Uavcan.rs v1 progress tracking

As we talked about in the dev call, I’m creating this to track some of the progress and issues with the v1 re-write. Preferably, specific tracking issues for individual tasks will be created (as opposed to the current overall tracking issue).

Previous discussion/Tasks to complete

Copy pasted from my email conversation with Alexander, the main tasks are:

Two API issues that need to be addressed.

  • Timestamping in a no_std environment, right now I use std::time::Instant to handle everything, which doesn’t fly in a no_std environment. It will likely require some tweaks to the overall API as well
  • Random source node IDs (for use in anonymous frames) - not really essential, it can be pushed to the user’s application. Thinking about it right now, that’s probably what I’ll do anyways.

And general work to be done before release is:

  • Message structure generation (via templating in nunavut)
  • Message serialization/deserialization (can be nunavut or via Rust macros, I’m leaning towards macros but not by a lot)
  • UAVCAN/UDP transport support
  • no_std, but alloc-capable SessionManager

Work that I would like to be done, but not considered essential for release:

  • no_std, fully static SessionManager (I don’t think anyone actually wants that other than me, so this isn’t really essential)
  • UAVCAN/Serial transport support
  • some form of automated testing to continuously verify interoperability with pyuavcan

Dev call details

Minor Notes

  • Crate should provide no_std by default, only enable std with a feature
  • Probably copyborrow canadensis’ solution for Instant (check canadensis_core crate)

Alexander’s tasks (subject to modification due to thesis and general progress)

  • Work on implementing more generic timestamping
  • Work on no_std SessionManager
  • Work proc_maco side of generation (David will do Nunavut templating - should be pretty easy)

Action Items

  • (David) Get example working again (or more robust against dependency issues if that’s the problem)
  • (David) Create tracking issues for smaller tasks

There was also some talk about future abstractions on top of the library. I generally didn’t include any of that in my roadmap because I wanted to focus on the core, but there is lots of room for upward growth and nicer things. (Specifically talked about implementing things like Heartbeat)

1 Like

@alexander_huebener I got the example working again, either the pyuavcan API changed slightly or I was using it wrong in the first place (likely the latter :slight_smile: ).

The Rust side also needed some work for my latest changes, that has been updated too - note that it currently needs nightly for GATs.

Created the following issues to track specific tasks:

@pavel.kirienko could you provide @alexander_huebener with write access to the main repo so he doesn’t have to fork it?

I’ve also migrated david/rearchitecting to v1, as a nicer main branch name, so please rebase all future work off of v1 and merge there until we’re ready for a proper release.

Sure.

Alexander, please share your GitHub username so I could send you an invitation. Thanks.

I’m very sorry for the delayed response. I had to do some basic work on my thesis to bring it in the correct direction.

@pavel.kirienko My GitHub username is teamplayer3.

@david.lenfesty
Right now, I’m working on basic can usage on my STM32G4 dev board. Firstly in C++ and then in Rust. I try out the Arduino c++ library for uavcan support. In general, I need a reference implementation, that would be c++, for my thesis, to quantify the implementation in Rust. After that, I want to test the uavcan crate on the controller.

1 Like

Done, you should receive the invitation now.

Says that you are invited:

Please navigate here and see if you can accept the invitation on the page directly: uavcan · GitHub

Ah. Here we go. Thanks. I don’t know why there is no notification in GitHub. Or is this normal?

Idk, maybe you opted out of their emails or it ended up in spam or whatever. Things fail.

Porting the crate to an embedded system:

I want to give a quick overview of my progress.
I want to port the crate to a microcontroller of the STM32G4 Series. For that, I use the development board stm32G431B. The controller has a can controller implemented. It supports FDCAN, which is considered by uavcan to be future-proof.

My first step was to get CAN working with the CUBE IDE of STMicroelectronics. That step was not that difficult because the IDE helps a lot with code generation.

After that, I wanted to get it working in rust. For that, I set up a small project with the cortex-quickstart project. Get a LED blinking was easy. The next step is to set up can, probably. In rust, there exists the bxcan crate. But as my discovery, FDCAN is not supported. To come over this, I started a crate to generate bindings for rust to the stm32 C hal. I got this working. Now I could use nearly the same code as in CUBE IDE. The biggest problem of this is that the most code is unsafe. But finally, I can send CAN messages in rust.

The next thing would be to write a safe wrapper around the bindings, or write a crate for the FDCAN support on stm32 controller. I think for my thesis I would go with the first approach and rely on the stm32 C hal and write a proper interface.

The next step is to port the uavcan.rs crate to the microcontroller and see what steps have to be done.

1 Like

@pavel.kirienko may correct me on this, but CAN FD is not a hard requirement for UAVCAN. You can just as well use good old CAN 2.0B (29 bit addresses, 8 byte data), as 107-Arduino-UAVCAN is currently doing. Of course CAN FD can deliver up to 64 bytes of data so it’s the more future-proof transport layer (I intend the Arduino port on CAN FD eventually) but it’s not a must have to get started.

2 Likes

Yes, you are right @aentinger. CAN FD is not a hard requirement. I will correct it.

In the specification doc is stated:

“This section specifies a concrete transport based on ISO 11898 CAN bus. Throughout this section, “CAN” implies both Classic CAN 2.0 and CAN FD, unless specifically noted otherwise. CAN FD should be considered the primary transport protocol.

Right now, I do some time analysis and come to a weird result. I checked transmission of a frame. For this, the loop has to iterate once.

Measurements taken on a cortex-m4 170MHz.

I came to a result of 21 micros when I measured the following code:

-- start --
for frame in node.transmit(&transfer).unwrap() {
    ...
}
-- stop --

For only measure the time the library needs, I subtracted the time of the inner of the loop. If I compare this result to the time it needs in the arduino lib (~12 micros) I thought this is too much and searched for the reason.

I decided to divide the for loop in different function calls. It looks like this:

let mut iter = get_iter_from_transfer(node, &transfer, clock);
let frame = get_next_frame(&mut iter);
transmit_fdcan(frame, can);

#[no_mangle]
fn get_iter_from_transfer<'a>(
    node: &mut Node<HeapSessionManager<CanMetadata, Milliseconds<u32>, StmClock>, Can, StmClock>,
    transfer: &'a Transfer<StmClock>,
    clock: &MonoTimer,
) -> CanIter<'a, StmClock> {
    node.transmit(&transfer).unwrap()
}

#[no_mangle]
fn get_next_frame(iter: &mut CanIter<StmClock>) -> CanFrame<StmClock> {
    iter.next().unwrap()
}

#[no_mangle]
fn transmit_fdcan(frame: CanFrame<StmClock>, can: &mut FdCan<FDCAN1, NormalOperationMode>) {
    ...
}

Now I got these results:

  • get_iter_from_transfer => ~21 micros
  • get_next_frame => ~0 mircos
  • transmit_fdcan => ~3 micros (only for completeness, not in calculations)

So further, I looked into the transmit function. The only thing it does is to create a CanIter. I copied the CanIter struct into my project and measured the time inside the new function and come to a result of 0 micros.

I don’t came to a result why the function call takes so long and when I measure the time inside the function, no time is needed.

let t_1 = start;
CanIter::new(...).unwrap();
let elapsed = ~21 micros;

impl CanIter {
    fn new() {
        let t_2 = start;
        ...
        let elapsed = ~0 micros;
    }
}

Details on how I measured:

I took the MonoTimer of the stm32g4xx_hal. In Rust it looks like this:

let clock = MonoTimer::new(cp.DWT, cp.DCB, &rcc.clocks);

let start = clock.now();
...
let elapsed = start.elapsed();

let micros = clock.frequency().duration(elapsed).0;
info!("elapsed: {} micros", micros);

I repeated the measurements on my Windows PC and got these values.

  • get_iter_from_transfer => ~1000/700 nanos
  • get_next_frame => ~100/200 nanos

This shows that the creation of the CanIter takes more time as well as getting the next frame.

My thoughts:

I think this could relate to create the CanIter struct on the stack, which is a bit strange. But I don’t have a real Idea why.

Furthermore, I think this is more a rust thing, but I thought these measurements are interesting for the development.

Resolved my measure problem. When a new CanIter is created, a crc struct is initialized. This creation involves a copy of a long array which is used as a lookup for the crc. This took around 20 micros. An optimization for that is to use a static lookup table and only safe the state of the crc sum on every iteration of the CanIter.

This gets addressed in this issue.

1 Like

Performance measurements

Did some more measurements on receiving messages.

In these measurements, the time is measured which has elapsed during library calls. In specific, from where a single received frame was feet into the library, until the library responded with a success receive. As a transport protocol, I used CAN with a MTU of max 8 bytes. In these measurements no physical CAN is involved. My controller, on which these tests run, had the following specs:

Cortex M4 170 MHz

Both compilers are set to optimization level 3 (speed optimization).

  • Rust: rustc 1.57.0-nightly (11491938f 2021-09-29)

  • C++: gcc version 9.2.1 20191025

Arduino:


// single frame

-- start --

uc.onCanFrameReceived(frame);

// callback

void on_data_receive(CanardTransfer const &transfer, ArduinoUAVCAN &uavcan)

{

  -- stop --

}

Rust:


// single frame

-- start --

if let Some(_) = node.try_receive_frame(frame).unwrap() {

    -- stop --

}

Measurements are done with one subscription and one session. No data types are used (not supported in Rust right now).

As reference, the arduino (c++) implementation performed as follows:

divided into n frames payload bytes micros
1 7 9
2 12 18
3 19 24
8 61 55

Rust measurements

  • Used allocator for heap: rlsf

  • Used container: alloc crate

  • Clock precision: 6 nanos (measures in micros)

divided into n frames payload bytes time 1 [micros] time 1.1 [micros] time 1.2 [micros]
1 7 6 5 (5) 2
2 12 15 9 (9) 5
3 19 22 12 (12) 8
8 61 44 29 (27) 23

(Value in brackets is the time of the first message)

Description of different measurements

  • (1) These measures were taken without optimization. (before this commit).

  • (1.1) Taken after this commit. The session buffer is initialized with the max message len. So, the buffer don’t have to reallocate when more frames received. This can be seen when only 1 frame has to be received verses two frames. The time which is saved by this optimization is shown in multi frame transmission.

  • (1.2) For these measurements, the subscription::update() function got refactored. commit. Now, old sessions get reused. This means, the buffer of the session does not need to be reallocated. This gives a performance boost after the first message was received. In detail, for the first message, a new session is created. After this, the following transfers can reuse the same session object.

Conclusion

The Rust implementation performed without optimization better than the arduino implementation (C++). Done optimizations had large impact on the measured times and lead to no bugs by now. This shows the potential of Rust. The uavcan.rs lib uses good abstractions and is faster than the C/C++ implementation, as shown by my measurements.

Further investigation

Further, more szenarios must be measured. This means, vary the subscription amount and the session amount per subscription. This could lead to other numbers because of more memory usage. But this depends heavily on the used allocator implementation. It could be interesting to measure with different allocator implementations.

To bring this to a next level, some kind of bench suit would be nice to measure performance for new releases. Maybe, I spend some time on this.

3 Likes

Awesome work!

Sorry for the lack of activity on your PRs, school picked up a bit pretty recently for me so I haven’t had the proper time to manage the PRs and general progress as much as I would like, but hopefully next week I’ll have some proper time to get all this work merged in (and maybe work on FD-CAN support).

1 Like

Hi :wave: Good work!

I’m just piping in because I do not share your conclusion, imho you’ve chose the wrong reference implementation to measure the Rust implementation against.

You’d be getting a more realistic comparison by creating a libcanard-only embedded application and measure what’s the execution time. 107-Arduino-UAVCAN is a convenient wrapper for libcanard but certainly not the end-all/be-all when it comes to optimisation. Also the Arduino platform does not let you choose custom compilation flags to there’s little chance for any kind of “fine-tuning”.

Of course building a custom embedded application requires more work but the comparison would be more meaningful that way.

Cheers, Alex

3 Likes

Thanks for your feedback @aentinger.

I wanted to compare the implementation to the arduino one because it has a nice user interface and does most work on its own (c++ type system, not too much to keep in mind while set up). I see the same things in the Rust implementation.
I know in embedded systems it’s more than install a library, take the example and everything is optimized. I think there are the strength of the rust implementation. It has a user-friendly abstraction of the underlying mechanisms, but this does not impact performance.

For compiling the arduino example, I use platformio which let me set all compiler flags I want. Maybe I have to investigate more on flags I can set to make it faster. But I think this is another point where Rust shines. There are not that many things to keep in mind. On one hand, this could be a good thing, on the other it could be bad.

In my oppinion, 107-Arduino-UAVCAN is such a thin wrapper around the c implementation that it doesn’t matter if I take the pure c Implementation or the arduino one. A second point, which I didn’t point out clearly, I don’t use the type system of UAVCAN for measurements.

This means I use the private function subscribe() for receiving (no type serialization) and for transmission I use enqeueTransfer(). These two functions do nothing more than I would do with the pure c implementation, but have the advantage of the type system of C++.

I hope these facts make the comparison more clear and why I use the arduino one. In my opinion, this is the right choice. I know I had to point out my intention in more detail.

If there is more I did not see, or I have to keep in mind. Let me know. It helps me a lot to see it from different points of view.
If I have some time, I can do the performance tests with the pure c implementation.

1 Like

You flatter me :blush: However, there also occurs dynamic memory non-o1heap allocations (on the first time you transmit anything, there’s a std::map involved) so depending how you measured your timings this might also factor in the required time. But I’ll stop bad-mouthing my code now :wink:, regardless I stand by my recommendation to use libcanard for direct comparisons.

1 Like

Yes, this is one point, why I use the implementation. This happens in the Rust impl as well. These are the measurements in brackets (first receive).

Now, I did measurements for the transmission part. I will show them soon.

Can you show me the source code line? I can’t spot it rapidly.