Homepage GitHub

UAVCAN v1 crash course

One of the core design principles of UAVCAN is simplicity. That means two things: the protocol is constructed in a straightforward way and it is trivial to apply. We believe that this goal is achieved by the current design; yet, there were several occasions where our fellow humans would exclaim:

But if your protocol is so simple, how come it takes over a hundred pages to describe? :face_with_monocle:

That question could arise out of a misunderstanding of how design specifications work. Among other objectives, UAVCAN is designed to facilitate the robust interoperability of equipment from different vendors in high-integrity applications. That requires that every detail of the protocol is meticulously specified to ensure that, first, there are no unforeseen behaviors that might jeopardize a safety-critical application; second, that every implementation of the protocol can interoperate successfully with any other spec-compliant implementation, possibly from a different vendor. The formal project documentation also enables the integration of UAVCAN into high-assurance V-model systems design workflow:

As a result, a concept that takes one sentence to explain using regular daily speech takes two pages in the Specification. This post provides a legalese-free description of the protocol as a response to the misperception of the UAVCAN’s complexity.

While we are at it, the post also covers how to migrate from the earlier experimental revision of the protocol commonly known as UAVCAN v0. We are not going to cover how the design decisions were made because that would take forever considering that even a simple idea might turn out to be a huge can of worms if explored in-depth. If you want the background, feel free to search this forum because this is the place where many of the decisions were discussed and agreed upon (that often involves much back-and-forth bikeshedding).

Basics

UAVCAN is a composition of two decoupled parts: the presentation layer and the transport layer. The presentation layer is modeled by DSDL, meaning data structure description language, which describes data formats and how that data is to be serialized and interpreted. The transport layer determines how to transfer serialized data structures over the network.

There are different kinds of data structures:

  • Messages. A node may publish a message using a specific numerical subject identifier. Using that subject identifier, another node or several may subscribe to specific messages. This is the standard publish-subscribe pattern. This is typically the main mode of communication in UAVCAN applications and it is possible to implement a system using only messages.

  • Services. A node may send another node a request identified by a specific numerical service identifier. That other node would receive the request and then maybe send a response back. This is the standard client-server model.

When discussing subject or service identifiers generically we call them port identifiers. There’s nothing special about this term and any port identifier is always either a subject or service identifier in concrete terms.

Messages and service calls are exchanged between nodes. A given hardware unit could implement one node (e.g. an air-speed sensor might be a single UAVCAN node) or many nodes (e.g. a flight controller might have several functions that each act as a UAVCAN node). UAVCAN is a stateless protocol, meaning a node can join the network and begin operation immediately upon powering on without any kind of registration or preparatory data exchange with other participants. This is an important feature as it enables highly deterministic fault-tolerant systems. UAVCAN is a democratic protocol, meaning that there is no such thing as a “master” or any other kind of centralized intelligence – all nodes are equal.

Presentation

Anybody familiar with C-like languages should immediately feel at home. Suppose there’s a file my_project/MyMessageType.1.0.uavcan, where my_project is a namespace directory and 1.0 is the version number of the data type:

uint16 VALUE_LOW  = 1000
uint16 VALUE_HIGH = 2000
uint16 VALUE_MID = (VALUE_HIGH + VALUE_LOW) / 2  # Exact rational arithmetics!
uint16 value
uint8[<=100] key  # Variable-length array, at most 100 items.

The wire representation is straightforward. The byte order is little-endian and variable-length arrays (like key here) are prepended with a length prefix which is either uint8, uint16, uint32, or uint64:

$ pyuavcan dsdl-gen-pkg dsdl_src/my_project  # Compile the namespace
$ python             # Whip out the python to test it using PyUAVCAN
>>> import pyuavcan, my_project
>>> serialized = pyuavcan.dsdl.serialize(my_project.MyMessageType_1_0(
...     value=1234, key='Hello world!'))
>>> bytes(next(serialized))
b'\xd2\x04\x0cHello world!'

Here’s what we see: 1234 = 0x04D2, so it is serialized as [0xD2, 0x04]. The greeting encoded in UTF-8 is 12 bytes long, which is 0x0C, which is immediately followed by Hello world!. You can generate and parse such serialized representations using auto-generated code (Nunavut will help you here) or you can just twiddle bytes manually if you don’t want to get your hands dirty with automatic transcompilers.

The crucial thing to note here is that DSDL does not exist at runtime. As a specification language, DSDL can be read by humans to serialize and deserialize objects manually, and by machines to generate such serialization and deserialization code automatically. An embedded system does not know anything about DSDL, because at the time it is deployed DSDL has already done its job.

Suppose you manufactured a gazillion devices using the above definition and then you suddenly realized that the definition is deficient. You can’t just migrate all devices to a newer version at once because 1/4 gazillion of these devices are already in the field (sales have been brisk)! At this point the concept of semantic compatibility will become extremely prominent in your life. The UAVCAN designers endured two years of occasionally heated debates about data type versioning and in the end, they summoned the implicit truncation rule and the implicit zero extension rule into existence. Here’s how they work:

# my_project/MyMessageType.1.1.uavcan
uint16 VALUE_LOW  = 1000
uint16 VALUE_HIGH = 2000
uint16 VALUE_MID = (VALUE_HIGH + VALUE_LOW) * 0.5  # Rational arithmetics.
uint16 value
# This definition has no key. Who needs keys anyway?
# my_project/MyMessageType.1.2.uavcan
uint16 value
uint8[<=100] key     # The key is back.
float64 extra_value  # A new field!

We have two new versions of the same type which look quite different, but they are all semantically compatible :exploding_head:. A node may publish my_project/MyMessageType.1.1, another node may subscribe and deserialize the message using my_project/MyMessageType.1.2, and they would communicate just fine thanks to the implicit zero extension rule. The rule says that if the deserializer expects more data than there is, it shall assume that it’s just zeros all the way down, so the key would look empty and the extra field would be zero. If the nodes reversed their roles, the implicit truncation rule would enter the scene, which says that if there’s more data than the node expected, it should pretend that the extra data isn’t there at all. At the DSDL level, a related concept is structural polymorphism or structural sub-typing.

There are two other things that are sometimes relevant: tagged unions and service types. A tagged union is a way of encoding one value out of several possible options (like std::variant<> in C++ or enumerations in Rust, there is no equivalent in C); the encoded value is prepended with a byte that says which one is it:

@union  # This directive adds one byte in front of the message.
uint16 integer       # If the tag is zero, it's an integer.
uint8[<=100] string  # If the tag is one... You know the drill.
my_project.MyMessageType.1.2 my_object  # Yeah, composition.

A service type is defined by inserting three minus characters (---) somewhere into the definition, which separates service request schema from the response schema. If you have experience with ROS, you already know everything there is to know.

Looking at the examples here might help:

The public regulated data types define certain standard application-level functions such as the heartbeat message uavcan.node.Heartbeat, the only application-level function that every UAVCAN node is required to support. Except for publishing a Heartbeat once a second, every other application-level function is optional and can be implemented at the discretion (or lack thereof) of the designer. The documentation for such application-level behaviors is provided right in the comments of the respective DSDL definitions so that everything is kept conveniently in one place.

The last thing you need to know about the presentation layer of UAVCAN is how subject and service identifiers (aka port identifiers) are assigned unique numbers. Suppose there is a node that publishes messages of type my_project.MyMessageType.1.2 or provides a service of such and such type. How does it know what exact port to use? The vendor of the node could go the UAVCAN v0 way and just hard-code a specific identifier, but you might see, perhaps, how this could get out of hand? Another vendor would do the same thing; collisions galore! So we say this:

  • If the vendor really needs that fixed port identifier, it should send a pull request with the new data type definition to the public regulated data types repository linked above. The UAVCAN maintainers will be picky about which types are allowed into the regulated data type set; if the proposed type serves a very specific use case of a small vendor, it might get rejected. Think of it like the USB standard classes or CANopen standard profiles.

  • If the above does not hold (it rarely does), the vendor shall provide the ability to reconfigure the subject/service identifier by the end-user or integrator (such identifiers are called non-fixed identifiers). Failure to do so will render the device not UAVCAN-compliant and will cause many headaches for the customer.

  • If the vendor happens to be using UAVCAN in a closed project with no exposure to the outside world the vendor can do as it pleases (in most countries). Nobody cares, really.

Seems restrictive? That’s the cost of robust interoperability.

The specification and the public regulated data types repository document the ranges of port identifiers that can be used with fixed and non-fixed identifiers; the former are called regulated identifiers in the UAVCAN parlance, and the latter is unregulated.

This is a case where the specification might actually be pretty clear so we’ll repeat table 2.1 here to sum up this section:

regulated unregulated
public robust interoperability no fixed port identifiers/must be configurable
private (nope, not a thing) sure, just keep it to yourself

Transport

The job of the transport layer is to ferry serialized objects around the network (such occurrences are called transfers) and to facilitate topic-based filtering. The transports are designed with the requirements of high-integrity applications in mind, which include strict temporal predictability guarantees, redundant interfaces, and more exotic concepts like tunable reliability controls for very special snowflakes.

There are several transport protocols designed on top of different networking technologies such as CAN (FD) (called UAVCAN/CAN) or UDP/IP (called UAVCAN/UDP); they are replaceable, meaning that the presentation layer and the application on top of it are isolated from the specifics of the transport and can be migrated from one transport to another easily.

The transport protocols are designed to support topic-based data filtering in hardware, such that when the application requests a particular subscription or a service, the transport layer configures the underlying hardware to accept the relevant messages/requests/responses and to reject the rest automatically. All common implementations of high-speed and/or real-time networking hardware, like CAN controllers or Ethernet adapters, provide the necessary functionality out of the box so the software doesn’t need to sift through copious amounts of data in real-time.

UAVCAN/CAN

The UAVCAN/CAN transport is the direct successor of the experimental UAVCAN v0 extended with CAN FD support. One familiar with v0 will have no trouble migrating to v1 because, in essence, it relies on the same core concepts just having a few bits shifted around. The specification of UAVCAN/CAN is hardly five pages long, so implementing it from scratch should be generally a no-brainer and it would barely take more than a few hundred lines of code but even that is unlikely to be necessary given that there are portable MIT-licensed implementations available.

Not to replicate the specification but for clarity’s sake, the CAN transport treats both Classic CAN and CAN FD equivalently, the only difference being the maximum transmission unit (MTU, the amount of data per CAN frame). Most of the metadata (such as the subject/service ID, source node ID, and priority) is packed into the CAN ID in the most obvious way possible, except for the four things: the transfer-ID and the three flags which are start-of-transfer, end-of-transfer, and the toggle bit. These four things go into the last byte of the frame payload, also known as the tail byte.

The transfer-ID is a peculiarity of the UAVCAN jargon – most other protocols call it the sequence number. It is an integer that is incremented every time a message of a specific subject is published or a specific service is invoked, and it is quite paramount for many functions of the protocol.

The flags are only used when the serialized entity does not fit into a single frame, which means that the transfer is a multi-frame transfer. Multi-frame transfers may appear convoluted but essentially they are a zero-cost feature because any other implementation that takes into account all relevant edge cases (which are many) will end up being functionally similar. The purpose of the flags should be evident: the start-of-frame and the end-of-frame demarcate the first and the last frame of the transfer, respectively, and the toggle bit toggles, starting from one.

It is well known that one console dump is worth 1024 words. Suppose we start up the PyUAVCAN CLI tool and use it to publish a message of the type we defined above over a GNU/Linux SocketCAN interface using the subject-ID 4919 (0x1337 in hex) from node-ID 59:

# Generate Python packages from DSDL namespaces:
$ pyuavcan dsdl-gen-pkg https://github.com/UAVCAN/public_regulated_data_types/archive/master.zip dsdl_src/my_project
# Publish a message:
$ pyuavcan pub 4919.my_project.MyMessageType.1.0 '{value: 1234, key: "Hello world!"}' --tr='CAN(can.media.socketcan.SocketCANMedia("vcan1",64),59)'

Meanwhile, having started the candump utility in another terminal, we observe the following developments:

# Columns: timestamp, iface, flags (B means BRS), CAN ID, [payload size], payload.
$ candump -decaxta any
(1.365)  vcan1  TX B -  107D553B  [08]  00 00 00 00 20 89 00 E0   '.... ...'
(1.366)  vcan1  TX B -  1013373B  [16]  D2 04 0C 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 E0
(2.366)  vcan1  TX B -  107D553B  [08]  01 00 00 00 20 89 00 E1   '.... ...'

The first and the last frames here are the heartbeats from the CLI tool – remember, they are mandatory for all nodes. In the following example, we equip a pair of sunglasses and publish the same message but the MTU is set to 8 bytes, forcing the publisher to resort to a multi-frame transfer:

$ pyuavcan pub 4919.my_project.MyMessageType.1.0 '{value: 1234, key: "Hello world!"}' --tr='CAN(can.media.socketcan.SocketCANMedia("vcan2",8),59)'
$ candump -decaxta any
(7.925)  vcan2  TX - -  107D553B   [8]  00 00 00 00 20 3D 01 E0   '.... =..'
(7.925)  vcan2  TX - -  1013373B   [8]  D2 04 0C 48 65 6C 6C A0   '...Hell.'
(7.925)  vcan2  TX - -  1013373B   [8]  6F 20 77 6F 72 6C 64 00   'o world.'
(7.925)  vcan2  TX - -  1013373B   [4]  21 F9 02 60               '!..`'
(8.926)  vcan2  TX - -  107D553B   [8]  01 00 00 00 20 3D 01 E1   '.... =..'

Look at it go. The last two bytes at the end of the transfer (right after the exclamation mark !) are the multi-frame transfer CRC – the CRC-16-CCITT function of the serialized representation. It is needed to let the receiver ensure that the received multi-frame transfer is reassembled correctly.

UAVCAN/whatever

UAVCAN/UDP and UAVCAN/serial are very new transports that are reviewed in detail in other sources like the PyUAVCAN docs or the post on Alternative transport protocols. Essentially they all reify the same concepts using the functional means provided by the underlying networking technology. For example, in the case of UAVCAN/UDP, the subjects and services are manifested as UDP port numbers, which is optimal because it allows the implementation to offload the network traffic handling to the standard IP stack and the underlying networking hardware.

Migration from UAVCAN v0

The protocol has been simplified noticeably since v0, and several design issues were resolved. Here is the full list of substantial changes:

  • The Data Type ID has been removed. Without going much into detail, it was coupling the syntax of data (a data type definition) with its semantics (how it was used). It was the cause of certain architectural imperfections in the applications that relied on v0, so proceeding further without resolving that issue was considered undesirable. In v1, this is resolved with the new concepts of Subjects and Services, which are decoupled from the type identity and permit surjective mapping of subjects or services onto types, rendering UAVCAN architecturally identical to conventional publish-subscribe frameworks.

  • The Data Type Signature went the same way. It was shown to make data type definitions unnecessarily difficult to evolve. The new design permits polymorphic subtyping and arbitrary modification of data types so that deployed systems can be upgraded incrementally. This is the second most important upgrade after the syntax-semantics decoupling shown above.

  • The multi-frame transfer CRC is no longer pre-seeded with anything. This is a direct consequence of the above. Also, in UAVCAN/CAN v1 the CRC has been moved towards the end of the transfer.

  • Data type definitions can now be explicitly versioned and evolved sensibly. Messed up a type? No problem, just release a new version.

  • Tail Array Optimization is removed. Every array has a length prefix now, always.

  • The implicit fields (array length and union tag) are now either 8, 16, 32, or 64-bit wide. They used to have odd sizes like uint3. This change simplifies data type design and serialization.

  • The byte order is kept little-endian but the bits are now populated LSB-to-MSB, not the other way around. This change provides enhanced compatibility with 3rd-party tools and enables faster serialization and deserialization on conventional little-endian microarchitectures (big-endian platforms shall convert the byte order during serialization and deserialization).

  • The CAN ID bit layout of UAVCAN/CAN v1 is different and the toggle bit starts with 1 instead of 0. The toggle bit change is to make v0 and v1 distinguishable at runtime, enabling their coexistence in the same application.

  • UAVCAN/CAN v1 supports CAN FD.

  • New logo. Better. Bluer.

There are no changes that affect the hardware. A single unit or a whole system can migrate from v0 and v1 by a trivial software update. Said software update amounts to few things:

  • Replace your old v0 library with its v1 equivalent. The API will be slightly different but architecturally they are all alike. Some things got new names; like, Data Type ID is now the subject/service-ID. Some things are completely removed thus making development easier; for example, no more Data Type Signature and Tail Array Optimization.

  • Don’t forget to RTFM! All libraries are supplied with extensive documentation.

  • Add version numbers to your DSDL definitions and remove the manual padding before variable-length arrays and unions. Read the design guidelines in the public regulated DSDL repository and consider the recommendations about idempotency, if applicable. Don’t forget that there’s no tail array optimization anymore.

It doesn’t take much effort on the software side and there are zero repercussions for the hardware. Most importantly, the new implementations are built to a much higher quality standard. The only valid reason for hodling onto v0 is the legacy, and we are actively working with vendors to ensure their speedy convergence on v1.

Background

Further reading:

Here are some of the threads that shaped v1 or describe the key design decisions ordered by date (oldest first):

The list is incomplete, search the forum and GitHub for more. Some of the sources have been linked in this post already.

Conclusion

What perhaps speaks best about the level of complexity is the fact that the protocol can be implemented in a little over 1000 lines of code and work in a device with ca. 32K ROM, 8K RAM (OpenGrab EPM v3, Thiemar S2740VC), taking only a few kibibytes of ROM for itself. The old PX4 UAVCAN bootloader fits into 8K ROM using a custom protocol implementation.

The best place to start for a newcomer is probably the PyUAVCAN demo, as it allows experimenting on the local machine without much preparation. The key concepts are easily transferable to other implementations such as the deeply embedded real-time Libcanard.

The project accepts donations via OpenCollective.

This is a wiki post that can be edited by any regular user at this forum. If you notice a mistake, a piece of obsolete information, or see something that can be improved, please update it. Thanks!

4 Likes