Handling mixed v0/v1 and Classic CAN / CAN FD networks

One of the key things that will determine if v1 gains traction or if it causes users to regret ever considering using it is how well we handle the transition between v0 and v1, and also handle mixed networks.
We should expect that v0 will be common on production vehicles in the UAS space for at least the next 3 years, and 1MBit operation will be common for even longer. There will be lots of setups that are pure v1 at high bitrates before then (or at least I hope there will be!), but for many production environments making the switch will be a slow process. I’m opening this topic to have a place to discuss how we will handle this transition period. My apologies if this is already covered elsewhere that I have missed. Note that it is a lot more than just the wire protocol level compatibility that is the issue.

Types of mixed networks
There are several quite distinct types of mixing that will matter:

  • mixtures of nodes that can only do v0 with v1 capable nodes
  • a single sensor node that is configured to broadcast the same data both as v0 and v1
  • an autopilot configured to accept both v0 and v1 sensors
  • mixtures of v1 nodes that are FDCAN capable and those that are not
  • mixtures of data consumers (of which autopilots are one example) that have different capabilities, such as companion computers or safety systems that only understand v0 while the flight controller understands v1.

I know that my nomenclature with things like “sensor node” goes against the philosophy of v1. That is quite deliberate. Users will see a physical device on their vehicle and the primary purpose of the device is providing sensor data. That is a “sensor node”. Hiding that distinction does not help.

Key Issues
A whole pile of interesting (and likely frustrating) issues will come up with the various types of mixing. Some of the key ones are:

  • the existing of diagnostic tools (such as config and bus monitoring tools) that understand these mixed networks and can help system integrators navigate the issues
  • alignment of node identification between v0 and v1
  • automatic upgrading of bitrates on FDCAN if all nodes are capable of higher rates
  • possibly support the “cover your ears now” method of doing high data rates when one or more node on the network can’t handle high rates
  • proper testing of mixed network scenarios so we discover the issues before it bites our end users (who may have pretty complex setups)

Diagnostic Tools
In the ArduPilot world users either use uavcan_gui_tool or they use MissionPlanner to analyse what is happening on their CAN buses. Both have packet inspectors, firmware update UIs, parameter control UIs etc.
We need to be able to point users at diagnostic tools that can handle all of the above mixed network scenarios, and gives them clear information on what is happening so they can diagnose and fix issues. That means the tools need to either bind to two separate APIs and combine them, or have a single API that can handle both v0 and v1. The display needs to make it really clear when v0 or v1 is being used in each packet (more colors?) and needs to have filtering capabilities to allow them to be separated. It is not really good enough to have to launch separate tools for v0 and v1.
We also need these tools to be smart enough to make it easy to relate a packet they are seeing in the bus trace to a physical device. The abstractions in v1 may make this harder.
Being able to save a full bus trace in a format that can be uploaded to a forum for analysis will also be very useful, much like we use tlog for mavlink logs.

Node Alignment
One potentially tricky issue is nodes that are broadcasting the same sensor data as both v0 and v1. If a flight controller is on the network it needs to know that it is a duplicate or we will end up double fusing the data, which would not be good. Even just the user seeing on their ground stations “you have 3 GPS modules” when they only have 2 physically connected is something we should avoid.
We’ve had this issue in the past with things like the Fix and Fix2 GNSS messages in v0. For that what ArduPilot does is keeps track of whether it has ever received a Fix2 from a particular node, and if it has then it will discard any Fix messages from that node. That prevents the double fusion of the data for that special case.
How do we do this for mixed v0/v1? I must admit that I’m still quite fuzzy on the node ID stuff in v1, but the bit that I think I may understand makes me nervous about this case. How can we robustly detect that a v0 GNSS packet is a duplicate of a packet from the same node that came is as v1? I do hope the answer is not going to be that we shouldn’t support such mixing, as that would just cause v1 deployment to be deferred a lot longer.

Bitrate Negotiation
We really should aim for as much of this mixed network stuff to be automatic as possible. That includes the bitrate negotiation. Ideally the user should find that if they only plug in nodes that are capable of higher bitrates, that the default action should be that all the nodes automatically switch to the higher rate. It should be possible to configure nodes not to do this, but I’d like to see if we can do it automatically by default.
It is made tricky because it isn’t just a matter of whether the node has a FDCAN capable peripheral, we also have to know that their transceiver is capable of higher rates. For example, the pretty common TJA015 is only rated at 5MBit, whereas a lot of boards (likely most high end boards?) have transceivers capable of 8MBit.
It is also complicated by the fact that most UAVCAN sensors currently commercially available are only capable of 1MBit. We hope we can do updated firmwares for these that will make them v1 capable, but we can’t make them capable of more than 1MBit. We should not be asking our users to throw out their existing equipment, so we need to cope gracefully while these are on the network.

Cover your Ears Method
Sometimes we will really want higher bitrates for a bulk transfer (eg. firmware update) while there is still a node on the network that can’t handle high bitrates. One possible way to handle this is for the initiator of the transfer to send out a broadcast message effectively saying “everyone who is not 8MBit capable please stop listening for the next NNN milliseconds”. You wouldn’t want to do this while flying, but for ground config it could make things a lot faster. Do we want to try and implement this?

That should be enough to get this discussion going. It is certainly not a comprehensive list of the issues we’re likely to encounter as we start to add v1 to the network, but it should give some starting points for discussion.

1 Like

The philosophy of UAVCAN v1 does not have anything to say about sensor nodes. DS-015 does, but it does not “hide” anything, it just models the network differently. I explained this in the adjacent thread; let’s keep this one clear of DS-015.

This is true. I actually have a solution for this that I intend to work on personally in the foreseeable future. If you require it urgently, let me know, then I will assign this task a higher priority.

I have outlined the solution very briefly in the Wireshark thread: UAVCAN v1 Wireshark plugin - #2 by pavel.kirienko. The plan is to define a binary log format based on uavcan.metatransport serialized using UAVCAN/serial. It will be protocol-agnostic and implementable both in software (such as Yakut) and hardware nodes.

This is trivially handled by subject configuration. Suppose there is node A that broadcasts certain data using both v0 and v1. Then there is subscriber B that is v0-only and subscriber C that is also mixed v0/v1. Then, seeing as v0 does not require subject configuration (there are data type identifiers instead of that), you don’t need to do anything to link A and B, they just work. Likewise, leaving node C not subscribed to the v1 subjects ensures that only the v0 link is active.

Removal of data type identifiers greatly simplifies the design and configuration of networks.


Regarding the bit rate, the plan adopted by DS-015 is derived from SAE J2284-4, where the data bit rate is fixed at x4 arbitration bit rate. Here is an excerpt from the DS-015 specification:

This means that the maximum is currently limited to 4 Mbps.

In general, DS-015 defines not only the application layer but also the physical layer, which can be discussed separately:

Classic CAN vs CAN-FD

Sadly, this problem does not have a clear-cut solution. The lack of a large selection of affordable MCUs with CAN-FD capability (especially more than 1 CAN-FD controller) paired with the current silicon shortage is likely to drag adoption out for the next few years at least. The biggest issue is that a single node that is not FD-compatible causes problems for the entire network. A band-aid solution to this might be to implement conversion nodes as @tridge mentioned, to connect a classic network to an FD one. This is far from ideal, I know, but it might accelerate adoption a bit. If anyone has any good ideas on how to go about this, please speak up :slight_smile:

Consider Texas Instruments TCAN4550. I never worked with it myself but it seems mostly harmless — a single-chip solution that incorporates a CAN FD controller along with the physical layer driver. By virtue of being external, it can be cheaply galvanically isolated from the host via SPI (it is easier than isolating CAN FD due to its latency sensitivity).

1 Like

@pavel.kirienko Thanks for the recommendation, I hadn’t seen this chip and it seems quite convenient. However, am I missing anything, or is it out of stock just like every other chip on the market?

image

The lead time seems better than on some other chips, though.

It might be out of stock now, but I wouldn’t expect this to last for much longer. If you are designing a new product, the current situation should not matter that much.

this looks really nice.

Perhaps a consideration for flight controller with two CAN ports, is to run v0 on CAN1 and v1 on CAN2. While I understand that not all FCs have multiple CAN ports, at least it’s something.

Hi David. I don’t think that would help much because v0 and v1 can run on the same bus and v0 frames can be robustly differentiated from v1 ones (this is explained in the Specification, section 4.2). If you segregated the protocols by CAN bus, you would still have the same problems to solve at the application layer.