UAVCAN/serial: issues with DMA friendliness and bandwidth overhead

VadimZ · May 23, 2020, 7:19pm

I’m posting this in the hope to better understand which requirements are the central to the use case UAVCAN/serial attempts to address. Because while the UAVCAN ecosystem has the potential to become something amazing, the issues below would either keep me away from it, or force me into incompatible fork, at least where serialized format is concerned.

My major points are:

Byte stuffing approach is not DMA friendly.
This approach can work well in HDLC which is implemented by hardware logic. Doing byte stuffing by a microcontroller, on a high throughput data stream that could otherwise be handled by DMA would have a major performance impact.
Guaranteeing link can always support 100% overhead is not practical.
This is a real concern because data is often not high entropy. So “transparency” of the channel is an illusion, since throughput limit is also a critical channel parameter.
For instance, if forced to use such an underlying channel implementation in one of my current applications (a high speed data logger), I would need to implement “anti-escaping randomizer” before handling the data to the channel. Without it, some states of the world would guarantee overflowing the channel limits due to escaping.

If you acknowledge those issues are relevant and worth of addressing, I’d be happy co collaborate on some alternative that would make the protocol usable to me and others with similar constraints.

If not, well, one more custom framing protocol …

pavel.kirienko · May 24, 2020, 12:46pm

Hey Vadim. We are certainly open to new proposals since this is an ongoing design process and the existing draft is expected to be suboptimal.

The core requirements presented to the serial encapsulation format are transparency and implementation simplicity. The simplicity requirement is probably self-evident and requires no further explanation – the simplicity of the whole protocol is one of its core design goals.

The communication channel transparency here is to be understood as the lack of coupling between the probability of a failure occurring in the communication channel and the data exchanged over the channel. We define “failure” as a temporary disruption of communication, undetected data corruption, or misinterpretation of data.

The designer of a high-integrity system has to perform a comprehensive analysis of the failure modes to prove the behavioral properties of the system. It is convenient to approach the task by assuming that there is a malicious agent attempting to subvert the system, even though the problem is not related to security in general.

If you adopt that mindset, you can quickly observe that any type of communication channel that does not offer robust separation between data and control symbols is non-transparent because it is always possible to construct such a data symbol sequence that can be interpreted as a control sequence. Embed a valid frame into another frame and evaluate if the former frame is represented as a valid frame on the communication channel; if yes, the protocol can be subverted by invalidating the control symbols of the outer frame, causing the parser state machine to interpret (part of) the data as a control sequence.

For example, suppose a frame is defined as (this is a very common pattern): (control-symbol, length, payload). Then, define payload as (control-symbol, length2, payload2). If the first control-symbol is missed, can the protocol state machine determine that the following symbols are to be ignored?

Another issue pertaining to transparency is fault recovery. An encapsulation that does not offer robust segregation of control symbols may exhibit poor worst-case failure recovery time if the parser state machine mislabels incoming control symbols as data or vice versa due to data corruption (or malicious activity of the sender).

For example, suppose that in the above example the length prefix is damaged to indicate a very long value, resulting in the parser state machine incapacitating the communication channel until either the expected length is traversed or a failure recovery heuristic is triggered. Such heuristics tend to introduce unduly constraints upon the protocol and complicate the analysis of the system so they are to be avoided.

One might exclaim that the last failure case is addressable with recursive parsers, and it is true, but it does not help against the first issue and recursive parsers are hard to implement in a system whose resource consumption has to be tightly bounded.

As mentioned in the PyUAVCAN documentation, consistent overhead byte stuffing (COBS) may be leveraged as a means for implementing transparent channels while keeping the overhead low, but this approach suffers from poor error recovery characteristics (what happens if a large-valued data symbol is misinterpreted as a COBS control symbol or a correctly placed COBS symbol is corrupted such that it indicates a large skip distance?)

You might be tempted to think that these considerations have an academic feel and are hardly relevant in practical applications. This is not true. First, as I mentioned already, if you are dealing with a high-integrity system then fringe cases have to be analyzed. Second, a sufficiently noisy channel may occasionally trigger failure cases in very real applications leading to potentially dangerous circumstances; for example, you may have seen already the discussion related to the VESC motor control protocol whose poor design may cause safety issues.

Non-transparent encapsulation formats are still used successfully in many applications where a high degree of functional safety is not required or the communication medium is assumed to be robust, such as GNSS hardware (e.g., RTCM, u-blox), communication modems (e.g., XBee), etc. Assumptions that are valid in such applications do not hold for UAVCAN.

If you look at protocols designed for high-reliability applications, you will see that they implement transparent channels virtually universally. The commonly-known MODBUS RS-485 comes to mind, where the control-data segregation is implemented with the help of strict timing requirements (impractical for UAVCAN). DDS/XRCE offers a serial transport design recommendation which is based on HDLC as well, for the same reasons.

The statement “byte stuffing approach is not DMA friendly” is true but the whole truth is “stream-oriented serial interfaces like UART are not DMA friendly”. Either we have to live with that or I am missing something, in which case please correct me.

VadimZ · May 24, 2020, 4:06pm

Hey, thanks for the detailed answer.

I’ll start with the points of complete agreement:

clarity of design and simplicity of implementation - extremely important
robustness and fault recovery - no serious system can be built on foundation that lack those
careful “academic-like” analysis - there is no alternative unless we’re building hobby toys.

So with those out of the way, I think the presented dichotomy between robustness and byte stuffing is a false one. It should be possible to construct a framing scheme that is arbitrarily robust, and has no payload modification.

Lets start with taking your example of (control-symbol, length, payload) and modify it into: (control-symbol, length, CRC1, payload, CRC2)
This scheme is as robust to control symbol or length injection as the CRC is. By making the CRC larger, we can increase the robustness as far as we need it to, The probability is easily calculated given underlying bit error rate of the channel.

The resync algorithm would just have to read the triplet (control-symbol, length, CRC1) , validate CRC and either accept or reject the header. The recovery is very fast - first non-corrupted packet would be accepted.
(Side note: a security metaphor breaks down here. Malicious attacker would be able to forge the CRC and slip in any bad data undetected in any protocol implementation short of PKI. Denial of service with intentionally large packets similarly out of the scope, and exists in stuffing option as well)

So I think the above proves that both stuffing and length+CRC have equally good recovery times, and equally good protection properties (up to the strength of CRC in both options) . I would go further to say length+CRC also wins on implementation simplicity.

Another benefit of length + CRC option lies in consistent overhead with much better worst case, and in block-transfer friendliness.

If we look for example at STM32 family of micro-controllers, both UART and USB peripherals support DMA for a while now (the later with a lot of limitations, unfortunately, but hopefully they will improve).
So transferring block of data over serial port does not involve feeding the bytes one by one anymore. And this makes non-stuffing protocol more performant on those platforms.

What do you think ?

pavel.kirienko · May 25, 2020, 6:40pm

I think this is incorrect. The extra CRC does protect against length corruption but it does not protect against malicious frames, unlike the transparent channel option.

Let us define a maliciously constructed frame as a frame whose payload is a valid frame itself which, if interpreted by the parser, may lead to undesirable consequences (such as a denial of service caused by a long length).

Any approach that does not offer strict separation of control symbols from data symbols is susceptible to this condition because one can trivially construct a frame and embed it into the payload knowing that the protocol will forward it verbatim. The approach based on byte stuffing is robust against this condition because it guarantees that a control-symbol cannot occur in the data section. Reception of a control-symbol forces the parser state machine to unconditionally begin a new frame, hence its behavior is simpler and easily predictable.

Let us review your DMA issues and see if there is a sensible way to implement the transparent channel while avoiding excessive computational load on the MCU. Could you please describe in detail what desirable aspects of the implementation are unattainable with the current approach?

Also, regarding the variable overhead: if Scott Dixon was here, he would have certainly said that the variable overhead inherent to the byte stuffing may cause undesirable control coupling because the temporal properties of data transmissions become dependent on their contents:

I haven’t given much thought to that yet and I am not sure how critical this issue is but one possible way to mitigate this is to always enforce 100% overhead regardless of the payload. For example, one radical solution would be to make the protocol text-based like NMEA-0183, exchanging data in printable hexadecimal characters. This is not meant to be a proposal, I am aware of the performance-related costs, but we need to be aware of this option.

pavel.kirienko · May 26, 2020, 1:15pm

After a bit more thinking it seems obvious that an efficient binary-to-text encoding format will allow us to replace the current variable overhead of 0%–100% with a near-constant overhead within 25%–33% while keeping the transparent channel.

A well-known Base64 encoding yields a virtually constant overhead of 33.(3)% (“virtually constant” because the padding introduces a minor uncertainty). There are more efficient but less commonly used alternatives like Ascii85/Base85/Z85 which offer 25% at the cost of the trailing padding being implicit.

Human-readability of the resulting format is an interesting feature considering that it is also currently intended for use as a log file format.

Should we explore this further?

VadimZ · May 26, 2020, 6:29pm

I think this “vulnerability” applies equally to length and stuffing approaches. Just as you can send a gigabyte length header, you can also begin sending gigabyte packet with no delimiters. If anything, with length approach this would be detected immediately, and with stuffing approach - only when the packet buffer overflows. In both cases, the recovery action would be to discard and begin re-synchronization.

That brings up the point that any protocol which aims to guarantee interoperability should have some notion of MTU, either static or negotiated. Otherwise these is no sensible choice of packet buffer size - the party with more RAM could always overrun the party with less. And you need packet buffer to keep the data around until the trailing CRC arrives to validate it.

Also, lacking an explicitly defined threat model, I’m not sure how far you intend to take the security features. After all, sufficiently motivated attacker with access one side of the serial/USB line can also:

display insulting messages on the operator console
send fake data
apply few kilovolts to the data lines to cause even more undesirable consequences

pavel.kirienko · May 26, 2020, 6:48pm

We do not define any security threat model at all because we are not dealing with security. In my original post I introduced the notion of a malicious agent as a convenient way of analyzing failure cases rather than intentional attacks. If you find this terminology confusing, we can replace “malicious agent” with “worst-case scenario”, all else unchanged.

I think you are mistaken in your assumption that the failure modes of a transparent channel and the non-transparent channel framing structures are equivalent. As I indicated above, the crucial difference is that a transparent channel link is able to recover from arbitrary prior state unconditionally upon the beginning of a new frame. This does not hold for the other option and I hope my previous replies provide an adequate illustration of that.

Do you see any flaws in my reasoning?

The RAM argument is independent of the MTU because the reassembly of a multi-frame transfer requires the target node to allocate sufficient memory to contain the entire reassembled message. The message cannot be processed sequentially due to the integrity check issues and certain complex edge cases pertaining to data serialization.

Irrespective of the above, the worst-case data requirement is deducible statically from the DSDL definitions – statically defined resource utilization limits are one of the core features of the protocol.

This is incorrect. Keeping the data to compute the CRC is unnecessary because all CRC hashes are single-pass algorithms. This property is the foundation of the implicit truncation rule that is already defined in the specification and used in the existing UAVCAN/CAN implementations, for example.

VadimZ · May 28, 2020, 4:58am

I think the flaw here lies in the treatment of CRC. It is assumed “invincible” in the stuffing protocol, while treated as vulnerable in the analysis “protected length” variant. If we agree that “malicious agent” is outside of the scope of this analysis, then the correct thing is to assign a probability of “CRC collision” due to random channel noise and use it consistently for both protocols.
If that qualitative reasoning is not convincing, I’ll try to come up with numerical analysis to illustrate the point.

Ok. I war referring to this message size when I said MTU rather than transfer frame. I think for serial protocol they are likely to be identical. So if the maximum message size is statically decidable, then any frame larger than this size should be rejected.

Of course. I meant to point out that the message can’t be processed and discarded in a streaming fashion before the CRC arrives and is validated and that we need message buffer. Seems we are in agreement here.

pavel.kirienko · May 28, 2020, 8:44pm

Sorry, I don’t follow. From my perspective, the probability of a CRC collision does not affect the current analysis in any way. The CRC can be excluded from the argument without altering its validity.

The core idea (unaffected by CRC) is that the transparent channel parser state machine is reset unambiguously with the beginning of the next frame (assuming that the frame delimiter is undamaged). This makes its behavior simple. Any non-transparent encoding that does not introduce an implicit differentiation between control symbols and data symbols cannot guarantee that by design.

The high and variable overhead introduced by the byte stuffing is obviously a disadvantage but I think it could be mitigated by using alternative representations as I mentioned above.

That behavior would interfere with the message extensibility feature built into the protocol. Even if it was implemented as you described, the utility of this constraint would be limited on a system where the maximum size of a message is large.

VadimZ · May 30, 2020, 6:10am

I think it is not correct - otherwise we could also say that arbitrary noisy message would be accepted by either protocol. CRC is a crucial part of the design and excluding it from consideration leads to wrong conclusion.

“length+CRC” encodings guarantees that in the same way as message tail CRC guarantees that message corrupted by noise would be rejected. So I would call it “guaranteed by design” as well.

pavel.kirienko · June 3, 2020, 9:08am

Yes, but this is a different failure mode. The one we were discussing in the last few posts here is not affected by the frame check sequence.

You are talking about a completely different failure mode that is handled identically by both approaches.

Here is a concrete example to augment my earlier replies. Imagine you have a frame format like (magic=0x7E ‘~’, length [1 byte], checksum [1 byte], payload [‘length’ bytes], checksum [1 byte]).

A trivial specimen would be:

>>> payload = b'Hello world!'
>>> bytes([0x7E, len(payload), len(payload)]) + payload + bytes([sum(payload) & 0xFF])
b'~\x0c\x0cHello world!]'
>>> old_payload = _

The following demonstrates the first of the failure cases I described:

>>> bytes([0x7E, len(old_payload), len(old_payload)]) + old_payload + bytes([sum(old_payload) & 0xFF])
b'~\x10\x10~\x0c\x0cHello world!]P'

If the first frame delimiter is lost, the parser would receive the original payload even though it was never transmitted.

The other is:

>>> payload = b'\x7E\xFF\xFF'
>>> bytes([0x7E, len(payload), len(payload)]) + payload + bytes([sum(payload) & 0xFF])
b'~\x03\x03~\xff\xff|'

If the first frame delimiter is lost, the parser would become stuck receiving the non-existent 255-bytes long frame.

The transparent channel encoding is robust against both of the failure cases. A non-transparent encoding is always prone to the described failures, by design.

Now, being equipped with these practical examples, can you please re-read my first reply and re-evaluate it?

VadimZ · June 6, 2020, 6:40pm

Ok, I have re-read and re-evaluated.

Here’s a very similar attack sequence crafted against the transparent protocol. I’ve taken popcop as a specimen with 0x8e as the delimiter and one-byte sum() as the CRC, same as in your example.

>>> payload = b'\x42\x42\x0e'
>>> delim = b'\x8e'
>>> packet = delim + payload + bytes([sum(payload) & 0xFF]) + delim
>>> packet.hex()
'8e42420e928e'

Now if the noise flips the MSB of packet[3], turning it from 0x0e to 0x8e then the payload of 0x42 will be accepted even though it was never transmitted.

So I still do not see what forms a special failure case here that separates the two protocols.

Your second example is a little more interesting, since it crosses from the “accept wrong data” result to the “wait” result. Here my answer is that I consider both results to be of equal weight in failure cost analysis. The probability of both errors is controlled by the CRC strength.

The probability of collision here is a packet_error_rate / 2^CRC_bits. Packet error rate is usually approximated as packet_size*bit_error_rate, and since length header is expected to be much smaller than the payload, corresponding collision probability in header resulting in a wait will be correspondingly smaller that the probability of accepting corrupted packet by either protocol.

Also, the duration of the wait in case of CRC collision is bounded by heartbeats, which I would design into protocol in any case to be transmitted when there is no real payload.

So to recap, given a protocol that accepts corrupted frame with a probability P, I see no reason to spend extraordinary amount of effort to guarantee that with a probability of, say, P/10 will not spend up to max_message_length bytes recovering.

If the analysis shows that recovery time significantly more important than corrupted data reception (not that I consider it to be the case, but I’m open to possibility that such a use case can be constructed), even then the more cost effective solution is to increase the strength of CRC. Using CRC64, or simply repeating CRC32 twice, would make the probability of wait negligibly small.

scottdixon · June 7, 2020, 5:47pm

This thread appears to have drifted somewhat away from the subject of “DMA friendliness.” @VadimZ; reading your original post my understanding is you want to modify the current serial transport in two ways:

Lower computational complexity of encoding and decoding to reduce CPU load.
Lower maximum overhead of the encoding algorithm to reduce reduce the maximum required bandwidth.

Is this correct? If so then we aren’t really discussing anything specific to DMA. What we care about is reducing the amount of CPU needed to send or receive these messages. Specifically I’m looking at the reference manual for the STM32F7 family of microcontrollers based on this comment you made:

I’m unsure how the bits put onto the serial line modifies the performance of transferring to/from DMA? DMA simply allows you to buffer serial words without CPU intervention. No matter what those words contain you have to reassemble this data using CPU so it seems like we only really care about the computational complexity of encoding/decoding unless I’m missing something?

As an aside; in response to @pavel.kirienko’s proposal to use a text-based encoding, the STM37F’s USART peripheral supports generating interrupts for ModBus delimiters (either RTU or ASCII). I think this, combined with DMA, would allow reception of a full frame without CPU intervention (where the full frame fit into the DMA buffer).

VadimZ · June 10, 2020, 6:54pm

Hi Scott,

I think the subjects are closely interrelated. On a high level, the stated rationale for using payload escaping is to achieve better robustness of the protocol. So the last few messages on this thread were spent trying to analyze if this is indeed the case. My current understanding, as detailed in the message above, is that escaping does not lead to more robust protocol.

The downsides of escaping are the “DMA unfriendliness” and overhead. Both are the consequences of payload on the stream being not bit-wise identical to payload in memory. Lets call this property “payload transparency” as opposed to often-mentioned “channel transparency”.

Well, yes, the purpose of the DMA is to reduce computational load of data transfer tasks.

To clarify, lets compare two cases. In both we’ll deliver data from high speed SPI A2D convertor to upstream USB host.

Payload transparency, no escaping. When data is available, DMA engine transfers the samples from SPI peripherals to memory buffer. When buffer is filled, another DMA transaction pushes the buffer over to USB. The CPU cycles are only spent so set up each transfer, O(1) complexity.
Payload escaping. The CPU has to go over the entire buffer, perform the escaping and write the results into another memory buffer, potentially twice as big, before sending them to USB. O(n) complexity.

scottdixon · June 10, 2020, 7:04pm

Gotcha. So the optimization is for chaining DMA transactions where the CPU is only needed to de-frame a message before initiating the next sequence in the chain.

So, in this world, where a UAVCAN frame embedded in a UAVCAN/Serial frame could, potentially, contain a UAVCAN/Serial delimiter what is your proposed synchronization algorithm for the serial reader?

VadimZ · June 10, 2020, 7:40pm

You mean nested serial over serial, where a complete valid frame can be part of the payload ? Interesting point, need to think it over

pavel.kirienko · June 12, 2020, 3:57pm

Both protocols respond to noise as you demonstrated similarly so this case does not seem to warrant additional attention. What you should also consider is how a non-transparent protocol responds to the following practical cases: 1. where a valid long message is aborted halfway along the transmission and a new message is initiated, as is the case where the link is transiently downed or the sender is reset. 2. where the link is used to carry tunneled messages of the same format.

The differentiator between the transparent and non-transparent channels is that only a transparent-channel receiver is guaranteed to accept the first valid packet regardless of the prior state. Or, in other words, a transparent channel provides guaranteed recovery properties from transient link failures. Judging by your mentioning of probabilities you seem to understand that now.

You are correct to say that we should be aware of the costs of the guarantees we obtain by choosing the transparent channel model. If it were possible to apply probabilistic approaches reducing the effects of the failure modes I described earlier without resorting to the more expensive transparent channel option, we would have done that, but as it stands, I am not aware of such methods.

I find your proposal of forcing the receiver state machine out of the long-receive failure state using heartbeats non-viable because it does add considerable complexity – something that I talked about specifically in my first reply here. Are you sure that the serial transport is indeed the optimal choice for your application? If bandwidth and CPU utilization are major concerns (they appear to be), should you not consider a different transport, like UDP perhaps?

You are correct that the worst-case overhead is large and it could possibly use an improvement. I mentioned Z85 above as a constant-overhead alternative to the variable-overhead byte stuffing used now (Z85 – fixed 25%; byte stuffing – variable 0…100%). Among the different text-based frame encapsulation formats that come to mind (NMEA 0183, G codes, MODBUS ASCII) none fit our objectives perfectly so we could use a well-established data representation format (as opposed to encapsulation format) instead. UAVCAN frames contain a fixed set of fields which calls for a rigid table-like structure; which in turn suggests a delimiter-separated format like TSV. For example, if we naively put one field per column, we end up with:

priority
source node-ID
destination node-ID
port-ID
transfer-ID
frame index
data type hash
header CRC
Z85 payload with CRC

VadimZ · June 14, 2020, 6:11am

Hi Pavel and Scott,

Regarding tunneling, I’m coming to a conclusion that indeed, transparent (aka unescaped) payload and tunneling are basically incompatible. Or, to be more precise, out of 3 properties: transparent payload, tunneling support and no out-of-band signaling only two can be satisfied simultaneously. So practically, with transparent payload, TCP which has out of band signaling can support tunneling, and UART serial port can’t. In channels with OOB like TCP or USB, link reset would trigger a connection reset, therefore no recovery is ever required.

Regarding text-based protocol:
I’m consider those two good reasons to build text protocols: when the underlying medium only supports text, or when eyeball debugging/ keyboard simulation is desirable. I don’t actually see any of those in UAVCAN/serial. Eyeball debugging in no realistic with the efficient encodings, and the underlying channel is based on bytes (we’ve come a long way from teletype terminals). So I see no good reason to ignore the high bit or, worse, limit ourselves to 85 characters out of 255.

So practically, the sensible choice seems to pick “transparent payload” for channels with OOB, and use stuffing for pure UART serial links. (since VCP is sometimes chained with UART unknowingly to the end device it will have to be bundled it with the latter for general purpose implementations)

With that in mind, and looking back at efficient stuffing options, what do you mean by “while it [COBS] offers a substantially lower overhead, it undermines the synchronization recovery properties of the protocol.” ? Since each packet is encoded separately, recovery time should not be affected.

P.S.
You might find this interesting:

(source)

P.S.2
For my immediate specific application, where in additional to bandwidth and CPU utilization, code simplicity is also a concern, a direct USB link with transparent payload is probably the best choice. Implementing IP and guaranteed delivery (TCP or alternative) of a sample stream adds rather significant complexity.

pavel.kirienko · June 14, 2020, 2:14pm

Thank you for the link. I think my understanding of COBS was incorrect because I hadn’t realized that the receiver state is unconditionally reset by the first zero occurring in the stream. If confirmed, this would invalidate the claim of the current design to optimality.

Can you confirm that:

A delimiter (zero) occurring in a COBS-encoded stream unconditionally terminates the current frame (regardless of its validity).
If the delimiter occurs earlier than predicted by the last pointer byte, the frame is malformed. Example:
- 11 03 22 33 00 decodes into 11 00 22 33
- 11 03 22 00 is malformed – unexpected delimiter
Delimiting both the beginning and the end of a frame is feasible (this may result in the reception of spurious zero-length frames which is acceptable).

VadimZ · June 14, 2020, 7:36pm

I think this quote from an original Cheshire and Baker paper confirms point 1

Point 2 rather logically follows from point 1. Such a condition would result in premature end of buffer during decoding.

I’m not sure what do you mean in point 3. I think in the original design, the intention is to have zero byte as a separator between frames, so single zero byte per packet. 0 [packet] 0 [packet] 0 etc.

Multiple consequent zeros would generate “empty packets” and be effective NOPs, but for which purpose ?