UAVCAN/serial: issues with DMA friendliness and bandwidth overhead

pavel.kirienko · June 15, 2020, 2:01pm

Yes, but the corruption affects both B and C. Bilateral delimiters confine the error to one packet.

Per the model above, it stops once each encoded packet is fully delimited such that its representation is invariant to the prior or future state of the link; i.e., at two delimiters.

VadimZ · June 15, 2020, 2:30pm

Here is what I meant with double delimiters not helping in your example:

[packet-A] 0 0 [packet-B] 0 0  ~~~~~~~~~~~~~~~~  [packet-C] 0 0
                ^                   ^        ^
                |                   |        |
                Link                Link     Link
                disconnected        remains  recovered
                near this point     down

I understand the intuitive argument in favor of them in the presence of random errors (not prolonged link loss) because the error in delimiter could “take out” two packets rather than one.
I’m wondering is there’s a good numeric model to figure out if this tradeoff in overhead is worthwhile.

pavel.kirienko · June 15, 2020, 3:31pm

If the link utilization is <100% (as is usually the case) so that the packets do not follow each other back-to-back, the link status change (disruption/recovery) may occur between the packets:

0 [packet-A] 0 0 [packet-B] 0  ~~~~~~~~~~~~~~~~  0 [packet-C] 0
                   ^                   ^        ^
                   |                   |        |
                   Link                Link     Link
                   disconnected        remains  recovered
                   near this point     down

If the link is fully utilized, dual delimiters appear to add no value other than:

VadimZ · June 15, 2020, 9:39pm

Makes sense.

So that said, do you consider COBS a viable contender for UAVCAN/serial channel framing ?

pavel.kirienko · June 15, 2020, 9:40pm

Yes. Are you available to modify the current implementation in PyUAVCAN?

scottdixon · June 17, 2020, 3:57am

¡CONSENSUS!

VadimZ · June 17, 2020, 12:35pm

Can’t commit to it right now, but will try to find time.

BTW, there’s this python implementation: https://pypi.org/project/cobs/

pavel.kirienko · June 17, 2020, 3:16pm

If you need help getting familiar with the codebase, don’t hesitate to reach me directly.

I registered this on GitHub:

JediJeremy · June 29, 2020, 4:04am

Sorry I’m late to this discussion - I’ve been in implementor hell battling with UDP - I love the proposal to move to COBS but wanted to throw in a couple of ideas before code gets written, since if things are changing anyway might as well consider them now. Most of these ideas are probably bad for various reasons, but shooting them down now might save troubles later.

Since COBS does a look-ahead anyway, and since zero is probably going to be a very common byte value in most streams, we might want to consider a modification where instead of one byte of overhead we add two - the first to indicate where the next zero is, and the second to count the zeroes. That would compress runs of zeroes, which potentially could reduce the total frame size. (Of course the analysis of that is very app-dependent)

Another variant of that idea might use the top bit of the first overhead byte to signal that the second run-count byte was “not one”. That would increase overhead to at least one byte every 126 payload bytes, but if we expect zeros to turn up that often anyway, that’s not a big loss.

The second thing I’d suggest is making it explicit that trailing zeroes in the frame be truncated. (or at least, the expected behavior) I think at the moment that’s kind of optional and expected only with minor version polymorphism? Since the decoder is supposed to already pretend each frame has infinite trailing zeroes that will only modify the encoder, and makes full use of the existing decoder logic. (If this is already the recommendation whoops sorry I missed it…) Again this may have the effect of actually reducing the transmitted frame size, which is always nice. This idea is mutually exclusive with the previous zero-run-compression idea… if run compression is already happening this adds almost nothing.

And now the third and final thing… this is the big one… many years ago when I was looking into serial encoding I spent some time investigating Low Density Parity Checks (LDPC) which is the ultimate evolution of forward error correction algorithms. Basically, this would replace the CRC32 check.

LDPC is related to “Turbo Codes”, which are slightly less ultimate (LDPC codes get as close as possible to the Shannon limit) but perhaps slightly easier to compute. Neither would be as quick as CRC32 but they have a big payoff… while CRC is an error checking algorithm, LDPC and Turbo Codes are error correction algorithms. So long as relatively few bits are corrupted by noise the algorithms can figure out which, and flip them back. An otherwise lost packet becomes recoverable.

Most Forward Error Correction (FEC) codes have the issue that a corruption to the error check bits is an unrecoverable error… you’d need another FEC to ‘protect’ your previous check code and then you’d need one to error correct that block… so it’s either “turtles all the way down” or you have parts of the frame which are inherently more vulnerable to noise than other parts.

LDPC is an improvement because the parity check bits are themselves included in the error correction. Another way of thinking of it is to say that the check block is never transmitted (and assumed to be a fixed value like zero) and ‘extra bits’ are appended to the payload that are flipped so that condition is true. Those “parity check” bits are either scattered through the payload or simply appended to the end. The algorithm has other nice properties, such as being constant overhead and easily implementable in ASIC/FPGA hardware, although it requires a “parity check matrix” much like the CRC32 table. Most of the complexity of the algorithm is in the generation of this table - it takes quite a lot of math to create it - but of course that doesn’t matter to the encoder/decoder once the table is prepared.

In essence you can think of LDPC codes as having a set of parity bits (of constant known value) which are connected to a ‘random-ish’ subset of the payload and ‘extra’ bits. If the receiver detects that the bits don’t quite match, it asks “what’s the minimum set of bits I have to flip back to make all the parity checks work”.

It sounds like a lot of CPU overhead, but the matrix pre-encodes much of the work. When I was last looking at the algorithm the transcoders were still a LOT slower than CRC32, but that was years ago, and there may have been significant practical improvements since then, such as matrices that are optimal for software encoders. Either way, LPDC represents the “Gold Standard” for best-possible FEC (and therefore most efficient channel use) and is worth looking at.

There are very few times in Comp. Sci that you can say an algorithm is “unbeatable” and LDPC is one of them.

Turbo Codes don’t quite reach that level but get pretty close. While developed beforehand they can be considered a kind of ‘special case’ of LDPC with a sub-optimal matrix, but are quite fast, if I recall.

In both cases, the assumption is that correcting occasional small errors gives better channel use than simply detecting errors and discarding frames. You can 'tune" the LDPC matrix so that it has a truly stunning level of reliability. Some variants even allow the payload to parity check ratio to be variable so that the algorithm can dynamically allocate more error-correction bits to cope with changing line noise conditions, at the cost of implementation complexity.

It would certainly make more efficient use of the channel than the temporal redundancy option, although at the cost of extra CPU. If we’re changing the protocol anyway, I’d say now’s the time to look at forward error-correction.

In situations where the error-correction introduces a variable-time overhead which breaks hard real-time systems, you can revert to just using it as error detection and discard the packet (as you would have to anyway if more bits were corrupted than the parity block can cope with) but that gets to be an implementation choice.

Argh, look at the time… I gotta run.

pavel.kirienko · June 29, 2020, 4:26am

Regarding RLE and zero truncation, both are undesirable because in a real-time system, we are typically interested in the worst case scenario; this is unlike conventional or soft real-time systems where the average case or the amortized worst case are of interest. Compressing or trimming the zeros will save you the network throughput or computing power, but in a real-time application you are unlikely to be able to make use of the freed-up resources because you have to account for the worst case where the zeroes could not be eliminated. Beside that, the variability may result in the so called “control coupling” effect where the variance in the network timings may alias with a process loop altering its response characteristics.

LDPC is supremely interesting. I looked closely at the error-correcting codes while working on PoPCoP, the exploratory predecessor of UAVCAN/serial. I agree that the possibilities are interesting and I think that the main reason why these methods are not yet commonplace in communication protocols is the mental inertia. Since we are defining a new protocol here, we might as well use the opportunity to do it better. Despite all that, I turned to CRC due to my time constraints, but I would welcome an attempt to work out a superior error-correcting solution if there is anybody willing to dedicate enough time to that.

Is error correction really related to COBS, though?

JediJeremy · June 29, 2020, 5:23am

Ah, I knew there’d be a good reason. Yup I’m definitely in the soft real-time mindspace… perhaps one more option if we ever go for ‘profiles’? USB and TCP/IP are inherently non-realtime already, but I take the point.

Well it might be if we chose an algorithm like Reed-Solomon where we can ‘reserve’ certain codes so that eg: zero byte is never emitted by the encoder… and many FEC codes expect a fixed block size which would affect the framing.

I’ll do a review of the current literature and see if anything interesting falls out… one funny thing I just (re)learned is that 802.11n WiFi already uses LDPC so basically I already get that for ‘free’… and that also means there might be hardware in modern microcontrollers which could be co-opted for the purpose. I also just learned about “polar codes” which are an attempt to get close to LDPC but with much less CPU.

pavel.kirienko · September 13, 2020, 12:51am

Hey Vadim,

I’m curious how is it looking there on your end? Have you been able to deploy your application? How did UAVCAN/serial perform in your use case? Any chance to see you among contributors to PyUAVCAN?

Thanks!

VadimZ · September 27, 2020, 4:09pm

Hi Pavel,

Still hope to find time for it, but doesn’t look good for the near future. The project in the context of which I was looking for serial encapsulation protocol was sidelined …

pavel.kirienko · September 27, 2020, 4:36pm

I am not sure which option to take:

Wait for you or someone else to contribute the COBS encoder to replace the current one before the end of this year. I am currently unable to dedicate time to that because we are hell-bent on releasing v1 and other things like DS-015.
Continue using the current suboptimal encoder. Later, with the help of @JediJeremy or someone else who also understands error-correcting codes, release a new major revision of the serial transport (UAVCAN/serial2) that will either exist alongside the current one or deprecate it completely.

Right now, considering how things have been developing lately, I suspect that #1 is not feasible so we are on track to keep the inferior format.

VadimZ · September 27, 2020, 4:54pm

Would you be open to using external python library dependency for COBS implementation in PyUavCan ? Namely, https://pypi.org/project/cobs/ . I might be able to scrape some time to do it this way …

pavel.kirienko · September 27, 2020, 4:57pm

That seems like a lesser evil than #2 so yes.

VadimZ · September 27, 2020, 5:13pm

Alright, I’ll read https://github.com/UAVCAN/pyuavcan/blob/master/CONTRIBUTING.rst and try to get to “passing tests” state. I’m on windows and it says “if you are on a different system, you are on your own” towards the end.

pavel.kirienko · September 27, 2020, 5:16pm

May I suggest that you base your work on the bootloader branch? I introduced certain improvements to the unit test suite there that I expected to merge into master very quickly, but I was distracted to other issues and my fixes remain on that branch to this day. I should get back to that as soon as DS-015 is sorted.

VadimZ · October 9, 2020, 4:01pm

Sure, however I don’t see this branch now. Was it merged already ?

pavel.kirienko · October 9, 2020, 4:01pm

Yes.