Homepage GitHub

Big-endian vs. little-endian in the context of bit-level encoding

Yeah, we are confusing transfer syntax and serialization. We don’t care about the transfer syntax – it’s the domain of the transport layer. What we care about is how do we implement field boundaries when they do not match with byte boundaries.

A field whose boundary does not match with a byte boundary has to occupy only a fraction of the byte. The question is: should the fraction be aligned left or right?

image

We have to define an ordering rule for bits within a byte. There are two sensible possibilities:

  • Big-endian is when the most significant bit is considered to be first.
  • Little-endian is when the least significant bit is considered to be first.

Suppose we have a field that takes three bits of a byte; let’s label its bits f, and the unoccupied bits would be x. Then, in the case of the big-endian format, the byte will be divided up as follows (MSB on the left, LSB on the right):

bit number 0 1 2 3 4 5 6 7
value f f f x x x x x

And the little-endian would be (same display: MSB on the left, LSB on the right):

bit number 7 6 5 4 3 2 1 0
value x x x x x f f f

Which bit is transmitted first is absolutely irrelevant.

Kent Lennartsson got back to me with a suggestion to “make as few changes between two versions as possible”, which translates into big-endian. He wrote other things on the subject as well but they do not relate directly to the discussion so I am omitting that.

Sergey said at the dev call that he is concerned about the fact that a byte-boundary-aligned number that takes n<8 bits will be represented on the wire as if it were shifted left by (8-n). In other words, suppose we have uint3 x, x=1, then a serialized representation would be (1<<(8-3)) = 32. Then (x+1) would be (2<<(8-3)) = 64, (x+2) -> 96, etc.

I am not entirely against the little-endian format but my little experiment with PyUAVCAN (see PR 99, linked earlier) seems to indicate that it doesn’t really make things much more approachable. Whichever way you turn it, unaligned fields are a disaster and you can’t do anything about it.

This is what we propose:

Note that the convention used here is that bits are placed on the bus starting on the left and reading to the right. I.e: b7, b6, …, b0. This example demonstrates Least-Significant-Byte ordering with Most-Significant-Bit ordering.

Our experience is this convention maximizes compatibility with industry standard CAN databases offered by companies like Kvaser or Peak Systems.

Well. No wonder. This is the CANopen format. I don’t understand two things:

the convention used here is that bits are placed on the bus starting on the left and reading to the right

Where does the bus come into this discussion? Why do we care how the bits are transmitted? If my very special bus transmits even bits first, odd bits later, how and why does it affect the serialization format?

This example demonstrates Least-Significant-Byte ordering with Most-Significant-Bit ordering.

If we were to use the terminology I proposed in the previous post, the ordering would be least-significant-bit-first, because we align sub-byte fields on the right first (i.e., towards the LSB). Do you think the terminology is incorrect? Maybe we should just avoid saying things like MSB/LSB, seeing as we are not transmitting anything here, but merely smudging bits around?

Edit: also, how important the compatibility with COTS tools is, from your experience, on a scale from 1 to 5?

0004

Actually, it’s rather intuitive. It’s obvious but for some reason, I did not see it. In the OP post, I said that the big-endian byte order with the current (big-endian) bit order would have been nice because its native representation of data matches what we use in text (assuming the standard positional binary system). In the case of little-endian bytes + little-endian bits the same holds if the bit string is inverted such that the most significant digit is on the right.

Take the above example: (uint5)14 + (uint8)187 = 01110 + 10111011. The obvious case of big-endian bit/byte ordering:

  • Concatenated: 01110 10111011
  • Padded to 8: 01110 10111011 000
  • Split into bytes: 01110101 11011000
  • As hex: 75, D8

Same in the case of little-endian bit/byte ordering but the concatenation operands are inverted and the final byte swap is added to account for the fact that the numeral system is bit-big-endian:

  • Concatenated: 10111011 01110
  • Padded to 8: 000 10111011 01110
  • Split into bytes: 00010111 01101110
  • Swap bytes: 01101110 00010111
  • As hex: 6E, 17

We should switch to little-endian. I am going to merge https://github.com/UAVCAN/pyuavcan/pull/99.

whoa, ho there. “switch to little-endian”? You mean for both bits and bytes? I’m not comfortable with that without looking at the ramifications for real peripherals on the market. What was wrong with going with the CANOpen style of LSB/MSb? That seems to be well-supported by tools in the industry.

(the conversation was completed at the dev call; the resolution is that we are switching to the CANopen format which means the bit order will be changed and the byte order will remain unchanged; the issue is tracked in the Specification repository)

Hello, what is the result of the discussion? I see, that the last spec (Revision 2020-07-14, page 28) is saying the byte/bit order is LSB/MSb, but the Guide (The UAVCAN Guide) claims it must be a difference between UAVCAN v.0 and UAVCAN v.1:
The byte order is kept little-endian but the bits are now populated LSB-to-MSB, not the other way around
And UAVCAN v.0 has the same LSB/MSb order.
How should I interpret this info?

On page 28 it says:

Eight bits form one byte; within the byte, the bits are ordered so that the least significant bit is considered first (0-th index), and the most significant bit is considered last (7-th index).

Which means that the bit order is LSB-first, or little-endian.

Thank you for clarification, but the pic from the spec is confusing, it shows exactly the MSb order:
bit_order_uavcan

Observe that the least significant bit bears the 0-th index, which is the lowest index, which is, by definition, least significant bit first.

Ok, thank you, it makes sense!