Proposed change to variable-length arrays

In the spirit of simplification over optimization I think we should change the default serialization behaviour of variable-length arrays and then introduce an advanced option to reenable the existing bandwidth-optimized behaviour:

uint8[<=16] a

would silently expand to:

void3
uint8[<=16] a

Optionally the system designer could do

@unaligned
uint8[<=16] a

to reenable the behaviour defined in v1 alpha and v0.

Rationale

The current behaviour is unintuitive and requires a thorough reading of the specification by the system designer. Even then the system designer has to manually perform and remember to maintain the ceil(log2(capacity+1)) calculation needed to install the proper void padding. The implicit behaviour would always waste bits to keep the first element of the array aligned.

Complications

uint2 a
uint8[<=16] b

So what about this? One possibility is that the DSDL compiler knows to imply this:

uint2 a
void1
uint8[<=16] b

The other is that we assume unaligned is expected if the variable length array is started on an unaligned boundary implying this:

uint2 a
@unaligned
uint8[<=16] b

I breif survey might be in order to determine which of these two behaviours are “intuitive”.

I think the change proposed here would introduce a special case by making the array length field implicitly aligned unlike other types:

We briefly discussed the possibility of introducing an opposite directive @aligned later in v1.x (x>0), but there are certain issues that arise with variable-offset fields. Specifically, let’s suppose that your proposal is implemented. How would you suggest it handle the following:

uint3[<=16] a
@assert (_offset_ % 8).count > 1  # Yup, non byte-aligned.
# <-- Insert dynamic, runtime-computed padding here?
uint8[<=16] b

I think that explicit alignment with @align is more aligned :drum: with the core design goals, but this is probably debatable. At any rate, this discussion should not be focused on arrays, because they are not special. There was a misguided statement made in Big-endian vs. little-endian in the context of bit-level encoding - #4 by scottdixon where it was implied that the serialization rules of variable-length arrays differ from other data types – that is not true.

I think we have a real opportunity here to improve the accessibility of UAVCAN by making it easier to design zero-cost types. My focus on the variable length array is because of the implicit length field that often causes unalignment without the author realizing it. Everywhere else in a message definition the author makes visible choices that would suggest alignment issues. For example:

uint2
uint2
uint4
# I'm aligned here on and 8-bit boundary
uint8[16]
# Still aligned
uint8[<=16]
# I'll probably assume each item in this array is aligned on 8-bit boundaries. I have to get
# out the spec and a calculator to find out that I'm wrong.

I propose two modifications to v1:

  1. We add the @aligned directive immediately
  2. We implicitly add @align 8 to the implicit length field of variable length arrays. We also allow @align 1 to appear before a variable length array which will override the implicit @align 8 directive and restore the current layout behaviour.
    • The @align always applies to the start of an array and not to each element in an array. To change alignment rules for array elements authors must make the element a composite type and provide alignment directives in that type’s definition.

I agree, the implicitness here is bad.

implicitly add @align 8 to the implicit…

image

Can we please postpone this until v1.1? (There are other tabled issues out there). I am a little concerned about the dynamic nature of this @align field, it’s a rather major change (as I wrote above, if _offset_ % 8 != {0} then @aligned cannot be replaced with a voidX). It won’t affect backward compatibility though so it can be added later easily.

If you are okay with postponing it, can you please open a new issue in the Specification repository and label it future?

First: HA! +1 meme usage there.

Second: is there anyway we can avoid this pitfall in 1.0 without the @align directive? Do we just always align this field on a byte boundary and only allow tighter packing in 1.1? Do we add @align but only allow it for aligning dynamic arrays in 1.0 and expanding its role in 1.1? It’s sort of the first rock people trip over when starting to use UAVCAN so it’s low-hanging fruit (waiting for meme here) to reduce the friction to adoption.

Sigh. Okay. I suppose we could go with this one, but we should also allow its placement before the @union directive to indicate that the tag should be aligned:

@align
@union  # The tag is prepended with up to 7 bits of padding.
uint8 a
float16 b

Later we will be able to expand its functionality as follows:

  • Add an optional argument to select arbitrary alignment in bits.
  • Support the case where (_offset_ % 8).count > 1 by requiring implementations to compute the required alignment at runtime (I am worried about this runtime thing because it does complicate min/max size computation).

Is it “align” or “aligned”? I suspect “align” suits the purpose better.

Would you like to submit a PR to the spec yourself where we could then settle down the exact implementation details?

We need to revisit this now because after we accepted the bit order change some of the issues involved in this discussion have been altered. I mentioned this issue briefly in the linked topic, too.

Imagine we have an array type like this:

void7
uint8[<=256] a

One might expect that its serialized representation would look like the length field promoted to uint16 encoded in the little-endian byte order and followed by the array data, for example:

>>> [bytes(x) for x in pyuavcan.dsdl.serialize(uavcan.primitive.String_1_0(
...       'Yesterday I yelled a degrading insult at an elderly lady.'))]
[b'\x39\x00Yesterday I yelled a degrading insult at an elderly lady.']

Where \x39\x00 would be the length of the string in the little-endian byte order. That would be intuitive and extensible since it would be possible to chop off a bit from the padding to enlarge the capacity. What we get instead with the new bit ordering rule is:

>>> [bytes(x) for x in pyuavcan.dsdl.serialize(uavcan.primitive.String_1_0(
...       'Yesterday I yelled a degrading insult at an elderly lady.'))]
[b'\x80\x1cYesterday I yelled a degrading insult at an elderly lady.']

Where \x80\x1c is the least significant bit of the length shifted up seven places followed by 57 >> 1.

Perhaps, the resulting bit layouts may be considered to contradict the core principles by being more convoluted than one might think by looking at a DSDL definition? In that view, I think your proposal to make implicit fields byte-size-aligned should probably be accepted by changing the following two sections of the Specification:

  • 3.7.4.2 Variable-length array type
  • 3.7.5.1 Tagged unions

Requiring the implementation to round the size of an implicit field up to the smallest of [8, 16, 32, 64].

A directive reversing this behavior can be introduced as well later.

Edit: Fields from the regulated set that will be affected, including the preceding manual padding:

$ grep '\[<' -R . -B1
./uavcan/diagnostic/32760.Record.1.0.uavcan-void6
./uavcan/diagnostic/32760.Record.1.0.uavcan:uint8[<=112] text
--
./uavcan/file/408.Read.1.0.uavcan-void7
./uavcan/file/408.Read.1.0.uavcan:uint8[<=256] data
--
./uavcan/file/409.Write.1.0.uavcan-
./uavcan/file/409.Write.1.0.uavcan:uint8[<=192] data   # 192 = 128 + 64; the write protocol permits usage of smaller chunks.
--
./uavcan/file/Path.1.0.uavcan-void1
./uavcan/file/Path.1.0.uavcan:uint8[<=MAX_LENGTH] path
--
./uavcan/internet/udp/500.HandleIncomingPacket.0.1.uavcan-void7
./uavcan/internet/udp/500.HandleIncomingPacket.0.1.uavcan:uint8[<=309] payload
--
./uavcan/internet/udp/32750.OutgoingPacket.0.1.uavcan-void2
./uavcan/internet/udp/32750.OutgoingPacket.0.1.uavcan:uint8[<=45] destination_address
--
./uavcan/internet/udp/32750.OutgoingPacket.0.1.uavcan-
./uavcan/internet/udp/32750.OutgoingPacket.0.1.uavcan:uint8[<=261] payload
--
./uavcan/metatransport/can/DataClassic.0.1.uavcan-void4
./uavcan/metatransport/can/DataClassic.0.1.uavcan:uint8[<=8] data
--
./uavcan/metatransport/can/DataFD.0.1.uavcan-void1
./uavcan/metatransport/can/DataFD.0.1.uavcan:uint8[<=64] data
--
./uavcan/metatransport/serial/Fragment.0.1.uavcan-void7
./uavcan/metatransport/serial/Fragment.0.1.uavcan:uint8[<=CAPACITY_BYTES] data
--
./uavcan/metatransport/udp/Frame.0.1.uavcan-void2
./uavcan/metatransport/udp/Frame.0.1.uavcan:uint8[<=MTU] data
--
./uavcan/node/port/431.List.0.1.uavcan-
./uavcan/node/port/431.List.0.1.uavcan:uint16[<=128] port_ids  # See above for the full description of the logic.
--
./uavcan/node/port/432.GetInfo.0.1.uavcan-void2
./uavcan/node/port/432.GetInfo.0.1.uavcan:uint8[<=50] data_type_full_name
--
./uavcan/node/434.GetTransportStatistics.0.1.uavcan-void6
./uavcan/node/434.GetTransportStatistics.0.1.uavcan:IOStatistics.0.1[<=MAX_NETWORK_INTERFACES] network_interface_statistics
--
./uavcan/node/430.GetInfo.1.0.uavcan-void2
./uavcan/node/430.GetInfo.1.0.uavcan:uint8[<=50] name
--
./uavcan/node/430.GetInfo.1.0.uavcan-void7
./uavcan/node/430.GetInfo.1.0.uavcan:uint64[<=1] software_image_crc
--
./uavcan/node/430.GetInfo.1.0.uavcan-
./uavcan/node/430.GetInfo.1.0.uavcan:uint8[<=222] certificate_of_authenticity
--
./uavcan/node/435.ExecuteCommand.1.0.uavcan-void1
./uavcan/node/435.ExecuteCommand.1.0.uavcan:uint8[<=uavcan.file.Path.1.0.MAX_LENGTH] parameter
--
./uavcan/pnp/cluster/32740.Discovery.1.0.uavcan-void2
./uavcan/pnp/cluster/32740.Discovery.1.0.uavcan:uavcan.node.ID.1.0[<=5] known_nodes
--
./uavcan/pnp/cluster/390.AppendEntries.1.0.uavcan-void7
./uavcan/pnp/cluster/390.AppendEntries.1.0.uavcan:Entry.1.0[<=1] entries
--
./uavcan/pnp/32742.NodeIDAllocationDataMTU8.0.1.uavcan-
./uavcan/pnp/32742.NodeIDAllocationDataMTU8.0.1.uavcan:uavcan.node.ID.1.0[<=1] allocated_node_id
--
./uavcan/primitive/array/Bit.1.0.uavcan-void4
./uavcan/primitive/array/Bit.1.0.uavcan:bool[<=2048] value
--
./uavcan/primitive/array/Integer16.1.0.uavcan:int16[<=128] value
--
./uavcan/primitive/array/Integer32.1.0.uavcan-void1
./uavcan/primitive/array/Integer32.1.0.uavcan:int32[<=64] value
--
./uavcan/primitive/array/Integer64.1.0.uavcan-void2
./uavcan/primitive/array/Integer64.1.0.uavcan:int64[<=32] value
--
./uavcan/primitive/array/Integer8.1.0.uavcan-void7
./uavcan/primitive/array/Integer8.1.0.uavcan:int8[<=256] value
--
./uavcan/primitive/array/Natural16.1.0.uavcan:uint16[<=128] value
--
./uavcan/primitive/array/Natural32.1.0.uavcan-void1
./uavcan/primitive/array/Natural32.1.0.uavcan:uint32[<=64] value
--
./uavcan/primitive/array/Natural64.1.0.uavcan-void2
./uavcan/primitive/array/Natural64.1.0.uavcan:uint64[<=32] value
--
./uavcan/primitive/array/Natural8.1.0.uavcan-void7
./uavcan/primitive/array/Natural8.1.0.uavcan:uint8[<=256] value
--
./uavcan/primitive/array/Real16.1.0.uavcan:float16[<=128] value                    # Exactly representable integers: [-2048, +2048]
--
./uavcan/primitive/array/Real32.1.0.uavcan-void1
./uavcan/primitive/array/Real32.1.0.uavcan:float32[<=64] value                     # Exactly representable integers: [-16777216, +16777216]
--
./uavcan/primitive/array/Real64.1.0.uavcan-void2
./uavcan/primitive/array/Real64.1.0.uavcan:float64[<=32] value                     # Exactly representable integers: [-2**53, +2**53]
--
./uavcan/primitive/String.1.0.uavcan-void7
./uavcan/primitive/String.1.0.uavcan:uint8[<=256] value
--
./uavcan/primitive/Unstructured.1.0.uavcan-void7
./uavcan/primitive/Unstructured.1.0.uavcan:uint8[<=256] value
--
./uavcan/register/Name.1.0.uavcan-void2
./uavcan/register/Name.1.0.uavcan:uint8[<=50] name
--
./regulated/zubax/actuator/esc/AngVelSetpoint.0.1.uavcan-void2
./regulated/zubax/actuator/esc/AngVelSetpoint.0.1.uavcan:uavcan.si.unit.angular_velocity.Scalar.1.0[<64] motor_mechanical_angular_velocity
--
./regulated/zubax/actuator/esc/RatiometricSetpoint.0.1.uavcan-
./regulated/zubax/actuator/esc/RatiometricSetpoint.0.1.uavcan:int10[<64] ratiometric
--
./regulated/zubax/sensor/bms/BatteryPackParams.0.1.uavcan-void2
./regulated/zubax/sensor/bms/BatteryPackParams.0.1.uavcan:uint8[<64] name
--
./regulated/zubax/sensor/bms/BatteryPackStatus.0.1.uavcan-
./regulated/zubax/sensor/bms/BatteryPackStatus.0.1.uavcan:float16[<256] cell_voltages  # [volt]

Let’s address your worrisome behaviour towards the elderly in a separate post. Otherwise I think this is the right decision.