New DSDL directive for simple compile-time checks

pavel.kirienko · October 11, 2018, 12:46pm

While working on the new standard data type set for v1.0 I noticed that getting byte alignment right requires a bit of attention and it is easy to mess things up by miscalculating the bit length of array length prefixes and union tags. It would help a lot if one could drop a directive right in the definition that would instruct the compiler to check a simple condition and abort compilation if the condition is not met.

Simple arithmetic checks against the bit offset from the beginning of the message allow one to ensure that a given field is bit-aligned and that the size of the message does not exceed a certain limit. The latter is especially useful for standard messages as we want to ensure that no message can be larger than 400 bytes, to maximize compatibility with resource-constrained nodes.

For example:

uint7 node_id
void1
# Suppose that we want to ensure that the following field is byte-aligned:
@assert offset % 8 == 0
uint8[16] unique_id
# Now checking that we got the size of the message right
@assert offset / 8 == 17

The above approach is simple but breaks on variable-size structures. For those, we can use ranges instead.

uint7 node_id
void1
# Suppose that we want to ensure that the following field is byte-aligned:
@assert offset % 8 == 0
void3
uint8[<=16] unique_id
# Can't use offset anymore because after the above field it cannot be
# determined statically. Using ranges instead.
@assert min_offset / 8 == 2
@assert max_offset / 8 == 18
# Making sure the message is not longer than a predefined limit
@assert max_offset / 8 < 400

Supporting both strict offset and its ranges (between min and max) is useful because if one were to look at the way most definitions are created they would see that variable-length fields tend to gravitate towards the end of the definition, so it makes sense to let the user rely on the strict offset estimate as much as possible for extra determinism. Additionally, having the strict offset provided directly in the definition helps users who can’t rely on code generation tools in their applications (e.g., up until recently Libcanard did not support message code generation, so one had to serialize/deserialize messages by fiddling with bit offsets manually, which is error-prone).

To simplify implementation of third-party compilers, the specification may make this directive optional to support, allowing compilers to ignore it. Probably the most tedious part of this feature would be to come up with a sufficiently rigorous definition of the syntax of arithmetic expressions.

kjetilkjeka · October 17, 2018, 12:04pm

This seems quite messy. And also a bit prone to error as well. I think it would make more sense to support the following symbols for asserting max length.

Assert length

MAX_LENGTH - Maximum bit_length of the serialized definition.
MIN_LENGTH - Minimum bit_length for the serialized defintiion.

@assert MAX_LENGTH <= 400*8
# Definition here

or to assert that a type is statically sized

@assert MAX_LENGTH == MIN_LENGTH
# Defintion here

Aligned

If we want to make sure things are aligned we should do so with primitives that relates to alignment. With the following directives @align <offset?> we can express alignment in an ergonomic way even in situations where it was previously not possible (e.g. right after a dynamically sized field).

uint2 data1
@align -5 # Aligns to end
uint5 data2
@align # Due to last datatype being aligned to end, does nothing
uint7[<=1] data3
@align end # equals to `@align [ - bit_length(uint7) ]`
uint7 data4
@align data # equals to `@align [- log(1)]`
uint8[<=1] data5

Examples

Your example would then look like the following

@assert MAX_LENGTH == MIN_LENGTH
@assert MAX_LENGTH ==8*17

uint7 node_id
@align
uint8[16] unique_id

The other example would be

@assert MAX_LENGTH <= 8*400

uint7 node_id
@align data
uint8[<=16] unique_id

pavel.kirienko · October 17, 2018, 4:31pm

Entities are not to be multiplied without necessity. I don’t have much to say against MAX_LENGTH, other than that it introduces a new symbol where we could keep using the old MAX_OFFSET (or max_offset – lowercase or uppercase?) at the end of the definition.

As for @align – explicit is better than implicit! You are delegating the work of aligning stuff to the DSDL compiler, which is great in that the compiler is a machine and thus it’s far less likely to mess things up than us meatbags. It is bad in that it makes definitions less transparent and makes re-use of void fields more awkward because void fields used for alignment are no longer there (not explicitly at least); manual serialization also gets a bit more difficult because of the implicit fields. This approach seems to cross the line between a very simple DSL whose inner workings are immediately obvious and something that does code generation under the hood. If we are to cross that line someday in the future, perhaps there should be a more compelling reason than just adding some void fields.

Also this:

uint7 node_id
@align data             # Adding 4 bits implicitly, ok
uint3[<4] items
@align                  # What do now? Possible offsets: 16, 19, 22, 25 bits
int32 not_really_aligned

Whereas one could manually do this:

uint7 node_id
void4
@assert offset == 12
uint3[<4] items
# min_offset after a dynamic field is equivalent to "@align data" before it
@assert min_offset == 16
@assert max_offset == 25
int32 not_really_aligned

kjetilkjeka · October 18, 2018, 11:00am

Entities are not to be multiplied without necessity. I don’t have much to say against MAX_LENGTH, other than that it introduces a new symbol where we could keep using the old MAX_OFFSET (or max_offset – lowercase or uppercase?) at the end of the definition.

I think the assert on offset is a bit manual and hard to use, which is the problem we want to solve. I, therefore, wanted to replace it by assertion on length and explicit desire of alignment.

pavel.kirienko:

Also this:

uint7 node_id
@align data             # Adding 4 bits implicitly, ok
uint3[<4] items
@align                  # What do now? Possible offsets: 16, 19, 22, 25 bits
int32 not_really_aligned

The idea was that @align would have a bit more expressive power than the current void fields. It would always align with the next byte even when following a dynamic field. So it becomes

uint7 node_id
@align data             # Adding 4 bits implicitly, ok
uint3[<4] items
@align # Dynamic offset of 0, 3, 6, 1 or 2 bits
int32 in_fact_really_aligned

pavel.kirienko · October 18, 2018, 1:43pm

This is a whole new feature. I am quite certain that we shouldn’t open this brand new CAN of worms when we’re one foot in the v1.0 already. I understand that DSDL could use a lot of improvement, but let us not get ahead of ourselves here. Perhaps the whole language can be reviewed from scratch in v1.1, perhaps not.

Without the @align directive we still have to rely on offsets, so adding length alongside offset seems like unnecessary sugar. Let’s keep it simple.

kjetilkjeka · October 18, 2018, 2:19pm

It feels likes the offset assertions are a bit low level and clutter the definitions. It seems almost as easy to make mistakes with the assertions as with the dsdl code. And they’re definitely making the dsdl definitions harder to read.

I think it would be better to choose some higher level mechanism that describes intent better than the offset, like asserting on alignments.

@assert aligned # same as `@assert aligned 0`
uint7 data0
@assert aligned data
uint8[<=1] data1
void3
@assert aligned end
uint5 data2
void2
@assert aligned +2
uint64 data3

It would not be helpful for min/max offset. But what is really the point of asserting on min/max offset anyway?

It kind of feels like what we really want is a good DSDL visualizer though?

This would allow manual inspection during creation.

pavel.kirienko · October 18, 2018, 3:15pm

We want to be sure that fields that follow variable-length entities are still byte-aligned (or otherwise aligned) at all times. Here’s an example:

uint7 node_id
void1
@assert offset % 8 == 0
void3
uint8[<=16] unique_id
@assert min_offset / 8 == 2
@assert max_offset / 8 == 18

# Making sure that the next field is always aligned
@assert min_offset % 8 == 0
@assert max_offset % 8 == 0
uint16 foo

# Eventually (in a future release) we could go further and ask the compiler to
# verify the offset against every possible permutation of the preceding
# variable-length fields.
@assert any_offset % 8 == 0 # I have not given much thought to this yet

@assert max_offset / 8 < 400

Yes, it won’t work with variable-length fields where the element sizes are not a multiple of 8 bits, but those are rare (they don’t exist in the standard data type set, for example).

If strongly desired, we could add some sugar that would expand as follows:

@assert aligned
# Is equivalent to:
@assert min_offset % 8 == 0
@assert max_offset % 8 == 0

I am not sure why do we need @assert aligned data or end since the same goals can be achieved by moving the assertion statement after the field it is intended to protect.

@assert aligned
uint7 data0
uint8[<=1] data1
@assert aligned   # Applies both to the above and to the below
void3
uint5 data2
@assert aligned
void2
uint64 data3

Stuff like @assert aligned +2 doesn’t seem to be really needed, does it? Unconventional checks can always be implemented by checking against the offset variables directly, e.g. @assert (offset - 2) % 8 == 0

kjetilkjeka · October 19, 2018, 9:00am

pavel.kirienko:

We want to be sure that fields that follow variable-length entities are still byte-aligned (or otherwise aligned) at all times. Here’s an example:
uint7 node_id
void1
@assert offset % 8 == 0
void3
uint8[<=16] unique_id
@assert min_offset / 8 == 2
@assert max_offset / 8 == 18

# Making sure that the next field is always aligned
@assert min_offset % 8 == 0
@assert max_offset % 8 == 0
uint16 foo

# Eventually (in a future release) we could go further and ask the compiler to
# verify the offset against every possible permutation of the preceding
# variable-length fields.
@assert any_offset % 8 == 0 # I have not given much thought to this yet

@assert max_offset / 8 < 400
Yes, it won’t work with variable-length fields where the element sizes are not a multiple of 8 bits, but those are rare (they don’t exist in the standard data type set, for example).

This is a big problem with min/max offset. We can only check the arrays that are incredibly hard to get wrong, to begin with. We could make @assert aligned work equal to your proposed @assert any_offset % 8 == 0 without needing to reason about min/max/any offsets and mod operators.

pavel.kirienko:

I am not sure why do we need @assert aligned data or end since the same goals can be achieved by moving the assertion statement after the field it is intended to protect.
@assert aligned
uint7 data0
uint8[<=1] data1
@assert aligned   # Applies both to the above and to the below
void3
uint5 data2
@assert aligned
void2
uint64 data3
Stuff like @assert aligned +2 doesn’t seem to be really needed, does it? Unconventional checks can always be implemented by checking against the offset variables directly, e.g. @assert (offset - 2) % 8 == 0

Do you take the challange to remove end and data in the following examples?

void7
@assert aligned data
uint47[<=1] data

void5
@assert aligned data # After you've looked up SomeType you will realize that the definition is not a multiple of 8 bytes
SomeType[<=7] data

void5
@assert aligned end
SomeType[13]

@assert aligned should check that it’s always aligned. Not just in 2 arbitrarily chosen cases (min/max length).

pavel.kirienko · October 19, 2018, 11:59am

I accept the challenge for the first two definitions.

void7
uint47[<=1] data
@assert min_offset % 8 == 0

void5
# After you've looked up SomeType you will realize that the definition is
# not a multiple of 8 bytes
SomeType[<=7] data
@assert min_offset % 8 == 0

The next one cannot be solved within the limits of the current type system regardless of the chosen syntax, because dynamically computed padding is not allowed/defined. I wrote above that I don’t want to add it to UAVCAN v1.0, because it is such a major change.

void5
@assert aligned end
SomeType[13]

scottdixon · October 23, 2018, 4:27am

(and in comes Scott 10 days late, sorry).

TLDR;

For 1.0 I like the idea of supporting a simple assert as a way to start down this path without fully designing it.

I would like to see an align directive in the future but I wouldn’t introduce it just yet.
For assert to be useful we need a maximum and minimum boundary token like Pavel’s max_offset and min_offset so I agree these should go into 1.0. I would suggest using more concise tokens instead; . for max offset and $ for min offset but I understand this might be controversial.
I’d add a message parameter to the assert to make build output more meaningful.

Long Form

First, I’m really excited by the direction we are going here. I’ve been skeptical about the value of DSDL as a unique syntax until now. I really want to revisit this in 1.1 and pull in concepts from the GNU linker which solves some of the same problems we’re attacking here in its linker script syntax. But for DSDL we should add tools to help UAVCAN type authors build and enforce rules for their types.

I’d suggest using more concise tokens instead of “max_offset” and “min_offset” to allow more compact assert statements and I’d also add a message field:

void7
uint47[<=1] data
@assert $ % 8 == 0, "The minimum offset must be 8-bit aligned at this point"

This syntax (yes, the . is borrowed from ld) would allow us to add an ld-like align directive and use index math in the future to replace all our use of void:

. = ALIGN(8)
uint47[<=1] data
@assert $ == ALIGN(8), "The minimum offset must be 8-bit aligned at this point"

Speculation About the Future?

We could also consider adding a directive that helped detect sub-optimal framing based on transport. For example:

uint8[8] data
@assert . < FRAMES(CAN_2.0, 1), "This message must fit in a single CAN 2.0 frame."

For can FD would could add a token representing the end of the padded frame to allow assertions like:

uint8[8] data
# So I'm using ; as the end-of-padding token
@assert . - ; < 16, "This message should not put more than 2 bytes of padding on the bus."

Details aside, I think there’s an opportunity to make DSDL a more powerful DSL for designing types that are optimized for CAN.

pavel.kirienko · October 23, 2018, 8:28am

While I am on the same page with Scott as far as general ideas go, I have a strong objection against the proposed syntax. The GNU linker script is really a poor role model to follow.

. and $ are a good start; €, £, ¥, and ₽ provide endless possibilities for future extensions.

If we’re working with offsets, then let’s call them offsets (in English). DSDL definitions are not something people code all day long; rather, they are more or less static: written once, read a lot. So it makes sense to optimize for readability.

An optional message for the assert statement seems very sensible, as well as transport-specific checks.

scottdixon · October 23, 2018, 4:40pm

Okay. I’ll argue that when we introduce ALIGN we should use . for “current position” so we can do . = ALIGN(8) but we’re agreeing this is a 1.1 argument.

kjetilkjeka · October 25, 2018, 8:45pm

It seems like we pretty much agree that we want to be able to do compile time assertions using a new directive. And the syntax will be the following @assert <condition>.

+1 to this

I think we can do better than offset, max_offset and min_offset even as a start. The problem with offset is that its impossible to give a “real” number once a dynamic structure has been encountered. What about the following proposal.

offset actually means “possible_offsets” and is defined as a set of possible offsets (unsigned integers).
Checking for the max offset is possible by checking against the max element in the list @assert offset.max == 12
The same with min @assert offset.min == 2
Checking for exact offset is possible through @assert offset == {5}
Checking for all possible values is possible through @assert offset == {2,3,4,5}
You can also do element wise modulo on the set. Meaning that if offset == {8, 16, 24, 32} the following assertion will still hold @assert offset % 8 == {0}

Edited in notes:

The set {1, 1, 1} will be equal to {1} (since we don’t care about duplicates in a set). We can disallow duplicates in set construction initially if it makes implementation easier.
The set {1, 3, 2} will be equal to {1, 2, 3} (since we don’t care about ordering in a set). We can diallow unordered set construction initially if it makes implementation easier.
Operations are applied element wise for sets. The set {8, 16, 24, 32} % 8 is equal to the set {0, 0, 0, 0} == {0}

pavel.kirienko · October 31, 2018, 7:35pm

I have implemented that in the exact form you described in PyDSDL.

Simplicity of the DSDL grammar allowed me to take some drastic shortcuts: instead of parsing definitions using context-free parsers (as you seem to be doing in your Rust implementation), I made a simple regular grammar matcher based on regular expressions. Initially, I went the way of separate lexing and then parsing using a hand-built parser (because I wanted the library to be dependency-free for a number of unrelated reasons), but at some point I realized that it would be prudent to take advantage of the radical simplicity of DSDL, and rebuilt that part from scratch.

Contrary to the above reasoning, assert expressions are directly evaluated using the Python eval() function, which would probably count as cheating as I expect you don’t have that level of introspection in Rust, seeing as it’s a native language without runtime. In order to retain simplicity, I would suggest us to define a number of predefined regular forms of assert expressions and codify them in the specification, so that future implementers deprived of scripting languages could keep their implementations simpler. Once that fixed set of forms is defined, I will update PyDSDL to check against them to ensure portability across different parsers/compilers (because the current implementation, by virtue of being eval()-based, can eat arbitrarily complex expressions without complaining, which is not what we want).

Also, I took the liberty to add another simple directive: @print, which I expect will be useful for debugging. Its behavior is like that of @assert except that it doesn’t check anything, only prints:

@print offset      # Outputs {1024, 2048, 1032, 2056, 1040, etc.}
@assert offset % 8 == {0}

kjetilkjeka · November 1, 2018, 12:09pm

Great, it seems like the form works well in practice and don’t clutter the definitions unnecessarily. I’m happy with the end result!

We have several embeddable scripting languages in Rust. Gluon, Dyon and Rhai are the main ones.

Another simple way to do it is to compile the assertions into compile-time assertions in Rust (using rust constant evaluations for the operations) and just try to compile the resulting file with the Rust compiler.

Yes, we must validate the const expressions before handing them to the evaluator (no matter if this is Python, Gluon or a custom written evaluator). This will allow us to control which operations are used and that they are used in a manner that is allowed by DSDL. Allow all Python operations for const evaluation in DSDL would allow mutating offset and invalidate the invariant, and as you say, this is definitely not what we want.

I don’t mind keeping the const expressions simple enough that the regex validator can be used. But I also think we should think a bit more generic than just assert when defining this const eval. I think specifically that we should unify the const values in an assert, array lengths, and const definition assignments. I think the current const assignment and array length assignment is a bit weird.

I talked to @aasmune last week, and know they have similar use cases as the following minimal examples:

Using a const for array length:

uint16 SENSORS_COUNT= 19

SensorReading[SENSORS_COUNT] sensor_readings

Mixing it all up:

uint16 SENSORS_COUNT = Sensor.GROUPA_COUNT +  Sensor.GROUPB_COUNT
uint16 SENSOR_ARRAY_MAX = 10

void4
SensorReading[<=SENSOR_ARRAY_MAX] sensor_readings
@assert 4*SENSOR_ARRAY_MAX <= SENSORS_COUNT # Need to send all sensor readings in 4 messages
@assert offset % 8 == {0}

I think we can unify this with the following:

The const Integer can represent all values of all DSDL integer types (int2-64 + uint2-64) and when using a const in a const expression the conversion to const integer is implicit.
Assignment to a const is done from a const integer, type conversion is implicit but includes bounds checking that will cause a compile error (similar to an assert) it fails.
The array length is also implicitly converted from a const including a positive number check.
offset is a function that returns a Set of const integers.

This will also help us formalize whether things as the following is allowed: float32 FOO = 0x0b, uint7 CHAR = 'a'

And make things consistent by allowing: uint8[0xa] bar

pavel.kirienko · November 3, 2018, 5:32pm

We seem to be perfectly on the same page. Would you be able to bring the DSDL chapter up to date with our recent advancements? If you limited your edits solely to the DSDL directory, we might even avoid merge conflicts completely. It seems that at least the sections 3.2 to 3.4 should be rewritten from scratch, although other sections may require edits as well.

On the subject of whether to allow certain things or not, I think it is important that we lean on the conservative side, prohibiting everything unless there is a good reason not to. Lifting restrictions in future revisions of the protocol is always easier than adding them in a backward-compatible way.

In the spirit of the above declaration, I added some experimental checks in PyDSDL, which you can easily find by searching for usages of the exception types:

To address your question specifically:

This will also help us formalize whether things as the following is allowed: float32 FOO = 0x0b, uint7 CHAR = ‘a’

The first one is permitted because PyDSDL does an implicit conversion int -> float (but not the other way). The second one is prohibited by an explicit check:

github.com

UAVCAN/pydsdl/blob/6434860189ddb61a722095fe6503c7ec35a9310a/pydsdl/data_type.py#L544-L573


# Type check
if isinstance(data_type, BooleanType):
    if isinstance(value, bool):
        self._value = bool(value)  # type: typing.Union[float, int, bool]
    else:
        raise InvalidConstantValueError('Invalid value for boolean constant: %r' % value)


elif isinstance(data_type, IntegerType):
    if isinstance(value, int):
        self._value = int(value)
    elif isinstance(value, str):
        if len(value.encode('utf8')) != 1:
            raise InvalidConstantValueError('A constant string must be exactly one ASCII character long')


        if not isinstance(data_type, UnsignedIntegerType) or data_type.bit_length != 8:
            raise InvalidConstantValueError('Constant strings can be used only with uint8')


        self._value = ord(value.encode('utf8'))
    else:
        raise InvalidConstantValueError('Invalid value type for integer constant: %r' % value)

This file has been truncated. show original

Further, it is enforced that identifier names are not one of the following:

# Disallowed name patterns apply to any part of any name, e.g.,
# an attribute name, a namespace component, type name, etc.
_DISALLOWED_NAME_PATTERNS = [
    r'(?i)(bool|uint|int|void|float)\d*$',          # Data type like names
    r'(?i)(saturated|truncated)$',                  # Keywords
    r'(?i)(con|prn|aux|nul|com\d?|lpt\d?)$',        # Reserved by the specification (MS Windows compatibility)
]

Which follows the same principle “if not sure, prohibit”. This can be discussed, of course.

PyDSDL treats void fields in a special way: they are considered to be like regular fields with an empty name. This may be considered an implementation detail, or we could codify that in the specification explicitly for ease of reasoning.

PyDSDL prohibits root namespaces from being nested within each other. This is very important, as it can lead to malformed definitions, e.g. a type ns1.ns2.Type might successfully reference another type located in the namespace ns2., which does not make practical sense.

PyDSDL prohibits the same root namespace from being defined in several different locations. Meaning that spreading definitions belonging to the same root namespace across different directories is not permitted (e.g. /foo/bar/uavcan, /baz/bar/uavcan).

PyDSDL treats a version number of zeros (v0.0) as invalid, because it makes a limited practical sense. Normally, v0.1 should be preferred.

The maximum data type name length has been reduced to 63 characters due to the changes in the standard data type set. PyDSDL checks that, too.

Bit compatibility across minor versions and regulated port ID assignment rules are also all checked; I won’t list that here, it’s easier to see that directly in namespace_parser.py.

Service types are treated in a special way: they are modeled as a quasi-type which can’t be serialized directly and contain two nested types, suffixed with .Request and .Response, e.g. uavcan.node.GetInfo.Request and uavcan.node.GetInfo.Response. I am not sure whether we should leave that as an implementation detail, or if we should specify that explicitly for extra rigor.

@kjetilkjeka would you be up to describe that, together with the grammar? Meanwhile I will finish up the transport layer specs.

pavel.kirienko · November 4, 2018, 2:32pm

Further, I would like to introduce a correction to the current naming conventions: I propose to not force lower or upper case on units of measurements (units of measurements are to be specified explicitly only when they deviate from the SI). It should also be recommended to always put the unit of measurement at the end of the identifier.

float32 pressure_kPa
uint16 duration_us
uint32 TIMEOUT_ms = 150

The objective is to prevent possible confusion that may arise due to the metric prefixes being case-sensitive:

pavel.kirienko · November 11, 2018, 1:06am

I propose that the form u?q\d+_\d+ be also prohibited. It will be needed once we introduced fixed-point arithmetic primitives (e.g., q16_8 or uq24_16).

kjetilkjeka · November 11, 2018, 8:27pm

I agree on all point concerning the reserved keywords and type names (including the fixed point q)

I would like to keep this an implementation detail.

What does this mean?

That if I have a ns1.ns2.Type1 it cannot reference ns1.ns2.Type2? I don’t think this makes much sense?

This is unsound as different manufacturers will have overlapping names for different namespaces. manufacturers1.ecs.Status vs manufacturers2.ecs.Command

To be precise, this is just the name not the full path, right?

This should be an implementation detail as well.

Sorry it took me so long finishing up my other work, I will do this as soon as we sort out these minor issues.

pavel.kirienko · November 11, 2018, 8:43pm

What does this mean?
That if I have a ns1.ns2.Type1 it cannot reference ns1.ns2.Type2? I don’t think this makes much sense?

I should have been more clear, indeed. What I meant is that having two root namespaces located one inside the other is not allowed, because it may lead to problems and is not useful.

This is unsound as different manufacturers will have overlapping names for different namespaces. manufacturers1.ecs.Status vs manufacturers2.ecs.Command

My suggestion applies specifically to root namespaces, not nested namespaces. Your example is not affected by this restriction; it would be only if ecs was the root namespace.

To be precise, this is just the name not the full path, right?

This is like uavcan.node.Heartbeat.