New DSDL directive for simple compile-time checks

pavel.kirienko · October 18, 2018, 1:43pm

This is a whole new feature. I am quite certain that we shouldn’t open this brand new CAN of worms when we’re one foot in the v1.0 already. I understand that DSDL could use a lot of improvement, but let us not get ahead of ourselves here. Perhaps the whole language can be reviewed from scratch in v1.1, perhaps not.

Without the @align directive we still have to rely on offsets, so adding length alongside offset seems like unnecessary sugar. Let’s keep it simple.

kjetilkjeka · October 18, 2018, 2:19pm

It feels likes the offset assertions are a bit low level and clutter the definitions. It seems almost as easy to make mistakes with the assertions as with the dsdl code. And they’re definitely making the dsdl definitions harder to read.

I think it would be better to choose some higher level mechanism that describes intent better than the offset, like asserting on alignments.

@assert aligned # same as `@assert aligned 0`
uint7 data0
@assert aligned data
uint8[<=1] data1
void3
@assert aligned end
uint5 data2
void2
@assert aligned +2
uint64 data3

It would not be helpful for min/max offset. But what is really the point of asserting on min/max offset anyway?

It kind of feels like what we really want is a good DSDL visualizer though?

This would allow manual inspection during creation.

pavel.kirienko · October 18, 2018, 3:15pm

We want to be sure that fields that follow variable-length entities are still byte-aligned (or otherwise aligned) at all times. Here’s an example:

uint7 node_id
void1
@assert offset % 8 == 0
void3
uint8[<=16] unique_id
@assert min_offset / 8 == 2
@assert max_offset / 8 == 18

# Making sure that the next field is always aligned
@assert min_offset % 8 == 0
@assert max_offset % 8 == 0
uint16 foo

# Eventually (in a future release) we could go further and ask the compiler to
# verify the offset against every possible permutation of the preceding
# variable-length fields.
@assert any_offset % 8 == 0 # I have not given much thought to this yet

@assert max_offset / 8 < 400

Yes, it won’t work with variable-length fields where the element sizes are not a multiple of 8 bits, but those are rare (they don’t exist in the standard data type set, for example).

If strongly desired, we could add some sugar that would expand as follows:

@assert aligned
# Is equivalent to:
@assert min_offset % 8 == 0
@assert max_offset % 8 == 0

I am not sure why do we need @assert aligned data or end since the same goals can be achieved by moving the assertion statement after the field it is intended to protect.

@assert aligned
uint7 data0
uint8[<=1] data1
@assert aligned   # Applies both to the above and to the below
void3
uint5 data2
@assert aligned
void2
uint64 data3

Stuff like @assert aligned +2 doesn’t seem to be really needed, does it? Unconventional checks can always be implemented by checking against the offset variables directly, e.g. @assert (offset - 2) % 8 == 0

kjetilkjeka · October 19, 2018, 9:00am

pavel.kirienko:

We want to be sure that fields that follow variable-length entities are still byte-aligned (or otherwise aligned) at all times. Here’s an example:
uint7 node_id
void1
@assert offset % 8 == 0
void3
uint8[<=16] unique_id
@assert min_offset / 8 == 2
@assert max_offset / 8 == 18

# Making sure that the next field is always aligned
@assert min_offset % 8 == 0
@assert max_offset % 8 == 0
uint16 foo

# Eventually (in a future release) we could go further and ask the compiler to
# verify the offset against every possible permutation of the preceding
# variable-length fields.
@assert any_offset % 8 == 0 # I have not given much thought to this yet

@assert max_offset / 8 < 400
Yes, it won’t work with variable-length fields where the element sizes are not a multiple of 8 bits, but those are rare (they don’t exist in the standard data type set, for example).

This is a big problem with min/max offset. We can only check the arrays that are incredibly hard to get wrong, to begin with. We could make @assert aligned work equal to your proposed @assert any_offset % 8 == 0 without needing to reason about min/max/any offsets and mod operators.

pavel.kirienko:

I am not sure why do we need @assert aligned data or end since the same goals can be achieved by moving the assertion statement after the field it is intended to protect.
@assert aligned
uint7 data0
uint8[<=1] data1
@assert aligned   # Applies both to the above and to the below
void3
uint5 data2
@assert aligned
void2
uint64 data3
Stuff like @assert aligned +2 doesn’t seem to be really needed, does it? Unconventional checks can always be implemented by checking against the offset variables directly, e.g. @assert (offset - 2) % 8 == 0

Do you take the challange to remove end and data in the following examples?

void7
@assert aligned data
uint47[<=1] data

void5
@assert aligned data # After you've looked up SomeType you will realize that the definition is not a multiple of 8 bytes
SomeType[<=7] data

void5
@assert aligned end
SomeType[13]

@assert aligned should check that it’s always aligned. Not just in 2 arbitrarily chosen cases (min/max length).

pavel.kirienko · October 19, 2018, 11:59am

I accept the challenge for the first two definitions.

void7
uint47[<=1] data
@assert min_offset % 8 == 0

void5
# After you've looked up SomeType you will realize that the definition is
# not a multiple of 8 bytes
SomeType[<=7] data
@assert min_offset % 8 == 0

The next one cannot be solved within the limits of the current type system regardless of the chosen syntax, because dynamically computed padding is not allowed/defined. I wrote above that I don’t want to add it to UAVCAN v1.0, because it is such a major change.

void5
@assert aligned end
SomeType[13]

scottdixon · October 23, 2018, 4:27am

(and in comes Scott 10 days late, sorry).

TLDR;

For 1.0 I like the idea of supporting a simple assert as a way to start down this path without fully designing it.

I would like to see an align directive in the future but I wouldn’t introduce it just yet.
For assert to be useful we need a maximum and minimum boundary token like Pavel’s max_offset and min_offset so I agree these should go into 1.0. I would suggest using more concise tokens instead; . for max offset and $ for min offset but I understand this might be controversial.
I’d add a message parameter to the assert to make build output more meaningful.

Long Form

First, I’m really excited by the direction we are going here. I’ve been skeptical about the value of DSDL as a unique syntax until now. I really want to revisit this in 1.1 and pull in concepts from the GNU linker which solves some of the same problems we’re attacking here in its linker script syntax. But for DSDL we should add tools to help UAVCAN type authors build and enforce rules for their types.

I’d suggest using more concise tokens instead of “max_offset” and “min_offset” to allow more compact assert statements and I’d also add a message field:

void7
uint47[<=1] data
@assert $ % 8 == 0, "The minimum offset must be 8-bit aligned at this point"

This syntax (yes, the . is borrowed from ld) would allow us to add an ld-like align directive and use index math in the future to replace all our use of void:

. = ALIGN(8)
uint47[<=1] data
@assert $ == ALIGN(8), "The minimum offset must be 8-bit aligned at this point"

Speculation About the Future?

We could also consider adding a directive that helped detect sub-optimal framing based on transport. For example:

uint8[8] data
@assert . < FRAMES(CAN_2.0, 1), "This message must fit in a single CAN 2.0 frame."

For can FD would could add a token representing the end of the padded frame to allow assertions like:

uint8[8] data
# So I'm using ; as the end-of-padding token
@assert . - ; < 16, "This message should not put more than 2 bytes of padding on the bus."

Details aside, I think there’s an opportunity to make DSDL a more powerful DSL for designing types that are optimized for CAN.

pavel.kirienko · October 23, 2018, 8:28am

While I am on the same page with Scott as far as general ideas go, I have a strong objection against the proposed syntax. The GNU linker script is really a poor role model to follow.

. and $ are a good start; €, £, ¥, and ₽ provide endless possibilities for future extensions.

If we’re working with offsets, then let’s call them offsets (in English). DSDL definitions are not something people code all day long; rather, they are more or less static: written once, read a lot. So it makes sense to optimize for readability.

An optional message for the assert statement seems very sensible, as well as transport-specific checks.

scottdixon · October 23, 2018, 4:40pm

Okay. I’ll argue that when we introduce ALIGN we should use . for “current position” so we can do . = ALIGN(8) but we’re agreeing this is a 1.1 argument.

kjetilkjeka · October 25, 2018, 8:45pm

It seems like we pretty much agree that we want to be able to do compile time assertions using a new directive. And the syntax will be the following @assert <condition>.

+1 to this

I think we can do better than offset, max_offset and min_offset even as a start. The problem with offset is that its impossible to give a “real” number once a dynamic structure has been encountered. What about the following proposal.

offset actually means “possible_offsets” and is defined as a set of possible offsets (unsigned integers).
Checking for the max offset is possible by checking against the max element in the list @assert offset.max == 12
The same with min @assert offset.min == 2
Checking for exact offset is possible through @assert offset == {5}
Checking for all possible values is possible through @assert offset == {2,3,4,5}
You can also do element wise modulo on the set. Meaning that if offset == {8, 16, 24, 32} the following assertion will still hold @assert offset % 8 == {0}

Edited in notes:

The set {1, 1, 1} will be equal to {1} (since we don’t care about duplicates in a set). We can disallow duplicates in set construction initially if it makes implementation easier.
The set {1, 3, 2} will be equal to {1, 2, 3} (since we don’t care about ordering in a set). We can diallow unordered set construction initially if it makes implementation easier.
Operations are applied element wise for sets. The set {8, 16, 24, 32} % 8 is equal to the set {0, 0, 0, 0} == {0}

pavel.kirienko · October 31, 2018, 7:35pm

I have implemented that in the exact form you described in PyDSDL.

Simplicity of the DSDL grammar allowed me to take some drastic shortcuts: instead of parsing definitions using context-free parsers (as you seem to be doing in your Rust implementation), I made a simple regular grammar matcher based on regular expressions. Initially, I went the way of separate lexing and then parsing using a hand-built parser (because I wanted the library to be dependency-free for a number of unrelated reasons), but at some point I realized that it would be prudent to take advantage of the radical simplicity of DSDL, and rebuilt that part from scratch.

Contrary to the above reasoning, assert expressions are directly evaluated using the Python eval() function, which would probably count as cheating as I expect you don’t have that level of introspection in Rust, seeing as it’s a native language without runtime. In order to retain simplicity, I would suggest us to define a number of predefined regular forms of assert expressions and codify them in the specification, so that future implementers deprived of scripting languages could keep their implementations simpler. Once that fixed set of forms is defined, I will update PyDSDL to check against them to ensure portability across different parsers/compilers (because the current implementation, by virtue of being eval()-based, can eat arbitrarily complex expressions without complaining, which is not what we want).

Also, I took the liberty to add another simple directive: @print, which I expect will be useful for debugging. Its behavior is like that of @assert except that it doesn’t check anything, only prints:

@print offset      # Outputs {1024, 2048, 1032, 2056, 1040, etc.}
@assert offset % 8 == {0}

kjetilkjeka · November 1, 2018, 12:09pm

Great, it seems like the form works well in practice and don’t clutter the definitions unnecessarily. I’m happy with the end result!

We have several embeddable scripting languages in Rust. Gluon, Dyon and Rhai are the main ones.

Another simple way to do it is to compile the assertions into compile-time assertions in Rust (using rust constant evaluations for the operations) and just try to compile the resulting file with the Rust compiler.

Yes, we must validate the const expressions before handing them to the evaluator (no matter if this is Python, Gluon or a custom written evaluator). This will allow us to control which operations are used and that they are used in a manner that is allowed by DSDL. Allow all Python operations for const evaluation in DSDL would allow mutating offset and invalidate the invariant, and as you say, this is definitely not what we want.

I don’t mind keeping the const expressions simple enough that the regex validator can be used. But I also think we should think a bit more generic than just assert when defining this const eval. I think specifically that we should unify the const values in an assert, array lengths, and const definition assignments. I think the current const assignment and array length assignment is a bit weird.

I talked to @aasmune last week, and know they have similar use cases as the following minimal examples:

Using a const for array length:

uint16 SENSORS_COUNT= 19

SensorReading[SENSORS_COUNT] sensor_readings

Mixing it all up:

uint16 SENSORS_COUNT = Sensor.GROUPA_COUNT +  Sensor.GROUPB_COUNT
uint16 SENSOR_ARRAY_MAX = 10

void4
SensorReading[<=SENSOR_ARRAY_MAX] sensor_readings
@assert 4*SENSOR_ARRAY_MAX <= SENSORS_COUNT # Need to send all sensor readings in 4 messages
@assert offset % 8 == {0}

I think we can unify this with the following:

The const Integer can represent all values of all DSDL integer types (int2-64 + uint2-64) and when using a const in a const expression the conversion to const integer is implicit.
Assignment to a const is done from a const integer, type conversion is implicit but includes bounds checking that will cause a compile error (similar to an assert) it fails.
The array length is also implicitly converted from a const including a positive number check.
offset is a function that returns a Set of const integers.

This will also help us formalize whether things as the following is allowed: float32 FOO = 0x0b, uint7 CHAR = 'a'

And make things consistent by allowing: uint8[0xa] bar

pavel.kirienko · November 3, 2018, 5:32pm

We seem to be perfectly on the same page. Would you be able to bring the DSDL chapter up to date with our recent advancements? If you limited your edits solely to the DSDL directory, we might even avoid merge conflicts completely. It seems that at least the sections 3.2 to 3.4 should be rewritten from scratch, although other sections may require edits as well.

On the subject of whether to allow certain things or not, I think it is important that we lean on the conservative side, prohibiting everything unless there is a good reason not to. Lifting restrictions in future revisions of the protocol is always easier than adding them in a backward-compatible way.

In the spirit of the above declaration, I added some experimental checks in PyDSDL, which you can easily find by searching for usages of the exception types:

To address your question specifically:

This will also help us formalize whether things as the following is allowed: float32 FOO = 0x0b, uint7 CHAR = ‘a’

The first one is permitted because PyDSDL does an implicit conversion int -> float (but not the other way). The second one is prohibited by an explicit check:

github.com

UAVCAN/pydsdl/blob/6434860189ddb61a722095fe6503c7ec35a9310a/pydsdl/data_type.py#L544-L573


# Type check
if isinstance(data_type, BooleanType):
    if isinstance(value, bool):
        self._value = bool(value)  # type: typing.Union[float, int, bool]
    else:
        raise InvalidConstantValueError('Invalid value for boolean constant: %r' % value)


elif isinstance(data_type, IntegerType):
    if isinstance(value, int):
        self._value = int(value)
    elif isinstance(value, str):
        if len(value.encode('utf8')) != 1:
            raise InvalidConstantValueError('A constant string must be exactly one ASCII character long')


        if not isinstance(data_type, UnsignedIntegerType) or data_type.bit_length != 8:
            raise InvalidConstantValueError('Constant strings can be used only with uint8')


        self._value = ord(value.encode('utf8'))
    else:
        raise InvalidConstantValueError('Invalid value type for integer constant: %r' % value)

This file has been truncated. show original

Further, it is enforced that identifier names are not one of the following:

# Disallowed name patterns apply to any part of any name, e.g.,
# an attribute name, a namespace component, type name, etc.
_DISALLOWED_NAME_PATTERNS = [
    r'(?i)(bool|uint|int|void|float)\d*$',          # Data type like names
    r'(?i)(saturated|truncated)$',                  # Keywords
    r'(?i)(con|prn|aux|nul|com\d?|lpt\d?)$',        # Reserved by the specification (MS Windows compatibility)
]

Which follows the same principle “if not sure, prohibit”. This can be discussed, of course.

PyDSDL treats void fields in a special way: they are considered to be like regular fields with an empty name. This may be considered an implementation detail, or we could codify that in the specification explicitly for ease of reasoning.

PyDSDL prohibits root namespaces from being nested within each other. This is very important, as it can lead to malformed definitions, e.g. a type ns1.ns2.Type might successfully reference another type located in the namespace ns2., which does not make practical sense.

PyDSDL prohibits the same root namespace from being defined in several different locations. Meaning that spreading definitions belonging to the same root namespace across different directories is not permitted (e.g. /foo/bar/uavcan, /baz/bar/uavcan).

PyDSDL treats a version number of zeros (v0.0) as invalid, because it makes a limited practical sense. Normally, v0.1 should be preferred.

The maximum data type name length has been reduced to 63 characters due to the changes in the standard data type set. PyDSDL checks that, too.

Bit compatibility across minor versions and regulated port ID assignment rules are also all checked; I won’t list that here, it’s easier to see that directly in namespace_parser.py.

Service types are treated in a special way: they are modeled as a quasi-type which can’t be serialized directly and contain two nested types, suffixed with .Request and .Response, e.g. uavcan.node.GetInfo.Request and uavcan.node.GetInfo.Response. I am not sure whether we should leave that as an implementation detail, or if we should specify that explicitly for extra rigor.

@kjetilkjeka would you be up to describe that, together with the grammar? Meanwhile I will finish up the transport layer specs.

pavel.kirienko · November 4, 2018, 2:32pm

Further, I would like to introduce a correction to the current naming conventions: I propose to not force lower or upper case on units of measurements (units of measurements are to be specified explicitly only when they deviate from the SI). It should also be recommended to always put the unit of measurement at the end of the identifier.

float32 pressure_kPa
uint16 duration_us
uint32 TIMEOUT_ms = 150

The objective is to prevent possible confusion that may arise due to the metric prefixes being case-sensitive:

pavel.kirienko · November 11, 2018, 1:06am

I propose that the form u?q\d+_\d+ be also prohibited. It will be needed once we introduced fixed-point arithmetic primitives (e.g., q16_8 or uq24_16).

kjetilkjeka · November 11, 2018, 8:27pm

I agree on all point concerning the reserved keywords and type names (including the fixed point q)

I would like to keep this an implementation detail.

What does this mean?

That if I have a ns1.ns2.Type1 it cannot reference ns1.ns2.Type2? I don’t think this makes much sense?

This is unsound as different manufacturers will have overlapping names for different namespaces. manufacturers1.ecs.Status vs manufacturers2.ecs.Command

To be precise, this is just the name not the full path, right?

This should be an implementation detail as well.

Sorry it took me so long finishing up my other work, I will do this as soon as we sort out these minor issues.

pavel.kirienko · November 11, 2018, 8:43pm

What does this mean?
That if I have a ns1.ns2.Type1 it cannot reference ns1.ns2.Type2? I don’t think this makes much sense?

I should have been more clear, indeed. What I meant is that having two root namespaces located one inside the other is not allowed, because it may lead to problems and is not useful.

This is unsound as different manufacturers will have overlapping names for different namespaces. manufacturers1.ecs.Status vs manufacturers2.ecs.Command

My suggestion applies specifically to root namespaces, not nested namespaces. Your example is not affected by this restriction; it would be only if ecs was the root namespace.

To be precise, this is just the name not the full path, right?

This is like uavcan.node.Heartbeat.

pavel.kirienko · November 11, 2018, 8:46pm

Which I suggest to call simply “name”. The last part of it could be referred to as “short name”. The full notation could be like this: https://github.com/UAVCAN/pydsdl/blob/008c8f16a45ed2a73d7fc22bdc2991c6a538ae55/pydsdl/dsdl_definition.py#L80-L103

kjetilkjeka · November 12, 2018, 2:54pm

How is this possible, isn’t the definition of a root namespace that it isn’t contained inside another namespace?

Is what you mean that when naming a type it must either be inside the exact same namespace (and use the short name) or you must use the fully qualified name?

I still don’t understand this restriction. Does this restriction disallow having a root.esc1.control namespace and a root.esc2.control namespace? Or a root.esc1.Command definitions together with a root.esc2.Command definitions?

I think of “name” instantly as short name, if you think of it as the fully qualified name perhaps we should avoid using only “name” to avoid ambiguities. I suggest:
uavcan.node - “namespace”
Heartbeat - “short-name”
uavcan.node.Heartbeat - “fully qualified name”

pavel.kirienko · November 12, 2018, 6:01pm

Exactly, and that’s what we’re checking. Suppose a parser is invoked to parse a namespace, and that namespace needs to use types defined in another namespace. Both will have to be supplied to the parser. Now, the parser will need to ensure that one namespace directory is not nested within the other.

Neither of those is affected. What is prohibited is that you can’t invoke a parser with two root namespace directories both ending with the same final path component: /foo/bar/uavcan, /abc/baz/uavcan; because that would imply that either the same definitions are supplied twice, or members of the same root namespace are distributed across several directories. The latter is dangerous because if a parser is ever invoked on only one of those directories, it will be unable to detect possible conflicts with the other.

Ok, let’s avoid saying “name” and use either “fully qualified name” (can we just say “full name” for brevity?) or “short name”.

pavel.kirienko · November 12, 2018, 9:01pm

DSDL processing tools (e.g. parsers) should issue a warning when a non-deprecated type is dependent on a deprecated type. This would typically mean that the developer forgot to update a dependency version number somewhere in a dependent definition.

Or do we want to treat that as an error? That would make sense to ensure that all dependencies are updated timely? In that case it should also be specified in the spec.

Example:

Foo:

@deprecated
# ...definition...

Accepted dependency:

@deprecated     # The referred version of foo is deprecated, so is this one
Foo.1.0 foo

Incorrect dependency:

# This type should be deprecated by extension
# The developer probably forgot to bump the version number of Foo
Foo.1.0 foo