Problems with DS-015

coder_kalyan · May 4, 2021, 12:29am

we should just stop using covariance matrices.

It’s helpful to have them for when they are actually used but I agree that in the majority of scenarios they are not worth the bandwidth (for instance I just send 0s most of the time).

We need it to be broadcast in a form that says “this is from a pitot tube, if you want to get a pitot based airspeed you can use this”.

Can’t this be done with subject name semantics? I don’t think a specific message is necessary.

tridge · May 4, 2021, 12:50am

strings aren’t great for separating semantics, unless you have well defined strings to ensure everyone always uses the same string.
Personally I’d prefer a separate msg

Madman · May 4, 2021, 3:55am

Dropping my two cents into the conversation. I, like others, agree with @tridge and what he has said. But a hard fork is going somewhat backwards IMO.

tridge · May 4, 2021, 3:59am

I wouldn’t be posting here if I didn’t want to avoid a fork. I’d much rather find a solution that meets the technical needs while working across the broader UAVCAN community.
I just can’t ignore the technical requirements, so if we have to split into APCAN then we will, but at this stage I’m hopeful it will not be necessary.

tridge · May 4, 2021, 4:01am

I alluded to some concerns with mixed networks in the above discussion. I’ve now started a separate topic for that:

pavel.kirienko · May 4, 2021, 3:44pm

@tridge I apologize for being a bit late for this discussion — it took me a bit of time to dig through the extensive background and understand where ArduPilot is coming from. The worst thing I could do in this situation is to give a quick knee-jerk response. Let me know if you think I missed anything critical from the preceding conversation.

Common ground

Everyone involved in this conversation or who would like to get involved should first understand what DS-015 is and what it is not. Please, do not post anything regarding DS-015 until you have thoroughly familiarized yourself with the following sources (I know that @tridge, @dagar, and @coder_kalyan have already done so):

https://uavcan.org (the text on the main page explains what to expect from UAVCAN v1)
https://uavcan.org/specification (reading the first two chapters will suffice)
The Cyphal Guide - Applications & Usage - OpenCyphal Forum
Choosing Message and Service IDs (more on this below)
https://github.com/Dronecode/SIG-CAN-Drone/blob/main/DS-015%20UAVCAN%20Drone%20Standard%20v1.0.pdf

I want to set the foundation by listing certain points that I believe we can easily agree on.

First, this conversation is not about UAVCAN v1. I think both @tridge and I agree that ArduPilot, and the entire ecosystem around it, will definitely benefit from supporting v1 because it does fix many issues that used to be present in v0. For instance, v1 allows one to modify data types without breaking wire compatibility (which has already been successfully demonstrated); also, it is transport-agnostic, which will become important in the longer term. This conversation, however, is about the application-layer communication standard built on top of UAVCAN v1. Unlike v0, the stable version does not address any application-layer objectives; instead, it delegates this task to higher-level standards. DS-015 is one such standard. There may be others. One such standard may be crafted by the ArduPilot team independently from anyone else (although I would gladly offer my best advice if they welcome it); we will be referring to this hypothetical entity as “APCAN”.

Second, DS-015 is not the best choice for building sensor networks, just as a microscope is a poor choice of tool for hammering in nails. The criticism provided by Tridge is entirely correct — if you take a thing designed to do X and apply it for task Y you should expect suboptimal results. It saddens me that the critique is so off the mark, I take it that I should have done a better job at explaining why DS-015 is designed this way. I will try to correct this, but in return, I would like to ask you to do your part by honestly zooming out without holding onto your existing preconceptions.

Third, we should strive to unify our requirements instead of building APCAN next to DS-015, as the fork will be beneficial no neither party. Find my proposal at the end of this post.

What DS-015 is not?

It is not a replacement for I2C, SPI, UART, or UAVCAN v0. Much of the frustration in @tridge’s post comes from incorrectly set expectations.

Let’s say, you have been using a hammer for a long while. It wasn’t a great hammer, but it was just good enough to do its job. Then I came along, took your hammer away, and gave you a new screwdriver as a replacement. You looked at me in bewilderment, asking if I have completely gone insane, because how are you supposed to hammer in nails using that.

You can’t use idiomatic DS-015 for ferrying sensor measurements or actuator commands between the flight controller and its peripherals.

UAVCAN v0 integrates with your flight controller at the driver layer. DS-015 integrates with your flight controller at the layer of its business logic because the other network participants become part of the high-level control processes that used to be concentrated exclusively in the autopilot. This is what you can build using v0, but not idiomatic DS-015:

UAVCAN v0 is built for data exchange. DS-015 is built on the modern theory of distributed computing. @coder_kalyan has done a decent job summarizing the basics — partly repeating the UAVCAN Guide — so I will omit the details.

No more data type identifiers

We need to correct a critical error of interpretation that I spot in these posts (note the added emphasis):

These passages make me realize that I have probably done a poor job writing section Semantic segregation of the Interface Design Guidelines, introduction to the DS-015 standard, and the chapter “Basic concepts” of the UAVCAN Specification because all of these were supposed to address or prevent this misunderstanding. Let me quote the relevant bit from The Guide:

Instantiating a service necessarily involves assigning its subjects and UAVCAN-services certain specific port-identifiers at the discretion of the implementer (for example, configuring an air data computer of an aircraft to publish its estimates as defined by the air data service contract over specific subject-IDs chosen by the integrator).

Excepting special use cases, the port-ID assignment is necessarily removed from the scope of service specification because its inclusion would render the service inherently less composable and less reusable, and additionally forcing the service designer to decide in advance that there should be at most one instance of the said service per network. While it is possible to embed the means of instance identification into the service contract itself (for example, by extending the data types with a numerical instance identifier like it is sometimes done in DDS keyed topics), this practice is ill-advised because it constitutes a leaky abstraction that couples the service instance identification with its domain objects.

So what does it mean practically? Here is an example. Suppose you want to connect a differential pressure sensor to a legacy UAVCAN v0 network. You would make a data type that might look as follows:

uint8 sensor_id              # Which specific sensor is it?
float32 pressure_difference  # [pascal]

You would assign it a data type ID, let’s say, 20001. Then, when integrating a new sensor into the network, you configure it, defining which value of sensor_id should it publish. Alternatively, you could use the node-ID of the sensor to differentiate its data from other sensors of the same kind.

None of these methods work in UAVCAN v1: there are no data type identifiers. Further, the Guide also explains why the node-ID should not be used at the application layer (excepting special scenarios that are not related to this discussion).

In UAVCAN v1, your data type would look as follows:

float32 pressure_difference  # [pascal]

Or, since it is just a physics thing, you could just use uavcan.si.unit.pressure.Scalar.1.0 to the same effect.

Okay, but if there is no data type identifier, then how does your sensor know how to publish the data, and how does your subscriber (e.g., the autopilot) know where to look for this data? The nodes are configured at the time of their integration into the system. UAVCAN v1 fixes the so-called syntax-semantic entanglement problem that was, by far, the worst offender among the design deficiencies present in v0.

UAVCAN v1 offers port-identifiers as a replacement, but they cannot be set statically at the data type definition level, as explained in the linked resources. Exceptions are given for some standard data type definitions, but only that — a vendor cannot define a data type with a fixed identifier. Section “Basic concepts” of the Specification explains why.

This means that the user of your differential pressure sensor cannot possibly integrate the unit into the system without first configuring the subject-ID at which its data should be published; said configuration is also performed via UAVCAN using the Register Interface (no need to craft additional user interfaces). Did you miss this point?

It is also possible to automate the port-ID assignment in certain scenarios, although this is expected to be of limited utility in general. There is an unfinished proposal that can be resurrected if there is interest.

If you are surprised by this, please stop now and read this discussion before continuing, because it is absolutely instrumental for us to have a sensible conversation:

To have a hands-on experience with the computation graph offered by UAVCAN v1, please run this Python demo on your local computer (works best with GNU/Linux): https://pyuavcan.readthedocs.io/en/stable/pages/demo.html. Then also run the DS-015 servo demo to see the embedded side of the same problem.

I hope the questions that I quoted at the beginning of this section are now answered.

No more sensor nodes

UAVCAN v1 is built to enable (hard) real-time distributed computing. Distribution enables one to construct more complex systems using less complex individual parts. This point, in essence, mirrors the old debate about the differences between monolithic and microkernels in OS design, which I presume most are well-familiar with. If you want a more in-depth discussion of this aspect, refer to the Guide.

DS-015 takes advantage of the capabilities offered by v1, bringing the specifics of one particular domain (drones) together with UAVCAN such that the tasks that are pervasive in this domain are addressed consistently in a way that is easy to standardize around.

While plug-and-play is, generally, in the scope of DS-015, one should not expect it to be as extensive as in v0. Since we are talking about a distributed system, expecting it to be entirely plug-and-play is akin to expecting the autopilot firmware to write itself.

Similar principles of distribution and compartmentalization stand behind ARINC 653, which also enables one to construct independently certifiable components, simplifying the maintenance and upgrade of the system. The benefits of such architectures do not necessarily need to be confined to safety-certifiable systems only, of course.

DS-015 assumes that every component is an independent agent that works in collaboration with its peers, such that the role of the underlying UAVCAN v1 is similar to IPC. An existing piece of COTS UAVCAN v0 drone hardware can be made DS-015 compliant by merely swapping its software, without the need to alter the hardware (excepting, perhaps, some marginal scenarios I am not aware of). However, the software does become more complex, which is acknowledged. I presume that vendors of low-cost drone hardware who don’t have access to adequate software development expertise won’t be able to pull it off unless we provided them with extensive support, perhaps in a fashion similar to AP_Periph. The UAVCAN Consortium is well-equipped for this type of collaboration.

The following quote from an adjacent topic reveals critical misunderstanding:

There are several issues:

UAVCAN v1 does not really say anything about sensor nodes, because it is below the layer of abstraction where this distinction makes sense. In UAVCAN v1, there are just nodes. You may call them sensor nodes if you want, this is fine.
DS-015 models physical processes and subsystems instead of sensors. Calling a DS-015 node a “sensor node” is like calling a public Java method a “subroutine”: it is either wrong, or it is evidence of poor design. Idiomatic DS-015 assumes that you hide your sensor behind a higher-layer abstraction. The airspeed estimation with the IAS/CAS debate is a good example of this distinction.

We are equipped now to address the specific problems listed in the OP post:

Do you also need to analyze the raw current measurements made by the ESCs? Unless you have any highly non-trivial requirements I am not aware of, I don’t expect the need to analyze the raw data of every sensor to persist once you adopt the distributed mindset.

The logging aspect is handled by UAVCAN itself, which assumes publishing diagnostic data (such as internal states or informational messages) at a low priority level. We will talk about this more in the thread about the transition from v0 to v1.

This is not really a deficiency of DS-015, but rather a case of incompatible requirements. More on this in the next section.

If this is manageable for the autopilot, then it is manageable for the air data computer node as well.

Your requirements are incompatible with idiomatic DS-015

I understand that what you are looking for at this moment has little to do with idiomatic DS-015. While I am confident that sooner or later you will acknowledge the benefits of distributed architecture outside of the most trivial scenarios, at the moment you need a simpler solution. To this end, @dagar has suggested a middle-ground solution that is mostly valid and can be implemented without causing undue fragmentation of the ecosystem.

A basic airspeed sensor node (sic!) can be easily implemented using the physics data types provided by DS-015 and the standard uavcan.si namespace. This does not agree with idiomatic DS-015 but it will be built on the same basic foundations and opens a solid path for an eventual transition to DS-015 for adopters who find value in it.

Taking our airspeed sensor node (that is, taking the low-level approach as opposed to the alternative offered by DS-015) as an example, we could conceivably make it publish on the following subjects:

Subject	Type
`differential_pressure`	`uavcan.si.unit.pressure.Scalar.1.0`
`outside_air_temperature`	`uavcan.si.unit.temperature.Scalar.1.0`

You could also take the more complex types from the physics namespace. We could also alter them or define new ones. We are entirely open to extending the reg.drone.physics namespace to suit your requirements.

There is another alternative that is not mutually exclusive with the above. Your usage of UAVCAN does not really allow you to leverage the architectural advantages it offers — you are treating it as a point-to-point, star topology network (I am talking about the application layer topology here) to mimic I2C/SPI. In this case, we could also consider defining an additional profile next to reg.drone specifically for tunneling I2C, SPI, and possibly other protocols over UAVCAN. This way you could plug UAVCAN v1 as a backend for one of your I2C/SPI sensor drivers, which I suppose should be relatively easy to do.

Here is a call to action for you: please define a list of data objects that the airspeed sensor node has to publish and subscribe to. Then we will work together to make this design aligned with the DS-015 type system. It won’t be idiomatic DS-015, but at least it will rest on the same foundation using the same type library, opening the path for future convergence. I would like you to postpone forming premature opinions about the results of this experiment until it is concluded.

Other applications can benefit from idiomatic DS-015

As I wrote in my last email, the UAVCAN Consortium is inclined to fund work on implementing the support for idiomatic DS-015 on the ArduPilot side. The first step might be focused on supporting DS-015 actuators. I would like to gauge your opinion on this and whether you would be open to accepting high-quality contributions to this end. I do not expect this work to burden the core dev team beyond reviewing pull requests.

Meta: about this discussion

I would like to invite everyone to keep the conversation constructive and strictly on-point. Please desist from posting anecdotes and “+1”-style responses that do not add new information. Also, I would kindly ask everyone to avoid sharing opinions about DS-015 or UAVCAN v1 until you have at least a basic understanding of what these are.

We aspire to somewhat higher standards of discourse than some of the newly registered users might be used to. If this seems new, consider reading the FAQ.

coder_kalyan · May 4, 2021, 8:55pm

Hello @pavel.kirienko, thanks for the lengthy and clear explanation. I’d like to summarize a few things that we might want to prioritize going forward in the community:

UAVCANv0 versus v1

There is a lot of software (and more importantly, hardware, as it’s easier to scrap software than hardware) that exists that was designed to run UAVCANv0. A lot of this does not translate to idiomatic UAVCAN v1/DS-015, but I don’t think we should respond by deprecating all of this hardware and telling people to completely rethink their network architecture overnight.

There is another alternative that is not mutually exclusive with the above. Your usage of UAVCAN does not really allow you to leverage the architectural advantages it offers — you are treating it as a point-to-point, star topology network (I am talking about the application layer topology here) to mimic I2C/SPI. In this case, we could also consider defining an additional profile next to reg.drone specifically for tunneling I2C, SPI, and possibly other protocols over UAVCAN. This way you could plug UAVCAN v1 as a backend for one of your I2C/SPI sensor drivers, which I suppose should be relatively easy to do.

Perhaps one layer higher than a raw protocol tunnel would be preferable as it would probably reduce bandwidth waste and allow the node to clean up some of the useless hardware-specific implementation details, but I agree that having a lower-level-than-DS-015 solution alongside DS-015 will help especially with the migration process from UAVCAN v0 idioms that I mentioned in the previous point.

Another consideration is that despite UAVCANv1 being designed as a facilitator for application-level, abstract distributed computing, there are many scenarios where people are simply looking to use the physical CAN transport to solve physical layer deficiencies of I2C/SPI/UART. I personally think that the transport layer UAVCAN protocol is flexible enough to be applied to at least a subset of these use cases, so I think developing an application level spec or at least idioms that enable this is worthwhile. Part of this confusion comes from the legacy name of “CAN” inside “UAVCAN,” but that’s something we have to live with. There are also specific situations where the costs (one possible example being an air_data_computer) of implementing distributed computing outweigh the benefits, which I think also presents the need for a low level spec.

APCAN

I understand that what you are looking for at this moment has little to do with idiomatic DS-015. While I am confident that sooner or later you will acknowledge the benefits of distributed architecture outside of the most trivial scenarios, at the moment you need a simpler solution. To this end, @dagar has suggested a middle-ground solution that is mostly valid and can be implemented without causing undue fragmentation of the ecosystem.

I agree with @dagar’s points, DS-015 could do with some improvements. I think that while I agree with the architectural goals of DS-015, the standard neglects some of the realities about the current state of vehicular systems, and not all of these can be simply brushed under the rug in the name of “better architecture.” A compromise can be made, however. For instance, a lower level message set that accompanies the higher level one, or making the covariance types optional.

One such standard may be crafted by the ArduPilot team independently from anyone else (although I would gladly offer my best advice if they welcome it); we will be referring to this hypothetical entity as “APCAN”.

I don’t believe that developing another high-level spec is beneficial to the community. It encourages the “fork when you disagree instead of improving” mindset, leads to software and hardware fragmentation, and further widens the architectural gaps between the Ardupilot community and the PX4/Dronecode community, which ultimately hurts users of both platforms.

PnP

While plug-and-play is, generally, in the scope of DS-015, one should not expect it to be as extensive as in v0. Since we are talking about a distributed system, expecting it to be entirely plug-and-play is akin to expecting the autopilot firmware to write itself.

I think this is one point on which I disagree with @pavel.kirienko. I agree that it is very ambitious and perhaps impossible to make a 100% plug and play system while upholding the current design principles of UAVCANv1. However, I don’t think enough emphasis is being placed on pnp. As it currently stands, UAVCANv1 is only really situated to be leveraged by an experienced vehicular system/network designer. I think that UAVCAN has a scope bigger than that, and it would significantly stunt adoption to continue this mindset. Users of both PX4 and Ardupilot have enjoyed a very plug-and-play friendly architecture for many years now (being able to grab both supported and community-maintained flight controller boards, wire up sensors and peripherals to the ports, and get most equipment to “just work” with relatively little manual configuration), which is one of the reasons that someone can get involved with building autonomous vehicles without knowing the intricacies of how everything works. If, suddenly, every actuator, battery management system, external smart ADC/GNSS/etc required hours of reading its documentation in order to get a system up and running, there would be much less incentive to use it.

I think the spec and implementation of pnp should stay independent of both DS-015 and “APCAN” if that comes to be, and I’m interested in pursuing this further.

Usage semantics

I would appreciate it if @pavel.kirienko could improve the documentation regarding how message data definitions (DSDL) differs from the semantics of how the subject is being used. I hope that at least some of this can be automated with pnp in the future but using message data type names to represent the use of a subject, while perhaps the most obvious solution, is not very flexible.

P.S:

Then also run the DS-015 servo demo to see the embedded side of the same problem.

@pavel.kirienko This link seems to be broken?

pavel.kirienko · May 5, 2021, 10:24am

A post was merged into an existing topic: Handling mixed v0/v1 and Classic CAN / CAN FD networks

pavel.kirienko · May 5, 2021, 11:17am

Sure, but observe that many hardware products can support DS-015 by means of a mere software update. This should be the preferred way forward. Where this is impractical, non-idiomatic DS-015 (as suggested by Daniel) remains available as the alternative.

This is what the non-idiomatic DS-015 option is about though — if you must carry raw measurements, you can use the data types defined by DS-015 (or even UAVCAN v1 itself, see uavcan.si) directly. I hope we can construct an example airspeed sensor node with Andrew to demonstrate that this approach is viable.

I think you are overestimating the amount of manual configuration required to get a piece of basic hardware to work. If you open the servo demo (the link is valid btw, you just need to accept the invitation I just shared), you will see that connecting it to another system (which could be an autopilot or your laptop) only takes assigning a few numbers. This is comparable to configuring a slightly unconventional GNSS unit.

Regardless, we can resurrect the existing limited auto-configuration proposal that I mentioned, I just don’t think it should be a priority right now because we don’t have anything to apply it to.

Section “Semantic segregation” of the Guide is dedicated to this problem. I think it should help one understand the core idea by viewing subjects as objects (class instances) in an OOP program, and DSDL types as classes.

You define a class to model a concept, then you apply it to a concrete problem by creating an instance. Same goes with DSDL data types and subjects: you have a data type that models the kinematic state of a body. Then you create subject of this type to model the kinematic state of a particular mechanism on your robot.

“Using message data type names to represent the use of a subject” is valid in a very limited set of scenarios. In OOP these are called singletons.

Maarrk · May 5, 2021, 7:05pm

Thesis: The direction DS-015 is going into fundamentally misaligned with ArduPilot expectations of UAVCAN. Alternatively, ArduPilot community is “misusing” UAVCAN to create sensor networks.

Proposal: Not deprecating UAVCAN v0

Having read the sources recommended by @pavel.kirienko, I agree that the requirements proposed by @tridge are incompatible with current DS-015. At the risk of adding a “+1 type post” to the discussion, I also think that ensuring interoperation of v0 and v1 in the same networks would be more important to current UAVCAN on ArduPilot users than getting a design-wise better, but currently (!) less usable standard.

I think that ArduPilot cannot fully embrace the current goals of UAVCAN. The very first design goal is the first incompatibility that, in my opinion is not really feasible to implement:

Democratic network— There will be no master node. All nodes in the network will have the same communication rights; there should be no single point of failure.

I think that there must be a master node in the UAV, just as there is a pilot in a manned aircraft. Even if a co-pilot is present, there is no democracy. One senior entity has command, and I believe the same has been true at sea for millenia. Vast majority (if not all) of ArduPilot users run their UAV as a central flight controller connected to a network of sensors and actuators, and for complication and cost reasons I doubt that will change anytime soon.

It is understandable that developers of the AP flight stack, acting in the interest of us (AP users) are repeatedly pushing to use UAVCAN as a basis for a star topology with only the central element being “smart”. I agree that a more distributed model of computing has a potential to massively improve capabilities of autonomous vehicles. But from the user feedback it appears that a large part of the UAV community is not (yet!) ready to benefit from those.

From the point of view of a UAVCAN developer and maintainer, it certainly would be better to mark the v0 as legacy, and only focus your efforts on improving v1 feature set. There is however a major problem with that:

Isn’t data exchange precisely what ArduPilot expects from UAVCAN? - provide a common interface to peripherals that would be compatible with CAN physical layer. In line with the “improve, not fork” attitude AP developers turned to UAVCAN standard, and after spending a considerable development effort they received the features they wanted.

I am afraid that UAVCAN v1 decision-makers dismiss these functions as trivial, but all users do with their UAVs is:

connect sensors
connect motors and servos
set parameters

Even if current v0 implementation does not have infinite routes of self-governing improvement and new data types, as long as it provides these services it is really usable for us (UAV operators). Having the ability to update firmware over data link? Brilliant, love it, but it would’t be a deal breaker for me if this wouldn’t be possible. Vendors and developers have to coordinate their message types? It would be great if it was automagically solved by devices, but as long as the developers figure it out I’m okay with that. Even though there seems to be a divide between ArduPilot and PX4, because of using MAVLink usually “it just works”.

To sum up my lengthy opinion-filled post: From the fact that all discussion constantly derails into v0/v1 critique, maybe this is what the Drone Special Interest Group is actually interested in. I am afraid that for my use case, the benefits of v1 are remote, too advanced and academic.

I would feel cheated upon if I was invited to collaborate on a standard supposedly meant to help me, but all my input was dismissed with “But you see, you’re making your drones wrong. We will not work with you on simple sensor networks.”

tridge · May 5, 2021, 9:38pm

Thanks Marek.
I should also explain that I am very familiar with distributed computing. I programmed systems with 10s of thousands of CPUs back in the 90s when those types of systems first came out (eg. the CM2). A large part of my PhD research was in parallel computing. That type of parallel algorithm is different from what Pavel is talking about here (distributed algorithm versus distributed components), but many of the key ideas are in common.
There is a huge difference between a network that enables you to distribute components and one that tries to force you to distribute components. I am perfectly happy for ArduPilot to use distributed computing components. A node that just does state estimation (implementing the ArduPilot 23 state EKF in a CAN node) is something I have thought about doing, as I’ve wanted to create an ArduPilot implementation of “state estimation in a box”, much like VectorNav, InertialLabs etc. That was in mind when I added the “external AHRS” subsystem in ArduPilot recently, and you are likely to see that being an option with AP_Periph in the future.
That type of system where the system designer can choose to distribute components is perfectly achievable with v0. We already have ArduPilot partners flying triple-redundant autopilot systems (3 flight controllers, 2 in standby) and they are doing it with UAVCAN v0.
Trying to force distribution of the algorithms involved in flying a UAV does not make it more robust. In fact it will almost certainly lead to it being more complex and less robust. Enabling a system designer to distribute components where it makes sense for the design of the vehicle can make it more robust, but that must come from an overall understanding of the failure modes and a careful design of the system to minimise those failures.
The vast majority of ArduPilot users are best served with a central flight controller with a set of attached sensors and actuators.
Cheers, Tridge

pavel.kirienko · May 5, 2021, 10:22pm

Hi Marek,

Thank you for the sensible input. It is great that you recognize the potential benefits of the distributed architecture.

I think your point about the distinction between democratic and centralized networks might be a bit off the mark. The core design goal that you mentioned — that the network be democratic — refers to the communication protocol design, rather than the design of the application built on top of said protocol. The objective here is that the network itself does not require centralization, the rest is irrelevant at this level. I think this point is not actually related to the conversation; but, while we’re at it, it’s pertinent to mention that the very same design goal is present in the v0 specification, and in the CAN specification itself (it uses the term “multimaster” to the same effect).

This thread is mostly about v0 vs. DS-015 rather than v0 vs. v1 since we are primarily discussing the application-layer aspects rather than the underlying connectivity. As I wrote earlier, and as previously suggested by @dagar, we can stretch DS-015 towards simpler architectures that involve the exchange of basic, concrete states, as opposed to rich abstractions. I call this “non-idiomatic DS-015”, but a better term is welcome if one comes to mind. The point is that we address the existing requirements using the means provided by DS-015 while avoiding undue fragmentation of the ecosystem. I think we should construct a convincing proof-of-concept to demonstrate that the adoption of that “non-idiomatic DS-015” does not really go against the business objectives of ArduPilot. I need @tridge’s input for this to be sure that we are sufficiently aligned; see the “call to action” in my post above.

Speaking of the business objectives, I don’t think we have seen any evidence yet that the special interest group (SIG) is actually interested in the perpetuation of v0 or its particular style of problem-solving. This is rather an XY problem.

According to my mental model, the users you mentioned don’t want to transfer data or build a sensor network. They don’t want to build a distributed computing system either. Rather, they want their unmanned vehicles to do the job well. In this light, the approaches proposed by v0 and DS-015 should be considered equivalent. Does the user care whether the vehicle computes the airspeed on the flight management unit or on a separate node? When you flip your cellphone from portrait to landscape, do you care or know which part of it was responsible for determining its orientation? Is that the CPU or a dedicated IMU chip?

Leaving aside the purely technical benefits offered by the more advanced architecture (which you have already recognized and which have been covered earlier in this thread), there is also the network effect.

An architecturally cleaner design that is sufficiently abstracted from the specifics of one extremely narrow set of applications (i.e., small drones of a particular configuration) can foster a much larger ecosystem of DS-015-compliant products (hardware and software) and services (design & consulting). Meaning that:

Vendors of said products benefit from a larger target market (not only small drones that run ArduPilot or PX4 but also robotics and other advanced cyber-physical systems).
Users of said products benefit from a larger pool of available options thanks to the expanded market of DS-015-compliant items.

It is my understanding (based on the years of engagement with various users of UAVCAN v0) that these benefits are relevant to the SIG, even though a regular user may not immediately recognize it.

Maarrk · May 6, 2021, 11:04am

Thank you for your explanation Pavel. I misunderstood the scope of this design goal, indeed it does not prove the point I was trying to make. I hoped to illustrate that on application level, we are having centralized master-slave design, and I doubt we’ll move away from it.

I share the opinion that it would be better if a communication standard did not require every node to have complex processing in order to cooperate. Requirement for uncomplicated data on the network forces to move the complication of processing the data to the nodes. As some other posts here have pointed out, for this specific application that is undesirable.

But it is the central element. The user assumes it is the central element. We still call it the autopilot, like the single person that controls the aircraft. Please don’t force the users of the standard (both developers and end users) to reject this and pretend it is not true. A more abstract, extensible standard will be welcome by those who use it. I am afraid that for all others it will be a source of confusion.

These users do care about implementation

In a perfect environment the implementation details are hidden from users, but that rarely is the case for us. I have no idea what kind of architecture a drone sold by DJI has, and I will never need to know because that is a closed, proprietary system. For user experience, the openness of the standard and wide variety of vendors is a double-edged sword. Things will inevitably be misconfigured, come with wrong defaults, or some things will need to be fixed by a “simple software update”, that will require the user to install SLCAN drivers, flip their autopilot to passthrough mode etc. Learning to do that is not trivial with just the autopilot, and I fear that I’d need to learn to configure every smart device I buy. Even if UAVCAN provides a unified parameter service, which I do appreciate, there will be need to read separate manuals to learn what these parameters do.

I am afraid, that there will never be “in the long run” in this particular case, because people turn to open flight stacks in order to build their custom machines with cutting edge capabilities, using devices available for sale since only the previous year. If they were fine with waiting until a feature is widespread and well-developed they would simply buy a whole COTS drone.

With that in mind, even ignoring current solutions, I believe that keeping the application architecture similar to the physical system is a valid advantage.

The tolerant middle ground

I think the key disagreement happens on the “smart nodes - simple nodes” axis. It seems that participants of this discussion stand at and see from different points on this line. I hope that everyone will look at the following example from the same direction:

In the new approach we find attitude control too coupled to the specifics of the implementation. We define a new service called reg.drone.attitude.Roll. This may be provided by an aileron servo, or an ESC with propeller placed on a relevant arm of the multicopter. Thanks to this new service-oriented approach, the system will be much more composable, as you will be able to swap autopilots between different airframes more easily. The autopilot will no longer need to know if it’s driving a fixed-wing or a multicopter aircraft.

There is a decision to be made how far do we take the service concept. The example is a bit extreme, but this is how I perceive the discussion when coming from the simpler side. I hope this illustrates that this specific application calls for some special treatment.

I appreciate that Pavel recognises that a compromise needs to be made. I don’t think that anyone is trying to persuade to abandon the long term goals of UAVCAN. Just to stretch the standard to satisfy the needs of its adopters, as perceived by them, if you care about adoption at all. Even if the requests seem wrong. It is true that the XY problem response does apply to me, but I would restrain myself from suggesting that about core AP dev team.

tridge · May 7, 2021, 11:06am

A few problems with this:

monolithic kernels won, and for very good reason. The complexity of the layers needed to make real microkernels created worse problems than what microkernels tried to solve. That old microkernel vs monolithic kernel debate is long over, and monolithic won. There is a good parallel with your vision for v1 - you’re making the same mistake that Tanenbaum did. The complexity inherent in your vision of smart nodes will make the system, taken as a whole, much more complex and less robust.
You seem to be implying that you can’t do hard real-time with sensor nodes feeding a central autopilot. You can. Real-time is completely orthogonal to the topology.

That is a complete cop out. We can and should expect plug and play to be the default for most users just like it is for v0. Saying this is like a firmware writing itself is utter nonsense.

it is a good example, but only of why the DS-015 idiom is an absolutely terrible one. It brings no tangible benefit to our user community, increases complexity and greatly increases fragility of the system. Alignment of firmware versions to get a working system gets a lot harder.

no, it’s not. The air data computer doesn’t have the information available to do it.
I’m not sure if you realise that selection of sensors in a modern autopilot is dynamic not static. The EKF will switch what sensors it uses based on the data, not just based on a configuration. You can’t just assign an ID and think it is equivalent.
We’ve been moving towards more and more dynamic sensor selection in ArduPilot as it is a huge win for robustness. Take a look at the “sensor affinity” documentation for ArduPilot EKF3 as an example:
https://ardupilot.org/plane/docs/common-ek3-affinity-lane-switching.html

this could work for differential pressure, but would not extend nicely to GNSS and other more complex sensors.
The “smart nodes” and “forcing decentralisation” in DS-015 is terminal. We need a completely different message set with a design that meets our requirements.

I have in fact prototyped that, tunnelling i2c and spi, but I abandoned that path as it has awful timing requirements, and results in very fragile sensors, plus uses a lot more bandwidth. So no, we’re not going to do that for commonly used sensors.
We may support it for rarely used sensors, say a radiation counter or similar sensor which is not flight critical, but we’re not going down that path for core sensors.

In my model most airspeed sensors would just publish a pressure, a temperature and an ID to allow for multiple sensors on one node. That’s it.
If you’re referring to a hypotherical airspeed sensor that publishes CAS, then no, I’m not going to do that exercise with you. It is just pointless. If you still have not understood after all my explanations why it is a terrible idea to go down that path for a simple differential pressure sensor then I suspect there is no use discussing it further.

DS-015, in anything like it’s current form, is not something we want.
If the UAVCAN consortium is OK with the ArduPilot community developing an alternative message set which is published for anyone in the UAV community to use and collaborate on then that would be great. I think you’ll find it will suit vendors, users and autopilot projects a lot better than DS-015 and will give us a real way to move v1 forward.
If not then we’ll look at extending v0 to add FDCAN support, larger frames and the ability to extend messages (similar to how mavlink2 allows extensions). In that framework we’d create a new message set. Those are the features we really want and what will provide real benefit to our users.
Cheers, Tridge

VadimZ · May 7, 2021, 9:13pm

There are sometimes valid reasons to choose lower level of abstraction is the physical decomposition on the system.
Among those reasons:
- The design expertise and complexity tolerance are not distributed evenly in the ecosystem. There is significantly more available in AP dev community than in the peripheral development. So by Conway’s law ( architecture follows organization) it makes sense to split the functionality accordingly.
- Debug and diagnostic tools favor more a centralized design
All abstractions are leaky abstractions when put under sufficient pressure. So it takes very deep domain expertise to choose abstraction level. It is a complex tradeoff with an optimum point, that can not take one-sided mandates “move as high as possible”
“thin client” hardware architecture does not equal bad design and “god object” antipattern. There can still be well-designed software on the central node(s) with optimal modularity of architecture.
Meta point 1: I think it serves little useful purpose to load an excellent transport layer with opinions on what is essentially technically independent adjacent layers. UAVCAN v1 as the transport layer can support different system architectures (“sensor networks”, “smart nodes” and everything in between ) equally well. It would be much better to let players with “skin in the game” work to converge to the suitable solution while having the best platform possible to work on. It is entirely possible that different groups would converge on different solutions (Ardupilot vs Nuclear reactor manufacturers association). And none would be abstractly better, just more suited to the respective use case.
- The argument over “air data computer” looks especially redundant in this light: if someone wants to make one, fine. If someone else prefers to make a simple sensor, fine too. Let the market select which one wins (if not both).
- The approach of @tridge and Ardupilot is not in contradiction to “UAVCANv1 as the transport layer”. Any attempts to couple the transport layer with the system design and specific decomposition/architecture decisions are bound to slow adoption and harm the nascent and still fragile ecosystem.
Meta point 2: It would help to foster the spirit of collaboration and good faith if there is less use of terms loaded with negative connotations applied to concepts writer is opposed to. Examples would include “modern” vs “legacy”. State practical downsides, don’t attack emotionally.

tridge · May 7, 2021, 11:14pm

A look at GNSS in DS-015
We’ve spent a lot of time now analyzing airspeed. For a simple float differential pressure it has caused a lot of discussion, but it is time to move onto the true horrors of DS-015. Nothing exemplifies those horrors more than the GNSS message. This will be a long post because it will dive into exactly what is proposed for GNSS, and that is extremely complex.
What it should be
This is the information we actually need from a CAN GNSS sensor:

        uint3 instance
        uint3 status
        uint32 time_week_ms
        uint16 time_week
        int36 latitude
        int36 longitude
        float32 altitude
        float32 ground_speed
        float32 ground_course
        float32 yaw
        uint16 hdop
        uint16 vdop
        uint8 num_sats
        float32 velocity[3]
        float16 speed_accuracy
        float16 horizontal_accuracy
        float16 vertical_accuracy
        float16 yaw_accuracy

It is 56 bytes long, and pretty easy to understand. Note that it includes yaw support, as yaw from GPS is becoming pretty common these days, and in fact is one of the key motivations for users buying the higher end GNSS modules. When a float value in the above isn’t known then a NaN would be used. For example if you don’t know vertical velocity as you have a NMEA sensor attached and those mostly don’t have vertical velocity then the 3rd element of velocity would be NaN. Same for the accuracies that you don’t get from the sensor.
I should note that the above structure is a big improvement over the one in UAVCAN v0, which requires a bunch of separate messages to achieve the same thing.
GNSS in DS-015
Now let’s look at what GNSS would be like using current DS-015. Following the idiom of UAVCAN v1 the GNSS message is a very deeply nested set of types. It took me well over an hour to work out what is actually in the message set for GNSS as the nesting goes so deep.
To give you a foretaste though, to get the same information as the 56 byte message above you need 243 bytes in DS-015, and even then it is missing some key information.
How does it manage to expand such a simple set of data to 243 bytes? Here are some highlights:

there are 55 covariance floats. wow
there are 6 timestamps. Some of the timestamps have timestamps on them! Some of the timestamps even have a variance on them.

I’m sure you must be skeptical by now, so I’ll go into it in detail. I’ll start from the top level and work down to the deepest part of the tree of types

Top Level
The top level of GNSS is this:

#   point_kinematics            reg.drone.physics.kinematics.geodetic.PointStateVarTs   1...100
#   time                        reg.drone.service.gnss.Time                             1...10
#   heartbeat                   reg.drone.service.gnss.Heartbeat                        ~1
#   sensor_status               reg.drone.service.sensor.Status

the “1…100” is the range of update rates. This is where we hit the first snag. It presumes you’ll be sending the point_kinematics fast (typical would be 5Hz or 10Hz) and the other messages less often. The problem with this is it means you don’t get critical information that the autopilot needs on each update. So you could fuse data from the point_kinematics when the status of the sensor is saying “I have no lock”. The separation of the time from the kinematics also means you can’t do proper jitter correction for transport timing delays.

point_kinematics - 74 bytes
The first item in GNSS is point_kinematics. It is the following:

reg.drone.physics.kinematics.geodetic.PointStateVarTs:
   74 bytes
   uavcan.time.SynchronizedTimestamp.1.0 timestamp
   PointStateVar.0.1 value

Breaking down the types we find:

uavcan.time.SynchronizedTimestamp.1.0:
   7 bytes
   uint56

Keep note of this SynchronizedTimestamp, we’re going to be seeing it a lot.

PointStateVar.0.1:
   67 bytes
   PointVar.0.1 position
   reg.drone.physics.kinematics.translation.Velocity3Var.0.1 velocity

Looking into PointVar.0.1 we find:

PointVar.0.1 position:
   36 bytes
   Point.0.1 value
   float16[6] covariance_urt

and there we have our first covariances. I’d guess most developers will just shrug their shoulders and fill in zero for those 6 float16 values. The chances that everyone treats them in a consistent and sane fashion is zero.
Ok, so now we need to parse Point.0.1:

Point.0.1:
   24 bytes
   float64 latitude   # [radian]
   float64 longitude  # [radian]
   uavcan.si.unit.length.WideScalar.1.0 altitude

there at last we have the latitude/longitude. Instead of 36 bits for UAVCAN v0 (which gives mm accuracy) we’re using float64, which allows us to get well below the atomic scale. Not a great use of bandwidth.
What about altitude? That is a WideScalar:

uavcan.si.unit.length.WideScalar.1.0:
   8 bytes, float64

yep, another float64. So atomic scale vertically too.
Back up to our velocity variable (from PointStateVar.0.1) we see:

reg.drone.physics.kinematics.translation.Velocity3Var.0.1:
  31 bytes
  uavcan.si.sample.velocity.Vector3.1.0 value
  float16[6] covariance_urt

so, another 6 covariance values. More confusion, more rubbish consuming the scant network resources.
Looking inside the actual velocity in the velocity we see:

uavcan.si.sample.velocity.Vector3.1.0:
  19 bytes
  uavcan.time.SynchronizedTimestamp.1.0
  float32[3] velocity

there is our old friend SynchronizedTimestamp again, consuming another useless 7 bytes.
Now we get to the Time message in GNSS:

time: reg.drone.service.gnss.Time
  21 bytes
  reg.drone.physics.time.TAI64VarTs.0.1 value
  uavcan.time.TAIInfo.0.1 info

Diving deeper we see:

reg.drone.physics.time.TAI64VarTs.0.1:
  19 bytes
  uavcan.time.SynchronizedTimestamp.1.0 timestamp
  TAI64Var.0.1 value

yes, another SynchonizedTimestamp! and what is this timestamp timestamping? A timestamp. You’ve got to see the funny side of this.
Looking into TAI64Var.0.1 we see:

TAI64Var.0.1:
  12 bytes
  TAI64.0.1 value
  float32 error_variance

so there we have it. A timestamped timestamp with a 32 bit variance. What the heck does that even mean?
Completing the timestamp type tree we have:

TAI64.0.1:
  8 bytes
  int64 tai64n

so finally we have the 64 bit time. It still hasn’t given me the timestamp that I actually want though. I want the iTOW. That value in milliseconds tells me about the actual GNSS fix epochs. Tracking that timestamp in its multiple of 100ms or 200ms is what really gives you the time info you want from a GNSS. Can I get it from the huge tree of timestamps in DS-015? Maybe. I’m not sure yet if its possible.
Now on to the heartbeat. This is where we finally know what the status is. Note that the GNSS top level docs suggest this is sent at 1Hz. There is no way we can do that, as it contains information we need before we can fuse the other data into the EKF.
A heartbeat is a reg.drone.service.gnss.Heartbeat

reg.drone.service.gnss.Heartbeat:
  25 bytes
  reg.drone.service.common.Heartbeat.0.1 heartbeat
  Sources.0.1 sources
  DilutionOfPrecision.0.1 dop
  uint8 num_visible_satellites
  uint8 num_used_satellites
  bool fix
  bool rtk_fix

here we finally find out the fix status. But we can’t differentiate between 2D, 3D, 3D+SBAS, RTK-Float and RTK-Fixed, which are all distinct levels of fix and are critical for users and log analysis. Instead we get just 2 bits (presumably to keep the protocol compact?).
We do however get both the number of used and number of visible satellites. That is somewhat handy, but is a bit tricky as “used” has multiple meanings in the GNSS world.
Looking deeper we have:

reg.drone.service.common.Heartbeat.0.1:
  2 bytes
  Readiness.0.1 readiness
  uavcan.node.Health.1.0 health

which is made up of:

Readiness.0.1:
  1 byte
  truncated uint2 value

uavcan.node.Health.1.0:
  1 byte
  uint2

these are all sealed btw. If 2 bits ain’t enough then you can’t grow it.
Now on to the sources:

Sources.0.1:
  48 bits, 6 bytes
  bool gps_l1
  bool gps_l2
  bool gps_l5
  bool glonass_l1
  bool glonass_l2
  bool glonass_l3
  bool galileo_e1
  bool galileo_e5a
  bool galileo_e5b
  bool galileo_e6
  bool beidou_b1
  bool beidou_b2
  void5
  bool sbas
  bool gbas
  bool rtk_base
  void3
  bool imu
  bool visual_odometry
  bool dead_reckoning
  bool uwb
  void4
  bool magnetic_compass
  bool gyro_compass
  bool other_compass
  void14

so we have lots of bits (using 56 bits) telling us exactly which satellite signals we’re receiving, but not information on what type of RTK fix we have.
What is the “imu”, “visual_odomotry”, “dead_reckoning” and “uwb” doing in there? Does someone really imagine you’ll be encoding your uwb sensors on UAVCAN using this GNSS service? Why??
Diving deeper we have the DOPs:

DilutionOfPrecision.0.1:
  14 bytes
  float16 geometric
  float16 position
  float16 horizontal
  float16 vertical
  float16 time
  float16 northing
  float16 easting

that is far more DOP values than we need. The DOPs are mostly there to keep users who are used them them happy. They want 1, or at most 2 values. We don’t do fusion with these as they are not a measure of accuracy. Sending 6 of them is silly.
Now onto sensor_status:

sensor_status: reg.drone.service.sensor.Status

reg.drone.service.sensor.Status:
  12 bytes
  uavcan.si.unit.duration.Scalar.1.0 data_validity_period
  uint32 error_count
  uavcan.si.unit.temperature.Scalar.1.0 sensor_temperature

yep, we have the temperature of the GNSS in there, along with an “error_count”. What sort of error? I have no idea. The doc says it is implementation-dependent.
The types in the above are:

uavcan.si.unit.duration.Scalar.1.0:
  4 bytes
  float32

uavcan.si.unit.temperature.Scalar.1.0:
  4 bytes
  float32

quite what you are supposed to do with the “data_validity_period” from a GNSS I have no idea.
Ok, we’re done with what is needed for a GNSS that doesn’t do yaw, but as I mentioned, yaw from GNSS is one of the killer features attracting users to new products, so how would that be handled?
We get this:

# Sensors that are able to estimate orientation (e.g., those equipped with IMU, VIO, multi-antenna RTK, etc.)
# should also publish the following in addition to the above:
#
#   PUBLISHED SUBJECT NAME      SUBJECT TYPE                                            TYP. RATE [Hz]
#   kinematics                  reg.drone.physics.kinematics.geodetic.StateVarTs        1...100

so, our GNSS doing moving baseline RTK for yaw needs to publish reg.drone.physics.kinematics.geodetic.StateVarTs, presumably at the same rate as the above. For ArduPilot we fuse the GPS yaw in the same measurement step as the GPS position and velocity, so we’d like it at the same rate. We could split that out to a separate fusion step, but given yaw is just a single float, why not send it at the same time?
Well, we could, but in DS-015 it takes us 111 bytes to send that yaw. Hold onto your hat while I dive deep into how it is encoded.

reg.drone.physics.kinematics.geodetic.StateVarTs:
  111 bytes
  uavcan.time.SynchronizedTimestamp.1.0 timestamp
  StateVar.0.1 value

another SynchronizedTimestamp. Why? Because more timestamps is good timestamps I expect.
Now into the value:

StateVar.0.1:
  104 bytes
  PoseVar.0.1 pose
  reg.drone.physics.kinematics.cartesian.TwistVar.0.1 twist

yep, our yaw gets encoded as a pose and a twist. I’ll give you all of that in one big lump now, just so I’m not spending all day writing this post. Take a deep breath:

StateVar.0.1:
  104 bytes
  PoseVar.0.1 pose
  reg.drone.physics.kinematics.cartesian.TwistVar.0.1 twist

reg.drone.physics.kinematics.cartesian.TwistVar.0.1:
  66 bytes
  Twist.0.1 value
  float16[21] covariance_urt

Twist.0.1:
  24 bytes
  uavcan.si.unit.velocity.Vector3.1.0 linear
  uavcan.si.unit.angular_velocity.Vector3.1.0 angular

PoseVar.0.1:
  82 bytes
  Pose.0.1 value
  float16[21] covariance_urt

Pose.0.1:
  40 bytes
  Point.0.1                           position
  uavcan.si.unit.angle.Quaternion.1.0 orientation

uavcan.si.unit.angular_velocity.Vector3.1.0:
  12 bytes
  float32[3]

uavcan.si.unit.velocity.Vector3.1.0:
  12 bytes
  float32[3]

Point.0.1:
  24 bytes
  float64 latitude   # [radian]
  float64 longitude  # [radian]
  uavcan.si.unit.length.WideScalar.1.0 altitude

uavcan.si.unit.angle.Quaternion.1.0:
  16 bytes
  float32[4]

phew! That simple yaw has cost us 111 bytes, including 42 covariance variables, some linear and angular velocities, our latitude and longitude (again!!) and even a 2nd copy of our altitude, all precise enough for quantum physics. Then finally the yaw itself is encoded as a 16 byte quaternion, just to make it maximally inconvenient.
Conclusion
if you’ve managed to get this far then congratulations. If someone would like to check my work then please do. Diving through the standard to work out what actually goes into a service is a tricky task in UAVCAN v1, and it is very possible I’ve missed a turn or two.
The overall message should be pretty clear however. The idiom of DS-015 (and to a pretty large degree UAVCAN v1) is “abstraction to ridiculous degrees”. It manages to encode a simple 56 byte structure into a 243 byte monster, spread across multiple messages, with piles of duplication.
We’re already running low on bandwidth with v0 at 1MBit. When we switch to v1 we will for quite a long time be stuck at 1MBit as there will be some node on the bus that can’t do higher rates. So keeping the message set compact is essential. Even when the day comes that everyone has FDCAN the proposed DS-015 GNSS will swallow up most of that new bandwidth. It will also swallow a huge pile of flash, as the structure of DS-015 and of v1 means massively deep nesting of parsing functions. So expect the expansion in bandwidth to come along with an equally (or perhaps higher?) expansion in flash cost.
The DS-015 “standard” should be burnt. It is an abomination.

joshwelsh · May 8, 2021, 12:28am

I’m trying to stay somewhat neutral in this, but the amount of extra data on the bus is unacceptable; as someone leading a project to deliver a UAVCAN-based GNSS to the market in the $2000USD price range, as well as someone driving adoption of UAVCAN v1 on to our actuators, fix this Pavel. The time for debate has passed, collaborate or we’re going to consider leaving.

proficnc · May 8, 2021, 7:07am

Our hardware will be following Tridge’s recommendation here, why put the effort in to move to FDCAN just to throw it away on wasted data.

The experimental continued reinventing of UAVCAN needs to to stop! Improvements are fine, but this is just going backwards @pavel.kirienko

JacobCrabill · May 8, 2021, 7:19pm

This topic has gotten far out of hand. The hostility and unwillingness to collaborate is not productive.

As someone with some stake in the UAVCAN v1 / DS-015 game, I decided I should step in to try to make this discussion more constructive - because this is a totally solvable problem, and the rancor here is hiding how simple I think it could be to solve.

@tridge and others - you seem to be assuming that as soon as DS-015 was “released”, it was carved in stone and no longer subject to any further modifications. This is absolutely not the case. If you had been present in the discussions surrounding DS-015, I think you would have very different context on it. I wasn’t too actively involved in most of the discussions, but I did follow along for many of them, and my main takeaway was that the decision in the end was along the lines of “we don’t know how this is going to work until we try it, so let’s just release something to get the ball rolling and iterate from experience”. At no point, from my perspective, did I hear the opinion that the current form of DS-015 was expected to be the end-all be-all of UAVCAN drone communications.

So please, instead of foaming at the mouth at how bad DS-015 is, let’s be reasonable and start working together on improving the situation. To @tridge’s point, the GNSS service, in its current form, is rather abysmal. In hindsight, however, I’m not convinced anyone had previously bothered to go through the details of looking at the full size of the service as defined - it was designed at the level of the abstract types, and that’s as far as it went. So now is the time to iterate on that.

For additional context (at least from my tangentially-involved perspective; apologies if I’m putting words in anyone’s mouths), the goal was to release something just so we could stop debating the details and go try to create a real proof of concept implementation, because none of this matters if it doesn’t get built into a real system. That’s the state that PX4 is currently working towards - the UAVCAN v1 infrastructure is still being implemented, with the composable services of DS-015 being used to guide what that implementation looks like (specifically, driving us towards dynamic, reconfigurable Publisher/Subscriber objects from
which to build DS-015-like services). Most of the work is actually being done by @PetervdPerk who has (to my knowledge) really only focused on the Battery service, while I have focused on the
ESC and Servo services. We haven’t gotten to a full GNSS service, or any others for that matter.

So this is my roundabout way of saying yes, I agree that DS-015 needs to change, and I don’t think you’ll find any vocal arguments to the contrary. But before it’s completely overhauled, we need to find the issues and propose alternatives. The way in which the PX4 community has been doing that is by just going out and building it, which I think is far more constructive than any number of critical forum posts.

So in the spirit of collaboration and revising DS-015, here’s a rough proposal for a revised
GNSS service, for the sake of discussion (based on @tridge’s recommendation):

# Revised GNSS Service:
reg.drone.service.gnss.TimeWeek.0.1 gnss_time_week # Proposed new topic
  6 bytes
  + uint32 time_week_ms
  + uint16 time_week
reg.drone.physics.kinematics.Point.0.1
  24 bytes
  + float64 latitude
  + float64 longitude
  + uavcan.si.unit.length.WideScalar
    + float64
uavcan.si.unit.velocity.Vector3.0.1
  6 bytes
  + float32[3]
reg.drone.service.gnss.Status.0.2 dop # Proposed new topic -- placeholder name
  6 bytes
  + uint16 hdop
  + uint16 vdop
  + uint8 num_sats
  + uint3 fix_type

# Optional Topics
reg.drone.service.gnss.Accuracy.0.1 # Proposed new topic
  8 bytes
  + float16 speed_accuracy
  + float16 horizontal_accuracy
  + float16 vertical_accuracy
  + float16 yaw_accuracy
reg.drone.service.common.GroundTrack.0.0 # (Very rough) Proposed new topic
  6 bytes
  + uavcan.si.unit.velocity.Scalar.1.0 ground_speed
    + float32 meter_per_second
  + uavcan.si.unit.velocity.Scalar.1.0 ground_course
    + float32 radian
  + uavcan.si.unit.velocity.Scalar.1.0 heading
    + float32 radian

Total Bytes:
6 + 24 + 6 + 6 + 8 + 6 = 56 (14 bytes of which are optional).

(Omitted: The standard Heartbeat message that all nodes must publish).

Note that this keeps the UAVCAN v1 mindset of using a composition of basic types to do most
of the work rather than a single message (like Fix or Fix2) that not everyone agrees on,
and adds in DS-015 specific types when the basic UAVCAN types don’t suffice, with those more specific types split up such that a change to one won’t affect the others, and allows some of the data to be an optional part of the service.

This same approach can be taken to all of the services defined by DS-015, and future services
that need to be added to it (e.g. rangefinders, optical flow sensors, IMUs, VectorNav-type
units, …).

Also note that, if we add perhaps just a little more clarification and detail to the port-naming
conventions, we can very easily develop plug & play support around all of these services,
so the hobbyist ArduPilot user can plug & play with new devices to their heart’s content,
while still letting the professional / commercial integrators fine-tune the system to their
own specifications.

Let’s try to remain constructive here, friends, and work towards a better solution!

pavel.kirienko · May 8, 2021, 7:55pm

Differential pressure sensor demo

@tridge It is great that you have moved on to analyze other parts of DS-015, but I would like to reach some sort of conclusion regarding the airspeed sensor node (mind the difference: not an air data computer, so not an idiomatic DS-015) before switching the topic. I proposed that we construct a very simple demo based on your inputs. I did that yesterday; please, do have a look (@scottdixon also suggested that we make the repository public, so it is now public):

I hope this demo will be a sufficient illustration of my proposition that DS-015 can be stretched to accommodate your requirements (at least in this case for now). In fact, as it stands, the demo does not actually leverage any data types from the reg.drone namespace, not that it was an objective to avoid it.

May I suggest that you run it locally using Yakut and poke it around a little bit? I have to note though that Yakut is a rather new tool; if you run into any issues during its installation, please, open a ticket, and sorry for any inconvenience. You may notice that the register read/write command is terribly verbose, that’s true; I should improve this experience soon (this is, by the way, the part that can be automated if the aforementioned plug-and-play auto-configuration proposal is implemented).

We can easily construct additional demos as visual aids for this discussion (it only takes an hour or two).

Goals and motivation

I risk repeating myself again here since this topic was covered in the Guide, but please bear with me — I don’t want any accidental misunderstanding to poison this conversation further.

My reference to Torvalds vs. Tanenbaum was to illustrate the general scope of the debate, not to recycle the arguments from a different domain. We both know that distributed systems are commonly leveraged in state-of-the-art robotics and ICT. I am not sure if one can confidently claim that “distributed systems won” (I’m not even sure what would that mean exactly), but I think we can easily agree that there exists a fair share of applications where they are superior. Avionics is one of them. Robotic systems are another decent example — look at the tremendous ecosystem built around ROS!

It is my aspiration (maybe an overly ambitious one? I guess we’ll see) to enable a similar ecosystem, spanning a large set of closely related domains from avionics to robotics and more, with the help of UAVCAN. It is already being leveraged in multiple domains, although light unmanned aviation remains, by far, the largest adopter (this survey was also heavily affected by a selection bias, so the numbers are only crude approximations):

Requirements to LRU and software components between many of these domains overlap to a significant extent. It is, therefore, possible to let engineers and researchers working in any of these fields be able to rely on the advances made in the adjacent areas. I am certain that many business-minded people who are following this conversation will recognize the benefits of this.

UAVCAN v1 is perfectly able to serve as the foundation for such an inter-domain ecosystem, but it will only succeed if we make the first steps right and position it properly from the start. One of the preconditions is that we align its design philosophy with the expectations of modern-day experts, many of whom are well-versed in software engineering. This is not surprising, considering that just about every sufficiently complex automation system being developed today — whether vehicular, robotic, industrial — is software-defined.

The idea of UAVCAN becoming a technology that endorses and propagates flawed design practices like the specimen below keeps me up at night. I take full responsibility for it because an engineer working with UAVCAN v0 simply does not have access to adequate tools to produce good designs.

You might say that a man is always free to shoot himself in the foot, no matter how great the tool is. But as a provider of said tool, I am responsible to do my part at raising the sanity waterline, if only by a notch. Hence the extensive design guide, philosophy, tropes, ideologies, and opinionated best practices. Being a mere human, I might occasionally get carried away and produce overly idealistic proposals, which is why I depend on you and other adopters to keep the hard business objectives in sight. My experience with the PX4 leadership indicates that it is always possible to find a compromise between immediate interests and long-term benefits of a cleaner architecture by way of sensible and respectful dialogue.

Your last few posts look a bit troubling, as they appear to contain critique directed at your original interpretation of the standard while not taking into account the corrections that I introduced over the course of this conversation. Perhaps the clarity of expression is not my strong suit. The one thing that troubles me most is that you appear to be evaluating DS-015 as a sensor network rather than what it really is. I hope the demos will make things clearer.

On GNSS service

I think I understand where you are coming from. We had a lengthy conversation a year and a half ago about the deficiencies of the v0 application layer, where we agreed it could be improved; you went on to list the specifics. I suppose I spent enough time tinkering with v0 and its many applications to produce a naïve design that would exactly match your (and many other adopters) expectations. Instead, I made DS-015. Why?

When embarking on any non-trivial project, one starts by asking “what are the business requirements” and “what are the constraints”, then apply constrained optimization to approximate the former. With DS-015, the requirements (publicly visible subset thereof) can be seen here: DS-015 MVP progress tracking. One specific constraint was that it is to be possible to deploy a basic DS-015 configuration on a small UAV equipped with a 1 Mbps Classic CAN bus. If the constraint is satisfied and the business requirements are met, the design is acceptable. I suppose it makes sense.

One might instead maximize an arbitrary parameter without regard for other aspects of the design. For example, it could be the bus utilization, data transfer latency, flash space, how many lines of code one needs to write to bring up a working system, et cetera. Such blind optimization is mostly reminiscent of games or hobby projects, where the process of optimization is the goal in itself. This is not how engineering works.

At 10 Hz, the example 57-byte message you’ve shown requires 57 bytes * 10 Hz = 570 bytes per second of bandwidth, or 90 Classic CAN frames per second. At 1 Mbps, the resulting worst-case bus load would be about 1%.

At the same 10 Hz, the DS-015 GNSS service along with a separate yaw message requires 156 frames per second, thereby loading the bus by 2%:

The yaw is to be represented using uavcan.si.sample.angle.Scalar. The kinematic state message is intended only for publishers that are able to estimate the full kinematic state, which is not the case in this example.

Is DS-015 less efficient? Yes! It is about twice less efficient compared to your optimized message, or 1% less efficient, depending on how you squint. Should you care? I don’t think so. You would need to run just about 500 GNSS receivers on the bus to exhaust its throughput. Then you will be able to selectively disable subjects that your application doesn’t require to conserve more bandwidth (this is built into UAVCAN v1).

If you understand the benefits of service-oriented design in general (assuming idealized settings detached from our constraints), you might see then how this service definition is superior compared to your alternative, while having negligible cost in terms of bandwidth. I, however, should stop making references to the Guide, where this point is explained sufficiently well.

I should also address your note about double timestamping in reg.drone.physics.time.TAI64VarTs. In robotics, it is commonly required to map various sensor feeds, data transfers, and events on a shared time system — this enables complex distributed activities. In UAVCAN, we call it “synchronized time”. Said time may be arbitrarily chosen as long as all network participants agree about it. In PX4-based systems (maybe this is also true for ArduPilot?), this time is usually the autopilot’s own monotonic clock. In more complex systems like ROS-based ones, this is usually the wall clock. Hence, this message represents the GNSS time in terms of the local distributed system’s time, which is actually a rather common design choice.

Timestamping of all sensor feeds also allows you to address the transport latency variations since each sample from time-sensitive sensor feeds comes with an explicit timestamp that is invariant to the transmission latency.

Regarding the extra timestamp in reg.drone.physics.kinematics.translation.Velocity3Var: this is a defect. The nested type should have been uavcan.si.unit.velocity.Vector3 rather than uavcan.si.sample.velocity.Vector3. @coder_kalyan has already volunteered to fix this, thanks Kalyan.

As for the excessive variance states, you simply counted them incorrectly. This is understandable because crawling through the many files in the DS-015 namespace is unergonomic at best. The good news is that @bbworld1 is working to improve this experience (at the time of writing this, data type sizes reported by this tool may be a bit nonsensical, so beware):

https://bbworld1.gitlab.io/uavcan-documentation-example/reg/Namespace.html

It is hard not to notice that your posts are getting a bit agitated. I can relate. But do you not see how a hasty dismissal may have long-lasting negative consequences on the entire ecosystem? People like @proficnc, @joshwelsh, and other prominent members of the community look up to you to choose the correct direction for ArduPilot, and, by extension, for the entire world of light unmanned systems for a decade to come. We don’t want this conversation to end up in any irresponsible decisions being made, so let us please find a way to communicate more sensibly.

I don’t want to imply that the definitions we have are perfect and you are just reading them wrong. Sorry if it came out this way. I think they are, in fact, far from perfect (which is why the version numbers are v0.1, not v1.0), but the underlying principles are worth building upon.

Should we craft up and explore a GNSS demo along with the airspeed one?

Questions

Lastly, I should run through the following questions that appear to warrant special attention.

I implied no such thing. Sorry if I was unclear.

I agree this is useful in many scenarios, but the degree to which you can make the system self-configurable is obviously limited. By way of example, your existing v0 implementation is not fully auto-configurable either, otherwise, there would be no bits like this:

What I am actually suggesting is that we build the implementation gradually. We start with an MVP that takes a bit of dedication to set up correctly. Then we can apply autoconfiguration where necessary to improve the user experience. Said autoconfiguration does not need to require active collaboration from simple nodes, if you read the thread I linked.

Observe that the main product of the GNSS service is the set of kinematic states published over separate subjects. These subjects are completely abstracted from the means of estimating the vehicle’s pose/position. Whether it is UWB, VO, or any related technology, the output is representable using the same basic model.

I approve of your intent to move the conversation into a more constructive plane. Although before we propose any changes, we should first identify how exactly @tridge’s business requirements differ, and why. For example, it is not clear why the iTOW is actually necessary and how is it compatible with other GNSS out there aside of GPS; I suspect another case of an XY problem, but maybe we don’t have the full information yet (in which case I invite Andrew to share it).