The purpose of this topic is to discuss what to do about frame queuing on Linux for UAVCAN.
The current libuavcan implementation provides a single priority queue in user-space that sits in front of SocketCAN which is an interesting choice since most CAN devices appear to default to using the pfifo_fast qdisc. In this configuration significant latency can be added as a frame goes first through the user-space queue and then through the unprioritized pfifo_fast queues* and finally through any queues in the peripheral itself (which may also be unprioritized FIFO queues). For broadcast frames this is a fairly linear but still significant amount of latency that comes with priority inversions that are not supposed to happen with CAN (for service call latency see the research on bufferbloat for a more nuanced evaluation). Because of this problem, as I work on libuavcan v1 I am struggling to provide sane defaults for code that is supposed to be consistent between bare-metal and OS integrations. There are a few approaches I can see. Please provide feedback on these options or suggest other options I may not be considering:
Do everything in user-space. Require systems to configure themselves properly.
In this option we would provide priority queues as part of the common media layer implementation. Linux systems would need to set the can device’s qdisc to
noqueue. All platforms would need to ensure that their CAN driver did not employ overly large queues, that these internal queues were prioritized, and that the queues were adequately sized to prevent buffer under-run.
Require the platform to be optimal.
In this option libuavcan would act naively assuming that the system’s APIs provided optimal queueing behaviour. System integrators would need to understand how to implement or select proper queues, how to avoid the bufferbloat problem for CAN, and how to prevent buffer under-run at the peripheral level.
Provide software to do queuing in user-space but allow/require a system to configure this in as needed.
This is a hybrid of the two previous options where we implement a naive media layer but provide components and documentation for how to optimally assemble a system integration.
* My assumption is that because pfifo_fast choses buckets based on TOS bits that this is either undefined or simply not functional when given CAN frames?
An interesting aside: the bufferbloat community seems to be converging on CoDel as the state-of-the-art for linux queueing disciplines (specifically
fq_codel) and many linux distros are changing to this as a default. This means a SocketCAN device can get CoDel as a default which is a problem as CoDel drops some frames as a normal part of the algorithm’s operation. Most of the kernel experts suggest noqueue or pfifo_fast as the best default for CAN devices but there are bound to be bugs where this is not applied to CAN devices and a system is dropping CAN frames in the kernel severely degrading the expected performance of UAVCAN on the system. Because of this we should be proactive about discussing and documenting SocketCAN queueing for users of libuavcan.
Patch Series to make pfifo_fast the default for CAN
pfifo_fast on tldp.org
Codel on Bufferbloat.net
Van Jacobson, “Controlling Queue Delay”, 2012
M. Sojka, R. Lisov´y, P. P´ıˇsa, “SocketCAN and queueing disciplines”, 2012
M. Sojka, P. P´ıˇsa, “Timing Analysis of Linux CAN Drivers”