Homepage GitHub

Generate rich navigable HTML docs using Nunavut

DSDL definitions in their original form are hard to read by humans which impedes the adoption of UAVCAN and DS-015. Yet DSDL is sufficient for describing behaviors of a distributed computing system (DCS) without the need to resort to additional means of documentation (that would run the risk of divergence). It is therefore desirable to make DSDL specifications more approachable for humans without changing the language or specifications themselves.

To illustrate, suppose that you want to implement the servo network service as defined by the DS-015 standard. You go to the service definition file:

…whereat you see that to fully grasp what’s in there you need to do quite a bit of jumping around the files in the repo that are not even syntax-highlighted. This is a serious obstacle if you are just evaluating whether UAVCAN/DS-015 are the right solutions for you.

We, therefore, need to come up with a better presentation of DSDL definitions. The solution I propose is to define an additional target for Nunavut that yields HTML pages with documentation per DSDL root namespace. But before we get to that, there is one blocker to take care of:

Exposing comments in the AST constructed by PyDSDL

PyDSDL is the DSDL processing front-end used by Nunavut. It accepts a root namespace and yields a well-annotated AST based on that. Currently, PyDSDL discards comments, so we need to change this behavior:

The AST should be extended with two extra entities — composite type documentation and attribute documentation:

# This header comment is the documentation for this composite type.
# It may span an arbitrary number of lines and is terminated by the first non-comment line.
float64[4] foo  # This is an attribute comment for field "foo"
bool bar
# This is an attribute comment for field "bar".
# It spans multiple lines.

# This comment is not attached to anything because it follows a blank line, so it is dropped.
uavcan.primitive.Empty.1.0 baz  # This is for "baz".
# And this one is for "baz", too.
---
# This comment is attached to the response section.
void64 # This comment is for the padding field.
int64 MATH_PI = 4
# This is the best known approximation of Pi.

The composite type documentation is to be exposed via new property doc:str on pydsdl.CompositeType. A similar property should be added to pydsdl.Attribute.

Comments can be extracted from the source file by adding a new node handler visit_comment() to the internal class pydsdl.parser._ParseTreeProcessor.

The leading # and the space after it (if present) should be removed.

Once this is done, we can proceed to the second part.

Emitting HTML using Nunavut

Proper templates provided, Nunavut can map a DSDL root namespace to a fully-static website (which may be contained in one or several HTML files, perhaps with additional files for styles, scripts, or other resources; in the interest of portability it might be better to bundle everything into one large file). It is important to rely on a web-compatible format because we can’t require the user to download any artifacts to be able to explore DSDL.

The view should be similar to a directory tree. Take the standard root namespace uavcan:

- uavcan
  + diagnostic
  + file
  + internet
  + metatransport
  + node
  + pnp
  + primitive
  + register
  + si
  + time

The user clicks on a namespace and it expands in-place. The same goes for data type definitions, this is important:

- uavcan
  - diagnostic
    + Record.1.0 [fixed subject-ID 8184, extent 300 bytes]
    - Record.1.1 [fixed subject-ID 8184, extent 300 bytes]

        Generic human-readable text message for logging and displaying purposes.
        Generally, it should be published at the lowest priority level.
      + uavcan.time.SynchronizedTimestamp.1.0 timestamp
        Optional timestamp in the network-synchronized time system; zero if undefined.
        The timestamp value conveys the exact moment when the reported event took place.
      + Severity.1.0 severity
        uint8[<256] text
        Message text.
        Normally, messages should be kept as short as possible, especially those of high severity.

    + Severity.1.0
  + file
  + internet
  + metatransport
  + node
  + pnp
  + primitive
  + register
  + si
  + time

The text should be syntax-highlighted but it does not need to replicate the source token-by-token (it is not even possible because the AST does not contain the required information). It is easier to re-generate the text by simply invoking __str__() on each attribute and adding the docs around them:

>>> import pydsdl
>>> composites = pydsdl.read_namespace('public_regulated_data_types/uavcan')
>>> str(composites[1].attributes[2])
'saturated uint8[<=112] text'

The user may click any attribute inside a composite type and it would expand in-place in the same manner. Another kind of click (with a modifier key like shift+click or using a dedicated button) should take the user directly to the definition of the attribute’s type instead of unfurling it in-place.

Hovering over a field, type, or namespace should display its contents along with key information like size but without doc comments in a quick pop-up.

PyDSDL provides the offset information per field; it should be displayed next to the field to simplify manual serialization and to keep the user aware of the data footprint.

Many doc comments contain references to other data types. They lack any special formatting but full data type names are sufficiently unique to unambiguously detect them in text as-is. For example:

Notice the reference to reg.drone.physics.kinematics.translation.Velocity1VarTs. The version number is not given, which means that the latest one is implied (v0.1 in this case). Such references should be automatically highlighted as clickable links. There may also be links to namespaces (with or without the trailing .*:

This fragment should take the user to the namespace reg.drone.service.actuator.common.sp.

Due to the fact that Nunavut is unable to process more than one namespace at once, links to foreign root namespaces would necessarily navigate the user to a different generated site. If the generated site is compressed into a single HTML file the navigation would be trivial to implement since we know that an entity like reg.anything can be reached via URI like reg.html#anything.

There are special data type definitions that are used to document namespaces. They are named _ (single low line), one is shown above. Such data types need not be shown in the output but instead, their contents should be expanded directly under the corresponding namespace entry.

I think it is sensible to interpret the text of doc comments as Markdown to allow data type developers to construct more appealing documentation. It would require fixing the formatting across the public regulated data types repository but it is no big deal.


@bbworld1 Would you like to work on this? This is very high-priority right now (above Yukon) because it is perceived to be an adoption blocker.

@scottdixon Did I miss anything important?

Sorry for the slightly late response. I am definitely interested in working on this, but I don’t have much time this weekend to work on it - I will however probably be able to work on it next week.

What are the time constraints on this task?

There is no rigid limit but we should aim to have it deployed by end of April, so in 4 weeks.

Design-wise, I’d model this as a language in Nunavut. By treating this as just another language you shouldn’t have to build anything new in the API or core library and can utilize the language ‘support’ mechanism for the ancillary, static artifacts.

What the language is, specifically, is interesting. Our options (in my view) are:

  • html – Generating html is the easiest path and would require no new dependencies or capabilities by Nunavut. However, it would lead to a lot of duplication if we wanted to support additional documentation formats like PDF it might drive Nunavut to reinvent existing and mature documentation translation frameworks like pandoc, which would be distracting and not useful.
  • latex – Generating latex is interesting since this can act as an intermediate format from which PDF or HTML can be generated. The downside is that HTML from latex tends to be … not great.
  • docbook – Using an XML format like docbook as an intermediate would allow good translations to other formats like HTML using pandoc. The downside to this is that Nunavut would not be sufficient to generate HTML (i.e. you would need to go through nunavut -> docbook -> html)
  • sphinx – Generating Sphinx-compliant ReStructuredText would provide a pythonic intermediate format that has good html generators with minimal additional Python dependencies but which would allow for translation to PDF and other formats using pandoc.
  • xhtml – Similar to docbook, generating XHTML would output a valid set of HTML documents but only the ugly, structural part of the information. This ugliness could then be transformed (possibly using only css) into something beautiful and human.

These are valid points but if we want really rich previews (think George Soros-rich) with hovers and folds, can we obtain that with docbook or sphinx? LaTeX is an interesting option but any work done with latex is 30% suffering so my preference is to steer clear of it unless one requires really high-quality static output (like we do with the Specification).

I’m unfamiliar with docbook but, being XML and therefore pure structure, I have to imagine it would translate into rich HTML nicely. Another option I didn’t mention in my list is XHTML which we could use to generate sterile, ugly, but structurally correct HTML that can be translated into beautiful tag soups using pandoc (I assume) or perhaps just using advanced CSS and Javascript frameworks. This needs a front-end expert to validate my assumptions (the last time I created HTML/Javascript the DOM was just a Twinkle in the W3C’s eye).

This seems like something we should build a quick prototype of. Ultimately, my desired requirement is that Nunavut outputs a single, structured, and correct documentation format and that further translations are performed by other tools.

Do we have a target HTML style template we can use in such a prototype?

Can you share practical examples where final output formats other than HTML may be required?

Nope, we’re starting from scratch.

As part of a structured avionics program, Interface Control Documents (ICDs) are often required. These documents may be in many formats including word, PDF, or XML. It is advantageous to programmatically generate such ICDs instead of maintaining them by hand.

But then again, HTML is exportable into PDF, too.

I have very basic HTML generation working:

Obviously, a lot of it is still missing, but the general idea is there, I think.

As for what generation format to use, I agree that an intermediate format would probably be best, but on the other hand in my view it’s simpler to start with HTML and then go from there to other formats, not the other way around.

Related to the generated view - after experimenting with the nested directory-tree style view, it seems to me that it’s a bit difficult to read and search, as you have to expand each namespace and type definition to see what’s inside. It’s a great view if you know where the message you’re looking for is, but if you’re searching through the docs to find a certain kind of message (e.g. you want to find a message for latitude/longitude of a drone, but don’t know the name) then it becomes difficult to navigate. I propose instead that we split the docs into a tree view and a list of all types within the namespace, a la MAVLink docs:


The tree view then becomes a handy navigation aid, rather than a cumbersome view of all information inside the namespace. Having all the types in a list on the page also makes it much easier to CTRL-F for relevant types, and other types can be linked to on the page. What do you think @pavel.kirienko ?

P.S. Once we have documentation generation working we could possibly tweak Nunaweb to generate and statically host documentation pages. It shouldn’t be a significant load increase (just static pages), and it would make it much easier to host easily accessible online documentation. What do you think of this idea?

A tree view on the left is a sensible idea but it is important to keep the nested structure in the main view as well such that one could expand/collapse nested fields and namespaces there. The MAVLink experience will not work well for UAVCAN because DSDL types tend to be much more complex with multiple levels of nesting whereas MAVLink types are flat.

Let’s just take what you have on the first screenshot and attach the tree view on the left.

Let’s do it. It would be best to automatically regenerate the docs whenever new commits are pushed to the public regulated data types repo. Do you think we could somehow automate this eventually, e.g., via webhooks?