Heartbeats published by the bootloader time out due to slow mass erase

I am using yakut to test file server to update firmware, When receive ExecuteCommand begin firmware update, I will erase flash, but in stm32 the CPU is blocked while flash erase is ongoing, heartbeat will be influenced, yakut found node offline.
The flash partition have 2 sections, each is 128KB. and it cost almost 5s to erase flash. what should I do when this status happen?

Normally, an embedded system would boot into the bootloader once it receives an update request, then it would commence the update process asynchronously from the standpoint of the server:

  • Server sends the update request to the updatee.

  • The updatee sends a response with confirmation and launches the bootloader with the required arguments (such as the server’s node-ID and the file name). The server at this point forgets about this updatee as the protocol is mostly stateless — the only requirement for the server is to keep the file available for reading through the network.

  • The updatee’s bootloader does whatever it needs to do not being constrained by any deadlines: it starts and erases the flash. The server doesn’t care because it has already forgot about this node.

  • When the bootloader is ready, it begins downloading the file by issuing the uavcan.file.Read requests sequentially.

  • Upon completion, the bootloader launches the newly downloaded application (if successful) or reports status=WARNING and sits there waiting for rescue :rescue_worker_helmet:

UAVCAN does not require you to follow this protocol, though. Some nodes may implement A/B fast swapping by downloading the new firmware into separate memory while the main firmware remains operational.

In your case, just make sure you send a response before commencing the erase process, and don’t worry about your node disappearing for five seconds, as it shouldn’t cause any drastic consequences in the firmware update scenario (for a mission-critical node it would likely be a problem but not for a bootloader).

Thank you, and I have read the code in yakut/cmd/file_server/_cmd.py Line: 184

            if heartbeat.mode.value == heartbeat.mode.SOFTWARE_UPDATE:
                _logger.info("Node %r is in the software update mode already: %r", node_id, heartbeat)
                return

after begin firmware update, firmware need change it`s status to heartbeat.mode.SOFTWARE_UPDATE mode
so it will not retrigger this node.

Why do you need your node to be retriggered if the update is already in progress?

if node erase flash, then the node will be offline.
When after erase flash, the heartbeat will going on, and it`s mode is heartbeat.mode.OPERATITIONAL
then yakut will retrigger the node ExecuteCommand begin firmware update

if change mode to heartbeat.mode.SOFTWARE_UPDATE befor erase flash. the yakut will not retrigger node.

I’m not sure I’m following you. How can the node be operational if its firmware is gone? Maybe I need a bit more context here.

Responding to your edit:

This is the way it is intended to work. You don’t need the update process to be retriggered again. After the flash is erased, just go on downloading the file instead of waiting for another COMMAND_BEGIN_SOFTWARE_UPDATE.

Got it, I will change the heartbeat mode value befor erase flash. Thank you very much.

Maybe this would help you:

It is already finished and functional but is not yet released (UPD: already released). I am actually getting some final cleanups done, updating the docs and tests, and so on; I expect this to go live today or tomorrow. I am also going to swap the branches such that uavcan-v1 becomes the new master, and the legacy is moved into legacy-v0.

The README contains a basic usage example, and a more advanced one is available in the integration test suite under /tests/integration/bootloader/main.cpp.

here is a opensource mcu bootloader.