For a long time, Auriga has been cooperating with semiconductor industry leaders on firmware development for various platforms. Our engineering teams have solved many tasks based on standardized approaches such as IPMI and OpenBMC as well as UEFI and ATF implementations for specific platforms. This article summarizes the experience of Auriga’s developers in firmware updating.
Updating firmware on a workstation or server has evolved from a procedure for geeks into a routine for many users. This is especially true for new hardware products, when updates are needed because the firmware is still “raw,” and for outdated hardware, when updates are caused by an extension of the existing functionality that can prolong the “shelf life” of the old equipment.
Striving to improve the product quality, equipment manufacturers eliminate critical errors and vulnerabilities found in firmware and regularly provide users with patched firmware images. They also provide utilities for flashing images into hardware (firmware is usually held in non-volatile data storage devices such as flash memory).
Since the firmware is executed immediately after turning on or resetting the system, initializing the equipment, and preparing it for managing by the operating system, it is crucial to maintain its integrity and operability, especially during the update process, no matter how it ends. After all, we don’t want the expensive motherboard to become an idle “brick” that now needs to be repaired at the service center or even discarded because of an update error.
Therefore, objective requirements are imposed on the implementation of the firmware update process:
- Protection against flashing untargeted or damaged firmware (If the firmware is not intended for the target hardware, or if the integrity of the image is compromised, the update process should be cautiously completed before irreversible changes are made leading to loss of equipment functionality.)
- Resistance to external factors arising from the update process, because of which the update process may unexpectedly stop (e.g., power outage)
- The ability to restore working condition after exposure to external factors, resistance to which is not achieved or impossible
- The usability of the update process (user experience)
Fortunately, it is precisely because flashing firmware has become a mass phenomenon that equipment manufacturers introduce various protection measures into the update process satisfying the above requirements to varying degrees and increase the usability of the process itself.
The issue of firmware updating on server platforms is even more acute. Server platforms are designed so that maintenance processes have a minimal impact on the performance of their main functions. A transition into a special shell for updating, when other functions of the system are unavailable, may be acceptable for a workstation or a desktop PC, but this is practically unacceptable for a server. The loss of server availability at the time of the update can be costly and rank you lower against the competitive server solutions. The same applies to the usability of the update process, when it comes to the simultaneous updating of dozens, hundreds, and even thousands of deployed servers.
Thus, for server platforms, the following requirements are added:
- Minimization of server outage time
- The ability to update individual firmware components, if the software consists of several components that can be updated independently
- The ability to make upgrades without direct access to equipment (remotely)
- The ability to scale a single update process on multiple machines
The implementation of these requirements includes the creation of new/use of existing protocols of the update process as well as the introduction of elements necessary for the implementation of these protocols into the hardware design.
Until recently, manufacturing companies often had to use proprietary software update solutions with all their flaws: few development and support resources and a narrow application spectrum. However, standardized approaches are gaining popularity. This is not surprising, given their availability, relatively low cost, compatibility with each other, and support from a large engineering community. Thus, the HPM.1 specification has been developed for the server solutions with IPMI support. This specification implements both the data model and the update protocol satisfying most of the typical requirements for the firmware update process. HPM.1 allows you, for example, to perform updates remotely while verifying the integrity of the image and checking its target platform. At the same time, HPM.1 allows you to break the firmware into components that can be upgraded separately. The update process is divided into stages, allowing you to scale the software update process on multiple systems. For example, loading an image and its activation are separate stages, which allows synchronizing the activation of loaded components between several servers, thereby reducing the time of service outage.
At the hardware level, the update process requirements are implemented by several measures. The first measure is the introduction of redundancy. The server often provides two or more copies of the firmware, possibly located on different storage devices. Typical algorithms involve switching active firmware between devices with each new update or using only one active firmware for all updates but switching to known working firmware if the main one is found to be inoperative.
In addition, in the case of a newly updated firmware hangup, the triggering of a special watchdog timer usually either rolls back the system to the previous firmware version or switches to the known working firmware depending on the implemented operation algorithm.
Typically, the requirements for the organization of the firmware update process are presented at the design stage of a future server solution. Equipment manufacturers can expand the list of implemented hardware and software measures to meet these requirements and, thus, increase the attractiveness of their products. Focusing on the current state of the market, meeting these requirements is one of the ways to improve the competitiveness of a future solution.