Over the past 10 years, I have been working with simulators, specifically software models of various devices from cell phones to servers. Simulation is a widely used technique in device manufacturing in general, particularly in chip manufacture, software development, architecture R&D, and telecommunications. Software simulation for embedded systems may be beneficial for healthcare and automotive equipment manufacturers since literally any type of device model can be created. If a device, chip, or embedded system is unavailable, limited, or inaccessible to developers and testers, simulation allows us to perform a so-called “shift left,” meaning that we can develop software in parallel or even earlier than hardware releases to significantly reduce the time to market.
A wide variety of simulator tools have been introduced to the market; these are also known as virtual machines or hypervisors. For example, Parallels Studio, VMware Workstation, KVM, and QEMU for Linux, VirtualBox, VCS by Synopsys, and Mentor Graphics Questa, and Simics simulator by Intel and Wind River. Over the years, I have worked with most of these simulators. Therefore, in this and upcoming articles, I would like to share my experience with simulation and how our clients from various industries benefit from this technology. Let us start with some fundamentals to set the framework.
Airbnb in simulation: A guest and a host
A simulator executes the so-called “guest code.” This could be a “guest program” or a whole “guest operating system.” The simulated system itself is called “guest,” while the computer running the simulator is called “host.” The operating system running on the host is called “host OS.”
The simulator, carrying out a set of guest system instructions, simulates it utilizing available host CPU tools.
Simulation and emulation – which term is correct?
The model can mimic the device to a certain degree of accuracy and detail. However, we typically only discuss simulating the external system’s behavior, which is available to the program code. The code does not care exactly how any specific CPU instructions are implemented inside; the main thing that matters is that it works. This version of the simulation is common, easy to develop, and quite fast; even ordinary PC user stations possess sufficient computing power for its trouble-free implementation.
However, this simulation is not sufficient if we need to know—for example—how long a program will run on real hardware. Such tasks require not only modeling the external behaviour but also replicating the internal structure and business logic. This could also be done to various degrees of detail and accuracy. It is more correct to call such models emulators since they really emulate the device rather than just simulate the results of device operations.
Creating an emulator is much more complicated due to the greater amount of functionality the model should implement. Emulators function much slower than simulators at modeling external behavior; launching Windows using an emulator would take years. Therefore, no one is engaged in the creation of a software emulator of an entire platform; it would be very expensive and time-consuming. Instead, only specific components of a system or platform such as the CPU are emulated and only a part of the simulation process is launched on it. Various hybrid schemes are possible when a part of the simulator is a high-level model, a part is a low-level model, part is in a Field-Programmable Gate Array (FPGA), and part is generally a real device.
Four levels of simulation
Application Binary Interface (ABI)
The specific Application Binary Interface (ABI) implementation is probably the highest possible level of abstraction. Essentially, ABI specifies a binary interface for the interaction of two programs, typically a user program and a library or OS. ABI covers calling conventions (how to pass parameters and return values), data type sizes, and system calls. How does it work? For example, when creating an additional application thread in a Linux-based program, the pthread_create() function is called. However, what if we were to make a library with such a function in Windows and implement the necessary mechanisms for dynamically linking the application and the library (dynamic linking)? This would allow users to run Linux applications in Windows, meaning that Windows would be “simulating” Linux. This is exactly what was done in Windows Subsystem for Linux in Windows 10 and this allows users to run unmodified binary Linux applications on Windows. Cool, isn’t it?
Instruction Set Architecture (ISA)
As I mentioned above, the most common option is simulation at the CPU-instruction level, the so-called Instruction Set Architecture (ISA). To put it another way, we are discussing simulating the result of executing instructions without emulating all of the internal logic of how this happens in a real processor and without tracking the timing of various instructions’ execution. Such simulators are also called functional simulators—the well-known VirtualBox, Vmware Workstation, Wind River Simics, KVM, and QEMU platforms. These all allow developers to run applications designed for simulated devices without requiring recompilation or any other manipulations of running programs. In other words, it is possible to run unmodified binary code.
Microarchitecture
The lower-level and more detailed levels of simulation include the CPU microarchitecture level, where real internal algorithms and processor blocks are simulated such as instruction decoders, queues, branch predictors, cache, schedulers, and computing devices themselves. Such modeling allows the profiling of real program execution time and performance and their optimization for required architectures. Furthermore, in the case of simulating prototypes of future microprocessors, it is possible to predict and evaluate how these devices perform.
Emulation of logic components
The lowest level of simulation is the level of emulating the logical components of which modern chips are made. Such emulators are either software or hardware (based on FPGA). FPGA logic is described using Register Transfer Level (RTL) in Verilog, VHDL, etc. Compilation produces an image (bitstream) that is then flashed into the FPGA device. This does not require either a soldering iron or master’s degree in electrical engineering. The FPGA board is connected to a PC via USB or JTAG interface and special software from the FPGA board’s manufacturer performs the recording. The cost of such boards starts from $10 USD for the simplest options to millions of dollars for large FPGAs the size of a cabinet that are used by major chip manufacturing companies. At such companies, FPGA simulation is the final step before running RTL into production.
If we are talking about simple devices, then having an FPGA image on hand means that you can contact specialized companies that will then make a real (non-FPGA) device with programmed logic based on the FPGA bitstream.
In addition to the described simulation levels, I was also engaged in dealing with hybrid simulators. These are separate interconnected simulators that model different parts of a system at different levels. For example, an engineer needs to analyze the bandwidth of a new network card that works with a driver that is developed for a particular OS. Such a network device and many related devices can be implemented first at the microarchitectural level for preliminary analysis and then in FPGA at the level of logic components for final validation. Meanwhile, the rest of the system, which is partially involved in the validation flow, is implemented at the instruction level. You cannot skip this part because it is necessary—such as boot loading the OS—and it makes sense to implement it at a lower and more complex level.
This fundamental knowledge of simulation, its types and levels, will help us move on to understanding full-platform simulators, cycle-accurate models and working with traces, which I will discuss in my next article.