Dynamic Program Analysis with Intel Pin

Dynamic program analysis, also known as profiling, is widely used in various business areas, from healthcare to industrial automation. It helps identify application hotspots and race conditions, find errors dealing with the application memory, and estimate the actual RAM consumption. Moreover, such dynamic analysis techniques as Taint Analysis and In-Memory Fuzzing allow identification of parts of the program that are most susceptible to exploitation.

Compared to more common and easier-to-implement static analysis, dynamic code analysis has several advantages:

  • The source code of the program is not required for the analysis (though some tools are more useful for collecting more metrics in cases where the source code is available),
  • Availability of the runtime information to the engineer (for example, register contents, memory cells contents, known values of environment variables, etc.),
  • Error diagnostics for multi-threaded code (like threads competing for access to shared resources, deadlocks, etc.),
  • Measurement of the resources consumed by the program (execution time of the program or its parts, the number of calls to external databases or files),
  • Greater accuracy in false-positive errors diagnostics. Unlike static analysis, dynamic analysis does not try to predict the program’s behavior, but detects errors during program execution.

At the same time, the obvious disadvantages of dynamic analysis include:

  • Inability to provide 100% code coverage (there are some auxiliary techniques, such as Consolic Execution, effective only for smaller applications, otherwise huge computing resources are required),
  • It is difficult to find the exact place in the source code to fix a detected bug.
Dynamic Program Analysis Tool Creation with Intel Pin

Dynamic analysis must be included in the test plan to ensure a more comprehensive analysis of the application. Moreover, in some cases, such as FDA certification of medical applications, dynamic analysis is mandatory. There are many ready-to-use solutions and tools on the market, but what if the necessary tool is unavailable, or too expensive, or not sufficiently functional? These situations are exactly when Intel Pin comes to the rescue!

Intel Pin is a dynamic binary instrumentation (DBI) framework that enables the creation of dynamic program analysis tools to perform code analysis of the userspace applications.

The framework allows the behavior of compiled programs to be managed by embedding arbitrary C/C++ code during program execution. Therefore, the source code and program recompilation are not required. Pin supports Linux, Windows, and OS X. The Android version of Pin can also be found in official sources.

The tools created using Pin API are called Pintools. For Pintools development, Intel provides Pin kits that include the source codes for many tools.

A Pintool is a compiled binary file. For Linux systems, it is a shared library with a .so extension; for Windows, it is a dynamic library with a .dll extension. The interaction of Intel Pin, Pintool, and the analyzed program can be represented as follows:

Intel Pin

Intel Pin works in JIT (just-in-time) and Probe modes. The JIT mode is more functional but less productive: there is a significant slowdown in comparison to the original program. The Probe mode has limited functionality; however, performance (‘original program’ + ‘Pintool’ bundles) is close to that of the original program.

Thus, dynamic program analysis using Intel Pin can be represented as follows:

Intel Pin

Interestingly, Intel is actively using the framework in its own products, such as Intel® VTune ™ Amplifier, Intel® Inspector, Intel® Advisor, and Intel® Software Development Emulator (Intel® SDE).

Using Intel Pin for Software Vulnerability Diagnostics

Intel Pin allows implementing various dynamic code analysis techniques to diagnose potential software vulnerabilities. Here are some scenarios from my team’s experience:

  • Taint Analysis is a powerful technique to identify and protect areas of code that are potentially vulnerable to user input (anti-exploit software protection). The idea is to assign a special label (tag) to each object in the code (variable) to understand whether this object can spread the taint or not. The object is ‘tainted’ if the value of that is obtained from an unreliable source, i.e. a user, a network, or a file. During the program execution, a taint tree grows: the labels are merged, spread to other objects, or deleted depending on the GET/PUT, LOAD/STORE instructions. Intel Pin is capable of adding handlers for those instructions. Based on analysis of the taint tree, it is possible to determine which parts of the program are potentially susceptible to exploitation and, thus, to better protect them from malicious influence. If the user input is considered sensitive data, then it is possible to deduce the lifecycle of this type of data within the program to understand if the code complies with secure code guidelines (i.e., store confidential data only when it is deemed necessary).
  • Dynamic Symbolic Execution, or Consolic Execution, is a technique used for dynamic code coverage to ensure that all branches of code are reachable and executable. Graphically, this task can be represented as a binary tree traversal, where the nodes are conditional statements (if statements) and the edges are a sequence of non-conditional statements (branches of code). Obviously, during the first passage of any code branch, the input parameters’ values will end up as either the execution (true) or the non-fulfillment of the condition (false). For the next passage of that branch, the opposite values of the variables would be calculated. Intel Pin’s task is to add the appropriate handlers for conditional statements in the code. In fact, by applying this technique, the engineer can cover the entire code of the application using an automatically generated test suite.
  • In-Memory Fuzzing is a technique for testing code areas that are potentially vulnerable to user’s input. The method is based on the idea that almost any program could be hacked in cases where the input data is arranged randomly. First, a program code area should be selected to be tested. Afterwards, the set (range) of potential values for the input parameters should be arranged. In blackbox testing (when we know nothing about the test object), a random set of input parameters is generated from the range of potential values. In graybox (data of the tested object is incomplete) or whitebox (all data about the tested object is available) testing, the arranged input sets should be modified. If the executed code area leads to unpredictable behavior (i.e. an exception is thrown), then the input parameters set is cashed and the results are parsed. These steps are repeated for all sets of input data. Intel Pin helps to solve the task with no hassle: it adds breakpoints marking the beginning and the end of the tested code area. When the first breakpoint is reached, the context is saved and all exceptions in the code are tracked. If an exception has occurred, the set of input parameters is cached, and the context is restored. If the exception did not take place, the context is just restored. This technique helps to ensure code resistance to users’ input. One of the potential use cases for this technique is testing a binary file parser.
Are There Any Drawbacks to Intel Pin?

It is important to note that, despite its wide acceptance, Intel Pin still has a number of drawbacks.

Firstly, it does not allow working with IR (Intermediate Representation). IR is an intermediate representation of the program source code for subsequent processing (optimization, for instance), where each variable is based on the SSA (Static Single Assignment) principle. If the engineer does not have access to IR, the implementation of some of the above-mentioned techniques becomes much more complicated. For example, implementing the Taint Analysis technique in this case would require monitoring of all memory accesses to expose the entire chain of tainted objects.

Secondly, when using Linux OS, there may be a compatibility problem with the latest version of GCC. If you use the latest compiler in your project, you will have to wait for the updated Intel Pin version.

One more issue evolves if the Pintool on the project is meant to be multi-threaded. To implement the logic for working with threads, you will have to use the Pin Thread API. Intel officially states that you cannot use pthreads on Linux, and if you are a Windows developer, you will have to drop off the Win32 API.

Finally, C++11 cannot be used in Pintools, at least in the out-of-the-box version. You can still enable C++11 if you look closely at the Makefiles in Intel Pin. No negative effects of this change have been found.

Summing up, despite these limitations, if you need to develop your own tool for dynamic code analysis, Intel Pin is a great option. Its API is well documented and easy to work with. Additionally, Intel makes serious efforts in building an engineering community around their products.