Using AWS Jobs for Software Upgrades and Configuration Propagation on IoT Devices

Amazon Web Services (AWS) is one of the most popular framework environments for the Internet of Things (IoT) alongside Microsoft Azure and Google Cloud IoT. Smart devices are connected to the framework using the internet and interact with it using the MQTT protocol. Besides interacting with devices, the framework also provides great opportunities for data storage and processing, data representation to a user, data analysis (including artificial intelligence methods), access control with a powerful system of privileges, and a lot more.

For storing data, the AWS environment provides (besides different relational and non-relational DBMS) a cloud-based hierarchical file storage system called Simple Storage Service (S3). Each file in S3 storage can have a universal resource locator (URL), accessible from outside. In this case, the file can be accessed via a web browser. If the file content is an HTML page, then, using this file, an interactive user can access both AWS framework options and intelligent devices connected to it. The capabilities of this page are specified by the JavaScript code it has inside (this code can activate functions of the application programming interfaces (APIs) of the framework as a whole and its separate components).

Lambda Functions

Besides webpages, program code in the AWS framework environment can be stored as lambda functions. These are special named pieces of code, written in one of the following languages: Python, Java, C #, or Node.Js. They are stored in the cloud and are invoked on certain events. An event can be initiated by a webpage (like calling a certain HTTP REST API on a certain URL), by another lambda function, or by an intelligent device (via sending an MQTT message of a certain type). In all of these cases, events can have parameters. Lambda functions are used as middleware for the interaction between intelligent devices, AWS resources (e.g., databases), and the webpages that the user directly interacts with (Fig. 1).

Figure 1. Architecture of AWS component interaction (Source: Auriga)

There are hard limits for AWS lambdas, for example, the execution time of handling a single request is limited, the amount of memory that a lambda can use when handling a single request is limited. If any limit is exceeded, execution of the lambda is aborted. These limits are configured by the user when creating the lambda but cannot exceed certain values.

An IoT device connects to the cloud using the TCP protocol, which provides data integrity and buffering; in the case of a slow connection, the protocol takes care of accumulating data on the sending side and pushing it through the pipeline when it becomes possible. Also, an AWS protocol on top of TCP takes care of re-establishing the TCP connection persistently in the case of connection loss.

However, the connectivity issues between an IoT device and the cloud do not normally affect lambdas due to the specific unidirectional nature of the MQTT protocol. When communicating with an IoT device, a lambda just sends an MQTT message and does not wait for a response; if and when the response arrives, it is the responsibility of a different lambda function to handle it, and send another MQTT message to the IoT device, if needed.

AWS Jobs

One of the AWS framework components is job service (AWS Jobs). It is used for creating and executing long-lasting actions (jobs) on one or several IoT devices connected to AWS and for managing these jobs. In comparison with other AWS services, the AWS Jobs service has appeared rather recently.

Access to the AWS Jobs service is provided via the programming console as well as programmatically using the set of API functions.

A certain subset of these functions can be used by the intelligent device itself (can be invoked by sending MQTT messages). The functions accessible over the MQTT protocol execute actions necessary for accessing jobs and their parameters from the device side: GetPendingJobExecutions, StartNextPendingJobExecution, UpdateJobExecution, DescribeJobExecution, etc.

Other functions are defined over the HTTPS protocol and are intended to be called from JavaScript code on webpages, from the program code of lambda functions, and by users in the interactive mode. These functions are used mostly for the creation and deletion of jobs and job execution management: CreateJob, DeleteJob, DescribeJob, ListJobs, ListJobExecutionsForThing, etc.

In the AWS Jobs terminology, the main information about a job is stored in its job document. This is a JSON document that is passed from the framework to the target device and describes what should be done. Normally, a job document includes the name of the operation and a URL (or URLs) that refers to the location of data–job parameters.

This URL can be “pre-signed” by an AWS user. In this case, the URL allows access to a certain object for the intelligent device with the privileges of the user who has pre-signed it (so the device can have access to the data it normally can’t access). Pre-signed URLs have a limited lifetime and expire after that lifetime is over, making the object inaccessible again.
A job document can be created on the fly during the creation of a job or can be stored as a file in the S3 file storage of the AWS framework. A link to this file can be specified during job creation.

Other job attributes include the following:

  • Target device or device group. If a device group is targeted, the job runs on all devices that are members of the group.
  • Snapshot or continuous job. A snapshot job is completed after finishing on the selected device or group of devices. A continuous job always applies to a group of devices; it continues to exist after it finishes on existing devices and runs on devices that are later added to the group.

When a job is being executed on a specific device, it has a state. A limited set of states is defined by the framework: QUEUED, IN_PROGRESS, FAILED, SUCCESS, CANCELED, REJECTED, REMOVED. The current state is changed by a request from the device (e.g., calling the UpdateJobExecution function) or when a user calls one of the job management functions (e.g., cancels the job using the CancelJob function). Normally, the job execution state is IN_PROGRESS while the device is executing the job, and it becomes SUCCESS or FAILED after the device completes the job.

The state diagram for job execution is shown in Fig. 2 (here, the transitions initiated by the device are shown in blue, and those initiated by other AWS components are shown in green).

Figure 2. Diagram of states during AWS Jobs execution (Source: Auriga)

Software Update Implementation

The solution for software and configuration update for IoT-enabled devices has been developed within the scope of embedded software development for one of the IoT devices connected to the AWS framework. This solution is based on the capabilities of the AWS Jobs service.

Software downloading is designed as follows. The images to be downloaded are stored as files in S3 file storage, which allows the user to keep several software versions simultaneously. Besides, a special webpage is created in S3 storage, which allows an interactive user to choose the parameters of the update procedure (the name of the device/group of devices and the software version) and invoke a certain lambda function.

This lambda function (written in Python) implements the interaction with the AWS Jobs service. When a request for a software update comes from an interactive user, the lambda function creates a job. At the same time, the job document is created on the fly and sent as a parameter to the CreateJob function, together with the name of the device/device group to be updated. The job document of the task includes two text fields: the requested operation (“install”) and the URL of the software image in S3 storage to be installed. This URL corresponds to the software version the user has chosen to install on the device. Another Boolean field named “forced” is used to allow the so-called downgrade (i.e., the installation of the software version preceding the current one).

{
"operation": "install",
"url": "https://s3-us-west-2.amazonaws.com/smrc-www/ipdu-images/ipdu-latest.dat",
"forced": true
}

Figure 3. Job document for software update (Source: Auriga)

When a new job is created, the framework sends a notification to the corresponding device (or all devices if it is a group). If one of the devices is not accessible at that moment, the framework preserves the notification and sends it to the device later when it becomes available again.

When a device receives the “next job available” notification, it acknowledges the job and accesses the job document. The job then enters the IN_PROGRESS state. If the operation in the job document is «install», the device downloads the software update image using the provided URL from the job document. To download the data from the given URL, the “libcurl” library is used on the device. If the image is accessible and not corrupted and the new software version is acceptable, the device performs the update and initiates a reboot to activate it. During this process, the device doesn’t inform the framework of the job completion, and the job remains active (IN_PROGRESS state). If one of the check-ups fails, the device completes the task, moving the job to the FAILED state (and informs the framework). After a reboot, the device again receives the “next job available” notification for the pending job. At this moment, the device realizes it is in the middle of the software update process, and instead of starting the job again, it completes it with the state SUCCESS. However, if the update turns out to be unsuccessful and the new software version can’t be started, the downgrade to the previous software version is performed automatically, and after the reboot, the device completes the ongoing task, moving it to the FAILED state.

Configuration Downloading and Propagation

Device configuration requires a set of named parameters, the values of which are represented as JSON data structures.

Operations for configuration downloading and configuration uploading are defined in the job document. This allows users to download a predefined configuration to the device as well as propagate one configuration between devices in the AWS framework (by uploading the configuration from one device to the cloud and then downloading it to another device/devices).

To upload the configuration from the device, a job is also created; the job document contains the “config-upload” operation and the URL of the location in S3 storage where the configuration will be stored. On getting the job, the device uploads its configuration to the given URL via the network using the “libcurl” library. To prevent unauthorized access to S3 file storage from the IoT devices’ side, this URL must be previously signed by a legal AWS user.

Configuration downloading is performed in the same way as software downloading. The new configuration is also stored in S3 storage as a file, and the job document contains the URL link to this file. Only the name of the operation in the job document differs (“config-download” instead of “install”).

For configuration propagation, the source configuration is uploaded from one device into a certain file in S3 file storage, and the URL link to this file is then used in the configuration download jobs on other devices.

Using Continuous Jobs

A continuous job can be used to provision a large number of devices during a prolonged period (e.g., during manufacturing).

In this case, a group of devices is created in the AWS Jobs service. Then, continuous jobs for downloading certain software versions and certain configurations are created and applied to these device groups. If several different device types are used, then several groups and several sets of continuous jobs are created.

Once manufactured, a device is connected to AWS and added to a certain group. The continuous job automatically applies to this device and ensures that it gets the latest software and the desired configuration installed.

The article was initially published at www.embedded.com.