Demystifying a Docker image

Published: November 14th, 2019

Six months ago ForAllSecure started analyzing Docker images. What does this mean? Imagine we have a user who wants us to fuzz their application. How do they give it to us? Do they tar it up? Do they give us access to an environment where it’s running? Do we integrate into their build pipeline? Applications are an entire ecosystem — they require specific library versions, environment variables, users, etc. While it may seem like a small limitation conceptually, this added barrier can contribute to the friction between development and security teams, especially as organizations look to incorporate security as a part of their build cycles.

This is where Docker comes into play. We wanted Docker as a packaging solution for our users because it’s accessible and easy to use, but we didn’t want the overhead of the Docker daemon and all the other fancy features that come with it. We ended up building our own lightweight version of Docker, allowing ForAllSecure to accept Docker images, while running them with the barebones RunC runtime. This allows us to analyze code without requiring changes to developer behavior. In this blog, we’ll focus on the first part of the problem: how to ingest Docker images.

Accompanying this post is the open sourcing of Rootfs Builder, the tool we use to extract a rootfs from a Docker image. A Docker image provides a portable, efficient format. Instead of sending a 4GB rootfs across the wire, users can simply give us a string like “ubuntu:latest” and ForAllSecure servers can pull the image and extract the rootfs. This value prop doesn’t just apply to ForAllSecure. Rootfs Builder allows any runtime to ingest a Docker image and extract the rootfs. We chose Runc, but the extracted rootfs is vanilla (i.e. there is no Docker specific information) and will work with rkt, NSJail, etc.

It’s worth noting that there were a few existing solutions for building a rootfs from an image. Unfortunately, they do not handle whiteouts correctly (explained further below). I also want to give a shout out to Makisu and Kaniko (written by Uber and Google respectively), which do provide functionality for extracting an image from a rootfs. They solve the problem of building Docker images in an environment not suitable for Docker, namely Kubernetes. We chose to not use their software because it was still a bit too feature-full for us.

Now that you understand the problem we are trying to solve, we can dive into the question, what is a Docker image? How do we go from a Docker “image,” which is just some string like “alpine:latest” to a running instance of Alpine? In short, an image is a glorified tarball. It consists of various layers, which when merged together, form the rootfs of the container. To understand these layer, we need to make a quick detour to discuss the underlying technology, OverlayFS (OFS).

OverlayFS
OverlayFS layers two directories on a single Linux host and presents them as a single directory. The first directory, referred to as the “lower” directory, is read-only and usually provides the base file system. The second directory, referred to as the “upper” directory, reflects any changes made to the lower directory, while leaving the lower directory itself unchanged. If a file is removed, a “whiteout” file is created in the upper directory, to simulate the removal. The mount point is the 2 merged directories. Note that OFS requires support for extended attributes in order to store metadata regarding whiteouts.

OFS is the storage driver for Docker and, as you can imagine, is well-suited for containers. The lower directory is the filesystem, and then each layer on top is a snapshot of the container filesystem at a given time. OFS is an efficient way to generate and store diffs to a filesystem.

Try it out yourself:

# Create a tmpfs because a tmpfs has support for extended attributes
root@5e2bb73f7afd:/tmp/tmpfs# Mount -t tmpfs tmpfs /tmp/tmpfs
# Create the lowerdir, upperdir, workdir and the merged dir
root@5e2bb73f7afd:/tmp/tmpfs# cd /tmp/tmpfs
root@5e2bb73f7afd:/tmp/tmpfs# mkdir lowerdir, upperdir, workdir, merged
# I moved the alpine base filesystem into the lowerdir to make the example more meaningful
root@5e2bb73f7afd:/tmp/tmpfs# mv /alpine /lowerdir
# Create the OFS mount
root@5e2bb73f7afd:/tmp/tmpfs# mount -t overlay -o lowerdir=/tmp/tmpfs/lowerdir,upperdir=/tmp/tmpfs/upperdir,workdir=/tmp/tmpfs/workdir overlay /tmp/tmpfs/merged
# Notice that now the merged directory, previously empty, reflects the lower directory
root@5e2bb73f7afd:/tmp/tmpfs# ls merged/
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
# If we create a file in the merged directory, it gets reflected in the upper directory
root@5e2bb73f7afd:/tmp/tmpfs# ls upperdir/
root@5e2bb73f7afd:/tmp/tmpfs# touch merged/hello
root@5e2bb73f7afd:/tmp/tmpfs# ls upperdir/
hello
# If we remove a file from merged, it’s also reflected in the upper directory as a whiteout file
root@5e2bb73f7afd:/tmp/tmpfs# rm merged/bin/arch
root@5e2bb73f7afd:/tmp/tmpfs# ls -la upperdir/bin/
total 0
drwxr-xr-x 2 root root 60 Sep 16 20:47 .
drwxr-xr-x 3 root root 80 Sep 16 20:44 ..
c——— 1 root root 0, 0 Sep 16 20:47 arch

What’s in an image
Now that we understand the tech underlying a Docker image, we can look inside and better understand its contents. The Docker image contains 3 components:

Manifest.json: points to all the layers and the config.json.
Config.json: contains metadata necessary for running the container. Think Docker version, environment variables, mounts, etc.
Layers: These are OFS layers as described above and are named using the hash of their contents. When merged together, they form the rootfs.

Let’s step through this using Docker to shed some more light on this:

Start by Docker pulling and saving the image. `docker save` saves the images to a tar archive.

marli 9:32:50 /tmp () docker pull httpd
marli 9:32:57 /tmp () docker save httpd:latest -o httpd.tar

Extract the tar archive and look around.

marli 9:33:10 /tmp () mkdir httpd && tar -C httpd -xvf httpd.tar && cd httpd
marli 9:33:51 /tmp/httpd () ls
19459a87219415cc5751fb48e9aa37ea580b98eed1fe784e76c4bc3c9b3b0741.json
4354893d890be3cc8574d2a43a153f01d3dbf2c3b6680bb54ad91e56e2103b19
49adc30abd56426f5889a7edbe19c463d6b5c4d0e515b531ef09b33d7839476b
55185aeb1145d787ef29c0c917b0a169737cc479df5c799805b65f22f0be848e
5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff
83be7a564d0c2bad81aca09479229afba3cb114a10cc05a28774e166653e2aea
manifest.json
Repositories

We see the manifest.json, which points to the config, as well as each layer.

marli 9:33:51 /tmp/httpd () jq . manifest.json
[
  {
    “Config”: “19459a87219415cc5751fb48e9aa37ea580b98eed1fe784e76c4bc3c9b3b0741.json”,
    “RepoTags”: [
      “httpd:latest”
    ],
    “Layers”: [
      “5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff/layer.tar”,
      “49adc30abd56426f5889a7edbe19c463d6b5c4d0e515b531ef09b33d7839476b/layer.tar”,
      “55185aeb1145d787ef29c0c917b0a169737cc479df5c799805b65f22f0be848e/layer.tar”,
      “4354893d890be3cc8574d2a43a153f01d3dbf2c3b6680bb54ad91e56e2103b19/layer.tar”,
      “83be7a564d0c2bad81aca09479229afba3cb114a10cc05a28774e166653e2aea/layer.tar”
    ]
  }
]

I also suggest taking a look at the config.json, but it’s a bit large to include here.

Let’s check out the base layer. We see a complete file system.

marli 9:58:08 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () ls
VERSION
json
layer.tar
marli 9:58:11 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () mkdir layer && tar -C layer -xvf layer.tar
marli 9:59:44 /tmp/httpd/5f6bd574c212bf1b00fa21bb12b588712d32bca72866be4061268498d90140ff () ls -la layer
total 0
drwxr-xr-x 21 marli wheel 672 Sep 16 09:59 .
drwxr-xr-x 6 marli wheel 192 Sep 16 09:59 ..
drwxr-xr-x 72 marli wheel 2304 Sep 9 17:00 bin
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 boot
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 dev
drwxr-xr-x 69 marli wheel 2208 Sep 9 17:00 etc
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 home
drwxr-xr-x 7 marli wheel 224 Sep 9 17:00 lib
drwxr-xr-x 3 marli wheel 96 Sep 9 17:00 lib64
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 media
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 mnt
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 opt
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 proc
drwx—— 4 marli wheel 128 Sep 9 17:00 root
drwxr-xr-x 4 marli wheel 128 Sep 9 17:00 run
drwxr-xr-x 66 marli wheel 2112 Sep 9 17:00 sbin
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 srv
drwxr-xr-x 2 marli wheel 64 Aug 30 05:31 sys
drwxr-xr-x 2 marli wheel 64 Sep 9 17:00 tmp
drwxr-xr-x 10 marli wheel 320 Sep 9 17:00 usr
drwxr-xr-x 13 marli wheel 416 Sep 9 17:00 var

Check out the top layer. Look familiar? This is simply an OFS upper layer described above.

marli 10:02:02 /tmp/httpd/83be7a564d0c2bad81aca09479229afba3cb114a10cc05a28774e166653e2aea () ls -la layer
total 0
drwxr-xr-x 3 marli wheel 96 Sep 16 10:02 .
drwxr-xr-x 6 marli wheel 192 Sep 16 10:02 ..
drwxr-xr-x 3 marli wheel 96 Sep 9 17:00 usr

Building a rootfs

Docker merges all the layers to create a single rootfs. The merging itself is pretty straightforward. We do it ourselves in Rootfs Builder, which takes the name of a Docker image, pulls the tarball, and extracts it. For every layer, we iterate through each tar header. We make 2 passes. The first pass is to remove whiteouts, recall these are files or directories that were removed in a layer. In the second pass, we read the tar header for metadata about the file or directory, specifically the mode, uid, and gid. If the file doesn’t exist we create it, otherwise we simply replace it. We also have logic to update the uid and gid. This is necessary if you want to unshare user namespaces. For example, you may want to appear to be root in the container, but outside the container you are an unprivileged user. This requires creating a subuid mapping. The mapping looks something like:

root@21d94d3c4539:/workdir# cat /etc/subuid
fas:100000:65536

This mapping reserves the first 65536 uids starting at 100000 under fas’s namespace. According to this mapping, uid 0 inside the container maps to 100000 outside the container.

Next Steps
Developers use Docker images every day, and now you know, they are just glorified tarballs. There’s plenty of room for improvement with Rootfs Builder. Outstanding features we hope to add will allow the user to specify:

The number of layers to untar.
A layer to omit when untarring.
A binary the user is interested in. Instead of returning an entire rootfs, this will just return the binary.

But for now, hopefully Rootfs Builder will help users introspect into Docker images. You can get started with Rootfs Builder here.

Article Tags

containers, docker, ForAllSecure, fuzzing, rootfs

About Marlies Ruck

Marlies Ruck is a software engineer at ForAllSecure

View all posts by Marlies Ruck

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Demystifying a Docker image

Article Tags

Subscribe to SDTimes

About Marlies Ruck

Related Articles

Kubernetes – ITOps Times Open Source Project of the Week

Octopus Deploy releases features to simplify Kubernetes continuous delivery

Red Hat Enterprise Linux 9.3 released, RHEL 8.9 out in a few weeks

ITOps Times Open-Source Project of the Week: Cilium