Ever Wondered how Docker Build works behind the scenes

Docker Storage Driver: OverlayFS

TJ. Podobnik, @dorkamotorka
Level Up Coding

--

Overlay filesystems, also known as OverlayFS or union filesystems, enable users to create a layered structure of their file systems and directories. This technique is widely used with containers. In this post, I’ll provide a short introduction to where OverlayFS is used, and then I’ll show you how you can use it on the command line.

Container images can vary widely in size; while some are quite small, like Alpine Linux is 2.5MB, others, such as Ubuntu 16.04, can be around 27MB, and the Anaconda Python distribution is 800MB to 1.5GB.

When you start a container with an image, it essentially begins with a blank slate, as if it had made a copy of the image exclusively for that container’s use. However, for larger container images like the 800MB Anaconda distribution, copying the entire image would be inefficient in terms of both disk space and speed. Therefore, Docker doesn’t make direct copies; instead, it an overlay.

How Overlays work

In short, overlay filesystems let you mount a filesystem using 2 directories: a lower directory, and an upper directory.

mount -t overlay overlay -o lowerdir=/lower,upperdir=/upper,workdir=/work /merged
  • the lower directory of the filesystem is read-only
  • the upper directory of the filesystem can be both read to and written from
  • the merged(overlay) is both lower and upper combined together

The two are essentially just random folders with example files, but one could represent a file system, and the other could be a file that we’re merging on top of it.

Below are some notes on the changes one might encounter using this setup:

  • When a process reads a file in the merged directory, the overlayfs filesystem driver looks in the upper directory and reads the file from there if it’s present. Otherwise, it looks in the lower directory.
  • When a process writes a file in the merged directory, overlayfs will write it to both the upper and merged directories.
  • When a process writes a file in the upper directory, overlayfs will write it to both the upper and merged directories.
  • When a process writes a file in the lower directory, overlayfs will write it to both the lower and merged directories.
  • When a process removes a file from the merged directory, overlayfs will only delete the file in the merged directory. However, in the upper directory, this file becomes a character device, which I suppose is how the overlayfs driver represents a file being deleted. This file is also referred to as a whiteout.
  • When a process removes a file from the lower directory, overlayfs will delete it from the read-only lower directory.
  • When a process removes a file from the upper directory, overlayfs will delete it from both the upper and merged directories.

Multiple layers

Docker images are often composed of like 25 layers. Overlayfs supports having multiple lower directories, so you can run

mount -t overlay overlay
-o lowerdir:/dir1:/dir2:/dir3:...:/dir25,upperdir=...

So I assume that’s how containers with many Docker layers work, it just unpacks each layer into a separate directory and then asks overlayfs to combine them all together with an empty upper directory that the container will write its changes to it.

Conclusion

In conclusion, Docker’s use of OverlayFS offers a flexible solution for managing containerized file systems, allowing for efficient layering and storage management. By leveraging OverlayFS, users can efficiently merge multiple directories into a unified file system while optimizing disk space and performance. Understanding how OverlayFS works behind the scenes provides valuable insights into Docker’s storage architecture and enhances containerization workflows for developers and system administrators alike.

To stay current with the latest cloud technologies, make sure to subscribe to my weekly newsletter, Cloud Chirp. 🚀

--

--