11. Container basics

11.1. Motivation and plan

My motivation for using containers has usually been the idea of a “clean room”: I like to quickly stand up a minimal operating system installation to test and install software.

My own development host usually has a ton of extra stuff installed, and I have probably done some custom setup as well. This means that some of these differences might bleed in to the packages I produce, which might suddenly (for example) assume a more modern version of python than what ships with the base system.

There are a few techniques to produce clean room environments. After looking at “chroot environments” and the “mock” program for building RPMs, we will explore the “docker container” approach, which has been quite successful in recent years. We will then look at some examples of things you can do with containers.

A more in-depth example of the use of containers to set up a web site is Section 12.

11.2. Prerequisites

Super user access on the machine you use.

11.3. History and concepts

Many topics in computing become huge quite quickly, and containers are one of those. Here is how some of the ideas came around.

In the late 1970s UNIX acquired a command called chroot, which allowed you to fake where the root of your filesystem is for the current process and its children. The rest of the command would only have access to that restricted area, and thus would not be able to muck with the rest of the system.

Around the year 2000 the idea was extended to that of jails in the FreeBSD operating system: these sandbox a group of processes to only be aware of each other, and to share a single IP address.

Virtualization was becoming more and more important, so all major purveyors of UNIX-like systems started introducing similar ideas.

In 2006-2008 the Linux kernel added cgroups (control groups) and kernel namespaces: a low level mechanism that allowed the introduction of lightweight virtual systems in Linux.

Soon after, in 2013, a young company called “Docker Inc.” released the docker program, which adds a convenient user layer to the Linux system calls.

After 2013 docker took off remarkably quickly and is widely used to create and manage containers, although it is not the only way of doing so.

Nowadays some alternatives are springing up. Red Hat announced in 2018 a next generation tool called podman that seems to be compatible with Docker. It is still young.

docker is available for Linux as well as proprietary operating systems, and thus can be a way of running a Linux container on a proprietary system.

docker itself is free/open-source software, but some of its extra options are not. Thus it is important to only use features of the “community edition”.

11.4. Starting small: chroot

To give a simple example of a chroot environment and what it does, try the following:

mkdir -p ~/work/chroot_jail/{bin,lib,lib64,etc,dev}
mkdir -p ~/work/chroot_jail/lib/x86_64-linux-gnu
PROGS="bash touch ls rm ps"
for prog in $PROGS
do
    cp -v /bin/$prog ~/work/chroot_jail/bin/
done
# now we need some shared libraries for these commands to use
# for example, try "ldd /bin/bash" to see how to find these
for prog in $PROGS
do
    echo "======== $prog ========"
    shlib_list="$(ldd /bin/$prog | egrep -o '/lib.*\.[0-9]*')"
    echo $shlib_list
    cp -v $shlib_list ~/work/chroot_jail/lib/x86_64-linux-gnu/   # debian/ubuntu
    cp -v $shlib_list ~/work/chroot_jail/lib64/                  # fedora/redhat
done

Now you’ve set things up by putting those very few programs and libraries that are needed to play around. Examine your work area with:

tree ~/work/chroot_jail

and that is the full system you will get to work with when you run the following commands:

sudo chroot ~/work/chroot_jail /bin/bash
# now we can do things from bash
echo "dude, I'm super user - let me see what is in /etc and /dev"
echo $USER
/bin/ls /etc
/bin/ls /dev
echo "dude, if now I try to do mean and nasty things to the system"
echo "bad stuff" > /dev/bogus_file
/bin/ls -l /dev/bogus_file

Now in another terminal (where you have not run chroot) take a look at /dev/bogus_file and you will see that it’s not there.

In your chroot shell notice that you can’t just type ls or ps – you have to type /bin/ls and /bin/ps. This is because you do not have a PATH environment variable. You can do:

export PATH=/bin
ls
ps

You will see that the ps command does not work as it is: you need the mount command to mount the /proc filesystem. Go ahead and exit the chroot-ed shell and re-run the setup with more programs in the PROGS variable – we will add mount, umount, and mkdir:

# after exiting the chroot-ed bash shell:
PROGS="bash touch ls rm ps mount umount mkdir"
for prog in $PROGS
do
    cp -v /bin/$prog ~/work/chroot_jail/bin/
done
# now we need some shared libraries for these commands to use
# for example, try "ldd /bin/bash" to see how to find these
for prog in $PROGS
do
    echo "======== $prog ========"
    shlib_list="$(ldd /bin/$prog | egrep -o '/lib.*\.[0-9]*')"
    echo $shlib_list
    cp -v $shlib_list ~/work/chroot_jail/lib/x86_64-linux-gnu/   # debian/ubuntu
    cp -v $shlib_list ~/work/chroot_jail/lib64/                  # fedora/redhat
done

Now you can re-run your chroot command:

sudo chroot ~/work/chroot_jail /bin/bash
export PATH=/bin
ps
# after the error output we run the mount command they gave us:
mkdir /proc
mount -t proc proc /proc
ps

Now we have been able to see some of the processes on our system. Let us try the super duper loaded ps command to see all process in the system:

ps -wwaux

What you see here is that all the processes on the system show up. This means that we do not have a truly segmented system.

Warning

The chroot environment does not give full isolation from the running operating system: only from the filesystem. This means that, if the chroot environment had a kill command, you could kill random people’s processes. And do other mean and nasty things. Conclusion: the chroot environment is not the solution to full segmentation so that you can run potentially hostile programs.

A final reflection on using chroot for segmentation: you saw that there was a lot of work that needed to be done to bring in

11.5. Using mock to build RPMs and other packages

See tutorials at:

https://blog.packagecloud.io/building-rpm-packages-with-mock/

https://rpm-software-management.github.io/mock/

https://fedoraproject.org/wiki/Using_Mock_to_test_package_builds

All of these need some updates, and need to account for some possible discrepancy in how the .spec file is saved in the .src.rpm file in meson builds. More on this later.

11.6. Simple examples of docker

Note

You need to make sure that docker is set up well on your computer. The docker documentation is surprisingly good for such a complex topic, and you should be able to get it going well. If you are behind some kind of firewall then you will need to find out how to handle that, once again there are good guides to dealing with it.

11.6.1. CentOS7 bare

Do you want to run a minimal CentOS7 system? Just type:

$ docker run -it centos:7
Unable to find image 'centos:7' locally
7: Pulling from library/centos
2d473b07cdd5: Pull complete
Digest: sha256:9d4bcbbb213dfd745b58be38b13b996ebb5ac315fe75711bd618426a630e0987
Status: Downloaded newer image for centos:7
[root@68f16c54e92d /]#

It might have taken a few seconds to download the image and then start it up. Let us exit by typing exit and then run it again:

[root@68f16c54e92d /]# exit
$ docker run -it centos:7
[root@28dd5290c67f /]#

and this time it was instantaneous.

Try running the most basic commands, like:

df -h
du --max-depth 2 /
ls /usr/bin
rpm -qa
rpm -qa | wc -l
# output is 148

and you will see that there are 148 packages installed. Now go to your fully featured CentOS7 host on which you do advanced software development and you’ll see that you can have some 2000 pacakges (I have 2493 on my main CentOS7 development host as I write this).

Q:: Wait, I just typed this docker run -it centos:7 and it started running a CentOS7 host in no time at all. How does that happen?
A:: You have hit on what I think has made docker (and probably other container approaches) so useful: optimization of size and startup time. This happens at two levels: (a) the actual size of the CentOS7 operating system distribution, and (b) the collection of techniques used by the docker software to boot this image very quickly. We will talk about this optimization more.
Q:: How well is this container segmented from the rest of the system?
A:: Try it out: run the container with docker run -it centos:7 and then run ps -wwaux and you will notice that you only see the processes associated with this container. More detail: the host can see all container processes, but the containers can only see their own. This (and other such touches allowed by the cgroups and kernel namespace features in modern Linux) is much more protection than you get with chroot.

Let us now say say we want to use software development tools on our CentOS7 container. By typing:

[root@28dd5290c67f /]# python3
bash: python3: command not found
[root@28dd5290c67f /]# python
Python 2.7.5 (default, Oct 14 2020, 14:45:30)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

we notice the insanity that CentOS7 ships with python2 and not python3 (yes, CentOS7 is well past end of life at the time I write this – 2022-02-09).

To install things off the web we will need to configure proxy environment variables (if we are behind a firewall). This could look something like:

[root@28dd5290c67f /]# export http_proxy=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export https_proxy=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export no_proxy='localhost,mydomain.mytld'
[root@28dd5290c67f /]# export HTTP_PROXY=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export HTTPS_PROXY=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export NO_PROXY='localhost,mydomain.mytld'
[...]

Then to install python3 (and for good measure emacs and texlive-latex):

[root@28dd5290c67f /]# yum install python3 emacs texlive-latex
[... lots of time ... maybe best to just install python3!]
[root@28dd5290c67f /]# rpm -qa | wc
[root@28dd5290c67f /]# 416
[root@28dd5290c67f /]# python3
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.

So we started from a bare (even more bare than the minimal host install) el7, but we can do any amount of installing and get our system to be as rich in tools as any development host. We see that in installing python3, emacs, and texlive-latex, we went from 148 packages to 416 packages.

11.6.2. Just python

Try this:

$ docker run -it python
Unable to find image 'python:latest' locally
latest: Pulling from library/python
[...]
>>>

the python image seemed big and to take a long time. It turns out they have slimmer containers with Python. For example:

$ docker run -it python:3-alpine
Unable to find image 'python:latest' locally
latest: Pulling from library/python
[...]
>>>

The docker web site discusses this image at https://hub.docker.com/_/python and gives some examples of what you can do with it, and how you can adapt it to be a simple appliance that runs your own python program.

11.6.3. Alpine: a really small linux image

$ docker run -it alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
59bf1c3509f3: Pull complete
Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
Status: Downloaded newer image for alpine:latest
/ #

The minimum image for CentOS7 was 203MB. The minimum for Alpine linux is 5.59MB - about 36 times smaller.

Alpine is often a base for specialized containers. You add to it with the apk package command. For example:

apk update    # before you can run other things
apk list
apk list | wc
apk add python3 emacs

This installs quite quickly. Alpine has quite a few packages. If you want to find MariaDB you can do:

apk update
apk list | grep -i mariadb

Many of the packages specialized containers are based on Alpine, since it is so small.

11.6.4. How optimized are they for size?

Let us talk about how much space these images use up, which relates to how much agility you will have in building and shipping them.

The CentOS7 is clearly not the 4.4 gigabyte .iso file that you get from the CentOS7 download page: it downloaded in a few seconds, and booted even more quickly. There is a collection of “standard docker images” that are very carefully curated to have a minimal (but automatically expandable) basis for that operating system distribution and tools.

To see how much space is used you can use the command docker images:

$ docker images
REPOSITORY                      TAG       IMAGE ID       CREATED         SIZE
python                          latest    dfce7257b7ba   2 days ago      917MB
python                          3-alpine  c7100ae3ac4d   5 days ago      48.7MB
busybox                         latest    ec3f0931a6e6   5 days ago      1.24MB
nginx                           latest    c316d5a335a5   2 weeks ago     142MB
alpine                          latest    c059bfaa849c   2 months ago    5.59MB
centos                          7         7e6257c9f8d8   18 months ago   203MB

My stream-of-consciousness take on this output is:

Centos7 (203MB) is much smaller than the full 4.4GB installation image, but wow! it’s much bigger than Alpine – so you can really fit a Linux starting point in 5.59MG? Hmm, but once you pack it up with the nginx web server it starts getting there at 142MB. And why on earth is python so massive at 917MB? Ah, there is the much smaller python:3-alpine, which starts from Alpine linux instead of Debian or CentOS, and is thus much smaller.

11.7. Specific pre-existing containers

https://awesome-docker.netlify.app/

https://hub.docker.com/ – then follow the Explore link at the top

You will find:

Ubuntu
Debian
Fedora
Python
nginx

MariaDB
CentOS
PostgreSQL
wordpress
Alpine

These are “official images”. This usually means that the team that produces that product has worked with the docker maintainers to ship images that are truly minimal (but easily expandable).

There are many many others. It is worth going through the list, but you might want to restrict it to “Verified Publisher” and “Official Images”.

11.9. Container orchestration

This is covered in much more detail in the next chapter, but my basic take on container orchestration is:

Why do you need it? Because a big plus with containers is specialization. You can have a container that just runs a web server, and another that runs a database. The web server will need to make database calls and to pass that information back.

This interaction between different containers is called container orchestration. There are a few different ways people have come up with to do this in docker. I have not needed to go beyond the docker-compose approach.

I give an example of orchestration with docker-compose in Section 12.

11.10. Creating your own container

A key thing to remember about containers is that when you exit they simply wink out of existence. If you had started from a centos:7 image, and then you had added C compilers and other tools, then you will have to start again next time!

This can take a long time, so there is a way of preparing a custom image, starting from a previous image and installing/configuring your software prerequisites.

To do this you create a file called Dockerfile which specifies what base container you start from, and what commands to run to build it up to what we want.

11.10.1. A python program

Prepare your work space with something like this:

mkdir -p ~/work/docker/python-example
cd ~/work/docker/python-example

and in that directory place the following python program into a file called optimize_rosenbrock.py:

Listing 11.10.1.1 optmize_rosenbrock.py – A brief scipy demonstration program from the scipy documentation.

#! /usr/bin/env python3

import numpy as np

from scipy.optimize import minimize

def main():
    print('example of optimization')
    x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])

    res = minimize(rosen, x0, method='nelder-mead',
                   options={'xatol': 1e-8, 'disp': True})

def rosen(x):
    """The Rosenbrock function"""
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

if __name__ == '__main__':
    main()

Make this program executable with chmod +x optimize_rosenbrock.py and then put the following text into a file called Dockerfile:

Listing 11.10.1.2 Dockerfile – A Dockerfile to build a container that just has our trivial python program.

# I have had some trouble with the python:3-alpine base container, so
# I use python:3-slim
FROM python:3-slim

# set a directory for the app
WORKDIR /usr/src/app

# copy all the files to the container
COPY . .

# install dependencies
RUN pip install --no-cache-dir scipy

# run the command
CMD ["python", "./optimize_rosenbrock.py"]

Now you build, view, and run your container with:

docker build -t markgalassi/optimize .
docker images
# (output looks like:)
# REPOSITORY             TAG        IMAGE ID       CREATED              SIZE
# markgalassi/optimize   latest     38e4688158a8   About a minute ago   1.23GB
docker run markgalassi/optimize
# (output looks like:)
# example of optimization
# Optimization terminated successfully.
#          Current function value: 0.000000
#          Iterations: 339
#          Function evaluations: 571

Note

When you are behind an http proxy you might have to add some stuff to the Dockerfile. The following lines before the first RUN statement should do it:

ENV HTTP_PROXY=http://ourproxy.mydomain.mytld:8080
ENV http_proxy=http://ourproxy.mydomain.mytld:8080
ENV HTTPS_PROXY=http://ourproxy.mydomain.mytld:8080
ENV https_proxy=http://ourproxy.mydomain.mytld:8080

11.10.2. A C program

Prepare your work space with something like this:

mkdir -p ~/work/docker/c-example
cd ~/work/docker/c-example

and in that directory place the following python program into a file called optimize_rosenbrock.py:

Listing 11.10.2.1 hello.c – A trivial C program.

#include <stdio.h>

int main()
{
  printf("hello world\n");
  return 0;
}

and then put the following text into a file called Dockerfile:

Listing 11.10.2.2 Dockerfile – A Dockerfile to build a container that just builds our trivial C program at build time, and runs it at run time.

FROM centos:7

# set a directory for the app
WORKDIR /home/mywork

# copy all the files to the container
COPY . .

# install dependencies
RUN yum install -y gcc
RUN gcc -o hello hello.c

# run the command
CMD ["./hello"]

Now you build, view, and run your container with:

docker build -t markgalassi/hello .
docker images
# (output looks like:)
# REPOSITORY             TAG        IMAGE ID       CREATED          SIZE
# markgalassi/hello      latest     0429a7c4af32   8 minutes ago    432MB
docker run markgalassi/hello
# (output looks like:)
# hello world

11.11. Thoughts on docker

11.11.1. Licensing

Docker releases its core software under a free/open-source (FOSS) license, calling it “docker-ce” (community edition).

It also distributes some add-on products under a proprietary license.

As with all software matters, I strongly recommend only using the FOSS version: containers are often a crucial piece of infrastructure, and one should not depend on the vagaries of products with a proprietary license.

11.11.2. Other approaches

11.11.2.1. Podman

Seems promising, as discussed here:

https://www.imaginarycloud.com/blog/podman-vs-docker/

https://computingforgeeks.com/using-podman-and-libpod-to-run-docker-containers/

My first attempt at using podman to run a verbatim Dockerfile ran in to some hitches on CentOS7. It is also not in the ubuntu 20.04 repositories.

11.11.2.2. CharlieCloud

https://hpc.github.io/charliecloud/

Written by Los Alamos’s very own Reid Priedhorsky

A few possibly useful citations: https://www.usenix.org/publications/login/fall2017/priedhorsky https://dl.acm.org/doi/abs/10.1145/3126908.3126925 https://dl.acm.org/doi/abs/10.1145/3458817.3476187

11.12. Nomenclature

I would love to find better ways of expressing this. It feels counterintuitive.

image: The thing that you could run.
container: The thing that is running.

11.13. A grabbag of small tricks

11.13.1. Awareness

Before you do many things you want some awareness of what’s going on:

docker ps           # show running images

You will see that a running container will have a unique id that looks like 124df05cf518. When you want to do something like kill an image you can just use the first few characters in that id string. You might recognize this shorthand approach from git.

Your to see what images are available for running you can try:

docker images       # show installed images

Individual docker image commands can be applied to the same kind of uuid.

11.13.2. Cleanup

When you have been running docker for a while you might have a lot of container images, running containers, and who knows what else that is taking up your disk space.

Some tips for cleanup are at:

https://betterprogramming.pub/docker-tips-clean-up-your-local-machine-35f370a01a78

The recipe seems to be:

docker ps           # are there any *running* images?
docker system df    # how much disk space is docker using
docker container prune # delete stopped containers and reclaim space
docker system df    # how did we do?
docker container rm -f $(docker container ls -aq)

This removed all the resources used by containers. As for images:

docker image prune  # removes "dangling" images
docker image rm $(docker image ls -q)   # removes all images
docker system df    # how did we do?
# the following is the strongest non-destructive remove command
docker system prune -a

11.13.3. An end to end example of starting and cleanup

Bring up two terminals. In one type:

docker run nginx

In the other window do:

docker ps
# output should look like:
# CONTAINER ID   IMAGE     COMMAND                  CREATED              STATUS              PORTS     NAMES
# 369ff435d18b   nginx     "/docker-entrypoint.…"   About a minute ago   Up About a minute   80/tcp    zen_wing
docker images
# REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
# nginx        latest    c316d5a335a5   2 weeks ago   142MB

To kill the running container you can run:

docker kill 369f   # the first few chars of the container id

Then you will get an empty output on docker ps. But docker images and docker system df will still show that the images are there.

So now type:

docker system prune -a
docker system df

and you will see that the space has been cleaned up.

11.14. Additional resources

How do you clean up all those huge containers and images? There is a good writeup of this at:

https://medium.com/better-programming/docker-tips-clean-up-your-local-machine-35f370a01a78

and

https://linuxize.com/post/how-to-remove-docker-images-containers-volumes-and-networks/

https://docs.docker.com/engine/reference/commandline/rmi/

https://docs.docker.com/engine/reference/commandline/image_rm/

https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes

And how do you make disk space available between the host and the container? At build time you can’t do much because the build has to be very clearly segragated from anything except the “build context” (the directory from which you run “docker build”).

At run time you can use the -v option, which works quite well to map host and container paths. But there is a wealth of other suggestions at this stackoverflow answer:

https://stackoverflow.com/a/39382248/693429

A nice discussion of how to keep your docker images small:

https://opensource.com/article/18/7/building-container-images

Discussions of “who is logged in”:

https://jtreminio.com/blog/running-docker-containers-as-current-host-user/

https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15

An article with a trick to avoid COPY and ADD, in favor of having an ad-hoc web server.

https://medium.com/ncr-edinburgh/docker-tips-tricks-516b9ba41aa2

An article with commands, tips, tricks. Mentions docker-compose

https://medium.com/@clasikas/docker-tips-tricks-or-just-useful-commands-6e1fd8220450