11. Container basics
11.1. Motivation and plan
My motivation for using containers has usually been the idea of a “clean room”: I like to quickly stand up a minimal operating system installation to test and install software.
My own development host usually has a ton of extra stuff installed, and I have probably done some custom setup as well. This means that some of these differences might bleed in to the packages I produce, which might suddenly (for example) assume a more modern version of python than what ships with the base system.
There are a few techniques to produce clean room environments. After looking at “chroot environments” and the “mock” program for building RPMs, we will explore the “docker container” approach, which has been quite successful in recent years. We will then look at some examples of things you can do with containers.
A more in-depth example of the use of containers to set up a web site is Section 12.
11.2. Prerequisites
Super user access on the machine you use.
11.3. History and concepts
Many topics in computing become huge quite quickly, and containers are one of those. Here is how some of the ideas came around.
In the late 1970s UNIX acquired a command called chroot, which allowed you to fake where the root of your filesystem is for the current process and its children. The rest of the command would only have access to that restricted area, and thus would not be able to muck with the rest of the system.
Around the year 2000 the idea was extended to that of jails in the FreeBSD operating system: these sandbox a group of processes to only be aware of each other, and to share a single IP address.
Virtualization was becoming more and more important, so all major purveyors of UNIX-like systems started introducing similar ideas.
In 2006-2008 the Linux kernel added cgroups (control groups) and kernel namespaces: a low level mechanism that allowed the introduction of lightweight virtual systems in Linux.
Soon after, in 2013, a young company called “Docker Inc.” released the docker program, which adds a convenient user layer to the Linux system calls.
After 2013 docker took off remarkably quickly and is widely used to create and manage containers, although it is not the only way of doing so.
Nowadays some alternatives are springing up. Red Hat announced in 2018 a next generation tool called podman that seems to be compatible with Docker. It is still young.
docker is available for Linux as well as proprietary operating systems, and thus can be a way of running a Linux container on a proprietary system.
docker itself is free/open-source software, but some of its extra options are not. Thus it is important to only use features of the “community edition”.
11.4. Starting small: chroot
To give a simple example of a chroot environment and what it does, try the following:
mkdir -p ~/work/chroot_jail/{bin,lib,lib64,etc,dev}
mkdir -p ~/work/chroot_jail/lib/x86_64-linux-gnu
PROGS="bash touch ls rm ps"
for prog in $PROGS
do
cp -v /bin/$prog ~/work/chroot_jail/bin/
done
# now we need some shared libraries for these commands to use
# for example, try "ldd /bin/bash" to see how to find these
for prog in $PROGS
do
echo "======== $prog ========"
shlib_list="$(ldd /bin/$prog | egrep -o '/lib.*\.[0-9]*')"
echo $shlib_list
cp -v $shlib_list ~/work/chroot_jail/lib/x86_64-linux-gnu/ # debian/ubuntu
cp -v $shlib_list ~/work/chroot_jail/lib64/ # fedora/redhat
done
Now you’ve set things up by putting those very few programs and libraries that are needed to play around. Examine your work area with:
tree ~/work/chroot_jail
and that is the full system you will get to work with when you run the following commands:
sudo chroot ~/work/chroot_jail /bin/bash
# now we can do things from bash
echo "dude, I'm super user - let me see what is in /etc and /dev"
echo $USER
/bin/ls /etc
/bin/ls /dev
echo "dude, if now I try to do mean and nasty things to the system"
echo "bad stuff" > /dev/bogus_file
/bin/ls -l /dev/bogus_file
Now in another terminal (where you have not run chroot) take a look at /dev/bogus_file and you will see that it’s not there.
In your chroot shell notice that you can’t just type ls
or ps
– you have to type /bin/ls
and /bin/ps
. This is because you
do not have a PATH
environment variable. You can do:
export PATH=/bin
ls
ps
You will see that the ps
command does not work as it is: you need
the mount
command to mount the /proc
filesystem. Go ahead and
exit the chroot-ed shell and re-run the setup with more programs in
the PROGS
variable – we will add mount, umount, and mkdir:
# after exiting the chroot-ed bash shell:
PROGS="bash touch ls rm ps mount umount mkdir"
for prog in $PROGS
do
cp -v /bin/$prog ~/work/chroot_jail/bin/
done
# now we need some shared libraries for these commands to use
# for example, try "ldd /bin/bash" to see how to find these
for prog in $PROGS
do
echo "======== $prog ========"
shlib_list="$(ldd /bin/$prog | egrep -o '/lib.*\.[0-9]*')"
echo $shlib_list
cp -v $shlib_list ~/work/chroot_jail/lib/x86_64-linux-gnu/ # debian/ubuntu
cp -v $shlib_list ~/work/chroot_jail/lib64/ # fedora/redhat
done
Now you can re-run your chroot command:
sudo chroot ~/work/chroot_jail /bin/bash
export PATH=/bin
ps
# after the error output we run the mount command they gave us:
mkdir /proc
mount -t proc proc /proc
ps
Now we have been able to see some of the processes on our system. Let us try the super duper loaded ps command to see all process in the system:
ps -wwaux
What you see here is that all the processes on the system show up. This means that we do not have a truly segmented system.
Warning
The chroot environment does not give full isolation from the
running operating system: only from the filesystem. This means
that, if the chroot environment had a kill
command, you could
kill random people’s processes. And do other mean and nasty
things. Conclusion: the chroot environment is not the solution to
full segmentation so that you can run potentially hostile programs.
A final reflection on using chroot for segmentation: you saw that there was a lot of work that needed to be done to bring in
11.5. Using mock to build RPMs and other packages
See tutorials at:
https://blog.packagecloud.io/building-rpm-packages-with-mock/
https://rpm-software-management.github.io/mock/
https://fedoraproject.org/wiki/Using_Mock_to_test_package_builds
All of these need some updates, and need to account for some possible discrepancy in how the .spec file is saved in the .src.rpm file in meson builds. More on this later.
11.6. Simple examples of docker
Note
You need to make sure that docker is set up well on your computer. The docker documentation is surprisingly good for such a complex topic, and you should be able to get it going well. If you are behind some kind of firewall then you will need to find out how to handle that, once again there are good guides to dealing with it.
11.6.1. CentOS7 bare
Do you want to run a minimal CentOS7 system? Just type:
$ docker run -it centos:7
Unable to find image 'centos:7' locally
7: Pulling from library/centos
2d473b07cdd5: Pull complete
Digest: sha256:9d4bcbbb213dfd745b58be38b13b996ebb5ac315fe75711bd618426a630e0987
Status: Downloaded newer image for centos:7
[root@68f16c54e92d /]#
It might have taken a few seconds to download the image and then start
it up. Let us exit by typing exit
and then run it again:
[root@68f16c54e92d /]# exit
$ docker run -it centos:7
[root@28dd5290c67f /]#
and this time it was instantaneous.
Try running the most basic commands, like:
df -h
du --max-depth 2 /
ls /usr/bin
rpm -qa
rpm -qa | wc -l
# output is 148
and you will see that there are 148 packages installed. Now go to your fully featured CentOS7 host on which you do advanced software development and you’ll see that you can have some 2000 pacakges (I have 2493 on my main CentOS7 development host as I write this).
- Q:
Wait, I just typed this
docker run -it centos:7
and it started running a CentOS7 host in no time at all. How does that happen?- A:
You have hit on what I think has made docker (and probably other container approaches) so useful: optimization of size and startup time. This happens at two levels: (a) the actual size of the CentOS7 operating system distribution, and (b) the collection of techniques used by the docker software to boot this image very quickly. We will talk about this optimization more.
- Q:
How well is this container segmented from the rest of the system?
- A:
Try it out: run the container with
docker run -it centos:7
and then runps -wwaux
and you will notice that you only see the processes associated with this container. More detail: the host can see all container processes, but the containers can only see their own. This (and other such touches allowed by the cgroups and kernel namespace features in modern Linux) is much more protection than you get with chroot.
Let us now say say we want to use software development tools on our CentOS7 container. By typing:
[root@28dd5290c67f /]# python3
bash: python3: command not found
[root@28dd5290c67f /]# python
Python 2.7.5 (default, Oct 14 2020, 14:45:30)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
we notice the insanity that CentOS7 ships with python2 and not python3 (yes, CentOS7 is well past end of life at the time I write this – 2022-02-09).
To install things off the web we will need to configure proxy environment variables (if we are behind a firewall). This could look something like:
[root@28dd5290c67f /]# export http_proxy=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export https_proxy=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export no_proxy='localhost,mydomain.mytld'
[root@28dd5290c67f /]# export HTTP_PROXY=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export HTTPS_PROXY=http://ourproxy.mydomain.mytld:8080
[root@28dd5290c67f /]# export NO_PROXY='localhost,mydomain.mytld'
[...]
Then to install python3 (and for good measure emacs and texlive-latex):
[root@28dd5290c67f /]# yum install python3 emacs texlive-latex
[... lots of time ... maybe best to just install python3!]
[root@28dd5290c67f /]# rpm -qa | wc
[root@28dd5290c67f /]# 416
[root@28dd5290c67f /]# python3
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
So we started from a bare (even more bare than the minimal host install) el7, but we can do any amount of installing and get our system to be as rich in tools as any development host. We see that in installing python3, emacs, and texlive-latex, we went from 148 packages to 416 packages.
11.6.2. Just python
Try this:
$ docker run -it python
Unable to find image 'python:latest' locally
latest: Pulling from library/python
[...]
>>>
the python image seemed big and to take a long time. It turns out they have slimmer containers with Python. For example:
$ docker run -it python:3-alpine
Unable to find image 'python:latest' locally
latest: Pulling from library/python
[...]
>>>
The docker web site discusses this image at https://hub.docker.com/_/python and gives some examples of what you can do with it, and how you can adapt it to be a simple appliance that runs your own python program.
11.6.3. Alpine: a really small linux image
$ docker run -it alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
59bf1c3509f3: Pull complete
Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
Status: Downloaded newer image for alpine:latest
/ #
The minimum image for CentOS7 was 203MB. The minimum for Alpine linux is 5.59MB - about 36 times smaller.
Alpine is often a base for specialized containers. You add to it with
the apk
package command. For example:
apk update # before you can run other things
apk list
apk list | wc
apk add python3 emacs
This installs quite quickly. Alpine has quite a few packages. If you want to find MariaDB you can do:
apk update
apk list | grep -i mariadb
Many of the packages specialized containers are based on Alpine, since it is so small.
11.6.4. How optimized are they for size?
Let us talk about how much space these images use up, which relates to how much agility you will have in building and shipping them.
The CentOS7 is clearly not the 4.4 gigabyte .iso
file that you get from
the CentOS7 download page: it downloaded in a few seconds, and booted
even more quickly. There is a collection of “standard docker images”
that are very carefully curated to have a minimal (but automatically
expandable) basis for that operating system distribution and tools.
To see how much space is used you can use the command docker images
:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
python latest dfce7257b7ba 2 days ago 917MB
python 3-alpine c7100ae3ac4d 5 days ago 48.7MB
busybox latest ec3f0931a6e6 5 days ago 1.24MB
nginx latest c316d5a335a5 2 weeks ago 142MB
alpine latest c059bfaa849c 2 months ago 5.59MB
centos 7 7e6257c9f8d8 18 months ago 203MB
My stream-of-consciousness take on this output is:
Centos7 (203MB) is much smaller than the full 4.4GB installation image, but wow! it’s much bigger than Alpine – so you can really fit a Linux starting point in 5.59MG? Hmm, but once you pack it up with the nginx web server it starts getting there at 142MB. And why on earth is python so massive at 917MB? Ah, there is the much smaller python:3-alpine, which starts from Alpine linux instead of Debian or CentOS, and is thus much smaller.
11.7. Specific pre-existing containers
https://awesome-docker.netlify.app/
https://hub.docker.com/ – then follow the Explore link at the top
You will find:
Ubuntu
Debian
Fedora
Python
nginx
MariaDB
CentOS
PostgreSQL
wordpress
Alpine
These are “official images”. This usually means that the team that produces that product has worked with the docker maintainers to ship images that are truly minimal (but easily expandable).
There are many many others. It is worth going through the list, but you might want to restrict it to “Verified Publisher” and “Official Images”.
11.9. Container orchestration
This is covered in much more detail in the next chapter, but my basic take on container orchestration is:
Why do you need it? Because a big plus with containers is specialization. You can have a container that just runs a web server, and another that runs a database. The web server will need to make database calls and to pass that information back.
This interaction between different containers is called container
orchestration. There are a few different ways people have come up
with to do this in docker. I have not needed to go beyond the
docker-compose
approach.
I give an example of orchestration with docker-compose in Section 12.
11.10. Creating your own container
A key thing to remember about containers is that when you exit they simply wink out of existence. If you had started from a centos:7 image, and then you had added C compilers and other tools, then you will have to start again next time!
This can take a long time, so there is a way of preparing a custom image, starting from a previous image and installing/configuring your software prerequisites.
To do this you create a file called Dockerfile
which specifies
what base container you start from, and what commands to run to build
it up to what we want.
11.10.1. A python program
Prepare your work space with something like this:
mkdir -p ~/work/docker/python-example
cd ~/work/docker/python-example
and in that directory place the following python program into a file called optimize_rosenbrock.py:
#! /usr/bin/env python3
import numpy as np
from scipy.optimize import minimize
def main():
print('example of optimization')
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='nelder-mead',
options={'xatol': 1e-8, 'disp': True})
def rosen(x):
"""The Rosenbrock function"""
return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)
if __name__ == '__main__':
main()
Make this program executable with chmod +x optimize_rosenbrock.py
and then put the following text into a file called Dockerfile:
# I have had some trouble with the python:3-alpine base container, so
# I use python:3-slim
FROM python:3-slim
# set a directory for the app
WORKDIR /usr/src/app
# copy all the files to the container
COPY . .
# install dependencies
RUN pip install --no-cache-dir scipy
# run the command
CMD ["python", "./optimize_rosenbrock.py"]
Now you build, view, and run your container with:
docker build -t markgalassi/optimize .
docker images
# (output looks like:)
# REPOSITORY TAG IMAGE ID CREATED SIZE
# markgalassi/optimize latest 38e4688158a8 About a minute ago 1.23GB
docker run markgalassi/optimize
# (output looks like:)
# example of optimization
# Optimization terminated successfully.
# Current function value: 0.000000
# Iterations: 339
# Function evaluations: 571
Note
When you are behind an http proxy you might have to add some stuff to the Dockerfile. The following lines before the first RUN statement should do it:
ENV HTTP_PROXY=http://ourproxy.mydomain.mytld:8080
ENV http_proxy=http://ourproxy.mydomain.mytld:8080
ENV HTTPS_PROXY=http://ourproxy.mydomain.mytld:8080
ENV https_proxy=http://ourproxy.mydomain.mytld:8080
11.10.2. A C program
Prepare your work space with something like this:
mkdir -p ~/work/docker/c-example
cd ~/work/docker/c-example
and in that directory place the following python program into a file called optimize_rosenbrock.py:
#include <stdio.h>
int main()
{
printf("hello world\n");
return 0;
}
and then put the following text into a file called Dockerfile:
FROM centos:7
# set a directory for the app
WORKDIR /home/mywork
# copy all the files to the container
COPY . .
# install dependencies
RUN yum install -y gcc
RUN gcc -o hello hello.c
# run the command
CMD ["./hello"]
Now you build, view, and run your container with:
docker build -t markgalassi/hello .
docker images
# (output looks like:)
# REPOSITORY TAG IMAGE ID CREATED SIZE
# markgalassi/hello latest 0429a7c4af32 8 minutes ago 432MB
docker run markgalassi/hello
# (output looks like:)
# hello world
11.11. Thoughts on docker
11.11.1. Licensing
Docker releases its core software under a free/open-source (FOSS) license, calling it “docker-ce” (community edition).
It also distributes some add-on products under a proprietary license.
As with all software matters, I strongly recommend only using the FOSS version: containers are often a crucial piece of infrastructure, and one should not depend on the vagaries of products with a proprietary license.
11.11.2. Other approaches
11.11.2.1. Podman
Seems promising, as discussed here:
https://www.imaginarycloud.com/blog/podman-vs-docker/
https://computingforgeeks.com/using-podman-and-libpod-to-run-docker-containers/
My first attempt at using podman to run a verbatim Dockerfile ran in to some hitches on CentOS7. It is also not in the ubuntu 20.04 repositories.
11.11.2.2. CharlieCloud
https://hpc.github.io/charliecloud/
Written by Los Alamos’s very own Reid Priedhorsky
A few possibly useful citations: https://www.usenix.org/publications/login/fall2017/priedhorsky https://dl.acm.org/doi/abs/10.1145/3126908.3126925 https://dl.acm.org/doi/abs/10.1145/3458817.3476187
11.12. Nomenclature
I would love to find better ways of expressing this. It feels counterintuitive.
- image
The thing that you could run.
- container
The thing that is running.
11.13. A grabbag of small tricks
11.13.1. Awareness
Before you do many things you want some awareness of what’s going on:
docker ps # show running images
You will see that a running container will have a unique id that
looks like 124df05cf518
. When you want to do something like kill
an image you can just use the first few characters in that id string.
You might recognize this shorthand approach from git.
Your to see what images are available for running you can try:
docker images # show installed images
Individual docker image
commands can be applied to the same kind
of uuid.
11.13.2. Cleanup
When you have been running docker for a while you might have a lot of container images, running containers, and who knows what else that is taking up your disk space.
Some tips for cleanup are at:
https://betterprogramming.pub/docker-tips-clean-up-your-local-machine-35f370a01a78
The recipe seems to be:
docker ps # are there any *running* images?
docker system df # how much disk space is docker using
docker container prune # delete stopped containers and reclaim space
docker system df # how did we do?
docker container rm -f $(docker container ls -aq)
This removed all the resources used by containers. As for images:
docker image prune # removes "dangling" images
docker image rm $(docker image ls -q) # removes all images
docker system df # how did we do?
# the following is the strongest non-destructive remove command
docker system prune -a
11.13.3. An end to end example of starting and cleanup
Bring up two terminals. In one type:
docker run nginx
In the other window do:
docker ps
# output should look like:
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# 369ff435d18b nginx "/docker-entrypoint.…" About a minute ago Up About a minute 80/tcp zen_wing
docker images
# REPOSITORY TAG IMAGE ID CREATED SIZE
# nginx latest c316d5a335a5 2 weeks ago 142MB
To kill the running container you can run:
docker kill 369f # the first few chars of the container id
Then you will get an empty output on docker ps
. But docker
images
and docker system df
will still show that the images are
there.
So now type:
docker system prune -a
docker system df
and you will see that the space has been cleaned up.
11.14. Additional resources
How do you clean up all those huge containers and images? There is a good writeup of this at:
https://medium.com/better-programming/docker-tips-clean-up-your-local-machine-35f370a01a78
and
https://linuxize.com/post/how-to-remove-docker-images-containers-volumes-and-networks/
https://docs.docker.com/engine/reference/commandline/rmi/
https://docs.docker.com/engine/reference/commandline/image_rm/
https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes
And how do you make disk space available between the host and the container? At build time you can’t do much because the build has to be very clearly segragated from anything except the “build context” (the directory from which you run “docker build”).
At run time you can use the -v option, which works quite well to map host and container paths. But there is a wealth of other suggestions at this stackoverflow answer:
https://stackoverflow.com/a/39382248/693429
A nice discussion of how to keep your docker images small:
https://opensource.com/article/18/7/building-container-images
Discussions of “who is logged in”:
https://jtreminio.com/blog/running-docker-containers-as-current-host-user/
https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15
An article with a trick to avoid COPY and ADD, in favor of having an ad-hoc web server.
https://medium.com/ncr-edinburgh/docker-tips-tricks-516b9ba41aa2
An article with commands, tips, tricks. Mentions docker-compose
https://medium.com/@clasikas/docker-tips-tricks-or-just-useful-commands-6e1fd8220450