.. _sec-container-basics:

==================
 Container basics
==================

Motivation and plan
===================

My motivation for using containers has usually been the idea of a
"clean room": I like to quickly stand up a *minimal* operating system
installation to test and install software.

My own development host usually has a ton of extra stuff installed,
and I have probably done some custom setup as well.  This means that
some of these differences might *bleed* in to the packages I produce,
which might suddenly (for example) assume a more modern version of
python than what ships with the base system.

There are a few techniques to produce clean room environments.  After
looking at "chroot environments" and the "mock" program for building
RPMs, we will explore the "docker container" approach, which has been
quite successful in recent years.  We will then look at some examples
of things you can do with containers.

A more in-depth example of the use of containers to set up a web site
is :numref:`sec-services-with-containers`.


Prerequisites
=============

Super user access on the machine you use.


History and concepts
====================

Many topics in computing become huge quite quickly, and containers are
one of those.  Here is how some of the ideas came around.

In the late 1970s UNIX acquired a command called **chroot**, which
allowed you to fake where the root of your filesystem is for the
current process and its children.  The rest of the command would only
have access to that restricted area, and thus would not be able to
muck with the rest of the system.

Around the year 2000 the idea was extended to that of **jails** in the
FreeBSD operating system: these sandbox a group of processes to only
be aware of each other, and to share a single IP address.

Virtualization was becoming more and more important, so all major
purveyors of UNIX-like systems started introducing similar ideas.

In 2006-2008 the Linux kernel added **cgroups** (control groups) and
**kernel namespaces**: a low level mechanism that allowed the
introduction of lightweight virtual systems in Linux.

Soon after, in 2013, a young company called "Docker Inc." released the
**docker** program, which adds a convenient user layer to the Linux
system calls.

After 2013 docker took off remarkably quickly and is widely used to
create and manage containers, although it is not the only way of doing
so.

Nowadays some alternatives are springing up.  Red Hat announced in
2018 a next generation tool called **podman** that seems to be
compatible with Docker.  It is still young.

docker is available for Linux as well as proprietary operating
systems, and thus can be a way of running a Linux container on a
proprietary system.

docker itself is free/open-source software, but some of its extra
options are not.  Thus it is important to only use features of the
"community edition".


Starting small: chroot
======================

To give a simple example of a chroot environment and what it does, try
the following:

::

   mkdir -p ~/work/chroot_jail/{bin,lib,lib64,etc,dev}
   mkdir -p ~/work/chroot_jail/lib/x86_64-linux-gnu
   PROGS="bash touch ls rm ps"
   for prog in $PROGS
   do
       cp -v /bin/$prog ~/work/chroot_jail/bin/
   done
   # now we need some shared libraries for these commands to use
   # for example, try "ldd /bin/bash" to see how to find these
   for prog in $PROGS
   do
       echo "======== $prog ========"
       shlib_list="$(ldd /bin/$prog | egrep -o '/lib.*\.[0-9]*')"
       echo $shlib_list
       cp -v $shlib_list ~/work/chroot_jail/lib/x86_64-linux-gnu/   # debian/ubuntu
       cp -v $shlib_list ~/work/chroot_jail/lib64/                  # fedora/redhat
   done

Now you've set things up by putting those very few programs and
libraries that are needed to play around.  Examine your work area
with:

::

   tree ~/work/chroot_jail

and that is the full system you will get to work with when you run the
following commands:

::

   sudo chroot ~/work/chroot_jail /bin/bash
   # now we can do things from bash
   echo "dude, I'm super user - let me see what is in /etc and /dev"
   echo $USER
   /bin/ls /etc
   /bin/ls /dev
   echo "dude, if now I try to do mean and nasty things to the system"
   echo "bad stuff" > /dev/bogus_file
   /bin/ls -l /dev/bogus_file

Now in another terminal (where you have not run chroot) take a look at
/dev/bogus_file and you will see that it's not there.

In your chroot shell notice that you can't just type ``ls`` or ``ps``
-- you have to type ``/bin/ls`` and ``/bin/ps``.  This is because you
do not have a ``PATH`` environment variable.  You can do:

::

   export PATH=/bin
   ls
   ps

You will see that the ``ps`` command does not work as it is: you need
the ``mount`` command to mount the ``/proc`` filesystem.  Go ahead and
exit the chroot-ed shell and re-run the setup with more programs in
the ``PROGS`` variable -- we will add mount, umount, and mkdir:

::

   # after exiting the chroot-ed bash shell:
   PROGS="bash touch ls rm ps mount umount mkdir"
   for prog in $PROGS
   do
       cp -v /bin/$prog ~/work/chroot_jail/bin/
   done
   # now we need some shared libraries for these commands to use
   # for example, try "ldd /bin/bash" to see how to find these
   for prog in $PROGS
   do
       echo "======== $prog ========"
       shlib_list="$(ldd /bin/$prog | egrep -o '/lib.*\.[0-9]*')"
       echo $shlib_list
       cp -v $shlib_list ~/work/chroot_jail/lib/x86_64-linux-gnu/   # debian/ubuntu
       cp -v $shlib_list ~/work/chroot_jail/lib64/                  # fedora/redhat
   done

Now you can re-run your chroot command:

::

   sudo chroot ~/work/chroot_jail /bin/bash
   export PATH=/bin
   ps
   # after the error output we run the mount command they gave us:
   mkdir /proc
   mount -t proc proc /proc
   ps

Now we have been able to see some of the processes on our system.  Let
us try the super duper loaded ps command to see all process in the
system:

::

   ps -wwaux

What you see here is that *all* the processes on the system show up.
This means that we do not have a truly segmented system.

.. warning::

   The **chroot** environment does not give full isolation from the
   running operating system: only from the filesystem.  This means
   that, if the chroot environment had a ``kill`` command, you could
   kill random people's processes.  And do other mean and nasty
   things.  Conclusion: the chroot environment is not the solution to
   full segmentation so that you can run potentially hostile programs.

A final reflection on using chroot for segmentation: you saw that
there was a lot of work that needed to be done to bring in 


Using mock to build RPMs and other packages
===========================================

See tutorials at:

https://blog.packagecloud.io/building-rpm-packages-with-mock/

https://rpm-software-management.github.io/mock/

https://fedoraproject.org/wiki/Using_Mock_to_test_package_builds

All of these need some updates, and need to account for some possible
discrepancy in how the .spec file is saved in the .src.rpm file in
meson builds.  More on this later.


Simple examples of docker
=========================

.. note::

   You need to make sure that docker is set up well on your computer.
   The docker documentation is surprisingly good for such a complex
   topic, and you should be able to get it going well.  If you are
   behind some kind of firewall then you will need to find out how to
   handle that, once again there are good guides to dealing with it.

CentOS7 bare
------------

Do you want to run a minimal CentOS7 system?  Just type:

::

   $ docker run -it centos:7
   Unable to find image 'centos:7' locally
   7: Pulling from library/centos
   2d473b07cdd5: Pull complete 
   Digest: sha256:9d4bcbbb213dfd745b58be38b13b996ebb5ac315fe75711bd618426a630e0987
   Status: Downloaded newer image for centos:7
   [root@68f16c54e92d /]#

It might have taken a few seconds to download the image and then start
it up.  Let us exit by typing ``exit`` and then run it again:

::

   [root@68f16c54e92d /]# exit
   $ docker run -it centos:7
   [root@28dd5290c67f /]# 

and this time it was instantaneous.

Try running the most basic commands, like:

::

   df -h
   du --max-depth 2 /
   ls /usr/bin
   rpm -qa
   rpm -qa | wc -l
   # output is 148

and you will see that there are 148 packages installed.  Now go to
your fully featured CentOS7 host on which you do advanced software
development and you'll see that you can have some 2000 pacakges (I
have 2493 on my main CentOS7 development host as I write this).

Q:
   Wait, I just typed this ``docker run -it centos:7`` and it started
   running a CentOS7 host in no time at all.  How does that happen?
A:
   You have hit on what I think has made docker (and probably other
   container approaches) so useful: *optimization of size and startup
   time.*  This happens at two levels: (a) the actual size of the
   CentOS7 operating system distribution, and (b) the collection of
   techniques used by the docker software to boot this image very
   quickly.  We will talk about this optimization more.

Q:
   How well is this container segmented from the rest of the system?
A:
   Try it out: run the container with ``docker run -it centos:7`` and then
   run ``ps -wwaux`` and you will notice that you only see the
   processes associated with this container.  More detail: the host
   can see all container processes, but the containers can only see
   their own.  This (and other such touches allowed by the cgroups and
   kernel namespace features in modern Linux) is much more
   protection than you get with chroot.

Let us now say say we want to use software development tools on our
CentOS7 container.  By typing:

::

   [root@28dd5290c67f /]# python3
   bash: python3: command not found
   [root@28dd5290c67f /]# python 
   Python 2.7.5 (default, Oct 14 2020, 14:45:30) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
   Type "help", "copyright", "credits" or "license" for more information.
   >>> 

we notice the insanity that CentOS7 ships with python2 and not python3
(yes, CentOS7 is well past end of life at the time I write this --
2022-02-09).

To install things off the web we will need to configure proxy
environment variables (if we are behind a firewall).  This could look
something like:

::

   [root@28dd5290c67f /]# export http_proxy=http://ourproxy.mydomain.mytld:8080
   [root@28dd5290c67f /]# export https_proxy=http://ourproxy.mydomain.mytld:8080
   [root@28dd5290c67f /]# export no_proxy='localhost,mydomain.mytld'
   [root@28dd5290c67f /]# export HTTP_PROXY=http://ourproxy.mydomain.mytld:8080
   [root@28dd5290c67f /]# export HTTPS_PROXY=http://ourproxy.mydomain.mytld:8080
   [root@28dd5290c67f /]# export NO_PROXY='localhost,mydomain.mytld'
   [...]

Then to install python3 (and for good measure emacs and texlive-latex):

::

   [root@28dd5290c67f /]# yum install python3 emacs texlive-latex
   [... lots of time ... maybe best to just install python3!]
   [root@28dd5290c67f /]# rpm -qa | wc
   [root@28dd5290c67f /]# 416
   [root@28dd5290c67f /]# python3
   Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
   Type "help", "copyright", "credits" or "license" for more information.

So we started from a bare (even more bare than the minimal host
install) el7, but we can do any amount of installing and get our
system to be as rich in tools as any development host.  We see that in
installing python3, emacs, and texlive-latex, we went from 148
packages to 416 packages.


Just python
-----------

Try this:

::

   $ docker run -it python
   Unable to find image 'python:latest' locally
   latest: Pulling from library/python
   [...]
   >>>

the python image seemed big and to take a long time.  It turns out
they have slimmer containers with Python.  For example:

::

   $ docker run -it python:3-alpine
   Unable to find image 'python:latest' locally
   latest: Pulling from library/python
   [...]
   >>>

The docker web site discusses this image at
https://hub.docker.com/_/python and gives some examples of what you
can do with it, and how you can adapt it to be a simple appliance that
runs your own python program.

Alpine: a really small linux image
----------------------------------

::

   $ docker run -it alpine
   Unable to find image 'alpine:latest' locally
   latest: Pulling from library/alpine
   59bf1c3509f3: Pull complete 
   Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
   Status: Downloaded newer image for alpine:latest
   / # 

The minimum image for CentOS7 was 203MB.  The minimum for Alpine linux
is 5.59MB - about 36 times smaller.

Alpine is often a base for specialized containers.  You add to it with
the ``apk`` package command.  For example:

::

   apk update    # before you can run other things
   apk list
   apk list | wc
   apk add python3 emacs

This installs quite quickly.  Alpine has quite a few packages.  If you
want to find MariaDB you can do:

::

   apk update
   apk list | grep -i mariadb

Many of the packages specialized containers are based on Alpine, since
it is so small.


How optimized are they for size?
--------------------------------

Let us talk about how much space these images use up, which relates to
how much agility you will have in building and shipping them.

The CentOS7  is clearly not the 4.4 gigabyte ``.iso`` file that you get from
the CentOS7 download page: it downloaded in a few seconds, and booted
even more quickly.  There is a collection of "standard docker images"
that are very carefully curated to have a minimal (but automatically
expandable) basis for that operating system distribution and tools.

To see how much space is used you can use the command ``docker images``:

::

   $ docker images
   REPOSITORY                      TAG       IMAGE ID       CREATED         SIZE
   python                          latest    dfce7257b7ba   2 days ago      917MB
   python                          3-alpine  c7100ae3ac4d   5 days ago      48.7MB
   busybox                         latest    ec3f0931a6e6   5 days ago      1.24MB
   nginx                           latest    c316d5a335a5   2 weeks ago     142MB
   alpine                          latest    c059bfaa849c   2 months ago    5.59MB
   centos                          7         7e6257c9f8d8   18 months ago   203MB

My stream-of-consciousness take on this output is:

   Centos7 (203MB) is much smaller than the full 4.4GB installation
   image, but wow! it's much bigger than Alpine -- so you can really
   fit a Linux starting point in 5.59MG?  Hmm, but once you pack it up
   with the nginx web server it starts getting there at 142MB.  And
   why on earth is python so massive at 917MB?  Ah, there is the much
   smaller python:3-alpine, which starts from Alpine linux instead of
   Debian or CentOS, and is thus much smaller.


Specific pre-existing containers
================================

https://awesome-docker.netlify.app/

https://hub.docker.com/ -- then follow the Explore link at the top

You will find:

.. container:: twocol

    .. container:: leftside

       * Ubuntu
       * Debian
       * Fedora
       * Python
       * nginx

    .. container:: rightside

       * MariaDB
       * CentOS
       * PostgreSQL
       * wordpress
       * Alpine

These are "official images".  This usually means that the team that
produces that product has worked with the docker maintainers to ship
images that are truly minimal (but easily expandable).

There are many many others.  It is worth going through the list, but
you might want to restrict it to "Verified Publisher" and "Official
Images".


Sharing files between host and container
========================================

If you want the container to access your files, or if you want to pick
up files that were created by the container, you need some machinery
we have not yet discussed.

There are various methods for doing this, with different levels of
complexity.  I personally have not needed to go beyond the use of the
``-v`` option to docker.

For example:

::

   docker run -it -v /home/$USER:/home_$USER centos:7
   [root@06de48420246 /]# ls /home_markgalassi/
   # and this listing shows all the files in /home/markgalassi

The typical way I have used the ``-v`` option is to stage things by
putting the files I need in some fresh directory in /tmp, then to
mount it with the ``-v`` option.


Container orchestration
=======================

This is covered in much more detail in the next chapter, but my basic
take on container orchestration is:

Why do you need it?  Because a big plus with containers is
specialization.  You can have a container that just runs a web server,
and another that runs a database.  The web server will need to make
database calls and to pass that information back.

This interaction between different containers is called *container
orchestration*.  There are a few different ways people have come up
with to do this in docker.  I have not needed to go beyond the
``docker-compose`` approach.

I give an example of orchestration with docker-compose in
:numref:`sec-services-with-containers`.


Creating your own container
===========================

A key thing to remember about containers is that when you exit they
simply wink out of existence.  If you had started from a centos:7
image, and then you had added C compilers and other tools, then you
will have to start again next time!

This can take a long time, so there is a way of preparing a *custom
image*, starting from a previous image and installing/configuring your
software prerequisites.

To do this you create a file called ``Dockerfile`` which specifies
what base container you start from, and what commands to run to build
it up to what we want.


A python program
----------------

Prepare your work space with something like this:

::

   mkdir -p ~/work/docker/python-example
   cd ~/work/docker/python-example

and in that directory place the following python program into a file
called optimize_rosenbrock.py:

.. literalinclude:: optimize_rosenbrock.py
   :language: python
   :caption: optmize_rosenbrock.py -- A brief scipy demonstration
             program from the scipy documentation.

Make this program executable with ``chmod +x optimize_rosenbrock.py``
and then put the following text into a file called Dockerfile:

.. literalinclude:: Dockerfile-python-example
   :language: bash
   :caption: Dockerfile -- A Dockerfile to build a container that just
             has our trivial python program.

Now you build, view, and run your container with:

::

   docker build -t markgalassi/optimize .
   docker images
   # (output looks like:)
   # REPOSITORY             TAG        IMAGE ID       CREATED              SIZE
   # markgalassi/optimize   latest     38e4688158a8   About a minute ago   1.23GB
   docker run markgalassi/optimize
   # (output looks like:)
   # example of optimization
   # Optimization terminated successfully.
   #          Current function value: 0.000000
   #          Iterations: 339
   #          Function evaluations: 571

.. note::

   When you are behind an http proxy you might have to add some stuff
   to the Dockerfile.  The following lines before the first RUN
   statement should do it:

   .. code-block:: bash

      ENV HTTP_PROXY=http://ourproxy.mydomain.mytld:8080
      ENV http_proxy=http://ourproxy.mydomain.mytld:8080
      ENV HTTPS_PROXY=http://ourproxy.mydomain.mytld:8080
      ENV https_proxy=http://ourproxy.mydomain.mytld:8080


A C program
-----------

Prepare your work space with something like this:

::

   mkdir -p ~/work/docker/c-example
   cd ~/work/docker/c-example

and in that directory place the following python program into a file
called optimize_rosenbrock.py:

.. literalinclude:: hello.c
   :language: c
   :caption: hello.c -- A trivial C program.

and then put the following text into a file called Dockerfile:

.. literalinclude:: Dockerfile-c-example
   :language: bash
   :caption: Dockerfile -- A Dockerfile to build a container that just
             builds our trivial C program at build time, and runs it
             at run time.

Now you build, view, and run your container with:

::

   docker build -t markgalassi/hello .
   docker images
   # (output looks like:)
   # REPOSITORY             TAG        IMAGE ID       CREATED          SIZE
   # markgalassi/hello      latest     0429a7c4af32   8 minutes ago    432MB
   docker run markgalassi/hello
   # (output looks like:)
   # hello world


Thoughts on docker
==================

Licensing
---------

Docker releases its core software under a free/open-source (FOSS)
license, calling it "docker-ce" (community edition).

It also distributes some add-on products under a proprietary license.

As with all software matters, I strongly recommend only using the FOSS
version: containers are often a crucial piece of infrastructure, and
one should not depend on the vagaries of products with a proprietary
license.

Other approaches
----------------

Podman
++++++

Seems promising, as discussed here:

https://www.imaginarycloud.com/blog/podman-vs-docker/

https://computingforgeeks.com/using-podman-and-libpod-to-run-docker-containers/

My first attempt at using podman to run a verbatim Dockerfile ran in
to some hitches on CentOS7.  It is also not in the ubuntu 20.04
repositories.

CharlieCloud
++++++++++++

https://hpc.github.io/charliecloud/

Written by Los Alamos's very own Reid Priedhorsky

A few possibly useful citations:
https://www.usenix.org/publications/login/fall2017/priedhorsky
https://dl.acm.org/doi/abs/10.1145/3126908.3126925
https://dl.acm.org/doi/abs/10.1145/3458817.3476187


Nomenclature
============

I would love to find better ways of expressing this.  It feels
counterintuitive.

image
   The thing that you *could* run.
container
   The thing that *is* running.


A grabbag of small tricks
=========================

Awareness
---------

Before you do many things you want some awareness of what's going on:

::

   docker ps           # show running images

You will see that a *running* container will have a unique id that
looks like ``124df05cf518``.  When you want to do something like kill
an image you can just use the first few characters in that id string.
You might recognize this shorthand approach from git.

Your to see what images are *available* for running you can try:

::

   docker images       # show installed images

Individual ``docker image`` commands can be applied to the same kind
of uuid.


Cleanup
-------

When you have been running docker for a while you might have a lot of
container images, running containers, and who knows what else that is
taking up your disk space.

Some tips for cleanup are at:

https://betterprogramming.pub/docker-tips-clean-up-your-local-machine-35f370a01a78


The recipe seems to be:

::

   docker ps           # are there any *running* images?
   docker system df    # how much disk space is docker using
   docker container prune # delete stopped containers and reclaim space
   docker system df    # how did we do?
   docker container rm -f $(docker container ls -aq)

This removed all the resources used by *containers*.  As for images:

::

   docker image prune  # removes "dangling" images
   docker image rm $(docker image ls -q)   # removes all images
   docker system df    # how did we do?
   # the following is the strongest non-destructive remove command
   docker system prune -a


An end to end example of starting and cleanup
---------------------------------------------

Bring up two terminals.  In one type:

::

   docker run nginx

In the other window do:

::

   docker ps
   # output should look like:
   # CONTAINER ID   IMAGE     COMMAND                  CREATED              STATUS              PORTS     NAMES
   # 369ff435d18b   nginx     "/docker-entrypoint.…"   About a minute ago   Up About a minute   80/tcp    zen_wing
   docker images
   # REPOSITORY   TAG       IMAGE ID       CREATED       SIZE
   # nginx        latest    c316d5a335a5   2 weeks ago   142MB

To kill the running container you can run:

::

   docker kill 369f   # the first few chars of the container id

Then you will get an empty output on ``docker ps``.  But ``docker
images`` and ``docker system df`` will still show that the images are
there.

So now type:

::

   docker system prune -a
   docker system df

and you will see that the space has been cleaned up.


Additional resources
====================

How do you clean up all those huge containers and images?  There is a
good writeup of this at:

https://medium.com/better-programming/docker-tips-clean-up-your-local-machine-35f370a01a78

and

https://linuxize.com/post/how-to-remove-docker-images-containers-volumes-and-networks/

https://docs.docker.com/engine/reference/commandline/rmi/

https://docs.docker.com/engine/reference/commandline/image_rm/

https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes

And how do you make disk space available between the host and the
container?  At build time you can't do much because the build has to
be very clearly segragated from anything except the "build context"
(the directory from which you run "docker build").

At run time you can use the -v option, which works quite well to map
host and container paths.  But there is a wealth of other suggestions
at this stackoverflow answer:

https://stackoverflow.com/a/39382248/693429

A nice discussion of how to keep your docker images small:

https://opensource.com/article/18/7/building-container-images

Discussions of "who is logged in":

https://jtreminio.com/blog/running-docker-containers-as-current-host-user/

https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15

An article with a trick to avoid COPY and ADD, in favor of having an
ad-hoc web server.

https://medium.com/ncr-edinburgh/docker-tips-tricks-516b9ba41aa2

An article with commands, tips, tricks.  Mentions docker-compose

https://medium.com/@clasikas/docker-tips-tricks-or-just-useful-commands-6e1fd8220450