Containers: A "quick" tour
March 25, 2018

Lately I moved into elementary OS Juno, besides the usual bugs in an unstable release I was expecting to be able to do most of what I was doing before. I recently published my first application in elementary’s AppCenter, so the first thing I wanted to do in my clean system was to install it and check everything was working fine. But to my surprise, there were no elementary apps in App center!. What I hadn’t realized was that Houston (AppCenter’s backend) also needs some preparation for Juno, which means rebuilding every application for the new environment based on Ubuntu Bionic. This new repository is not yet ready, as it is to be expected in an unstable release. As I was thinking how to test my application, I saw Spotify running inside a Docker container. This got me thinking I should learn about containers and this seemed like a good opportunity. I wanted to see if I could use them as a test environment for older Linux distributions, specially elementary OS Loki.

This blog post was originally going to be a Github repository with the scrips necessary to create a system image and install it. Sadly, the more I learned about containers, the more I realized it’s not really about the commands you use, but the capabilities of each type of container and its relationship between the guest system, the host system and the application we try to run. In the end, I decided the actual scripts were not that interesting. Instead, the overview of the kinds of containers I tried seemed more useful. For me in the future, and maybe for other people trying to understand how containers work.

This means I won’t focus on a short description of the commands required to have a container up and running (there are already several short tutorials for this), instead I will try to explain my journey through different kinds of containers and how they relate to each other. That being said, if you want to try this out, you need the image of a system. This means having a folder with all files from a typical installation except special ones (more on these later). Before starting I should warn you that tinkering with these things may confuse your operating system in unexpected ways, so I don’t recommend running this in your main Linux installation. I tested with two installations, one of elementary OS Loki and another one of elementary OS Juno. The way I created the images was by creating a copy of the host, by running the following in a clean Linux installation:

$ sudo rsync -avP --exclude=/proc/* \
 --exclude=/sys/* \
 --exclude=/dev/* \
 --exclude=/run/* \
 --exclude=/tmp/* \
 --exclude=/media/* \
 --exclude=/home/*/* \
 / ~/my_container

$ cp -r /etc/skel/. ~/my_container/home/$(whoami)/

This command copies almost everything from the host, including network configuration, users, groups, passwords, installed packages and creates a default home directory. Other ways of getting this image include using debootstrap, but then you will get a minimal system that has to be configured. The main topic here are containers, so I instead chose to create an image that should cause the least amount of problems. I know it sounds pointless to use host system as the guest system, but believe me, we will have enough fun as it is.

My objective was to create a container from which I could run a graphical application, with internet connectivity and sound: firefox. This would show I can use most of what a desktop application uses from it’s operating system, but inside a container.

Now, let’s talk about containers. My initial understanding of containers was “something like a virtual machine but faster, smaller and easier to create”. From my experience with virtual machines, communication with hardware from the guest system has always been a pain to set up. But desktop applications often try to communicate with the hardware, and my first impression of systems like Docker was that they try to favor security by creating overly isolated containers. When reading about it, chroot seemed to be a simpler alternative, closer to what I was looking for.

Chroot

While looking for simpler, less isolated containers, I remembered the installation procedure for Arch Linux. After booting a minimal system in a USB, one had to create another minimal root system where the final installation would be, then use chroot change the current root, to be this one we just created. Next, the rest of the packages of the desired system were installed using pacman, Arch’s package manager. Because we had changed the root, these packaged would get installed in our minimal system. Finally, the installation configured the bootloader to point to this new root and after a reboot this new system would be used. The cool thing to note about chroot is the simplicity of what it does: change the root directory, nothing more.

The environment we change into is called a chroot jail (ever heard of jailbreaking?, guess what kind of jail you are breaking out of). Technically, chroot is system call and has been part of Unix kernels pretty much since the beginning. Because in Unix everything is a file, we can have an idea of how the container will behave. Everything being a file means that the communication between the kernel and user space happens through the file system, using special files that represent hardware devices or kernel interfaces. By moving these files into the new root, applications will be able to communicate with the kernel even after changing the root directory.

On a simple system we want to keep /proc, /sys and /dev. In broader terms these directories contain respectively: the running processes, interfaces into the kernel, and files representing hardware. The content of these directories is created by the kernel, and can’t be copied as a normal file, instead what we have to do is mount them as a bind inside our new root filesystem. Doing this makes directories in our new root point to the original files in the real root.

$ sudo mount --bind /proc ~/my_container/proc
$ sudo mount --bind /sys ~/my_container/sys
$ sudo mount --rbind /dev ~/my_container/dev

You can check everything went well by running mount | grep my_container. You will notice we actually mounted more than 3 filesystems. This happens because the option --rbind creates a recursive bind, binding not only /dev but also any other mount that was inside /dev. To unmount them after we are done the simplest thing is to reboot. You can try using sudo umount and the option -R for recursive umounts, but this usually fails for me. After these systems are mounted, we can change into our new root by using:

$ sudo chroot ~/my_container

Inside the chroot jail we can now run commands to see how the environment was set up. For example calling id shows we are logged in as root, and checking env shows we have a different environment from the one we had before. These are symptoms of the first drawback of using chroot: it keeps the user and environment that were active when it was called. Which sounds like what we want, until we realize that our user is not the one calling chroot. We are using sudo to call it as root, and sudo creates a special safe environment. To log in as another user we can use chroot’s --user option, and to keep our environment we can use sudo’s -E but have to explicitly avoid changing the $PATH variable. In the end we have to get out of the session using exit and then get back in with:

$ sudo -E "PATH=$PATH" chroot --user=$(whoami) ~/my_container

We can run things inside our container, for example we can install packages as we normally do in a terminal.

$ sudo apt-get install cowsay

Why do we care so much about having the same environment as the one in the host system?, isn’t the point of creating a test environment for desktop applications to be able to have a different system?. The problem is the host operating system is not just a set of files, but also the state of each process that runs in the background as a daemon, like the init system, the dbus server, the X11 server, the display manager and so on. Right now we are carrying over all these processes from the host system into the guest system by binding /proc. The problem with these daemons is that there can only be one instance of them in a computer, they can have multiple clients, but we can’t have multiple servers.

On top of that, daemons also talk to the kernel and to other processes via sockets, or some other IPC mechanism. This means the three directories we mounted are not enough to provide all services a process inside chroot may need, for instance a more complex graphical application like AppCenter (which by the way has it’s own daemon) will most likely fail to run.

I will go into more detail about specific services that are relevant for desktop applications and how to make them available inside the container. But before that, let’s introduce schroot to ease the process of setting up the container.

Schroot

Schroot is a command that lets us modify the container quickly, so we can experiment with different setups and not have to mount and unmount everything every time.

First install schroot by using sudo apt-get install schroot. Next we need to tell schroot which filesystems to mount, for this it uses the same syntax as /etc/fstab. To set up the same container as before, create /etc/schroot/my_container/fstab as root, with the following content:

# file system   mount point    type     options        dump    pass
/proc           /proc          none     rw,bind        0       0
/sys            /sys           none     rw,bind        0       0
/dev            /dev           none     rw,rbind       0       0

For the rest of the container’s configuration schroot will read the contents of /etc/schroot/chroot.d/my_container.conf. It’s content is shown below, but you need to change every instance of {username} with your username in the host machine.

[my_container]
description=simple container
type=directory
directory=/home/{username}/my_container
profile=desktop
groups={username}
root-users={username}
setup.fstab=my_container/fstab
preserve-environment=true

Even though we will be able to iterate more quickly over our container’s configuration we have to be aware that we are trading off some of the simplicity of chroot. Schroot will do more stuff, and we need to aware of what this is. For instance, we don’t need to be root. Editing my_container.conf as root grants permission to launch the container to users in the root-users property. Other things schroot will do and we have to take into account are:

The default copyfiles and nssdatabases files are located in /etc/schroot/default. In there, you can also find a default fstab that will not be used because we configured to use our version instead. This was done using the property setup.fstab in the configuration file. If you wan to change the behavior of the other files you can use setup.nssdatabases and setup.copyfile.

Actually on systemd, this my_container.conf may fail because the file copied by the default copyfiles is a symlink, so go ahead and create a new empty copyfiles inside /etc/schroot/mycontainer and modify my_container.conf so it does not copy anything. All that being said, we are now ready to launch our container with:

$ schroot -c my_container

Back to our discussion about specific services provided by the host. We can now discuss what else is needed to get a graphical application running inside of our container. Because I’m using elemenary OS I will test first with a simple GUI application, the calculator. When I try to run io.elementary.calculator in the container, even though the application runs fine, I get a bunch of errors:

[dconf] unable to create directory '/run/user/1000/dconf': Permission denied. dconf will not work properly.

This error is related to dconf the runtime configuration daemon used by GNOME. It stores settings like themes, fonts or the state of your app when it was closed, and tells interested apps when something changed. It seems the application is trying to communicate through the /run filesystem, and we didn’t mount that. To fix this let’s add the following line to our fstab file:

...
/run            /run            none    rw,rbind         0       0

Now calling io.elementary.calculator woks without throwing weird errors!. The problem with doing this is that a lot of daemons also communicate through this filesystem. Specially the directory /run/user/1000 that is owned by your user and can be written by any application. Mounting /run helps couple some daemons we want, like dconf, but also couples some we don’t want. To try this out run poweroff inside the container and watch how your host machine gets shut down (you don’t even need to be root for this!). After you reboot if you call mount in the host, you will see that schroot keeps everything mounted. It’s better to end that session that we didn’t close. Use --list to find the name of the session, and then close it.

$ schroot --list
  > chroot:my_container     session:{session_name}
$ schroot -c session:{session_name} --end-session

The troubling part is that mounting /run also couples other daemons that use dbus, which include notifications, Gala and several others. It’s now easy to see that testing applications gets problematic, because we will mixing daemons from the host system with clients from the guest system. But having the correct version of daemons and clients is not enough, things are also dependent on what we did before, to get a very similar session than the one in the host, by copying usernames, network configuration and environment variables. But hey! firefox runs inside this container. Still, something like AppCenter may cause problems and reset Wingpanel in your host machine.

This shows that the lack of isolation is not that good to test desktop applications. What we would really like is to launch most of the things we can inside the container and make some daemons from the host (also called services) available inside the container. The init system, (systemd in the case of elementary OS and almost every modern Linux distribution) is the one in charge of running all these services. After looking for ways to ask systemd to create a similar session inside our container I found systemd-nspawn. This container system has been specially developed to make test environments, being closer to chroot than other container solutions, with less configuration options but still powerful enough to really isolate the guest system safely.

Nspawn

Nspawn is really simple to use, but keeps increasing the complexity of whet it does, in comparison to chroot or schroot. It will create a clean chroot environment and will call an init system with a real PID 1, using Linux namespaces. This process in turn will try to run all services configured using systemd in the guest system.

The main difference is that we will try not to explicitly bind any filesystems from the host into the guest an instead let nspawn do it’s magic: we will launch our container and cross our fingers hoping that nspawn will mount what we need, and then let the init scripts from the guest system do what we want and avoid doing what we don’t want. If you have ever tried to debug the startup procedure of your machine you will know that the number of things that can fail is rather depressing. But let’s try it out, to run nspawn with:

$ sudo systemd-nspawn -bD ~/my_container

As you can see this command has to be run as root, just like chroot. The -b option tells it to launch the init system instead of just calling a shell, the -D option tells it where our image is. To get out of the container, use poweroff, this time it will not shut down your host but instead execute the shut sown sequence of processes started by systemd. The option --bind can be used to manually tell nspawn to bind something inside the container.

As I said in the beginning one of the key objectives was to correctly configure the network inside the container, this means having: internet access and DNS resolution. These can be checked using ping 8.8.8.8 and nslookup www.google.com. When testing these, I had different problems depending on the system I was trying things on. I will explain what I did, but I you may get different results if you are running different systems. The main thing you should know is that in Linux the file /etc/resolv.conf is supposed to contain a list of DNS servers, no servers, menas no DNS resolution.

When I tried a Loki system inside a Loki system DNS resolution was not working. For this, following Arch’s guide on setting up systemd-resolved inside the container was enough:

$ sudo systemctl enable --now systemd-networkd systemd-resolved
$ sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

This works because Loki uses resolvconf to generate the file /etc/resolv.conf and when it’s inside the container it does not find any server and generates an empty file. Instead, systemd-resolved generates a fallback file that includes google’s DNS servers 8.8.8.8 and 8.8.4.4.

When I tried a Juno system inside Juno bad things happened: running the container made DNS resolution not work in the host system, let alone in the guest system. After hours messing with this, I found that the correct solution was to (ironically) disable systemd-resolved inside the container.

$ sudo systemctl disable systemd-resolved

This works because Juno uses systemd-resolved for DNS resolution which in turn provides a stub DNS server in 127.0.0.53, which gets set as the only DNS server in /etc/resolv.conffor the guest system. This server gets set also in the guest system because nspawn mounts /etc/resolv.conf whenever it detects systemd-resolved to be running in the host. What I think happens then is that two servers are launched to serve the stub in 127.0.0.53, one in the host and another one in the guest, which ends up breaking DNS resolution everywhere. Disabling systemd-resolved inside the container leaves only one server and the mounted /etc/resolv.conf with 127.0.0.53 as the only DNS server works correctly. So, are we done now? let’s try running firefox as before:

$ firefox
 > Error: no DISPLAY environment variable specified

Well, things are never as easy as one expects. This happens because the environment inside the container comes as a result of all the scripts that were run by systemd. It so happens that the very important DISPLAY variable, used by all graphical applications that use X11 was not set. In order to be able to establish a connection with X11 we must tell the display number as follows:

$ DISPLAY=:0 firefox

Success! firefox now runs inside the container (at least in mine). If this does not work for you there may be several reasons. Some systems don’t use :0 as their display, in that case you should check in the host system what is the output of calling echo $DISPLAY. If that still does not work I’ve seen people suggest binding the file ~/.Xauthority as not doing this caused a No protocol specified error. Other people recommend using --bind-ro=/tmp/.X11-unix so that nspawn mounts X11’s socket, which should be read only or otherwise the systemd in the guest will delete it in the host system. I didn’t need any of these, but I’ve read it may depend on which display manager is used by the host. Maybe some of this stuff got moved into dbus? I don’t know.

Sadly the success will not last too long, as you may notice that firefox inside the container does not have sound. This happens because there is no audio server (Pulse Audio in elementary OS) inside the container. Which is systemd working for us so we don’t get multiple servers and cause chaos as the multiple DNS servers from before. So, how do we communicate to this server from inside the container?. The socket used to communicate to it is located in /run/user/1000/pulse, yes, that dreadful filesystem we didn’t want to mount, and the whole reason we started using nspawn. If you run mount inside the container you will notice there is already a /run/user/1000 filesystem being created, but it does not contain the pulse directory. Because we don’t want to mess with nspawn’s mount, we better mount the pulse directory somewhere else and then tell firefox where to find it.

$ sudo systemd-nspawn -bD ~/my_container \
       --bind=/run/user/1000/pulse:/run/user/pulse
... [boot sequence] ...

(my_container)$ DISPLAY=:0 PULSE_SERVER=/run/user/pulse/native firefox

Final remarks

At last we got firefox running inside the container! (at least I did, did you?). Maybe all the problems I had, can give you some insight into the complexity that goes behind containers. The main problem is the coupling between the host and the guest systems. In some cases we want them to interact, as for networking, audio and video, but in other cases we don’t want them to, as is the case with daemons like the display manager, the init system, the DNS stub server or a server storing global configurations. Because modern operating systems are composed of so many different parts, and services can interact many different ways, running something inside a container will always depend on the application we are trying to run, the guest system and the host system. Regarding the application, we have to know which services it needs and which services it can modify. For the host and the guest systems we should know which services they provide and how they interact with each other. This becomes very difficult for full systems that run hundreds of services.

There is really no silver bullet that will allow any application to run flawlessly in any guest system inside any host. This yields an unreasonable number of variables that must be taken into account. From what I can see, popular containers try to limit these variables to some extent. Some focus on a certain kind of applications like pbuilder (build systems) or Docker (web services) and some focus on a specific kind of guest system, like nspawn that is specific to Linux. In the end, we will never get rid of having a mix of daemons from the host system, with other daemons or clients in the (probably different) guest system. Any ABI breakage happening in any of these daemons will cause problems when running things inside the container.

There are still a lot of container types I haven’t talked about. Even though I have tried Docker I think it will be very similar to nspawn with respect to my use case. The places Docker shines are in deploying and releasing services to run inside servers. This is something that I don’t need in my case. Nevertheless I will try to get an image of Loki running inside Juno using Docker and nspawn, results will come later in another blog. Aside from these there are still other options like Snaps or LXC, these I have yet to try out.

For the time being I will keep testing in my spare computer. I haven’t found a reliable container solution that will guarantee I won’t spend a lot of time debugging the container itself. Using a spare computer gives me confidence that all daemons and clients being tested are the correct version, there is nothing weird happening to any service during the startup procedure and I can even debug the daemons themselves (which has been necessary several times). Containers are a very good option as long as you require few services from the host, so far I have yet to find something that works as good as I want for testing desktop applications.