Dual-Purpose File System
LXCFS provides the fundamental userspace /proc file system in the LXC container; this makes the container look like a system so that LXD can provide that type of system-level container, explains Dustin Kirkland, Strategist, Container Offerings, Canonical. User processes, applications, and certain drivers execute while occupying the userspace, a subset of system memory. Canonical provides the CIS, the dev, and the proc in order that LXD can work its system-level container mojo.
Canonical limits the container to a subset of /proc. “If you have 64 CPUs on the host, but you only want to give this container access to four of those CPUs, then you need to make sure that that container can only see four of those CPUs. That’s what LXCFS provides. It’s a virtual file system inside of the container that provides the necessary portion of /proc,” explains Kirkland.
Next and related to ensuring that Canonical provides LXD with something that looks like a machine inside of the container, LXCFS also provides the cgroup fs-compatible file system for CGmanager. “We need that and systemd—the current generation init system—depends on having one of these cgroup file systems to manipulate, to do the things that are necessary to boot,” says Kirkland.
For decades, UNIX applied the idea of an init (initialization) system to bring up processes on boot up / system start. The industry used what was known as sysvinit, or system five init until about the last decade when systemd came into being and then began to take hold. “Sysvinit was the classic set of scripts that many UNIX admins will find familiar,” says Kirkland. Those scripts were many.
Any UNIX or Linux system will present the administrator with a laundry list of services that require initialization. “In the sysvinit world, those were enumerated and serialized, and so, you’d have, you know, s-1, and s-2, and s-55, and s-99, and all of those services would be started in a series, in serial order, and it was a very deterministic boot,” says Kirkland.
Such early init systems disallowed forthcoming initialization steps until the current step had fully executed. Still, sysvinit was a useful boot configuration tool thanks in part to runlevel configurations and editable numerical values.
“Then about 2004-2005 at the inception of Ubuntu, we started working on a parallelized init system called upstart,” says Kirkland. The logic was that with multiple CPUs, multicore processors, multithreading, and multiple concurrent process execution, the init process as it was was a boot up bottleneck at best with its unnecessarily serialized approach.
“So, we created upstart, which would basically boot every service as soon as its constraints were met, and it was truly a multi-threaded, multi-symmetric boot system,” says Kirkland. Upstart also managed services throughout the time that the system was live, between boot up and shutdown procedures. Canonical ensured broad acceptance of upstart by making it backward compatible with sysvinit.
Then Canonical began the systemd project to upgrade upstart. That was approximately two years ago. “Now all of the Linux distributions are agreeing to get behind a single init system—systemd—and avoid that sort of UNIX fragmentation,” says Kirkland. Like upstart, systemd quickly starts processes in parallel.
Now, all of this is critical information building to the application of systemd with machine containers. Machine containers are containers that boot, look, and respond like a VM or a physical machine would do it. “I can’t emphasize strongly enough that application containers do not have that. That’s not part of the application container design. And that’s fine, that’s just the way things work in application containers. There is no init, there is no booting of the system, whereas, inside of machine containers, there is,” says Kirkland.
Coming Full Circle
LXCFS comprises / supplies the virtual file system that systemd needs in order to run and boot up inside of that system container. “Really, it’s all about fooling systemd into thinking that it’s booting a real system, when in fact, it’s just booting a series of processes running inside of a container,” says Kirkland.