One of the more widely used mechanism for limiting consumption of memory resources in IT environments running containers is Control Groups, otherwise known as cgroups, that can be found inside the Linux kernel. In fact, LinkedIn has been making extensive use of cgroup to build a LinkedIn Platform as a Service (LPS) environment to deploy a range of applications that are based mostly on containers.
But recently the company revealed it has started encountering some unexpected issues with relying on cgroups at scale. Instead of completely isolating resources, LinkedIn is reporting that cgroups only limits resource usage in a way that prevents applications running in memory from starving other applications trying to access a common pool of memory.
Specifically, the company is advising fellow developers to remember that memory is not reserved by cgroups as it is with traditional virtual machines. In addition, LinkedIn reports that Page cache used by applications is counted toward a cgroups memory limit, and that the underlying operating system also can steal a portion of that memory limit.
LinkedIn has yet to test the latest version of cgroups. But LinkedIn is advising other developers using the first version needs to manually isolate the amount of memory being made available to any cgroups instance to avoid this issue.
Steve Ihde, director of Engineering at LinkedIn, says the social media company is committed to sharing its experiences with other developers as it builds out LPS and other technologies it uses to run applications at web scale. In fact, he notes the reason the company decided to build LPS in the first place was to eliminate any and all repetitive motions associated with DevOps. However, because when LinkedIn started this project there was no PaaS environment optimized for the type of microservices architecture that LinkedIn wanted to construct, the company decided to build one itself from the ground up as part of Project InVersion, which the company launched back in 2011.
In addition to automating provisioning, Ihde says LPS provides LinkedIn with a better way to automate policy management across a broad range of isolated microservices.
The issues that LinkedIn is encountering using containers highlight some of the challenges IT operations may encounter running containers on bare metal servers. Despite the processing overhead generated, most IT organizations opt to deploy containers on top of virtual machines in production environments to both ensure isolation and leverage existing investments in IT management frameworks. But web-scale companies such as LinkedIn have a much more vested interest in driving down their IT infrastructure costs. More efficient usage of memory at that scale can translate into millions of dollars in savings. Because of that issue there’s a lot more interest in running containers on bare metal servers by both the providers of these applications and the cloud service providers that often host them.
A PaaS environment can provide IT organizations with more flexibility in terms of addressing the management of containers on an individual application level. At this juncture most IT organizations won’t need to build their own PaaS to accomplish that goal. But the one thing they can count on is that organizations such as LinkedIn sharing more of their experiences as early adopters of containers benefits the rest of the DevOps community.