Out of memory (OOM) killing has historically happened inside kernel space—if a system runs out of physical memory, the Linux kernel is forced to OOM-kill one or more processes. This is typically slow and painful because the kernel spends an unbounded amount of time swapping pages in and out and evicting the page cache. Furthermore, configuring policy is not very flexible while being somewhat complicated.
oomd solves these problems in userspace. oomd takes corrective action in userspace before an OOM occurs in kernel space. Corrective action is configured via a flexible plugin system, in which custom code can be written. By default, this involves killing offending processes. oomd enables an unparalleled level of flexibility where each workload can have custom protection rules. Furthermore, time spent churning pages in kernel space is minimized.
See the following sections in the oomd README for instructions on getting started with and configuring oomd:
See the Facebook case study for information on how oomd is creating significant memory utilization gains in production in Facebook's data centers.
oomd works with several other Linux tools to provide advanced OOM-killing capabilities. Be sure to check out the following components for deeper insight and more info on how to get the most benefits from oomd:
- PSI: oomd uses
memory.pressuremetrics that are part of the Pressure Stall Information (PSI) Linux kernel module to trigger specified actions.
- cgroup2: oomd brings maximum memory utilization benefits to large data centers when used in conjunction with cgroup2, a kernel mechanism to group processes and allocate specified amounts of resources to each group. oomd can be configured to kill entire cgroups, rather than discrete processes.