Pods memory allocation failures

Description

Containers fail to start showing the following Docker errors when describing the pod:

Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown

The error may indicate problems with memory allocation on the host even if the system may appear to have enough resources looking at the memory usage with standard Linux top and free tools.

Such failures are also often accompanied by page allocation errors that can be seen in the kernel log (dmesg):

[Tue Mar 31 20:08:11 2020] runc:[1:CHILD]: page allocation failure: order:8, mode:0xc0d0

In addition, another symptom of the issue may be high kubelet memory usage:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2496 root      20   0 13.454g 9.362g  36472 S  10.9  9.9 949:57.99 kubelet

Scope

The issue was observed on RHEL-based systems running older Linux kernels, such as 3.10.0-1062.el7.

Resolution

There are a couple of possible workarounds for the issue.

Upgrading the kernel to a newer version is one of them. In the example mentioned above, after upgrading the kernel from 3.10.0-1062.el7 to 3.10.0-1127.8.2.el7 a user stopped seeing memory issues.

Another possible workaround is to set vm.zone_reclaim_mode kernel parameter to 1 if it’s set to 0 in order to enable more aggressive memory reclaim mode so the system can reclaim back from cached memory:

$ sysctl -w vm.zone_reclaim_mode=1

Note: To make the kernel parameter persist across node restarts, set it via a conf file in /etc/sysctl.d/ directory.

P.S.

The following RedHat KB article provides more information about the issue and possible workarounds mentioned above.