Ubuntu Manpage: /proc/sys/vm/ - virtual memory subsystem

NAME

       /proc/sys/vm/ - virtual memory subsystem

DESCRIPTION

/proc/sys/vm/
This directory contains files for memory management tuning, buffer, and cache
management.

/proc/sys/vm/admin_reserve_kbytes (since Linux 3.10)
This file defines the amount of free memory (in KiB) on the system that should be
reserved for users with the capability CAP_SYS_ADMIN.

The default value in this file is the minimum of [3% of free pages, 8MiB] expressed
as KiB. The default is intended to provide enough for the superuser to log in and
kill a process, if necessary, under the default overcommit 'guess' mode (i.e., 0 in
/proc/sys/vm/overcommit_memory).

Systems running in "overcommit never" mode (i.e., 2 in
/proc/sys/vm/overcommit_memory) should increase the value in this file to account
for the full virtual memory size of the programs used to recover (e.g., login(1)
ssh(1), and top(1)) Otherwise, the superuser may not be able to log in to recover
the system. For example, on x86-64 a suitable value is 131072 (128MiB reserved).

Changing the value in this file takes effect whenever an application requests
memory.

/proc/sys/vm/compact_memory (since Linux 2.6.35)
When 1 is written to this file, all zones are compacted such that free memory is
available in contiguous blocks where possible. The effect of this action can be
seen by examining /proc/buddyinfo.

Present only if the kernel was configured with CONFIG_COMPACTION.

/proc/sys/vm/drop_caches (since Linux 2.6.16)
Writing to this file causes the kernel to drop clean caches, dentries, and inodes
from memory, causing that memory to become free. This can be useful for memory
management testing and performing reproducible filesystem benchmarks. Because
writing to this file causes the benefits of caching to be lost, it can degrade
overall system performance.

To free pagecache, use:

echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes, use:

echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries, and inodes, use:

echo 3 > /proc/sys/vm/drop_caches

Because writing to this file is a nondestructive operation and dirty objects are
not freeable, the user should run sync(1) first.

/proc/sys/vm/sysctl_hugetlb_shm_group (since Linux 2.6.7)
This writable file contains a group ID that is allowed to allocate memory using
huge pages. If a process has a filesystem group ID or any supplementary group ID
that matches this group ID, then it can make huge-page allocations without holding
the CAP_IPC_LOCK capability; see memfd_create(2), mmap(2), and shmget(2).

/proc/sys/vm/legacy_va_layout (since Linux 2.6.9)
If nonzero, this disables the new 32-bit memory-mapping layout; the kernel will use
the legacy (2.4) layout for all processes.

/proc/sys/vm/memory_failure_early_kill (since Linux 2.6.32)
Control how to kill processes when an uncorrected memory error (typically a 2-bit
error in a memory module) that cannot be handled by the kernel is detected in the
background by hardware. In some cases (like the page still having a valid copy on
disk), the kernel will handle the failure transparently without affecting any
applications. But if there is no other up-to-date copy of the data, it will kill
processes to prevent any data corruptions from propagating.

The file has one of the following values:

1 Kill all processes that have the corrupted-and-not-reloadable page mapped as
soon as the corruption is detected. Note that this is not supported for a
few types of pages, such as kernel internally allocated data or the swap
cache, but works for the majority of user pages.

0 Unmap the corrupted page from all processes and kill a process only if it
tries to access the page.

The kill is performed using a SIGBUS signal with si_code set to BUS_MCEERR_AO.
Processes can handle this if they want to; see sigaction(2) for more details.

This feature is active only on architectures/platforms with advanced machine check
handling and depends on the hardware capabilities.

Applications can override the memory_failure_early_kill setting individually with
the prctl(2) PR_MCE_KILL operation.

Present only if the kernel was configured with CONFIG_MEMORY_FAILURE.

/proc/sys/vm/memory_failure_recovery (since Linux 2.6.32)
Enable memory failure recovery (when supported by the platform).

1 Attempt recovery.

0 Always panic on a memory failure.

Present only if the kernel was configured with CONFIG_MEMORY_FAILURE.

/proc/sys/vm/oom_dump_tasks (since Linux 2.6.25)
Enables a system-wide task dump (excluding kernel threads) to be produced when the
kernel performs an OOM-killing. The dump includes the following information for
each task (thread, process): thread ID, real user ID, thread group ID (process ID),
virtual memory size, resident set size, the CPU that the task is scheduled on,
oom_adj score (see the description of /proc/pid/oom_adj), and command name. This
is helpful to determine why the OOM-killer was invoked and to identify the rogue
task that caused it.

If this contains the value zero, this information is suppressed. On very large
systems with thousands of tasks, it may not be feasible to dump the memory state
information for each one. Such systems should not be forced to incur a performance
penalty in OOM situations when the information may not be desired.

If this is set to nonzero, this information is shown whenever the OOM-killer
actually kills a memory-hogging task.

The default value is 0.

/proc/sys/vm/oom_kill_allocating_task (since Linux 2.6.24)
This enables or disables killing the OOM-triggering task in out-of-memory
situations.

If this is set to zero, the OOM-killer will scan through the entire tasklist and
select a task based on heuristics to kill. This normally selects a rogue memory-
hogging task that frees up a large amount of memory when killed.

If this is set to nonzero, the OOM-killer simply kills the task that triggered the
out-of-memory condition. This avoids a possibly expensive tasklist scan.

If /proc/sys/vm/panic_on_oom is nonzero, it takes precedence over whatever value is
used in /proc/sys/vm/oom_kill_allocating_task.

The default value is 0.

/proc/sys/vm/overcommit_kbytes (since Linux 3.14)
This writable file provides an alternative to /proc/sys/vm/overcommit_ratio for
controlling the CommitLimit when /proc/sys/vm/overcommit_memory has the value 2.
It allows the amount of memory overcommitting to be specified as an absolute value
(in kB), rather than as a percentage, as is done with overcommit_ratio. This
allows for finer-grained control of CommitLimit on systems with extremely large
memory sizes.

Only one of overcommit_kbytes or overcommit_ratio can have an effect: if
overcommit_kbytes has a nonzero value, then it is used to calculate CommitLimit,
otherwise overcommit_ratio is used. Writing a value to either of these files
causes the value in the other file to be set to zero.

/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:

0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit

In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and the default
check is very weak, leading to the risk of getting a process "OOM-killed".

In mode 1, the kernel pretends there is always enough memory, until memory actually
runs out. One use case for this mode is scientific computing applications that
employ large sparse arrays. Before Linux 2.6.0, any nonzero value implies mode 1.

In mode 2 (available since Linux 2.6), the total virtual address space that can be
allocated (CommitLimit in /proc/meminfo) is calculated as

CommitLimit = (total_RAM - total_huge_TLB) *
overcommit_ratio / 100 + total_swap

where:

• total_RAM is the total amount of RAM on the system;

• total_huge_TLB is the amount of memory set aside for huge pages;

• overcommit_ratio is the value in /proc/sys/vm/overcommit_ratio; and

• total_swap is the amount of swap space.

For example, on a system with 16 GB of physical RAM, 16 GB of swap, no space
dedicated to huge pages, and an overcommit_ratio of 50, this formula yields a
CommitLimit of 24 GB.

Since Linux 3.14, if the value in /proc/sys/vm/overcommit_kbytes is nonzero, then
CommitLimit is instead calculated as:

CommitLimit = overcommit_kbytes + total_swap

See also the description of /proc/sys/vm/admin_reserve_kbytes and
/proc/sys/vm/user_reserve_kbytes.

/proc/sys/vm/overcommit_ratio (since Linux 2.6.0)
This writable file defines a percentage by which memory can be overcommitted. The
default value in the file is 50. See the description of
/proc/sys/vm/overcommit_memory.

/proc/sys/vm/panic_on_oom (since Linux 2.6.18)
This enables or disables a kernel panic in an out-of-memory situation.

If this file is set to the value 0, the kernel's OOM-killer will kill some rogue
process. Usually, the OOM-killer is able to kill a rogue process and the system
will survive.

If this file is set to the value 1, then the kernel normally panics when out-of-
memory happens. However, if a process limits allocations to certain nodes using
memory policies (mbind(2) MPOL_BIND) or cpusets (cpuset(7)) and those nodes reach
memory exhaustion status, one process may be killed by the OOM-killer. No panic
occurs in this case: because other nodes' memory may be free, this means the system
as a whole may not have reached an out-of-memory situation yet.

If this file is set to the value 2, the kernel always panics when an out-of-memory
condition occurs.

The default value is 0. 1 and 2 are for failover of clustering. Select either
according to your policy of failover.

/proc/sys/vm/swappiness
The value in this file controls how aggressively the kernel will swap memory pages.
Higher values increase aggressiveness, lower values decrease aggressiveness. The
default value is 60.

/proc/sys/vm/user_reserve_kbytes (since Linux 3.10)
Specifies an amount of memory (in KiB) to reserve for user processes. This is
intended to prevent a user from starting a single memory hogging process, such that
they cannot recover (kill the hog). The value in this file has an effect only when
/proc/sys/vm/overcommit_memory is set to 2 ("overcommit never" mode). In this
case, the system reserves an amount of memory that is the minimum of [3% of current
process size, user_reserve_kbytes].

The default value in this file is the minimum of [3% of free pages, 128MiB]
expressed as KiB.

If the value in this file is set to zero, then a user will be allowed to allocate
all free memory with a single process (minus the amount reserved by
/proc/sys/vm/admin_reserve_kbytes). Any subsequent attempts to execute a command
will result in "fork: Cannot allocate memory".

Changing the value in this file takes effect whenever an application requests
memory.

/proc/sys/vm/unprivileged_userfaultfd (since Linux 5.2)
This (writable) file exposes a flag that controls whether unprivileged processes
are allowed to employ userfaultfd(2). If this file has the value 1, then
unprivileged processes may use userfaultfd(2). If this file has the value 0, then
only processes that have the CAP_SYS_PTRACE capability may employ userfaultfd(2).
The default value in this file is 1.

NAME

DESCRIPTION

SEE ALSO