Ubuntu Manpage: ndctl-inject-error - inject media errors at a namespace offset

NAME

       ndctl-inject-error - inject media errors at a namespace offset

SYNOPSIS

       ndctl inject-error <namespace> [<options>]

THEORY OF OPERATION

       The capacity of an NVDIMM REGION (contiguous span of persistent memory) is accessed via
       one or more NAMESPACE devices. REGION is the Linux term for what ACPI and UEFI call a
       DIMM-interleave-set, or a system-physical-address-range that is striped (by the memory
       controller) across one or more memory modules.

       The UEFI specification defines the NVDIMM Label Protocol as the combination of label area
       access methods and a data format for provisioning one or more NAMESPACE objects from a
       REGION. Note that label support is optional and if Linux does not detect the label
       capability it will automatically instantiate a "label-less" namespace per region. Examples
       of label-less namespaces are the ones created by the kernel’s memmap=ss!nn command line
       option (see the nvdimm wiki on kernel.org), or NVDIMMs without a valid namespace index in
       their label area.

           Note
           Label-less namespaces lack many of the features of their label-rich cousins. For
           example, their size cannot be modified, or they cannot be fully destroyed (i.e. the
           space reclaimed). A destroy operation will zero any mode-specific metadata. Finally,
           for create-namespace operations on label-less namespaces, ndctl bypasses the region
           capacity availability checks, and always satisfies the request using the full region
           capacity. The only reconfiguration operation supported on a label-less namespace is
           changing its mode.

       A namespace can be provisioned to operate in one of 4 modes, fsdax, devdax, sector, and
       raw. Here are the expected usage models for these modes:

       •   fsdax: Filesystem-DAX mode is the default mode of a namespace when specifying ndctl
           create-namespace with no options. It creates a block device (/dev/pmemX[.Y]) that
           supports the DAX capabilities of Linux filesystems (xfs and ext4 to date). DAX removes
           the page cache from the I/O path and allows mmap(2) to establish direct mappings to
           persistent memory media. The DAX capability enables workloads / working-sets that
           would exceed the capacity of the page cache to scale up to the capacity of persistent
           memory. Workloads that fit in page cache or perform bulk data transfers may not see
           benefit from DAX. When in doubt, pick this mode.

       •   devdax: Device-DAX mode enables similar mmap(2) DAX mapping capabilities as
           Filesystem-DAX. However, instead of a block-device that can support a DAX-enabled
           filesystem, this mode emits a single character device file (/dev/daxX.Y). Use this
           mode to assign persistent memory to a virtual-machine, register persistent memory for
           RDMA, or when gigantic mappings are needed.

       •   sector: Use this mode to host legacy filesystems that do not checksum metadata or
           applications that are not prepared for torn sectors after a crash. Expected usage for
           this mode is for small boot volumes. This mode is compatible with other operating
           systems.

       •   raw: Raw mode is effectively just a memory disk that does not support DAX. Typically
           this indicates a namespace that was created by tooling or another operating system
           that did not know how to create a Linux fsdax or devdax mode namespace. This mode is
           compatible with other operating systems, but again, does not support DAX operation.

       ndctl-inject-error can be used to ask the platform to simulate media errors in the NVDIMM
       address space to aid debugging and development of features related to error handling.

       By default, injecting an error actually only injects an error to the first n bytes of the
       block, where n is the output of ndctl_cmd_ars_cap_get_size(). In other words, we only
       inject one ars_unit per sector. This is sufficient for Linux to mark the whole sector as
       bad, and will show up as such in the various badblocks lists in the kernel. If multiple
       blocks are being injected, only the first n bytes of each block specified will be injected
       as errors. This can be overridden by the --saturate option, which will force the entire
       block to be injected as an error.

           Warning
           These commands are DANGEROUS and can cause data loss. They are only provided for
           testing and debugging purposes.

EXAMPLES

       Inject errors in namespace0.0 at block 12 for 2 blocks (i.e. 12, 13)

       ndctl inject-error --block=12 --count=2 namespace0.0

       Check status of injected errors on namespace0.0

       ndctl inject-error --status namespace0.0

       Uninject errors at block 12 for 2 blocks on namespace0.0

       ndctl inject-error --uninject --block=12 --count=2 namespace0.0

OPTIONS

-B, --block=
Namespace block offset in 512 byte sized blocks where the error is to be injected.

NOTE: The offset is interpreted in different ways based on the "mode"
of the namespace. For "raw" mode, the offset is the base namespace
offset. For "fsdax" mode (i.e. a "pfn" namespace), the offset is
relative to the user-visible part of the namespace, and the offset
introduced by the kernel's metadata will be accounted for. For a
"sector" mode namespace (i.e. a "BTT" namespace), the offset is
relative to the base namespace, as the BTT translation details are
internal to the kernel, and can't be accounted for while injecting
errors.

-n, --count=
Number of blocks to inject as errors. This is also in terms of fixed, 512 byte blocks.

-d, --uninject
This option will ask the platform to remove any injected errors for the specified
block offset, and count.

WARNING: This will not clear the kernel's internal badblock tracking,
those can only be cleared by doing a write to the affected locations.
Hence use the --clear option only if you know exactly what you are
doing. For normal usage, injected errors should only be cleared by
doing writes. Do not expect have the original data intact after
injecting an error, and clearing it using --clear - it will be lost,
as the only "real" way to clear the error location is to write to it
or zero it (truncate/hole-punch).

-t, --status
This option will retrieve the status of injected errors. Note that this will not
retrieve all known/latent errors (i.e. non injected ones), and is NOT equivalent to
performing an Address Range Scrub.

-N, --no-notify
This option is only valid when injecting errors. By default, the error inject command
and will ask platform firmware to trigger a notification in the kernel, asking it to
update its state of known errors. With this option, the error will still be injected,
the kernel will not get a notification, and the error will appear as a latent media
error when the location is accessed. If the platform firmware does not support this
feature, this will have no effect.

-S, --saturate
This option forces error injection or un-injection to cover the entire address range
covered by the specified block(s).

-v, --verbose
Emit debug messages for the error injection process

-u, --human
Format numbers representing storage sizes, or offsets as human readable strings with
units instead of the default machine-friendly raw-integer data. Convert other numeric
fields into hexadecimal strings.

-r, --region=
A regionX device name, or a region id number. Restrict the operation to the specified
region(s). The keyword all can be specified to indicate the lack of any restriction,
however this is the same as not supplying a --region option at all.

-b, --bus=
A bus id number, or a provider string (e.g. "ACPI.NFIT"). Restrict the operation to
the specified bus(es). The keyword all can be specified to indicate the lack of any
restriction, however this is the same as not supplying a --bus option at all.

COPYRIGHT