Skip to main content

IO Cost Parameters

Overview

The iocost controller uses an IO cost model to estimate the cost of each IO, and implements work-conserving proportional control based on the estimated cost. Each IO is classified as sequential or random and given a base cost accordingly. On top of that, a size cost proportional to the length of the IO is added. While simple, this model sufficiently captures the operational characteristics of a wide variety of devices.

For more high-level explanations of IO control and the io.cost controller, see the IO Control page.

The Parameters

While the kernel comes with a few sets of default parameters, to achieve a reasonable level of control, comfigure the IO cost model in /sys/fs/cgroup/io.cost.model according to the specific device, with the following parameters:

  • rbps - Maximum sequential read BPS
  • rseqiops - Maximum 4k sequential read IOPS
  • rrandiops - Maximum 4k random read IOPS
  • wbps - Maximum sequential write BPS
  • wseqiops - Maximum 4k sequential write IOPS
  • wrandiops - Maximum 4k random write IOPS

The cost model is of course an approximation of reality. It can't exactly predict how the hardware will behave, especially as the devices themselves show dynamic performance deviations over time. The controller adapts to the situation by scaling the total command issue rate according to the Quality-of-Service (QoS) parameters: The latency targets and vrate bounds. The following parameters are configured in /sys/fs/cgroup/io.cost.qos.

  • rpct - Read latency percentile to use
  • rlat - Read target latency
  • wpct - Write latency percentile to use
  • wlat - Write target latency
  • min - vrate bound minimum
  • max - vrate bound maximum

The latency targets determine when the controller considers the device fully saturated. For example, rpct=95 and rlat=5000 means that if the 95th percentile of read completion latency is above 5ms, the device is at capacity and command issuing should be throttled.

The Quality-of-Service(QoS) parameter, vrate bounds express the percentage range of how much the device may be throttled up and down to meet the latency targets. For example, a range of 50% - 125% tells the controller to adjust maximum command issue rate between half and 1.25x of what would add up to 100% according to the cost model parameters. If the rbps is 400MBps and the workload is only doing sequential read, depending on the completion latency, the iocost controller will allow issuing between 200MBps and 500MBps.

The QoS parameters are affected by both the device itself and, to a lesser extent, the requirements of the workloads. In most cases, a device's latency response graph has a point where latency takes off. The device is already saturated and adding more concurrent commands increases the latency. Setting target latencies around that point is one way to configure the QoS parameters.

Another interesting aspect is that the vrate range can guide the underlying device. For example, some SSDs can complete a lot of writes at a very high speed for a short time and then go into a semi-comatose state, failing to complete other commands for hundreds or even thousands of milliseconds. While such bursts might look good on simple short benchmarks, they don't bring a lot of practical benefits and are detrimental to any latency-sensitive workloads, which may end up getting hit by the following stalls. iocost can avoid such irregularities by limiting vrate max close to 100% so that no matter how quickly the device signals write completion, the system never issues more than it can sustain. There are also SSDs that show significantly raised latencies for a while no matter how few IOs are thrown at it, likely during a certain phase of garbage collection. In such cases, scaling down the command issue rate further doesn't gain anything while losing the total amount of work. The vrate min bound can protect against such temporary extreme cases.

These are a lot of numbers to configure, but they're, for the most part, device model specific. In the future, we're hoping to build a database with known devices and their parameters so that they get configured automatically.

The Benchmark

/var/lib/resctl-demo/misc-bin/iocost_coef_gen.py runs as rd-bench-iocost.service and determines both the cost model and QoS parameters.

The QoS parameters are calculated as 4 times the random IO completion latency at 90% load,, and the vrate range is between 25% and 90%. The formulas are derived empirically to achieve reliable demo behavior across various devices and may not be optimal for other use cases.

Once the benchmarks are complete, the demo recors the results in /var/lib/resctl-demo/bench.json and propagates to /sys/fs/cgroup/io.cost.model and /sys/fs/cgroup/io.cost.qos. If you edit the file, demo updates the kernel configurations accordingly.

You can re-run and cancel hashd benchmark by clicking the Toggle iocost benchmark button.