irqbalance/irqbalance.1
Neil Horman 6e6ac7bc65 policyscript: Add ability to specify a directory for multiple scripts
The policyscript directive allows for the specifcaion of a single script
to define policy for all hardware on a system, which is good as a
site/host specific utility, but it makes for difficult work in the event
that vendors wish to provide guidance for only their own hardware (i.e.
if vendor A wants certain hardware to follow affinity_hinting without
affecting other hardware).  To manage this, lets enhance policyscript to
allow the specification of an entire directory, to which multiple
scripts can be added.  Semantics for this new directory feature are the
same as for the single script case, except that the script exit codes
have additional meaning:

exit code 0 - the script indicates that the referenced irq relates to a
device that this script recognizes and further script processing should
stop

exit code 1 - the script indicates that the referenced irq does not
relate to a device the script recognizes, and script processing should
continue

exit code >2 - the script indicates an error has occured, and any output
from it should be ignored, script processing should continue

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
2018-07-09 12:54:27 -04:00

185 lines
6.4 KiB
Groff

.de Sh \" Subsection
.br
.if t .Sp
.ne 5
.PP
\fB\\$1\fR
.PP
..
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Ip \" List item
.br
.ie \\n(.$>=3 .ne \\$3
.el .ne 3
.IP "\\$1" \\$2
..
.TH "IRQBALANCE" 1 "Dec 2006" "Linux" "irqbalance"
.SH NAME
irqbalance \- distribute hardware interrupts across processors on a multiprocessor system
.SH "SYNOPSIS"
.nf
\fBirqbalance\fR
.fi
.SH "DESCRIPTION"
.PP
The purpose of \fBirqbalance\fR is to distribute hardware interrupts across
processors on a multiprocessor system in order to increase performance\&.
.SH "OPTIONS"
.TP
.B -o, --oneshot
Causes irqbalance to be run once, after which the daemon exits.
.TP
.B -d, --debug
Causes irqbalance to print extra debug information. Implies --foreground.
.TP
.B -f, --foreground
Causes irqbalance to run in the foreground (without --debug).
.TP
.B -j, --journal
Enables log output optimized for systemd-journal.
.TP
.B -p, --powerthresh=<threshold>
Set the threshold at which we attempt to move a CPU into powersave mode
If more than <threshold> CPUs are more than 1 standard deviation below the
average CPU softirq workload, and no CPUs are more than 1 standard deviation
above (and have more than 1 IRQ assigned to them), attempt to place 1 CPU in
powersave mode. In powersave mode, a CPU will not have any IRQs balanced to it,
in an effort to prevent that CPU from waking up without need.
.TP
.B -i, --banirq=<irqnum>
Add the specified IRQ to the set of banned IRQs. irqbalance will not affect
the affinity of any IRQs on the banned list, allowing them to be specified
manually. This option is addative and can be specified multiple times. For
example to ban IRQs 43 and 44 from balancing, use the following command line:
.B irqbalance --banirq=43 --banirq=44
.TP
.B --deepestcache=<integer>
This allows a user to specify the cache level at which irqbalance partitions
cache domains. Specifying a deeper cache may allow a greater degree of
flexibility for irqbalance to assign IRQ affinity to achieve greater performance
increases, but setting a cache depth too large on some systems (specifically
where all CPUs on a system share the deepest cache level), will cause irqbalance
to see balancing as unnecessary.
.B irqbalance --deepestcache=2
.P
The default value for deepestcache is 2.
.TP
.B -l, --policyscript=<script>
When specified, the referenced script or directory will execute once for each discovered IRQ,
with the sysfs device path and IRQ number passed as arguments. Note that the
device path argument will point to the parent directory from which the IRQ
attributes directory may be directly opened.
The script may specify zero or more key=value pairs that will guide irqbalance in
the management of that IRQ. Key=value pairs are printed by the script on stdout
and will be captured and interpreted by irqbalance. Irqbalance expects a zero
exit code from the provided utility. Recognized key=value pairs are:
.TP
.I ban=[true | false]
Directs irqbalance to exclude the passed in IRQ from balancing.
.TP
.I balance_level=[none | package | cache | core]
This allows a user to override the balance level of a given IRQ. By default the
balance level is determined automatically based on the pci device class of the
device that owns the IRQ.
.TP
.I numa_node=<integer>
This allows a user to override the NUMA node that sysfs indicates a given device
IRQ is local to. Often, systems will not specify this information in ACPI, and as a
result devices are considered equidistant from all NUMA nodes in a system.
This option allows for that hardware provided information to be overridden, so
that irqbalance can bias IRQ affinity for these devices toward its most local
node. Note that specifying a -1 here forces irqbalance to consider an interrupt
from a device to be equidistant from all nodes.
.TP
Note that, if a directory is specified rather than a regular file, all files in
the directory will be considered policy scripts, and executed on adding of an
irq to a database. If such a directory is specified, scripts in the directory
must additionally exit with one of the following exit codes:
.TP
.I 0
This indicates the script has a policy for the referenced irq, and that further
script processing should stop
.TP
.I 1
This indicates that the script has no policy for the referenced irq, and that
script processing should continue
.TP
.I 2
This indicates that an error has occured in the script, and it should be skipped
(further processing to continue)
.TP
.B -s, --pid=<file>
Have irqbalance write its process id to the specified file. By default no
pidfile is written. The written pidfile is automatically unlinked when
irqbalance exits. It is ignored when used with --debug or --foreground.
.TP
.B -t, --interval=<time>
Set the measurement time for irqbalance. irqbalance will sleep for <time>
seconds between samples of the irq load on the system cpus. Defaults to 10.
.SH "ENVIRONMENT VARIABLES"
.TP
.B IRQBALANCE_ONESHOT
Same as --oneshot.
.TP
.B IRQBALANCE_DEBUG
Same as --debug.
.TP
.B IRQBALANCE_BANNED_CPUS
Provides a mask of CPUs which irqbalance should ignore and never assign interrupts to.
.SH "SIGNALS"
.TP
.B SIGHUP
Forces a rescan of the available IRQs and system topology.
.SH "API"
irqbalance is able to communicate via socket and return it's current assignment
tree and setup, as well as set new settings based on sent values. Socket is abstract,
with a name in form of
.B irqbalance<PID>.sock
, where <PID> is the process ID of irqbalance instance to communicate with.
Possible values to send:
.TP
.B stats
Retrieve assignment tree of IRQs to CPUs, in recursive manner. For each CPU node
in tree, it's type, number, load and whether the save mode is active are sent. For
each assigned IRQ type, it's number, load, number of IRQs since last rebalancing
and it's class are sent. Refer to types.h file for explanation of defines.
.TP
.B setup
Get the current value of sleep interval, mask of banned CPUs and and list of banned IRQs.
.TP
.B settings sleep <s>
Set new value of sleep interval, <s> >= 1.
.TP
.B settings cpus <cpu_number1> <cpu_number2> ...
Ban listed CPUs from IRQ handling, all old values of banned CPUs are forgotten.
.TP
.B settings ban irqs <irq1> <irq2> ...
Ban listed IRQs from being balanced, all old values of banned IRQs are forgotten.
.PP
irqbalance checks SCM_CREDENTIALS of sender (only root user is allowed to interact).
Based on chosen tools, ancillary message with credentials needs to be sent with request.
.SH "Homepage"
https://github.com/Irqbalance/irqbalance