Go to file
Krister Johansen 7bc1244fbf Teach irqbalance about Intel CoD.
This originally surfaced as a bug in placing network interrupts.  In
the case that this submitter observed, the NIC card was in NUMA domain
0, but each RSS interrupt was getting an affinity list for all CPUs in
the domain.  The expected behavior is for a single cpu to be chosen when
attempting to fan out NIC interrupts.  Due to other implementation
details of interrupt placement, this effectively caused all interrupt
mappings for this NIC to end up on CPU 0.

The bug turns out ot have been caused by Intel Cluster on Die breaking
an assumption in irqbalance about the design of the component hierarchy.
The CoD topology allows a CPU package to belong to more than one NUMA
node, which is not expected.

The RCA was that when the second NUMA node was wired up to the existing
physical package, it overwrote the mappings that were placed there by
the first.

This patch attempts to solve that problem by permitting a package to
have multiple NUMA nodes.  The CPU component hierarchy is preserved, in
case other parts of the code depend upon walking it.  When a CoD
topology is detected, the NUMA node -> CPU component mapping is moved
down a level, so that the nodes point to the first level where the
affinity becomes distinct.  In practice, this has been observed to be
the LLC.

A quick illustration (now, with COD, it looks like this):

                 +-----------+
                 | NUMA Node |
                 |     0     |
                 +-----------+
                       |
                       |        +-------+
                      \|/     / | CPU 0 |
                   +---------+  +-------+
                   | Cache 0 |
                   +---------+  +-------+
                   /          \ | CPU 1 |
      +-----------+             +-------+
      | Package 0 |
      +-----------+             +-------+
                  \           / | CPU 2 |
                   +---------+  +-------+
                   | Cache 1 |
                   +---------+
                       ^      \ +-------+
                       |        | CPU 3 |
                       |        +-------+
                 +-----------+
                 | NUMA Node |
                 |     1     |
                 +-----------+

Whereas, previously only NUMA Node 1 would end up pointing to package 0.
The topology should not be different on platforms that do not enable
CoD.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2017-07-11 09:21:04 -07:00
glib-local glib-local: ad call for g_list_remove 2012-08-31 13:12:35 -04:00
misc Revert "service: Block irqbalance from running in virt environments" 2013-06-28 09:46:44 -04:00
ui Add missing #include <string.h> in user interface files 2017-01-15 10:10:38 +02:00
.gitignore irqbalance: Misc build enhancements 2012-03-26 10:19:00 -04:00
activate.c remove affinity_hint infrastructure 2016-04-26 14:55:55 -04:00
AUTHORS Add autotools scripts to irqbalance 2009-09-01 15:50:39 +00:00
autogen.sh Make sure the m4 directory is created in autogen.sh 2012-06-20 11:45:37 -04:00
bitmap.c import __bitmap_parselist from Linux kernel 2015-03-12 17:47:00 -04:00
bitmap.h import __bitmap_parselist from Linux kernel 2015-03-12 17:47:00 -04:00
classify.c x86/x64: Exclude devices with legacy IRQ 255 2017-06-02 09:46:54 -05:00
configure.ac configure.ac: Update release version 2017-01-09 08:41:02 -05:00
constants.h Compute load in nanoseconds 2013-02-18 14:08:57 -05:00
COPYING Adding missing configure files 2009-10-02 18:22:21 +00:00
cpumask.h fix cpulist_parse definition to match bitmap_parselist and kernel 2015-03-12 17:47:00 -04:00
cputree.c Teach irqbalance about Intel CoD. 2017-07-11 09:21:04 -07:00
irqbalance.1 Add user interface to configuration and build, document socket API in man page 2017-01-03 08:48:42 -05:00
irqbalance.c Teach irqbalance about Intel CoD. 2017-07-11 09:21:04 -07:00
irqbalance.h Teach irqbalance about Intel CoD. 2017-07-11 09:21:04 -07:00
irqlist.c remove affinity_hint infrastructure 2016-04-26 14:55:55 -04:00
Makefile.am Add user interface to configuration and build, document socket API in man page 2017-01-03 08:48:42 -05:00
non-atomic.h initial import 2006-12-09 15:59:16 +00:00
numa.c Teach irqbalance about Intel CoD. 2017-07-11 09:21:04 -07:00
placement.c remove affinity_hint infrastructure 2016-04-26 14:55:55 -04:00
procinterrupts.c fix aarch64 compile error due to undefined variable 2017-01-15 10:10:38 +02:00
README.md Surely we don't want to minimize cache hit rates? 2015-06-02 10:41:57 -07:00
types.h Teach irqbalance about Intel CoD. 2017-07-11 09:21:04 -07:00

What is Irqbalance

Irqbalance is a daemon to help balance the cpu load generated by interrupts across all of a systems cpus. Irqbalance identifies the highest volume interrupt sources, and isolates them to a single unique cpu, so that load is spread as much as possible over an entire processor set, while minimizing cache miss rates for irq handlers.

Building and Installing

./autogen.sh
./configure [options]
make
make install

Developing Irqbalance

Irqbalance is currently hosted on github, and so developers are welcome to use the issue/pull request/etc infrastructure found there. However, most development discussions take place on the irqbalance mailing list, which can be subscribed to at: http://lists.infradead.org/mailman/listinfo/irqbalance

New Developers are encouraged to use this mailing list to discuss ideas and propose patches.

Bug reporting

When something goes wrong, feel free to send us bugreport by one of the ways described above. Your report should include:

  • Irqbalance version you've been using (or commit hash)
  • /proc/interrupts output
  • irqbalance --debug output
  • content of smp_affinity files - can be obtained by e.g.: $ for i in $(seq 0 300); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done
  • your hw hierarchy - e.g. lstopo-no-graphics output