7bc1244fbf
This originally surfaced as a bug in placing network interrupts. In the case that this submitter observed, the NIC card was in NUMA domain 0, but each RSS interrupt was getting an affinity list for all CPUs in the domain. The expected behavior is for a single cpu to be chosen when attempting to fan out NIC interrupts. Due to other implementation details of interrupt placement, this effectively caused all interrupt mappings for this NIC to end up on CPU 0. The bug turns out ot have been caused by Intel Cluster on Die breaking an assumption in irqbalance about the design of the component hierarchy. The CoD topology allows a CPU package to belong to more than one NUMA node, which is not expected. The RCA was that when the second NUMA node was wired up to the existing physical package, it overwrote the mappings that were placed there by the first. This patch attempts to solve that problem by permitting a package to have multiple NUMA nodes. The CPU component hierarchy is preserved, in case other parts of the code depend upon walking it. When a CoD topology is detected, the NUMA node -> CPU component mapping is moved down a level, so that the nodes point to the first level where the affinity becomes distinct. In practice, this has been observed to be the LLC. A quick illustration (now, with COD, it looks like this): +-----------+ | NUMA Node | | 0 | +-----------+ | | +-------+ \|/ / | CPU 0 | +---------+ +-------+ | Cache 0 | +---------+ +-------+ / \ | CPU 1 | +-----------+ +-------+ | Package 0 | +-----------+ +-------+ \ / | CPU 2 | +---------+ +-------+ | Cache 1 | +---------+ ^ \ +-------+ | | CPU 3 | | +-------+ +-----------+ | NUMA Node | | 1 | +-----------+ Whereas, previously only NUMA Node 1 would end up pointing to package 0. The topology should not be different on platforms that do not enable CoD. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> |
||
---|---|---|
glib-local | ||
misc | ||
ui | ||
.gitignore | ||
activate.c | ||
AUTHORS | ||
autogen.sh | ||
bitmap.c | ||
bitmap.h | ||
classify.c | ||
configure.ac | ||
constants.h | ||
COPYING | ||
cpumask.h | ||
cputree.c | ||
irqbalance.1 | ||
irqbalance.c | ||
irqbalance.h | ||
irqlist.c | ||
Makefile.am | ||
non-atomic.h | ||
numa.c | ||
placement.c | ||
procinterrupts.c | ||
README.md | ||
types.h |
What is Irqbalance
Irqbalance is a daemon to help balance the cpu load generated by interrupts across all of a systems cpus. Irqbalance identifies the highest volume interrupt sources, and isolates them to a single unique cpu, so that load is spread as much as possible over an entire processor set, while minimizing cache miss rates for irq handlers.
Building and Installing
./autogen.sh
./configure [options]
make
make install
Developing Irqbalance
Irqbalance is currently hosted on github, and so developers are welcome to use the issue/pull request/etc infrastructure found there. However, most development discussions take place on the irqbalance mailing list, which can be subscribed to at: http://lists.infradead.org/mailman/listinfo/irqbalance
New Developers are encouraged to use this mailing list to discuss ideas and propose patches.
Bug reporting
When something goes wrong, feel free to send us bugreport by one of the ways described above. Your report should include:
- Irqbalance version you've been using (or commit hash)
/proc/interrupts
outputirqbalance --debug
output- content of smp_affinity files - can be obtained by e.g.:
$ for i in $(seq 0 300); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done
- your hw hierarchy - e.g.
lstopo-no-graphics
output