Description of problem: irqbalance seems to die during or right after boot: [root@localhost init.d]# service irqbalance status irqbalance dead but subsys locked [root@localhost init.d]# Manually restarting seems to work: [root@localhost init.d]# service irqbalance restart Stopping irqbalance: [FAILED] Starting irqbalance: [ OK ] [root@localhost init.d]# But it dies soon thereafter: [root@localhost init.d]# service irqbalance status irqbalance dead but subsys locked [root@localhost init.d]# I see no messages in /var/log/messages Running with --debug produces: [root@localhost ~]# irqbalance --debug Package 0: cpu mask is 00000003 (workload 0) Cache domain 0: cpu mask is 00000003 (workload 0) CPU number 1 (workload 0) CPU number 0 (workload 0) Interrupt 217 (class ethernet) has workload 14 Interrupt 0 (class timer) has workload 1967 Interrupt 8 (class timer) has workload 0 Interrupt 20 (class storage) has workload 140 Interrupt 14 (class storage) has workload 22 Interrupt 219 (class storage) has workload 7 Interrupt 15 (class storage) has workload 0 Interrupt 22 (class legacy) has workload 230 Interrupt 21 (class legacy) has workload 1 Interrupt 23 (class legacy) has workload 0 Interrupt 12 (class legacy) has workload 0 Interrupt 9 (class legacy) has workload 0 Interrupt 7 (class legacy) has workload 0 Interrupt 1 (class legacy) has workload 0 Interrupt 218 (class other) has workload 23 ----------------------------------------------------------------------------- IRQ delta is 2015 Package 0: cpu mask is 00000003 (workload 11987) Cache domain 0: cpu mask is 00000003 (workload 11888) CPU number 1 (workload 11165) Interrupt 217 (ethernet/9) Interrupt 219 (storage/185) Interrupt 14 (storage/117) Interrupt 22 (legacy/1113) Interrupt 12 (legacy/0) Interrupt 7 (legacy/0) CPU number 0 (workload 10458) Interrupt 20 (storage/657) Interrupt 15 (storage/0) Interrupt 21 (legacy/60) Interrupt 23 (legacy/0) Interrupt 9 (legacy/0) Interrupt 1 (legacy/0) Interrupt 218 (other/98) [root@localhost ~]# Running latest Rawhide kernel, etc. on Thinkpad X60 (centrino duo) Version-Release number of selected component (if applicable): irqbalance-0.55-6.fc8 How reproducible: Every time Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
What are the contents of your/etc/sysconfig/irqbalance file?
Believe it is 'stock' and unchanged from distribution: [tbl@localhost ~]$ cat /etc/sysconfig/irqbalance # irqbalance is a daemon process that distributes interrupts across # CPUS on SMP systems. The default is to rebalance once every 10 # seconds. There is one configuration option: # # ONESHOT=yes # after starting, wait for a minute, then look at the interrupt # load and balance it once; after balancing exit and do not change # it again. ONESHOT= # # IRQ_AFFINITY_MASK # 64 bit bitmask which allows you to indicate which cpu's should # be skipped when reblancing irqs. Cpu numbers which have their # corresponding bits set to zero in this mask will not have any # irq's assigned to them on rebalance # #IRQ_AFFINITY_MASK= [tbl@localhost ~]$
Oh, I see the issue. This is actually working as designed. You're system has two cores, but they share a cache. As such, its beneficial to distribute the irq workload accrosss the cores, but its not beneficial to migrate the irqs periodically since they the likelyhood of cacheline misses is independent of which core services the irq. As such irqbalance enters a one_shot_mode where it balances the irqs once, then exits.
Thanks for tracking this down. Anyway to get it to clean up '/var/lock/subsys/irqbalance' on its way out so you don't get the 'red flag' message on shutdown?