321491 – irqbalance seems to die right after boot...

Bug 321491 - irqbalance seems to die right after boot...

Summary: irqbalance seems to die right after boot...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	irqbalance
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Neil Horman
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-06 18:01 UTC by Tom London
Modified:	2007-11-30 22:12 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-10-10 14:51:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Tom London 2007-10-06 18:01:34 UTC

Description of problem:
irqbalance seems to die during or right after boot:

[root@localhost init.d]# service irqbalance status
irqbalance dead but subsys locked
[root@localhost init.d]# 

Manually restarting seems to work:
[root@localhost init.d]# service irqbalance restart
Stopping irqbalance:                                       [FAILED]
Starting irqbalance:                                       [  OK  ]
[root@localhost init.d]# 

But it dies soon thereafter:

[root@localhost init.d]# service irqbalance status
irqbalance dead but subsys locked
[root@localhost init.d]# 

I see no messages in /var/log/messages

Running with --debug produces:

[root@localhost ~]# irqbalance --debug
Package 0:  cpu mask is 00000003 (workload 0)
        Cache domain 0: cpu mask is 00000003  (workload 0) 
                CPU number 1  (workload 0)
                CPU number 0  (workload 0)
Interrupt 217 (class ethernet) has workload 14 
Interrupt 0 (class timer) has workload 1967 
Interrupt 8 (class timer) has workload 0 
Interrupt 20 (class storage) has workload 140 
Interrupt 14 (class storage) has workload 22 
Interrupt 219 (class storage) has workload 7 
Interrupt 15 (class storage) has workload 0 
Interrupt 22 (class legacy) has workload 230 
Interrupt 21 (class legacy) has workload 1 
Interrupt 23 (class legacy) has workload 0 
Interrupt 12 (class legacy) has workload 0 
Interrupt 9 (class legacy) has workload 0 
Interrupt 7 (class legacy) has workload 0 
Interrupt 1 (class legacy) has workload 0 
Interrupt 218 (class other) has workload 23 



-----------------------------------------------------------------------------
IRQ delta is 2015 
Package 0:  cpu mask is 00000003 (workload 11987)
        Cache domain 0: cpu mask is 00000003  (workload 11888) 
                CPU number 1  (workload 11165)
                  Interrupt 217 (ethernet/9) 
                  Interrupt 219 (storage/185) 
                  Interrupt 14 (storage/117) 
                  Interrupt 22 (legacy/1113) 
                  Interrupt 12 (legacy/0) 
                  Interrupt 7 (legacy/0) 
                CPU number 0  (workload 10458)
                  Interrupt 20 (storage/657) 
                  Interrupt 15 (storage/0) 
                  Interrupt 21 (legacy/60) 
                  Interrupt 23 (legacy/0) 
                  Interrupt 9 (legacy/0) 
                  Interrupt 1 (legacy/0) 
  Interrupt 218 (other/98) 
[root@localhost ~]# 

Running latest Rawhide kernel, etc. on Thinkpad X60 (centrino duo)

Version-Release number of selected component (if applicable):
irqbalance-0.55-6.fc8

How reproducible:
Every time

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Neil Horman 2007-10-08 12:30:44 UTC

What are  the contents of your/etc/sysconfig/irqbalance file?

Comment 2 Tom London 2007-10-08 13:38:03 UTC

Believe it is 'stock' and unchanged from distribution:

[tbl@localhost ~]$ cat /etc/sysconfig/irqbalance
# irqbalance is a daemon process that distributes interrupts across
# CPUS on SMP systems.  The default is to rebalance once every 10
# seconds.  There is one configuration option:
#
# ONESHOT=yes
#    after starting, wait for a minute, then look at the interrupt
#    load and balance it once; after balancing exit and do not change
#    it again.
ONESHOT=

#
# IRQ_AFFINITY_MASK
#    64 bit bitmask which allows you to indicate which cpu's should
#    be skipped when reblancing irqs.  Cpu numbers which have their 
#    corresponding bits set to zero in this mask will not have any
#    irq's assigned to them on rebalance
#
#IRQ_AFFINITY_MASK=
[tbl@localhost ~]$

Comment 3 Neil Horman 2007-10-10 14:51:04 UTC

Oh, I see the issue.  This is actually working as designed.  You're system has
two cores, but they share a cache.  As such, its beneficial to distribute the
irq workload accrosss the cores, but its not beneficial to migrate the irqs
periodically since they the likelyhood of cacheline misses is independent of
which core services the irq.  As such irqbalance enters a one_shot_mode where it
balances the irqs once, then exits.

Comment 4 Tom London 2007-10-10 14:59:04 UTC

Thanks for tracking this down.

Anyway to get it to clean up '/var/lock/subsys/irqbalance' on its way out so you
don't get the 'red flag' message on shutdown?

Note You need to log in before you can comment on or make changes to this bug.