Bug 592409

Summary: sirq-net-rx/0: page allocation failure.
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: realtime-kernelAssignee: Red Hat Real Time Maintenance <rt-maint>
Status: CLOSED CURRENTRELEASE QA Contact: David Sommerseth <davids>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.2CC: acme, bhu, lgoncalv, ovasik, tao, williams
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-21 14:28:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jon Thomas 2010-05-14 19:39:35 UTC
During a big network reception (about 60000 packets/sec), the kernel thread sirq-net-rx/0 has crashed and network has been frozen for 1 minute

Additional info:

We bind all network IRQs on cpu 0, witch is isolated (isolcpus=0 at kernel parameter and echo 1 > /proc/irq/2297/smp_affinity ; echo 1 > /proc/irq/2298/smp_affinity in /etc/rc.local where 2297/2298 are ethX irq.)

From /var/log/messages:

May 14 15:42:42 ptparedge29 kernel: sirq-net-rx/0: page allocation failure. order:0, mode:0x20
May 14 15:42:42 ptparedge29 kernel: Pid: 8, comm: sirq-net-rx/0 Not tainted 2.6.24.7-108.el5rt #1
May 14 15:42:42 ptparedge29 kernel:
May 14 15:42:42 ptparedge29 kernel: Call Trace:
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8108b66a>] __alloc_pages+0x2fa/0x312
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810aac90>] kmem_getpages+0x9d/0x16e
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810ab576>] fallback_alloc+0x125/0x1b3
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810ab725>] ____cache_alloc_node+0x121/0x136
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810ab19f>] kmem_cache_alloc_node+0xac/0xf5
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810ab20c>] __kmalloc_node+0x24/0x29
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8120ccdf>] __alloc_skb+0x69/0x133
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8120d93e>] __netdev_alloc_skb+0x31/0x4d
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8812ad25>] :e1000:e1000_clean_rx_irq+0x208/0x4d1
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8812810a>] :e1000:e1000_clean+0x78/0x24f
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff81214242>] net_rx_action+0xab/0x1e9
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff81043d5a>] ksoftirqd+0x16a/0x26f
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff81043bf0>] ? ksoftirqd+0x0/0x26f
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff81043bf0>] ? ksoftirqd+0x0/0x26f
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810526f7>] kthread+0x49/0x76
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8100d078>] child_rip+0xa/0x12
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff810526ae>] ? kthread+0x0/0x76
May 14 15:42:42 ptparedge29 kernel:  [<ffffffff8100d06e>] ? child_rip+0x0/0x12
May 14 15:42:42 ptparedge29 kernel:
May 14 15:42:42 ptparedge29 kernel: Mem-info:
May 14 15:42:42 ptparedge29 kernel: Node 0 Normal per-cpu:
May 14 15:42:42 ptparedge29 kernel: CPU    0: Hot: hi:  186, btch:  31 usd: 134   Cold: hi:   62, btch:  15 usd:  51
May 14 15:42:42 ptparedge29 kernel: CPU    1: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    2: Hot: hi:  186, btch:  31 usd:  57   Cold: hi:   62, btch:  15 usd:  49
May 14 15:42:42 ptparedge29 kernel: CPU    3: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    4: Hot: hi:  186, btch:  31 usd: 163   Cold: hi:   62, btch:  15 usd:  52
May 14 15:42:42 ptparedge29 kernel: CPU    5: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    6: Hot: hi:  186, btch:  31 usd:  50   Cold: hi:   62, btch:  15 usd:  54
May 14 15:42:42 ptparedge29 kernel: CPU    7: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: Node 1 DMA per-cpu:
May 14 15:42:42 ptparedge29 kernel: CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    4: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    5: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    7: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
May 14 15:42:42 ptparedge29 kernel: Node 1 DMA32 per-cpu:
May 14 15:42:42 ptparedge29 kernel: CPU    0: Hot: hi:  186, btch:  31 usd: 172   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    1: Hot: hi:  186, btch:  31 usd:  63   Cold: hi:   62, btch:  15 usd:  55
May 14 15:42:42 ptparedge29 kernel: CPU    2: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    3: Hot: hi:  186, btch:  31 usd:  45   Cold: hi:   62, btch:  15 usd:  61
May 14 15:42:42 ptparedge29 kernel: CPU    4: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    5: Hot: hi:  186, btch:  31 usd: 157   Cold: hi:   62, btch:  15 usd:  49
May 14 15:42:42 ptparedge29 kernel: CPU    6: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    7: Hot: hi:  186, btch:  31 usd: 162   Cold: hi:   62, btch:  15 usd:  59
May 14 15:42:42 ptparedge29 kernel: Node 1 Normal per-cpu:
May 14 15:42:42 ptparedge29 kernel: CPU    0: Hot: hi:  186, btch:  31 usd:  94   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    1: Hot: hi:  186, btch:  31 usd: 120   Cold: hi:   62, btch:  15 usd:  59
May 14 15:42:42 ptparedge29 kernel: CPU    2: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    3: Hot: hi:  186, btch:  31 usd:  34   Cold: hi:   62, btch:  15 usd:  52
May 14 15:42:42 ptparedge29 kernel: CPU    4: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    5: Hot: hi:  186, btch:  31 usd: 107   Cold: hi:   62, btch:  15 usd:  54
May 14 15:42:42 ptparedge29 kernel: CPU    6: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:   0
May 14 15:42:42 ptparedge29 kernel: CPU    7: Hot: hi:  186, btch:  31 usd: 163   Cold: hi:   62, btch:  15 usd:  53
May 14 15:42:42 ptparedge29 kernel: Active:2510385 inactive:3132127 dirty:15783 writeback:68 unstable:32
May 14 15:42:42 ptparedge29 kernel:  free:13295 slab:394117 mapped:40025 pagetables:9734 bounce:0
May 14 15:42:42 ptparedge29 kernel: Node 0 Normal free:3512kB min:9920kB low:12400kB high:14880kB active:5386620kB inactive:6106804kB present:12288000kB pages_scanned:0 all_unreclaimable? no
May 14 15:42:42 ptparedge29 kernel: lowmem_reserve[]: 0 0 0 0
May 14 15:42:42 ptparedge29 kernel: Node 1 DMA free:10228kB min:4kB low:4kB high:4kB active:0kB inactive:0kB present:9572kB pages_scanned:0 all_unreclaimable? no
May 14 15:42:42 ptparedge29 kernel: lowmem_reserve[]: 0 2950 11950 11950
May 14 15:42:42 ptparedge29 kernel: Node 1 DMA32 free:36896kB min:2440kB low:3048kB high:3660kB active:736680kB inactive:1413056kB present:3021796kB pages_scanned:0 all_unreclaimable? no
May 14 15:42:42 ptparedge29 kernel: lowmem_reserve[]: 0 0 9000 9000
May 14 15:42:42 ptparedge29 kernel: Node 1 Normal free:2544kB min:7440kB low:9300kB high:11160kB active:3918240kB inactive:5008648kB present:9216000kB pages_scanned:576 all_unreclaimable? no
May 14 15:42:42 ptparedge29 kernel: lowmem_reserve[]: 0 0 0 0
May 14 15:42:42 ptparedge29 kernel: Node 0 Normal: 57*4kB 2*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3556kB
May 14 15:42:42 ptparedge29 kernel: Node 1 DMA: 3*4kB 3*8kB 1*16kB 2*32kB 4*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 10228kB
May 14 15:42:42 ptparedge29 kernel: Node 1 DMA32: 7163*4kB 504*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 36780kB
May 14 15:42:42 ptparedge29 kernel: Node 1 Normal: 38*4kB 6*8kB 6*16kB 4*32kB 2*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2472kB
May 14 15:42:42 ptparedge29 kernel: Swap cache: add 103335, delete 81317, find 0/0, race 0+0
May 14 15:42:42 ptparedge29 kernel: Free swap  = 16363548kB
May 14 15:42:42 ptparedge29 kernel: Total swap = 16777208kB
May 14 15:42:42 ptparedge29 kernel: Free swap:       16363548kB
May 14 15:42:42 ptparedge29 kernel: 6291456 pages of RAM
May 14 15:42:42 ptparedge29 kernel: 201278 reserved pages
May 14 15:42:42 ptparedge29 kernel: 5075485 pages shared
May 14 15:42:42 ptparedge29 kernel: 21986 pages swap cached

- Kernel is 2.6.24.7-108.el5rt
- priorities of sirq-net-rx are 75 FF

Comment 2 Clark Williams 2010-05-14 20:26:27 UTC
Jon,

That's a very old release. Current release is 2.6.24.7-149. Please ask them to try that release of the kernel.

Also, the correct way to change IRQ affinity is using the /proc/irq entries (e.g. /proc/irq/11/smp_affinity to set the affinity of IRQ 11). Changing the affinity of the IRQ thread has no effect. The thread is tied to the IRQ information so even if you run taskset on it, it'll just reset back to whatever the IRQ affintiy setting is.

Safest way to do isolation and affinity is to use 'tuna':

# tuna --cpu 0 --isolate --irq eth* --move

Run the above in a boot script and it will move everything off cpu 0, then move all the eth* irqs onto cpu0

Comment 3 Jon Thomas 2010-05-14 20:57:32 UTC
hmm, 

Are you saying 2297/2298 are pid's and not irq's? Am I misreading something here?

It seems there are irq's 2297/2298 and it looks like they are getting getting set to cpu 0.

 cat proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
...
2297: 1311775712          0          0          0          0          0         50          0   PCI-MSI-edge      eth1
2298:  121775118          0          0          0          0          0       1482          0   PCI-MSI-edge      eth0

from lspci:

04:00.1 0200: 8086:105e (rev 06)
	Subsystem: 8086:135e
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 2297

Comment 4 Clark Williams 2010-05-14 21:11:13 UTC
argh, I misread the listing above; you are correct they're IRQs from an IOAPIC.

So yeah, they're doing affinity correctly.

Still would like to see them run on the latest .24 kernel though...

Comment 5 Jon Thomas 2010-05-14 21:27:20 UTC
whew, I thought I was loosing my mind for a minute.

yeah, I'll see if they can upgrade to the newer release. 

thanks for the help