Bug 817636

Summary: BUG: soft lockup - CPU#2 stuck for 22s!
Product: [Fedora] Fedora Reporter: g. artim <gartim>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-16 20:30:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description g. artim 2012-04-30 17:30:30 UTC
Description of problem:

got message in logs of softlock of cpu 2


Version-Release number of selected component (if applicable):
smolt stuff:
General
=================================
UUID: 82aeec7f-a6db-457c-9120-27706baea2fe
OS: Fedora release 16 (Verne)
Default run level: Unknown
Language: en_US.UTF-8
Platform: x86_64
BogoMIPS: 7005.75
CPU Vendor: GenuineIntel
CPU Model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
CPU Stepping: 7
CPU Family: 6
CPU Model Num: 42
Number of CPUs: 8
CPU Speed: 3401
System Memory: 7893
System Swap: 6015
Vendor: System manufacturer
System: System Product Name System Version
Form factor: Desktop
Kernel: 3.3.2-6.fc16.x86_64
SELinux Enabled: 0
SELinux Policy: targeted
SELinux Enforce: Unknown
MythTV Remote: Unknown
MythTV Role: Unknown
MythTV Theme: Unknown
MythTV Plugin: 
MythTV Tuner: -1


Devices
=================================
(6945:4224:6945:4224) pci, None, PCI/PCI, N/A
(4358:12356:4163:33278) pci, firewire_ohci, FIREWIRE, M4A series motherboard
(32902:256:4163:33869) pci, agpgart-intel, HOST/PCI, P8P67 Deluxe Motherboard
(32902:261:4163:33869) pci, pcieport, PCI/PCI, Xeon E3-1200/2nd Generation Co
(32902:257:4163:33869) pci, pcieport, PCI/PCI, Xeon E3-1200/2nd Generation Co
(32902:4220:32902:4982) pci, e1000, ETHERNET, PRO/1000 GT Desktop Adapter
(32902:7202:4163:33869) pci, i801_smbus, SERIAL, P8P67 Deluxe Motherboard
(32902:7170:4163:33869) pci, ahci, STORAGE, P8P67 Deluxe Motherboard
(32902:7236:4163:33869) pci, None, PCI/ISA, Z68 Express Chipset Family LPC Co
(32902:5379:4163:33948) pci, e1000e, ETHERNET, P8P67 Deluxe Motherboard
(32902:7188:4163:33869) pci, pcieport, PCI/PCI, 6 Series/C200 Series Chipset 
(36869:645:36869:694) pci, aacraid, RAID, ASR5805
(6987:37234:4163:33911) pci, ahci, STORAGE, N/A
(32902:7213:4163:33869) pci, ehci_hcd, USB, P8P67 Deluxe Motherboard
(32902:7206:4163:33869) pci, ehci_hcd, USB, P8P67 Deluxe Motherboard
(32902:7226:4163:33869) pci, None, SIMPLE, P8P67 Deluxe Motherboard
(6945:4162:4163:33928) pci, xhci_hcd, USB, ASM1042 SuperSpeed USB Host Contro
(36869:645:36869:694) pci, aacraid, RAID, ASR5805
(32902:258:4163:33869) pci, i915, VIDEO, 2nd Generation Core Processor Family
(32902:7190:4163:33869) pci, pcieport, PCI/PCI, 6 Series/C200 Series Chipset 
(32902:7184:4163:33869) pci, pcieport, PCI/PCI, 6 Series/C200 Series Chipset 
(32902:7186:4163:33869) pci, pcieport, PCI/PCI, 6 Series/C200 Series Chipset 
(32902:9294:4163:33869) pci, None, PCI/PCI, 82801 PCI Bridge
(32902:7198:4163:33869) pci, pcieport, PCI/PCI, 6 Series/C200 Series Chipset 
(32902:7192:4163:33869) pci, pcieport, PCI/PCI, 6 Series/C200 Series Chipset 
(32902:7200:4163:33808) pci, snd_hda_intel, MULTIMEDIA, 6 Series/C200 Series 
(6523:9058:4163:33888) pci, ahci, STORAGE, P8P67 Deluxe Motherboard
(32902:4190:32902:4958) pci, e1000e, ETHERNET, PRO/1000 PT Dual Port Server A
(32902:4190:32902:4958) pci, e1000e, ETHERNET, PRO/1000 PT Dual Port Server A


Filesystem Information
=================================
device mtpt type bsize frsize blocks bfree bavail file ffree favail
-------------------------------------------------------------------
/dev/mapper/VolGroup-lv_root / ext4 4096 4096 11894941 7936068 7340100 298188
/dev/sda1 /boot ext4 1024 1024 508745 410243 384643 128016 127785 127785
/dev/mapper/VolGroup-lv_home /home ext4 4096 4096 5930326 1570859 1273490 148



How reproducible:

unsure, was nothing happening when this happened.


Steps to Reproduce:
1.
2.
3.
  
Actual results:

Apr 28 02:15:08 localhost kernel: [229181.222297] BUG: soft lockup - CPU#2 stuck for 22s! [irqbalance:1010]
Apr 28 02:15:08 localhost kernel: [229181.222300] Modules linked in: xfs rfcomm xt_iprange bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack btusb snd_hda_codec_hdmi snd_hda_codec_realtek eeepc_wmi asus_wmi sparse_keymap ath3k bluetooth rfkill microcode serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000 snd_pcm uinput snd_timer snd soundcore snd_page_alloc e1000e(O) nfsd lockd nfs_acl auth_rpcgss sunrpc mxm_wmi firewire_ohci firewire_core crc_itu_t aacraid wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Apr 28 02:15:08 localhost kernel: [229181.222331] CPU 2 
Apr 28 02:15:08 localhost kernel: [229181.222332] Modules linked in: xfs rfcomm xt_iprange bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack btusb snd_hda_codec_hdmi snd_hda_codec_realtek eeepc_wmi asus_wmi sparse_keymap ath3k bluetooth rfkill microcode serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000 snd_pcm uinput snd_timer snd soundcore snd_page_alloc e1000e(O) nfsd lockd nfs_acl auth_rpcgss sunrpc mxm_wmi firewire_ohci firewire_core crc_itu_t aacraid wmi i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Apr 28 02:15:08 localhost kernel: [229181.222356] 
Apr 28 02:15:08 localhost kernel: [229181.222358] Pid: 1010, comm: irqbalance Tainted: G           O 3.3.2-1.fc16.x86_64 #1 System manufacturer System Product Name/P8Z68-V PRO
Apr 28 02:15:08 localhost kernel: [229181.222362] RIP: 0010:[<ffffffff810935cd>]  [<ffffffff810935cd>] rebalance_domains+0x9d/0x180
Apr 28 02:15:08 localhost kernel: [229181.222368] RSP: 0000:ffff88022fa83e00  EFLAGS: 00000282
Apr 28 02:15:08 localhost kernel: [229181.222370] RAX: 0000000000000000 RBX: ffff88022fa83e2c RCX: 0000000000000200
Apr 28 02:15:08 localhost kernel: [229181.222371] RDX: 0000000000000200 RSI: ffff88022fa83e2c RDI: ffff88022700ba00
Apr 28 02:15:08 localhost kernel: [229181.222373] RBP: ffff88022fa83e60 R08: ffff88022fa83e2c R09: 0000000000000000
Apr 28 02:15:08 localhost kernel: [229181.222374] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88022fa83d78
Apr 28 02:15:08 localhost kernel: [229181.222376] R13: ffffffff815fd05e R14: ffff88022fa83e60 R15: 0000000000000320
Apr 28 02:15:08 localhost kernel: [229181.222378] FS:  00007fe66329c740(0000) GS:ffff88022fa80000(0000) knlGS:0000000000000000
Apr 28 02:15:08 localhost kernel: [229181.222380] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 28 02:15:08 localhost kernel: [229181.222382] CR2: 00007fe6632bb000 CR3: 000000021aa05000 CR4: 00000000000406e0
Apr 28 02:15:08 localhost kernel: [229181.222383] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 28 02:15:08 localhost kernel: [229181.222385] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 28 02:15:08 localhost kernel: [229181.222387] Process irqbalance (pid: 1010, threadinfo ffff88021fc74000, task ffff880221e31730)
Apr 28 02:15:08 localhost kernel: [229181.222388] Stack:
Apr 28 02:15:08 localhost kernel: [229181.222389]  0000000000000000 0000000000000000 ffff88022fa93680 000000022fa80000
Apr 28 02:15:08 localhost kernel: [229181.222392]  0000000000000e80 00000000000101de 0000000000000000 ffffffff81c040b8
Apr 28 02:15:08 localhost kernel: [229181.222395]  0000000000000002 0000000000000002 0000000000013680 0000000000013680
Apr 28 02:15:08 localhost kernel: [229181.222398] Call Trace:
Apr 28 02:15:08 localhost kernel: [229181.222399]  <IRQ> 
Apr 28 02:15:08 localhost kernel: [229181.222402]  [<ffffffff81093700>] run_rebalance_domains+0x50/0x170
Apr 28 02:15:08 localhost kernel: [229181.222405]  [<ffffffff8105f0e8>] __do_softirq+0xb8/0x230
Apr 28 02:15:08 localhost kernel: [229181.222409]  [<ffffffff810362c4>] ? ack_apic_level+0x74/0x140
Apr 28 02:15:08 localhost kernel: [229181.222413]  [<ffffffff815fda5c>] call_softirq+0x1c/0x30
Apr 28 02:15:08 localhost kernel: [229181.222416]  [<ffffffff81016455>] do_softirq+0x65/0xa0
Apr 28 02:15:08 localhost kernel: [229181.222418]  [<ffffffff8105f4fe>] irq_exit+0x9e/0xc0
Apr 28 02:15:08 localhost kernel: [229181.222421]  [<ffffffff815fe2c3>] do_IRQ+0x63/0xe0
Apr 28 02:15:08 localhost kernel: [229181.222424]  [<ffffffff815f486e>] common_interrupt+0x6e/0x6e
Apr 28 02:15:08 localhost kernel: [229181.222425]  <EOI> 
Apr 28 02:15:08 localhost kernel: [229181.222427]  [<ffffffff815fc5a9>] ? system_call_fastpath+0x16/0x1b
Apr 28 02:15:08 localhost kernel: [229181.222429] Code: e0 48 03 43 58 48 8b 15 52 6a c4 00 48 39 c2 78 34 48 8b 75 b0 8b 7d bc 4c 8d 45 cc 44 89 f1 48 89 da 44 89 4d a8 e8 a3 f5 ff ff <85> c0 b8 01 00 00 00 44 0f 45 f0 48 8b 05 21 6a c4 00 48 89 43 
Apr 28 02:15:08 localhost kernel: [229181.222451] Call Trace:
Apr 28 02:15:08 localhost kernel: [229181.222452]  <IRQ>  [<ffffffff81093700>] run_rebalance_domains+0x50/0x170
Apr 28 02:15:08 localhost kernel: [229181.222456]  [<ffffffff8105f0e8>] __do_softirq+0xb8/0x230
Apr 28 02:15:08 localhost kernel: [229181.222458]  [<ffffffff810362c4>] ? ack_apic_level+0x74/0x140
Apr 28 02:15:08 localhost kernel: [229181.222461]  [<ffffffff815fda5c>] call_softirq+0x1c/0x30
Apr 28 02:15:08 localhost kernel: [229181.222463]  [<ffffffff81016455>] do_softirq+0x65/0xa0
Apr 28 02:15:08 localhost kernel: [229181.222465]  [<ffffffff8105f4fe>] irq_exit+0x9e/0xc0
Apr 28 02:15:08 localhost kernel: [229181.222468]  [<ffffffff815fe2c3>] do_IRQ+0x63/0xe0
Apr 28 02:15:08 localhost kernel: [229181.222470]  [<ffffffff815f486e>] common_interrupt+0x6e/0x6e
Apr 28 02:15:08 localhost kernel: [229181.222471]  <EOI>  [<ffffffff815fc5a9>] ? system_call_fastpath+0x16/0x1b
Apr 28 02:15:09 localhost abrt-dump-oops[1003]: abrt-dump-oops: Found oopses: 1
Apr 28 02:15:09 localhost abrt-dump-oops[1003]: abrt-dump-oops: Creating dump directories
Apr 28 02:15:09 localhost abrtd: Directory 'oops-2012-04-28-02:15:09-1003-0' creation detect

Expected results:


Additional info:

I do have the most current e1000e Intel driver installed so my nic works, current fedora version get continuous resets, have not posted this yet.

Comment 1 Dave Jones 2012-07-16 18:17:20 UTC
has there been any reoccurrence of this on newer kernels ?

Comment 2 g. artim 2012-07-16 18:52:56 UTC
no, seem okay now.