Description of problem: Using a Sun Fire x4500 and building 8 soft raid5 arrays of 5 disks, placed into one large lvm group, formatted with xfs... Using the kernel version 2.6.18-194.17.4.el5 and using my build script (attached below) the kernel will print many "kernel: BUG SOFT LOCKUP" entries to /var/log/messages and the console. Eventually the machine will stop responding. I have waited overnight for the machine to return, but it does not. Using the kernel version 2.6.18-194.8.1.el5 this does not occur. Version-Release number of selected component (if applicable): How reproducible: every time using the attached script and the latest kernel. Steps to Reproduce: 1.install rhel 5.5 to /dev/sdac and /dev/sdy (two raid1 md arrays, one 4g swap and one for / with remaining space using ext3) 2.install xfsprogs from ftp://oss.sgi.com/projects/xfs/cmd_srpms-oct_09/ 3.run attached script 4.script completes 5.within hours, machine locks up before array build is complete. Actual results: lockup (no ping, ssh, console), arrays build never completes. Expected results: load to exceed 14, machine to become slow, rebuilds to finish. Additional info: Each disk is 1TB This machine runs puppet every two hours. This machine currently has two disks SMART is complaining about (not being used in arrays.) example stack trace sent to /var/log/messages Nov 2 05:14:56 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 23s! [md3_raid5:3583] Nov 2 05:14:57 ra-nas2 kernel: CPU 3: Nov 2 05:14:57 ra-nas2 kernel: Modules linked in: cpufreq_ondemand powernow_k8 freq_table bonding ipv6 xfrm_nalgo crypto_api ipt_REJECT xt_tcpudp xt_state iptable_filter iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables loop dm_mirror dm_multipath scsi_dh raid456 xor video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev i2c_amd8111 i2c_amd756 i2c_core sg k8_edac k8temp edac_mc amd_rng pcspkr hwmon serio_raw e1000 dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache usb_storage shpchp sata_mv libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Nov 2 05:14:57 ra-nas2 kernel: Pid: 3583, comm: md3_raid5 Not tainted 2.6.18-194.17.4.el5 #1 Nov 2 05:14:57 ra-nas2 kernel: RIP: 0010:[<ffffffff80064b50>] [<ffffffff80064b50>] _spin_unlock_irqrestore+0x8/0x9 Nov 2 05:14:57 ra-nas2 kernel: RSP: 0018:ffff8103fd42ddb8 EFLAGS: 00000287 Nov 2 05:14:57 ra-nas2 kernel: RAX: 0000000000000000 RBX: ffff8101ffda4800 RCX: 0000000000000000 Nov 2 05:14:57 ra-nas2 kernel: RDX: 0000000000000286 RSI: 0000000000000287 RDI: ffff8101ffda4850 Nov 2 05:14:57 ra-nas2 kernel: RBP: ffff8101ff1be528 R08: ffff8101ff1f31f0 R09: 00000001f97d2000 Nov 2 05:14:57 ra-nas2 kernel: R10: 00000000f97d1000 R11: ffff8101ff1f3200 R12: 0000000000000000 Nov 2 05:14:57 ra-nas2 kernel: R13: ffff8101ff1bc0e0 R14: 0000000000000220 R15: ffffffff880cbf30 Nov 2 05:14:57 ra-nas2 kernel: FS: 0000000041475940(0000) GS:ffff8102071b46c0(0000) knlGS:0000000000000000 Nov 2 05:14:57 ra-nas2 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Nov 2 05:14:57 ra-nas2 kernel: CR2: 00002aba6e3affe0 CR3: 0000000000201000 CR4: 00000000000006e0 Nov 2 05:14:57 ra-nas2 kernel: Nov 2 05:14:57 ra-nas2 kernel: Call Trace: Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff88084c70>] :scsi_mod:scsi_dispatch_cmd+0x27d/0x2ff Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff8808a174>] :scsi_mod:scsi_request_fn+0x2c1/0x390 Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff8005a85b>] generic_unplug_device+0x22/0x32 Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff88332063>] :raid456:unplug_slaves+0x63/0x99 Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff8833735f>] :raid456:raid5d+0x175/0x180 Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff8003acc1>] prepare_to_wait+0x34/0x61 Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff8021b72e>] md_thread+0xf8/0x10e Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff800a09d4>] autoremove_wake_function+0x0/0x2e Nov 2 05:14:57 ra-nas2 kernel: [<ffffffff8021b636>] md_thread+0x0/0x10e Nov 2 05:14:58 ra-nas2 kernel: [<ffffffff8003290a>] kthread+0xfe/0x132 Nov 2 05:14:58 ra-nas2 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Nov 2 05:14:58 ra-nas2 kernel: [<ffffffff800a07bc>] keventd_create_kthread+0x0/0xc4 libsmb/cliconnect.c:cli_session_setup_spnego(877) Nov 2 05:14:58 ra-nas2 kernel: [<ffffffff8003280c>] kthread+0x0/0x132 Nov 2 05:14:58 ra-nas2 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Second example stack trace: Nov 2 05:14:53 ra-nas2 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [rpmq:5131] Nov 2 05:14:53 ra-nas2 kernel: CPU 0: Nov 2 05:14:53 ra-nas2 kernel: Modules linked in: cpufreq_ondemand powernow_k8 freq_table bonding ipv6 xfrm_nalgo crypto_api ipt_REJECT xt_tcpudp xt_state iptable_filter iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables loop dm_mirror dm_multipath scsi_dh raid456 xor video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev i2c_amd8111 i2c_amd756 i2c_core sg k8_edac k8temp edac_mc amd_rng pcspkr hwmon serio_raw e1000 dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache usb_storage shpchp sata_mv libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Nov 2 05:14:53 ra-nas2 kernel: Pid: 5131, comm: rpmq Not tainted 2.6.18-194.17.4.el5 #1 Nov 2 05:14:53 ra-nas2 kernel: RIP: 0010:[<ffffffff800161f5>] [<ffffffff800161f5>] __bitmap_empty+0xf/0x62 Nov 2 05:14:53 ra-nas2 kernel: RSP: 0018:ffff8101ed23de18 EFLAGS: 00000212 Nov 2 05:14:53 ra-nas2 kernel: RAX: 0000000000000003 RBX: ffff810001003200 RCX: 00000000000000ff Nov 2 05:14:53 ra-nas2 kernel: RDX: 000000000000003f RSI: 00000000000000ff RDI: ffff810001003200 Nov 2 05:14:53 ra-nas2 kernel: RBP: 0000000000001000 R08: 0000000000000002 R09: 000000000000003e Nov 2 05:14:53 ra-nas2 kernel: R10: ffff8101ed23ddc8 R11: ffff8101ed37be48 R12: ffff8101ed23ddf8 Nov 2 05:14:53 ra-nas2 kernel: R13: 0000000000000000 R14: ffffffff8000c639 R15: ffff8101ed23de88 Nov 2 05:14:53 ra-nas2 kernel: FS: 00002b2336b01140(0000) GS:ffffffff803ca000(0000) knlGS:0000000000000000 Nov 2 05:14:53 ra-nas2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Nov 2 05:14:53 ra-nas2 kernel: CR2: 00002b2337390000 CR3: 00000001ecdec000 CR4: 00000000000006e0 Nov 2 05:14:53 ra-nas2 kernel: Nov 2 05:14:53 ra-nas2 kernel: Call Trace: Nov 2 05:14:53 ra-nas2 kernel: [<ffffffff80023132>] flush_tlb_others+0x99/0xbc Nov 2 05:14:53 ra-nas2 kernel: [<ffffffff8007698f>] flush_tlb_mm+0xca/0xd5 Nov 2 05:14:53 ra-nas2 kernel: [<ffffffff80012d8e>] unmap_region+0x104/0x12b Nov 2 05:14:53 ra-nas2 kernel: [<ffffffff80011e7f>] do_munmap+0x21b/0x29a Nov 2 05:14:53 ra-nas2 kernel: [<ffffffff800645ab>] __down_write_nested+0x12/0x92 Nov 2 05:14:53 ra-nas2 kernel: [<ffffffff80016288>] sys_munmap+0x40/0x59 Nov 2 05:14:54 ra-nas2 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Third example stack trace: Nov 1 08:26:03 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 35s! [swapper:0] Nov 1 08:26:03 ra-nas2 kernel: CPU 3: Nov 1 08:26:03 ra-nas2 kernel: Modules linked in: xfs raid456 xor cpufreq_ondemand powernow_k8 freq_table ipt_REJECT xt_tcpudp xt_state iptable_filter iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables loop dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev e1000 i2c_amd756 k8temp sg i2c_amd8111 hwmon amd_rng i2c_core k8_edac pcspkr serio_raw edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache usb_storage shpchp sata_mv libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Nov 1 08:26:03 ra-nas2 kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.17.4.el5 #1 Nov 1 08:26:03 ra-nas2 kernel: RIP: 0010:[<ffffffff80064b50>] [<ffffffff80064b50>] _spin_unlock_irqrestore+0x8/0x9 Nov 1 08:26:03 ra-nas2 kernel: RSP: 0018:ffff810107143d48 EFLAGS: 00000246 Nov 1 08:26:03 ra-nas2 kernel: RAX: 0000000000000000 RBX: ffff8101ff610800 RCX: 0000000000000001 Nov 1 08:26:03 ra-nas2 kernel: RDX: 0000000000000286 RSI: 0000000000000246 RDI: ffff8101ff610850 Nov 1 08:26:03 ra-nas2 kernel: RBP: ffff810107143cc0 R08: ffff810207204370 R09: ffff8101ff158000 Nov 1 08:26:03 ra-nas2 kernel: R10: ffff8103ec41d6c8 R11: ffffffff80045000 R12: ffffffff8005dc8e Nov 1 08:26:03 ra-nas2 kernel: R13: ffff8101f4eaa9c0 R14: ffffffff800781ff R15: ffff810107143cc0 Nov 1 08:26:03 ra-nas2 kernel: FS: 00002b985e74d190(0000) GS:ffff8102071b46c0(0000) knlGS:00000000f7f8f6c0 Nov 1 08:26:03 ra-nas2 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Nov 1 08:26:03 ra-nas2 kernel: CR2: 00002aaaae528000 CR3: 00000003fa934000 CR4: 00000000000006e0 Nov 1 08:26:03 ra-nas2 kernel: Nov 1 08:26:04 ra-nas2 kernel: Call Trace: Nov 1 08:26:04 ra-nas2 kernel: <IRQ> [<ffffffff88084c70>] :scsi_mod:scsi_dispatch_cmd+0x27d/0x2ff Nov 1 08:26:04 ra-nas2 kernel: [<ffffffff8808a174>] :scsi_mod:scsi_request_fn+0x2c1/0x390 Nov 1 08:26:04 ra-nas2 kernel: [<ffffffff8005c223>] blk_run_queue+0x41/0x73 Nov 1 08:26:05 ra-nas2 kernel: [<ffffffff8808879d>] :scsi_mod:scsi_run_queue+0x155/0x1bf Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff88088efe>] :scsi_mod:scsi_next_command+0x2d/0x39 Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff8808907d>] :scsi_mod:scsi_end_request+0xbf/0xcd Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff880891d9>] :scsi_mod:scsi_io_completion+0x14e/0x324 Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff880b6802>] :sd_mod:sd_rw_intr+0x252/0x28c Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff8808946e>] :scsi_mod:scsi_device_unbusy+0x67/0x81 Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff80037cac>] blk_done_softirq+0x5f/0x6d Nov 1 08:26:06 ra-nas2 kernel: [<ffffffff80012443>] __do_softirq+0x89/0x133 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8006cb8a>] do_softirq+0x2c/0x85 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8006ca12>] do_IRQ+0xec/0xf5 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8006b342>] default_idle+0x0/0x50 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa Nov 1 08:26:07 ra-nas2 kernel: <EOI> [<ffffffff8005a839>] generic_unplug_device+0x0/0x32 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8006b36b>] default_idle+0x29/0x50 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8004923a>] cpu_idle+0x95/0xb8 Nov 1 08:26:07 ra-nas2 kernel: [<ffffffff8007796b>] start_secondary+0x498/0x4a7 all lines containing "BUG" from /var/log/messages: Nov 1 06:47:24 ra-nas2 kernel: BUG: soft lockup - CPU#1 stuck for 64s! [swapper:0] Nov 1 08:26:03 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 35s! [swapper:0] Nov 1 08:26:08 ra-nas2 kernel: BUG: soft lockup - CPU#2 stuck for 35s! [md6_resync:9859] Nov 1 08:26:14 ra-nas2 kernel: BUG: soft lockup - CPU#1 stuck for 35s! [md9_resync:9898] Nov 1 12:26:12 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 35s! [md5_raid5:9841] Nov 1 12:26:14 ra-nas2 kernel: BUG: soft lockup - CPU#2 stuck for 35s! [md8_resync:9886] Nov 1 12:26:16 ra-nas2 kernel: BUG: soft lockup - CPU#1 stuck for 35s! [md5_resync:9845] Nov 2 05:14:53 ra-nas2 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [rpmq:5131] Nov 2 05:14:54 ra-nas2 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [rpmq:5131] Nov 2 05:14:55 ra-nas2 kernel: BUG: soft lockup - CPU#1 stuck for 23s! [md9_resync:3628] Nov 2 05:14:56 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 23s! [md3_raid5:3583] Nov 2 05:14:58 ra-nas2 kernel: BUG: soft lockup - CPU#2 stuck for 23s! [md3_resync:3585] Nov 2 13:14:41 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [md3_raid5:24146] Nov 2 13:15:38 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [md3_raid5:24146] Nov 2 13:15:40 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [md3_raid5:24146] Nov 2 13:15:46 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [md3_raid5:24146] Nov 2 13:15:47 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [md3_raid5:24146] Nov 2 13:15:49 ra-nas2 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [md3_raid5:24146] Nov 2 13:15:54 ra-nas2 kernel: BUG: soft lockup - CPU#1 stuck for 65s! [smartd:5143] Nov 2 13:15:55 ra-nas2 kernel: BUG: soft lockup - CPU#2 stuck for 65s! [md2_resync:24141] Nov 3 06:34:31 ra-nas2 kernel: BUG: soft lockup - CPU#1 stuck for 64s! [swapper:0]
Created attachment 457589 [details] The script used to build md arrays, setup lvm, format with xfs, and mount.
Created attachment 457590 [details] logs with line wrapping fixed.
Additional non-standard packages on this machine include ipmitool and hd from Sun's Driver and Tools disk for x4500 servers version 1.7.
One last note: this machine will have to go back into production next week so my ability to test will be reduced by the end of this week.
Created attachment 457596 [details] dmesg from the x4500
Sorry, previously stated 2.6.18-194.8.1.el5 also does not work. I am now trying kernel-2.6.18-164.15.1.el5 (the last 5.4 kernel)
kernel-2.6.18-164.15.1.el5 worked as expected.
I would appreciate advice on how to proceed. I'm a little stuck at the moment.
Is this still an issue for you?