Escalated to Bugzilla from IssueTracker
Description of problem: On RHLE5.3 system customer create a RAID6 with 24 disks which results into a kernel panic and call trace in the messages file. The issue also reproducible when try to re-sync the disk after removing a disk. #mdadm --create /dev/md_d1 --metadata=1.0 --level=6 --raid-devices=24 /dev/sdd1 /dev/sdh1 /dev/sdl1 /dev/sdp1 /dev/sdt1 /dev/sdx1 /dev/sdab1 /dev/sdaf1 /dev/sdaj1 /dev/sdan1 /dev/sdar1 /dev/sdav1 /dev/sdc1 /dev/sdg1 /dev/sdk1 /dev/sdo1 /dev/sds1 /dev/sdw1 /dev/sdaa1 /dev/sdae1 /dev/sdai1 /dev/sdam1 /dev/sdaq1 /dev/sdau1 dmesg: Jan 27 17:05:14 sunstorage01 kernel: md: syncing RAID array md_d1 Jan 27 17:05:14 sunstorage01 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Jan 27 17:05:14 sunstorage01 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction. Jan 27 17:05:14 sunstorage01 kernel: md: using 128k window, over a total of 976759808 blocks. Jan 27 17:05:14 sunstorage01 kernel: md_d1: unknown partition table Jan 27 17:05:26 sunstorage01 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [md_d1_raid5:6923] Jan 27 17:05:26 sunstorage01 kernel: CPU 1: Jan 27 17:05:26 sunstorage01 kernel: Modules linked in: raid456 xor ipv6 xfrm_nalgo crypto_api mptctl autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp l2cap bluetooth sunrpc dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sg shpchp i2c_nforce2 mlx4_core forcedeth i2c_core pcspkr usb_storage mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Jan 27 17:05:26 sunstorage01 kernel: Pid: 6923, comm: md_d1_raid5 Not tainted 2.6.18-92.1.22.el5 #1 Jan 27 17:05:56 sunstorage01 kernel: RIP: 0010:[<ffffffff8000b5c7>] [<ffffffff8000b5c7>] memcmp+0x1a/0x22 Jan 27 17:05:56 sunstorage01 kernel: RSP: 0018:ffff811027d09cd8 EFLAGS: 00000206 Jan 27 17:05:56 sunstorage01 kernel: RAX: 0000000000000000 RBX: ffff81080b7c60b8 RCX: 0000000000000665 Jan 27 17:05:56 sunstorage01 kernel: RDX: 0000000000000000 RSI: ffff81080b7d899b RDI: ffff81081574c99b Jan 27 17:05:56 sunstorage01 kernel: RBP: ffff81080b7d6000 R08: ffff810001000810 R09: 0000000000001000 Jan 27 17:05:56 sunstorage01 kernel: R10: ffff81080b7d9000 R11: ffff81080b7d8000 R12: ffff81080b7d5000 Jan 27 17:05:56 sunstorage01 kernel: R13: ffff81080b7d4000 R14: ffff81080b7d3000 R15: ffff81080b7d2000 Jan 27 17:05:56 sunstorage01 kernel: FS: 00002b47d593bb40(0000) GS:ffff81011c9608c0(0000) knlGS:0000000000000000 Jan 27 17:05:56 sunstorage01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Jan 27 17:05:56 sunstorage01 kernel: CR2: 00000038a806abe0 CR3: 000000081aef4000 CR4: 00000000000006e0 Jan 27 17:05:56 sunstorage01 kernel: Jan 27 17:05:56 sunstorage01 kernel: Call Trace: Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff883f7d10>] :raid456:handle_stripe+0xe5b/0x24d4 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff800892a6>] __wake_up_common+0x3e/0x68 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8002e284>] __wake_up+0x38/0x4f Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8009dc54>] keventd_create_kthread+0x0/0xc4 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8009dc54>] keventd_create_kthread+0x0/0xc4 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff883f94d6>] :raid456:raid5d+0x14d/0x17b Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8003a940>] prepare_to_wait+0x34/0x5c Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff802030c7>] md_thread+0xf8/0x10e Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8009de6c>] autoremove_wake_function+0x0/0x2e Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff80202fcf>] md_thread+0x0/0x10e Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff80032569>] kthread+0xfe/0x132 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8009dc54>] keventd_create_kthread+0x0/0xc4 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8003246b>] kthread+0x0/0x132 Jan 27 17:05:56 sunstorage01 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jan 27 17:05:56 sunstorage01 kernel: How reproducible: Random Steps to Reproduce: As per customer comment most of the time issue reproducible while creating and resync of the devices. They simply use the command provided in Problem Desc. Actual results: Call trace visible with vmcore generation. Expected results: RAID creation and resync should go smoothly with no Call trace. Additional info: We tried reproducing the issue in test environment however no success. Vmcore was not uploaded to the ftp site by the customer and we have setup for the vmcore on a test system : vmcore --- IP :10.65.6.56 root/redhat #crash /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux vmcore-1887814 I am uploading the core to ftp site as well with name : vmcore-1887814.gz This event sent from IssueTracker by djeffery [Support Engineering Group] issue 268390
With large numbers of disks and raid6, it seems the raid5 worker thread is saturating the cpu. And there's no guarantee it will ever schedule in a timely manner. Instead, it's stuck in a loop where more work is continuously added as it finishes older work. As an example, in one case while debugging the issue over 1 million iterations of the loop in raid5d() occurred before the loop was finally broken. It took almost 3.5 minutes before finally leaving the raid5d() function.
Created attachment 344232 [details] add cond_resched to loop in function raid5d After adding a call to cond_resched() in raid5d() where no locks are held, I have not been able to reproduce the soft lockup. The raid5 worker thread still runs almost continuously, but will no longer monopolize a cpu as the resync thread feeds it requests.
in kernel-2.6.18-181.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html