Created attachment 361076 [details] lockdep warning log Description of problem: Shortly after booting a system using mdraid with external metadata support to access 2 BIOS RAID 10 sets sharing 4 disks, I get the attached locking inconsistency warning. This might be related to one of the sets being unclean and needing syncing (so mdmon is actively syncing the set). If I do something which causes a signigicant amount of disk IO while the sync is running, the kernel locks up, and I get hung / stuck process detected messages every 120 seconds, so this warning seems to be very real. Let me know if you want me to hook up a serial cable and catch the stuck process reports. This is with: kernel-2.6.31-2.fc12.i686.PAE And older kernels too.
Created attachment 361145 [details] Proposed patch: upgrade sysfs_open_dirent_lock to spin_lock_bh
The attached patch simply upgrades the lock. It makes sysfs_notify_dirent() more useful and is cleaner than adding logic to md to delay the notification to process context.
Neil already sent a fix for this [1] in response to bz#515471. [1]: http://marc.info/?l=linux-kernel&m=124953744023803&w=2 *** This bug has been marked as a duplicate of bug 515471 ***
Ok, I've build a kernel with this fix in and the lockdep report is gone, but I still get deadlocks with my 2 raid10 set setup, I'll attach dmesg output of a machine with all processes trying to use the raid sets hanging. Re-opening this one to track the deadlock case.
Created attachment 361300 [details] dmesg output of a machine with all processes trying to use the raid sets hanging
It looks like the processes are waiting for mdmon to write 'active' to /sys/block/md*/md/array_state. Is mdmon still running at this point? Can you dump array_state to confirm that we are stuck at 'write-pending'? Finally can you say a bit more about what userspace is doing at this point, in the log the arrays are bouncing up and down (starting/stopping)?
(In reply to comment #6) > It looks like the processes are waiting for mdmon to write 'active' to > /sys/block/md*/md/array_state. Is mdmon still running at this point? Can you > dump array_state to confirm that we are stuck at 'write-pending'? When I hit this again (if I hit this again) I'll be sure to try and gather all this info. > Finally can > you say a bit more about what userspace is doing at this point, in the log the > arrays are bouncing up and down (starting/stopping)? That is correct, the mdraid container code is shared with normal mdraid handling code in anaconda, and during install everything first gets scanned (so started) and then teared down again, so that for example partitions used as part of a native mdraid set can be re-purposed to hold a pv or whatever. So the arrays are stopped / started several times. Thanks for the hint that this might be mdmon though, this has helped me to fix a big problem with my test machine no longer booting at all, which was caused by an mdmon segfault inside the initrd, I've written a patch fixing this, see bug 523860.
Created attachment 361436 [details] Picture of another mdraid related stuck task A slightly different call trace from another stuck mdraid task, this time during the initrd. Note this initrd still has the crashy mdmon, I was trying to boot the machine to regenerate the initrd and it hung. So this could very well be another mdmon no longer running case. I'll attach another call trace picture from the same boot, which is yet again slightly different.
Created attachment 361437 [details] Picture of another mdraid related stuck task (2)
I seem to no longer be seeing this now I've managed to keep mdmon from crashing. Adjusting summary.
Ok, I can no longer reproduce this, with the necessary patches in place to properly handle mdmon handover from initrd to the running system and to not kill mdmon on reboot / halt, closing.