Bug 600900
Summary: | Hot-plugable RAID components sometimes not assembled correctly | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Piergiorgio Sartor <piergiorgio.sartor> | ||||||
Component: | mdadm | Assignee: | Doug Ledford <dledford> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 13 | CC: | dledford, mschmidt, rmy | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | mdadm-3.1.3-0.git20100804.2.fc13 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 616596 (view as bug list) | Environment: | |||||||
Last Closed: | 2010-12-07 20:12:56 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Piergiorgio Sartor
2010-06-06 15:30:38 UTC
I'm seeing something similar, but with ATA devices not USB. I have ten partitions across three ATA drives that are combined into five RAID1 volumes. Here's what they look like in F12: Personalities : [raid1] md123 : active raid1 sda5[0] sdb5[1] 31463168 blocks [2/2] [UU] md124 : active raid1 sda6[0] sdb6[1] 30033408 blocks [2/2] [UU] md125 : active raid1 sda7[0] sdb7[1] 30796480 blocks [2/2] [UU] md126 : active raid1 sda9[0] sdc10[1] 41953600 blocks [2/2] [UU] md127 : active raid1 sdc12[0] sda8[1] 30788416 blocks [2/2] [UU] unused devices: <none> I installed F13 onto the partitions that used to hold F11 but, as is my custom, didn't tell anaconda what to do with the RAID volumes. Later I added them to fstab in F13. Initially there were problems that I took to be due to the line 'AUTO +imsm +1.x -all' that anaconda had put in mdadm.conf. All my RAID partitions have 0.9 metadata. I commented out the AUTO line and put in ARRAY lines specifying the UUIDs of the RAID devices. Now I find that more often than not F13 fails to correctly assemble the arrays. F12 always succeeds. In six boots of F13 the arrays were only properly built once. The failures are all different. Here's one example: Personalities : [raid1] md127 : active (auto-read-only) raid1 sda8[1] 30788416 blocks [2/1] [_U] md125 : active raid1 sda7[0] sdb7[1] 30796480 blocks [2/2] [UU] md123 : inactive sda5[0](S) 31463168 blocks md126 : active (auto-read-only) raid1 sdc10[1] 41953600 blocks [2/1] [_U] md124 : active raid1 sdb6[1] sda6[0] 30033408 blocks [2/2] [UU] unused devices: <none> I'll attach some more information in case anyone can see a pattern in this. I certainly can't. Created attachment 424916 [details]
dmesg and mdstat from six boots of F13
I seem to have got this working more reliably by removing rd_NO_MD from the kernel line in grub.conf. At least, I've been able to boot Fedora 13 five times now and the RAID arrays have been assembled correctly every time. Without rd_NO_MD the arrays are assembled earlier in the boot process, though I don't know why that would make any difference. @Ron: The difference you are seeing is that earlier in the boot process udev is likely processing disk add events sequentially instead of in parallel. Evidently there is a race condition when devices are added in parallel. OK, after some code inspection, I've found the race. Specifically, if two devices belonging to the same array are assembled in parallel, then if the array is not yet listed in the md-device-map file, each parallel tries to open a lock file, then attempts an exclusive lock on the lock file. One process gets it, the other waits. The process that got the lock then adds the array and calls map_update to write out the new map entry. Finally it calls unlock on the existing file, then it unlinks the lock file. The problem here is that if another instance was already waiting on the lock, it doesn't care that the file was unlinked and gets a new lock on an unlinked file, while a totally different instance of mdadm creates a new lock file and locks the new lock file, resulting in two instances having exclusive locks on two different lock files and being allowed to actually run in parallel, resulting in this problem. My solution is to change the locking mechanism to pass the flags O_CREAT and O_EXCL to the open command, which will fail the open if it is not the process that created the file. For as long as we fail due to the file already existing, we keep trying to open the file. Once we manage to create the file, then we already have a lock and are free to run. The fix for this will be in the next mdadm update (mdadm-3.1.3-0.git07202010.1 or later). mdadm-3.1.3-0.git20100722.1.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100722.1.fc13 I reinstated rd_NO_MD in grub.conf and installed mdadm-3.1.3-0.git20100722.2.fc13.i686.rpm. It doesn't seem to help, though. In five boots the arrays weren't properly built once. And mdadm segfaults, which probably isn't good. Created attachment 433800 [details]
Another five boot attempts.
Hi all, re-considering this bug, I must say that maybe it does not belong to "mdadm", but rather to "udev". Let's consider the following script: for i in /dev/sd* do mdadm -I $i & done I guess, if I complained it does not work, the answer would have been something like "remove the '&'"... Nevertheless, "udev" does the same and nobody complains (I understood). Is there any way to tell "udev" to serialize the operations under certain conditions? Or for certain rules? IMHO that would be better as solution, or not? Thanks, bye, pg @Ron: In my testing, I was able to reliably make it fail every time using the below reproducer, and after the package I listed, it never failed once: for i in 0 2 4 6 8 10 12 14; do dd if=/dev/zero bs=1024k count=100 of=/tmp/block$i dd if=/dev/zero bs=1024k count=100 of=/tmp/block$[ $i + 1 ] losetup /dev/loop$i /tmp/block$i losetup /dev/loop$[ $i + 1] /tmp/block$p $i + 1 ] mdadm -C /dev/md/test$i -l1 -n2 --name=test$i /tmp/loop$i /tmp/loop$[ $i + 1 ] done mdadm -S /dev/md/test* for i in /dev/loop{0..15}; do mdadm -I $i & done As far as your current issue, it is *definitely* something different (but that needs figured out nonetheless). You will note that in *none* of your original dmesg outputs did mdadm segfault, it simply didn't add all the disks (which is the problem I fixed). In the last 5 dmesg outputs, all of the failures were the result of segfaults in mdadm. The race condition I fixed is gone, but you are being effected by something else now. I'll actually need a new bug for tracking the new problem if you could open one please. @Piergiorgio: no, the answer would be that it should work, and the latest mdadm fixes the problem you depict. mdadm-3.1.3-0.git20100722.2.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100722.2.fc13 Hi, I tried the latest mdadm from update-testing and I'm sorry to inform you it does not work. Differently than before, I did not get "multiple" md devices, but I got around 15 "mdadm -I ..." hanging, using the maximum CPU available. After killing these mdadm and removing the incomplete md devices, I was able to assemble the arrays, but not to stop them smoothly. Specifically, trying "mdadm --stop /dev/md/12X" hangs, but when interrupted (ctrl-C or kill) the md device results removed. There are also some udevd errors, like: ... ... udevd[669]: worker [20022] failed while handling '/devices/pci0000:00/0000:00:0b.1/usb1/1-7/1-7.3/1-7.3.4/1-7.3.4:1.0/host12/target12:0:0/12:0:0:0/block/sdi/sdi3 ... ... udevd-work[24331]: '/sbin/mdadm -I /dev/sdi4' unexpected exit with status 0x000f ... Hope this helps. I guess the status of this bug should be changed to something else, now. Hope this helps, bye, pg (In reply to comment #11) > Differently than before, I did not get "multiple" md devices, but I got around > 15 "mdadm -I ..." hanging, using the maximum CPU available. Probably related to bug 621524. mdadm-3.1.3-0.git20100804.2.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13 mdadm-3.1.3-0.git20100804.2.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc12 mdadm-3.1.3-0.git20100804.2.fc14 has been submitted as an update for Fedora 14. http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc14 (In reply to comment #13) > mdadm-3.1.3-0.git20100804.2.fc13 has been submitted as an update for Fedora 13. > http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13 Didn't I mentioned this does not work? Actually, this version is a regression in comparison with the actual one, at least for my specific setup. Would it be possible to have this update pulled back? Thanks, bye, pg No, you mentioned that 20100722.2 did not work for you. This is for version 20100804.1, which was just built yesterday, so unless you downloeded it directly from koji, I'm positive you haven't tested this version yet. Oh, sorry then, I mis-read the git date. Then I'll try this one (maybe this weekend). Thanks! bye, pg mdadm-3.1.3-0.git20100804.2.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc12 mdadm-3.1.3-0.git20100804.2.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc13 Hi again, I tried mdadm-3.1.3-0.git20100804.2.fc13 and it seems to work. The arrays are properly assembled on hot plug, without partial duplications. Furthermore, the "spinning" issue (15 mdadm hanging) seems also solved. There is still a strange catch. Once the arrays are auto-assembled, some operations fail. For example "mdadm --grow /dev/md121 --bitmap=none" return "mdadm: failed to remove internal bitmap." In the logs ("dmesg" or "/var/log/messages") is reported "md: couldn't update array info. -16" If I stop the arrays and restart them manually, then the above operations work. Any suggestions? Thanks, bye, pg mdadm-3.1.3-0.git20100804.2.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mdadm'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/mdadm-3.1.3-0.git20100804.2.fc14 Hi, about comment #21, it seems the arrays are assembled "auto-read-only". Is this expected? Or there is something that needs to be tuned? Thanks, bye, pg mdadm-3.1.3-0.git20100804.2.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. mdadm-3.1.3-0.git20100804.2.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. |