Description of problem: mdadm silently fails to add devices to a degraded array with bitmaps. # mdadm /dev/md12 --add /dev/sdd2; echo $? 1 So the array is running degraded and there is no way to add devices. There is a bug in mdadm write_bitmap1 function that tries to write to a device opened with O_DIRECT flag using non-aligned buffer. The write fails and mdadm exits with code 1 without any messages. See detailed bug description at the bottom. Version-Release number of selected component (if applicable): mdadm-3.2.3-3.fc16.x86_64 How reproducible: Difficult to tell as the error is random by nature. The following will probably reproduce Steps to Reproduce: 1. Create raid10 array /dev/md12 from 4 identical partitions mdadm --create /dev/md12 --raid-devices=4 --chunk=512 --level=raid10 --layout=o2 --bitmap=internal --name=12 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2 2. Write something to /dev/md12 dd if=/dev/urandom bs=1m count=1 of=/dev/md12 3. Stop raid10 array mdadm --stop /dev/md12 4. (Simulating failure) Disconnect /dev/sdd /dev/sde devices (you might need to reboot) 5. Assemble degraded raid10 array (unless it is already assembled after the reboot) mdadm --assemble /dev/md12 /dev/sdc2 /dev/sdf2 6. Write something to /dev/md12 dd if=/dev/urandom bs=1m count=1 of=/dev/md12 7. Stop raid10 array mdadm --stop /dev/md12 8. Connect /dev/sdd /dev/sde devices (you might need to reboot) 9. Assemble degraded raid10 array (unless it is already assembled after the reboot) 10. Observe that the array consist of only 2 disks mdadm --query --detail /dev/md12 0 8 34 0 active sync /dev/sdc2 1 0 0 1 removed 2 0 0 2 removed 3 8 82 3 active sync /dev/sdf2 11. (simulating new disk) Write random data to /dev/sdd2 partition beginning dd if=/dev/urandom bs=16k count=1 of=/dev/sdd2 12. Try to add sdd2 to the array mdadm /dev/md12 --add /dev/sdd2; echo $? 1 13. Observe that the array still consist of only 2 disks mdadm --query --detail /dev/md12 --- same as before -- 14. Zero /dev/sdd2 partition beginning mdadm --zero-superblock /dev/sdd2 15. Try to add sdd2 to the array mdadm /dev/md12 --add /dev/sdd2; echo $? 1 16. Observe that the array still consist of only 2 disks mdadm --query --detail /dev/md12 --- same as before -- Actual results: In steps 12 and 15 mdadm silently fails with exit code 1. In steps 13 and 16 the array still consists of only 2 disks. 0 8 34 0 active sync /dev/sdc2 1 0 0 1 removed 2 0 0 2 removed 3 8 82 3 active sync /dev/sdf2 Expected results: In steps 12 and 15 mdadm is supposed to add the drive, print something good, and exit with code 0. In 13 and 16 the array is supposed to consist of 3 drives. For example 0 8 34 0 active sync /dev/sdc2 1 8 65 1 blablabla /dev/sdd2 2 0 0 2 removed 3 8 82 3 active sync /dev/sdf2 Also it is also expected that in a case of I/O error some message is printed (not just silent failure). Additional info: function Manage_subdevs // Manage.c fd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT); // line 916 write_init_super1([...,fd,...]) function write_init_super1(...) // super1.c write_bitmap1(...,fd) // line 1184 function write_bitmap1(...,fd) // super1.c ... char buf[4096]; // line 1639 <-- NOT ALIGNED ... n = awrite(fd, buf, n); // line 1654, n=1024, buf=0x7fffffffc570 function awrite(fd, buf, len) // super1.c if (ioctl(fd, BLKSSZGET, &bsize) != 0 || bsize <= len) // line 177 // bsize=512 write(fd, buf, len); // line 179, len=1024, buf=0x7fffffffc570 // returns -1, errno = EINVAL After getting -1 from awrite all functions return with one or another error code until exit(1) happens. No error messages are printed. Note also that close(fd) is called two times (see below). ------- strace -------- open("/dev/sdd2", O_RDWR|O_EXCL|O_DIRECT) = 4 ... fsync(4) = 0 lseek(4, 8192, SEEK_SET) = 8192 ioctl(4, BLKSSZGET, 512) = 0 write(4, "bitm\4\0\0\0-\236'A\273!"..., 1024) = -1 EINVAL (Invalid argument) fsync(4) = 0 close(4) = 0 close(4) = -1 EBADF (Bad file descriptor) See also unnecessary close above. -------- gdb --------- (gdb) info stack #0 awrite (fd=13, buf=0x7fffffffc570, len=1024) at super1.c:167 #1 0x000000000043054d in write_bitmap1 (st=<optimized out>, fd=13) at super1.c:1654 #2 0x000000000043041f in write_init_super1 (st=0x86d420) at super1.c:1184 #3 0x000000000041231d in Manage_subdevs (devname=0x7fffffffe56f "/dev/md12", fd=12, devlist=<optimized out>, verbose=1, test=0, update=0x0, force=0) at Manage.c:916 #4 0x0000000000404e1a in main (argc=<optimized out>, argv=0x7fffffffe298) at mdadm.c:1228 (gdb) p bsize $11 = 512 (gdb) n write_bitmap1 (st=<optimized out>, fd=13) at super1.c:1655 (gdb) p n $12 = -1 (gdb) p errno $13 = 22
Created attachment 561801 [details] Fixes buffer alignment in super1.c The patch fixes specific issue - write_bitmap1 buffer alignment in super1.c. It is possible that there are other unaligned buffers. The patch does not fix silent behaviour of mdadm when there are I/O errors while writing a superblock.
Thanks for the patch! I was going to look at it today, but you beat me to it. However, rather than hard-coding 4096 as the buffer size, it should use getpagesize() to determine the size of the buffer. It also needs to be posted upstream. Do you want to go ahead with it, or do you want me to respin it for upstream? Cheers, Jes
Ok so much for replying before my third coffee - it doesn't require page size alignment, just blk size alignment. 4096 bytes ought to suffice then.
mdadm-3.2.3-5.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/mdadm-3.2.3-5.fc16
Alexander, I have pushed Neil's fix for this bug into mdadm-3.2.3-5 - I would appreciate it if you could test it and report back if it fixes the problem for you. Thanks, Jes
Package mdadm-3.2.3-5.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing mdadm-3.2.3-5.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-1862/mdadm-3.2.3-5.fc16 then log in and leave karma (feedback).
I have tested mdadm-3.2.3-5.fc16 sucessfully. > It also needs to be posted upstream. > Do you want to go ahead with it, or do you want me to respin it for upstream? Please take care of the upstream.
mdadm-3.2.3-6.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/mdadm-3.2.3-6.fc16
mdadm-3.2.3-6.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.