Bug 789898 - mdadm silently fails to add devices to a degraded array with bitmaps
Summary: mdadm silently fails to add devices to a degraded array with bitmaps
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 16
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Jes Sorensen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 791189
TreeView+ depends on / blocked
 
Reported: 2012-02-13 08:32 UTC by Alexander Murashkin
Modified: 2012-03-10 21:51 UTC (History)
4 users (show)

Fixed In Version: mdadm-3.2.3-6.fc16
Clone Of:
: 791189 (view as bug list)
Environment:
Last Closed: 2012-03-10 21:51:43 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Fixes buffer alignment in super1.c (425 bytes, patch)
2012-02-14 08:15 UTC, Alexander Murashkin
no flags Details | Diff

Description Alexander Murashkin 2012-02-13 08:32:23 UTC
Description of problem:

mdadm silently fails to add devices to a degraded array with bitmaps.

# mdadm /dev/md12 --add /dev/sdd2; echo $?
1

So the array is running degraded and there is no way to add devices.

There is a bug in mdadm write_bitmap1 function that tries to write to a device opened with O_DIRECT flag using non-aligned buffer. The write fails and mdadm exits with code 1 without any messages.

See detailed bug description at the bottom.

Version-Release number of selected component (if applicable):

mdadm-3.2.3-3.fc16.x86_64

How reproducible:

Difficult to tell as the error is random by nature. The following will probably reproduce

Steps to Reproduce:

1. Create raid10 array /dev/md12 from 4 identical partitions

mdadm --create /dev/md12 --raid-devices=4 --chunk=512 --level=raid10 --layout=o2 --bitmap=internal --name=12 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2

2. Write something to /dev/md12

dd if=/dev/urandom bs=1m count=1 of=/dev/md12
 
3. Stop raid10 array

mdadm --stop /dev/md12

4. (Simulating failure) Disconnect /dev/sdd /dev/sde devices (you might need to reboot)

5. Assemble degraded raid10 array (unless it is already assembled after the reboot)

mdadm --assemble /dev/md12 /dev/sdc2 /dev/sdf2

6.  Write something to /dev/md12

dd if=/dev/urandom bs=1m count=1 of=/dev/md12

7. Stop raid10 array

mdadm --stop /dev/md12

8. Connect /dev/sdd /dev/sde devices (you might need to reboot)

9. Assemble degraded raid10 array (unless it is already assembled after the reboot)

10. Observe that the array consist of only 2 disks

mdadm --query --detail /dev/md12
       0       8       34        0      active sync   /dev/sdc2
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       82        3      active sync   /dev/sdf2

11. (simulating new disk) Write random data to /dev/sdd2 partition beginning

dd if=/dev/urandom  bs=16k count=1 of=/dev/sdd2

12. Try to add sdd2 to the array

mdadm /dev/md12 --add /dev/sdd2; echo $?
1

13. Observe that the array still consist of only 2 disks

mdadm --query --detail /dev/md12
--- same as before --

14. Zero /dev/sdd2 partition beginning

mdadm --zero-superblock /dev/sdd2

15. Try to add sdd2 to the array

mdadm /dev/md12 --add /dev/sdd2; echo $?
1

16. Observe that the array still consist of only 2 disks

mdadm --query --detail /dev/md12
--- same as before --

Actual results:

In steps 12 and 15 mdadm silently fails with exit code 1. In steps 13 and 16 the array still consists of only 2 disks. 

       0       8       34        0      active sync   /dev/sdc2
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       82        3      active sync   /dev/sdf2

Expected results:

In steps 12 and 15 mdadm is supposed to add the drive, print something good, and exit with code 0. In 13 and 16 the array is supposed to consist of 3 drives. For example

       0       8       34        0      active sync   /dev/sdc2
       1       8       65        1      blablabla     /dev/sdd2
       2       0        0        2      removed
       3       8       82        3      active sync   /dev/sdf2

Also it is also expected that in a case of I/O error some message is printed (not just silent failure).

Additional info:

function Manage_subdevs         // Manage.c

       fd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT); // line 916
       write_init_super1([...,fd,...]) 

function write_init_super1(...) // super1.c
       write_bitmap1(...,fd)    // line 1184
       
function write_bitmap1(...,fd)  // super1.c
       ...
       char buf[4096];          // line 1639 <-- NOT ALIGNED
       ...
       n = awrite(fd, buf, n);  // line 1654, n=1024, buf=0x7fffffffc570

function awrite(fd, buf, len)   // super1.c

       if (ioctl(fd, BLKSSZGET, &bsize) != 0 || bsize <= len) // line 177
                                 // bsize=512
          write(fd, buf, len);   // line 179, len=1024, buf=0x7fffffffc570
          // returns -1, errno = EINVAL

After getting -1 from awrite all functions return with one or another error code until exit(1) happens. No error messages are printed. Note also that close(fd) is called two times (see below). 

------- strace --------

open("/dev/sdd2", O_RDWR|O_EXCL|O_DIRECT) = 4
...
fsync(4)                                = 0
lseek(4, 8192, SEEK_SET)                = 8192
ioctl(4, BLKSSZGET, 512)                = 0
write(4, "bitm\4\0\0\0-\236'A\273!"..., 1024) = -1 EINVAL (Invalid argument)
fsync(4)                                = 0
close(4)                                = 0
close(4)                                = -1 EBADF (Bad file descriptor)

See also unnecessary close above.

-------- gdb ---------

(gdb) info stack
#0  awrite (fd=13, buf=0x7fffffffc570, len=1024) at super1.c:167
#1  0x000000000043054d in write_bitmap1 (st=<optimized out>, fd=13) at super1.c:1654
#2  0x000000000043041f in write_init_super1 (st=0x86d420) at super1.c:1184
#3  0x000000000041231d in Manage_subdevs (devname=0x7fffffffe56f "/dev/md12", fd=12, devlist=<optimized out>,
verbose=1, test=0, update=0x0, force=0) at Manage.c:916
#4  0x0000000000404e1a in main (argc=<optimized out>, argv=0x7fffffffe298) at mdadm.c:1228

(gdb) p bsize
$11 = 512

(gdb) n
write_bitmap1 (st=<optimized out>, fd=13) at super1.c:1655
(gdb) p n
$12 = -1
(gdb) p errno
$13 = 22

Comment 1 Alexander Murashkin 2012-02-14 08:15:33 UTC
Created attachment 561801 [details]
Fixes buffer alignment in super1.c

The patch fixes specific issue - write_bitmap1 buffer alignment in super1.c. It is possible that there are other unaligned buffers. The patch does not fix silent behaviour of mdadm when there are I/O errors while writing a superblock.

Comment 2 Jes Sorensen 2012-02-14 08:20:33 UTC
Thanks for the patch! I was going to look at it today, but you beat me to it.

However, rather than hard-coding 4096 as the buffer size, it should use
getpagesize() to determine the size of the buffer.

It also needs to be posted upstream.

Do you want to go ahead with it, or do you want me to respin it for upstream?

Cheers,
Jes

Comment 3 Jes Sorensen 2012-02-14 10:33:26 UTC
Ok so much for replying before my third coffee - it doesn't require page
size alignment, just blk size alignment. 4096 bytes ought to suffice then.

Comment 4 Fedora Update System 2012-02-16 15:07:34 UTC
mdadm-3.2.3-5.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/mdadm-3.2.3-5.fc16

Comment 5 Jes Sorensen 2012-02-16 15:14:17 UTC
Alexander,

I have pushed Neil's fix for this bug into mdadm-3.2.3-5 - I would appreciate
it if you could test it and report back if it fixes the problem for you.

Thanks,
Jes

Comment 6 Fedora Update System 2012-02-17 00:56:05 UTC
Package mdadm-3.2.3-5.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing mdadm-3.2.3-5.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-1862/mdadm-3.2.3-5.fc16
then log in and leave karma (feedback).

Comment 7 Alexander Murashkin 2012-02-20 05:01:52 UTC
I have tested mdadm-3.2.3-5.fc16 sucessfully.

> It also needs to be posted upstream.
> Do you want to go ahead with it, or do you want me to respin it for upstream?

Please take care of the upstream.

Comment 8 Fedora Update System 2012-02-23 11:09:26 UTC
mdadm-3.2.3-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/mdadm-3.2.3-6.fc16

Comment 9 Fedora Update System 2012-03-10 21:51:43 UTC
mdadm-3.2.3-6.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.