Bug 791189

Summary: mdadm silently fails to add devices to a degraded array with bitmaps
Product: [Fedora] Fedora Reporter: Jes Sorensen <Jes.Sorensen>
Component: mdadmAssignee: Jes Sorensen <Jes.Sorensen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 15CC: agk, alexandermurashkin, dledford, Jes.Sorensen, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: mdadm-3.2.3-6.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 789898 Environment:
Last Closed: 2012-03-10 21:55:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 789898    
Bug Blocks:    

Description Jes Sorensen 2012-02-16 12:36:21 UTC
+++ This bug was initially created as a clone of Bug #789898 +++

Description of problem:

mdadm silently fails to add devices to a degraded array with bitmaps.

# mdadm /dev/md12 --add /dev/sdd2; echo $?
1

So the array is running degraded and there is no way to add devices.

There is a bug in mdadm write_bitmap1 function that tries to write to a device opened with O_DIRECT flag using non-aligned buffer. The write fails and mdadm exits with code 1 without any messages.

See detailed bug description at the bottom.

Version-Release number of selected component (if applicable):

mdadm-3.2.3-3.fc16.x86_64

How reproducible:

Difficult to tell as the error is random by nature. The following will probably reproduce

Steps to Reproduce:

1. Create raid10 array /dev/md12 from 4 identical partitions

mdadm --create /dev/md12 --raid-devices=4 --chunk=512 --level=raid10 --layout=o2 --bitmap=internal --name=12 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2

2. Write something to /dev/md12

dd if=/dev/urandom bs=1m count=1 of=/dev/md12
 
3. Stop raid10 array

mdadm --stop /dev/md12

4. (Simulating failure) Disconnect /dev/sdd /dev/sde devices (you might need to reboot)

5. Assemble degraded raid10 array (unless it is already assembled after the reboot)

mdadm --assemble /dev/md12 /dev/sdc2 /dev/sdf2

6.  Write something to /dev/md12

dd if=/dev/urandom bs=1m count=1 of=/dev/md12

7. Stop raid10 array

mdadm --stop /dev/md12

8. Connect /dev/sdd /dev/sde devices (you might need to reboot)

9. Assemble degraded raid10 array (unless it is already assembled after the reboot)

10. Observe that the array consist of only 2 disks

mdadm --query --detail /dev/md12
       0       8       34        0      active sync   /dev/sdc2
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       82        3      active sync   /dev/sdf2

11. (simulating new disk) Write random data to /dev/sdd2 partition beginning

dd if=/dev/urandom  bs=16k count=1 of=/dev/sdd2

12. Try to add sdd2 to the array

mdadm /dev/md12 --add /dev/sdd2; echo $?
1

13. Observe that the array still consist of only 2 disks

mdadm --query --detail /dev/md12
--- same as before --

14. Zero /dev/sdd2 partition beginning

mdadm --zero-superblock /dev/sdd2

15. Try to add sdd2 to the array

mdadm /dev/md12 --add /dev/sdd2; echo $?
1

16. Observe that the array still consist of only 2 disks

mdadm --query --detail /dev/md12
--- same as before --

Actual results:

In steps 12 and 15 mdadm silently fails with exit code 1. In steps 13 and 16 the array still consists of only 2 disks. 

       0       8       34        0      active sync   /dev/sdc2
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       82        3      active sync   /dev/sdf2

Expected results:

In steps 12 and 15 mdadm is supposed to add the drive, print something good, and exit with code 0. In 13 and 16 the array is supposed to consist of 3 drives. For example

       0       8       34        0      active sync   /dev/sdc2
       1       8       65        1      blablabla     /dev/sdd2
       2       0        0        2      removed
       3       8       82        3      active sync   /dev/sdf2

Also it is also expected that in a case of I/O error some message is printed (not just silent failure).

Additional info:

function Manage_subdevs         // Manage.c

       fd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT); // line 916
       write_init_super1([...,fd,...]) 

function write_init_super1(...) // super1.c
       write_bitmap1(...,fd)    // line 1184
       
function write_bitmap1(...,fd)  // super1.c
       ...
       char buf[4096];          // line 1639 <-- NOT ALIGNED
       ...
       n = awrite(fd, buf, n);  // line 1654, n=1024, buf=0x7fffffffc570

function awrite(fd, buf, len)   // super1.c

       if (ioctl(fd, BLKSSZGET, &bsize) != 0 || bsize <= len) // line 177
                                 // bsize=512
          write(fd, buf, len);   // line 179, len=1024, buf=0x7fffffffc570
          // returns -1, errno = EINVAL

After getting -1 from awrite all functions return with one or another error code until exit(1) happens. No error messages are printed. Note also that close(fd) is called two times (see below). 

------- strace --------

open("/dev/sdd2", O_RDWR|O_EXCL|O_DIRECT) = 4
...
fsync(4)                                = 0
lseek(4, 8192, SEEK_SET)                = 8192
ioctl(4, BLKSSZGET, 512)                = 0
write(4, "bitm\4\0\0\0-\236'A\273!"..., 1024) = -1 EINVAL (Invalid argument)
fsync(4)                                = 0
close(4)                                = 0
close(4)                                = -1 EBADF (Bad file descriptor)

See also unnecessary close above.

-------- gdb ---------

(gdb) info stack
#0  awrite (fd=13, buf=0x7fffffffc570, len=1024) at super1.c:167
#1  0x000000000043054d in write_bitmap1 (st=<optimized out>, fd=13) at super1.c:1654
#2  0x000000000043041f in write_init_super1 (st=0x86d420) at super1.c:1184
#3  0x000000000041231d in Manage_subdevs (devname=0x7fffffffe56f "/dev/md12", fd=12, devlist=<optimized out>,
verbose=1, test=0, update=0x0, force=0) at Manage.c:916
#4  0x0000000000404e1a in main (argc=<optimized out>, argv=0x7fffffffe298) at mdadm.c:1228

(gdb) p bsize
$11 = 512

(gdb) n
write_bitmap1 (st=<optimized out>, fd=13) at super1.c:1655
(gdb) p n
$12 = -1
(gdb) p errno
$13 = 22

--- Additional comment from alexandermurashkin on 2012-02-14 03:15:33 EST ---

Created attachment 561801 [details]
Fixes buffer alignment in super1.c

The patch fixes specific issue - write_bitmap1 buffer alignment in super1.c. It is possible that there are other unaligned buffers. The patch does not fix silent behaviour of mdadm when there are I/O errors while writing a superblock.

--- Additional comment from Jes.Sorensen on 2012-02-14 03:20:33 EST ---

Thanks for the patch! I was going to look at it today, but you beat me to it.

However, rather than hard-coding 4096 as the buffer size, it should use
getpagesize() to determine the size of the buffer.

It also needs to be posted upstream.

Do you want to go ahead with it, or do you want me to respin it for upstream?

Cheers,
Jes

--- Additional comment from Jes.Sorensen on 2012-02-14 05:33:26 EST ---

Ok so much for replying before my third coffee - it doesn't require page
size alignment, just blk size alignment. 4096 bytes ought to suffice then.

Comment 1 Fedora Update System 2012-02-16 15:12:51 UTC
mdadm-3.2.3-5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/mdadm-3.2.3-5.fc15

Comment 2 Fedora Update System 2012-02-17 00:52:30 UTC
Package mdadm-3.2.3-5.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing mdadm-3.2.3-5.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-1843/mdadm-3.2.3-5.fc15
then log in and leave karma (feedback).

Comment 3 Fedora Update System 2012-02-23 11:12:46 UTC
mdadm-3.2.3-6.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/mdadm-3.2.3-6.fc15

Comment 4 Fedora Update System 2012-03-10 21:55:28 UTC
mdadm-3.2.3-6.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.