Bug 787276

Summary: segfault of mdadm --monitor --scan
Product: Red Hat Enterprise Linux 6 Reporter: Konstantin Olchanski <olchansk>
Component: mdadmAssignee: Doug Ledford <dledford>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-03 19:03:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Konstantin Olchanski 2012-02-03 18:59:10 UTC
Raid array failed and I got no email notice from mdadm. Why? mdadm was not running. Attempt to start manually with "service mdmonitor start" fails with this in the syslog:
Feb  3 10:52:15 positron kernel: mdadm[18642]: segfault at 0 ip 0000000000421c76 sp 00007fffde4ca430 error 4 in mdadm[400000+61000]

Installed mdadm debuginfo package, run mdadm under gdb - see it crash on NULL pointer in mse->metadata_version. (see stack trace below).

Confirm that some arrays in /proc/mdstat do not report a metadata version number (see contents on /proc/mdstat below).

Confirm that all SL6.1 machines with no metadata reported do not have mdadm running.

This is VERY BAD because there will be no automatic notification of RAID failures, etc.

Confirm that freshly installed SL6.1 machines have metadata versions 1.0, 1.1 and 1.2 with mdadm running happily.

Also see the same bug filed and fixed in Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=698731

Attached below is the gdb output from mdadm and contents of /proc/mdstat.

Please fix or provide workaround (i.e. how to convert 0.9 array into 1.0 array). Thanks in advance,
K.O.


[root@positron ~]# gdb `which mdadm`
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /sbin/mdadm...Reading symbols from /usr/lib/debug/sbin/mdadm.debug...done.
done.
(gdb) run --monitor --scan
Starting program: /sbin/mdadm --monitor --scan
Detaching after fork from child process 21267.
Detaching after fork from child process 21278.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000421c76 in check_array (st=0x67ca40, mdstat=<value optimized out>, test=<value optimized out>, ainfo=0x7fffffffde80, increments=<value optimized out>)
    at Monitor.c:580
580             if (strncmp(mse->metadata_version, "external:", 9) == 0 &&
(gdb) where
#0  0x0000000000421c76 in check_array (st=0x67ca40, mdstat=<value optimized out>, test=<value optimized out>, ainfo=0x7fffffffde80, increments=<value optimized out>)
    at Monitor.c:580
#1  0x00000000004225d6 in Monitor (devlist=<value optimized out>, mailaddr=<value optimized out>, alert_cmd=<value optimized out>, period=1000, daemonise=6811760, 
    scan=<value optimized out>, oneshot=0, dosyslog=0, test=<value optimized out>, pidfile=0x0, increments=20, share=1) at Monitor.c:223
#2  0x0000000000403d9b in main (argc=<value optimized out>, argv=0x7fffffffe5e8) at mdadm.c:1600

(gdb) up
#1  0x00000000004225d6 in Monitor (devlist=<value optimized out>, mailaddr=<value optimized out>, alert_cmd=<value optimized out>, period=1000, daemonise=6811760, 
    scan=<value optimized out>, oneshot=0, dosyslog=0, test=<value optimized out>, pidfile=0x0, increments=20, share=1) at Monitor.c:223
223                             if (check_array(st, mdstat, test, &info, increments))

(gdb) p *mdstat
$1 = {dev = 0x67f0c0 "md127", devnum = 127, active = 1, level = 0x67f0e0 "raid1", pattern = 0x67f140 "_U", percent = -1, resync = 0, devcnt = 1, raid_disks = 4, 
  metadata_version = 0x0, members = 0x67f100, next = 0x67f1c0}
(gdb) p *mdstat->next
$2 = {dev = 0x67f210 "md2", devnum = 2147483647, active = 1, level = 0x67f230 "raid1", pattern = 0x67f2f0 "U_", percent = -1, resync = 0, devcnt = 2, raid_disks = 4, 
  metadata_version = 0x67f2d0 "1.0", members = 0x67f290, next = 0x67f310}
(gdb) p *mdstat->next->next
$3 = {dev = 0x67f030 "md1", devnum = 2147483647, active = 1, level = 0x67f050 "raid1", pattern = 0x67f3e0 "U_", percent = -1, resync = 0, devcnt = 2, raid_disks = 4, 
  metadata_version = 0x0, members = 0x67f3a0, next = 0x0}
(gdb) p *mdstat->next->next->next
Cannot access memory at address 0x0


[root@positron ~]# cat /proc/mdstat
Personalities : [raid1] 
md127 : active (auto-read-only) raid1 sdc1[1]
      40957568 blocks [2/1] [_U]
      bitmap: 1/157 pages [4KB], 128KB chunk

md2 : active raid1 sdb3[2] sdc3[1](F)
      40959928 blocks super 1.0 [2/1] [U_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sdb2[0] sdc2[2](F)
      32764480 blocks [2/1] [U_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

K.O.

Comment 2 Doug Ledford 2012-02-03 19:03:05 UTC
This has been fixed already as of rhel6.2.  Please update to the latest mdadm which resolves this issue.

Comment 3 Konstantin Olchanski 2012-02-03 19:07:04 UTC
I guess I should note the version of mdadm and the contents of /etc/mdadm.conf:

[root@positron ~]# rpm -q mdadm
mdadm-3.2.1-1.el6.x86_64

[root@positron ~]# cat /etc/mdadm.conf 
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=1674297a:39ef30ff:969902aa:3eab0b21
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=67358e91:01dc1f3f:07873abf:f749b94d
[root@positron ~]# 

K.O.

Comment 4 Konstantin Olchanski 2012-02-03 19:13:42 UTC
I confirm this problem does not exist in SL6.2:

[root@positron Packages]# rpm -q mdadm
mdadm-3.2.2-9.el6.x86_64

K.O.