Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 787276

Summary:	segfault of mdadm --monitor --scan
Product:	Red Hat Enterprise Linux 6	Reporter:	Konstantin Olchanski <olchansk>
Component:	mdadm	Assignee:	Doug Ledford <dledford>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.1
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-02-03 19:03:05 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Konstantin Olchanski 2012-02-03 18:59:10 UTC

Raid array failed and I got no email notice from mdadm. Why? mdadm was not running. Attempt to start manually with "service mdmonitor start" fails with this in the syslog:
Feb  3 10:52:15 positron kernel: mdadm[18642]: segfault at 0 ip 0000000000421c76 sp 00007fffde4ca430 error 4 in mdadm[400000+61000]

Installed mdadm debuginfo package, run mdadm under gdb - see it crash on NULL pointer in mse->metadata_version. (see stack trace below).

Confirm that some arrays in /proc/mdstat do not report a metadata version number (see contents on /proc/mdstat below).

Confirm that all SL6.1 machines with no metadata reported do not have mdadm running.

This is VERY BAD because there will be no automatic notification of RAID failures, etc.

Confirm that freshly installed SL6.1 machines have metadata versions 1.0, 1.1 and 1.2 with mdadm running happily.

Also see the same bug filed and fixed in Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=698731

Attached below is the gdb output from mdadm and contents of /proc/mdstat.

Please fix or provide workaround (i.e. how to convert 0.9 array into 1.0 array). Thanks in advance,
K.O.


[root@positron ~]# gdb `which mdadm`
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /sbin/mdadm...Reading symbols from /usr/lib/debug/sbin/mdadm.debug...done.
done.
(gdb) run --monitor --scan
Starting program: /sbin/mdadm --monitor --scan
Detaching after fork from child process 21267.
Detaching after fork from child process 21278.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000421c76 in check_array (st=0x67ca40, mdstat=<value optimized out>, test=<value optimized out>, ainfo=0x7fffffffde80, increments=<value optimized out>)
    at Monitor.c:580
580             if (strncmp(mse->metadata_version, "external:", 9) == 0 &&
(gdb) where
#0  0x0000000000421c76 in check_array (st=0x67ca40, mdstat=<value optimized out>, test=<value optimized out>, ainfo=0x7fffffffde80, increments=<value optimized out>)
    at Monitor.c:580
#1  0x00000000004225d6 in Monitor (devlist=<value optimized out>, mailaddr=<value optimized out>, alert_cmd=<value optimized out>, period=1000, daemonise=6811760, 
    scan=<value optimized out>, oneshot=0, dosyslog=0, test=<value optimized out>, pidfile=0x0, increments=20, share=1) at Monitor.c:223
#2  0x0000000000403d9b in main (argc=<value optimized out>, argv=0x7fffffffe5e8) at mdadm.c:1600

(gdb) up
#1  0x00000000004225d6 in Monitor (devlist=<value optimized out>, mailaddr=<value optimized out>, alert_cmd=<value optimized out>, period=1000, daemonise=6811760, 
    scan=<value optimized out>, oneshot=0, dosyslog=0, test=<value optimized out>, pidfile=0x0, increments=20, share=1) at Monitor.c:223
223                             if (check_array(st, mdstat, test, &info, increments))

(gdb) p *mdstat
$1 = {dev = 0x67f0c0 "md127", devnum = 127, active = 1, level = 0x67f0e0 "raid1", pattern = 0x67f140 "_U", percent = -1, resync = 0, devcnt = 1, raid_disks = 4, 
  metadata_version = 0x0, members = 0x67f100, next = 0x67f1c0}
(gdb) p *mdstat->next
$2 = {dev = 0x67f210 "md2", devnum = 2147483647, active = 1, level = 0x67f230 "raid1", pattern = 0x67f2f0 "U_", percent = -1, resync = 0, devcnt = 2, raid_disks = 4, 
  metadata_version = 0x67f2d0 "1.0", members = 0x67f290, next = 0x67f310}
(gdb) p *mdstat->next->next
$3 = {dev = 0x67f030 "md1", devnum = 2147483647, active = 1, level = 0x67f050 "raid1", pattern = 0x67f3e0 "U_", percent = -1, resync = 0, devcnt = 2, raid_disks = 4, 
  metadata_version = 0x0, members = 0x67f3a0, next = 0x0}
(gdb) p *mdstat->next->next->next
Cannot access memory at address 0x0


[root@positron ~]# cat /proc/mdstat
Personalities : [raid1] 
md127 : active (auto-read-only) raid1 sdc1[1]
      40957568 blocks [2/1] [_U]
      bitmap: 1/157 pages [4KB], 128KB chunk

md2 : active raid1 sdb3[2] sdc3[1](F)
      40959928 blocks super 1.0 [2/1] [U_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sdb2[0] sdc2[2](F)
      32764480 blocks [2/1] [U_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

K.O.

Comment 2 Doug Ledford 2012-02-03 19:03:05 UTC

This has been fixed already as of rhel6.2.  Please update to the latest mdadm which resolves this issue.

Comment 3 Konstantin Olchanski 2012-02-03 19:07:04 UTC

I guess I should note the version of mdadm and the contents of /etc/mdadm.conf:

[root@positron ~]# rpm -q mdadm
mdadm-3.2.1-1.el6.x86_64

[root@positron ~]# cat /etc/mdadm.conf 
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=1674297a:39ef30ff:969902aa:3eab0b21
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=67358e91:01dc1f3f:07873abf:f749b94d
[root@positron ~]# 

K.O.

Comment 4 Konstantin Olchanski 2012-02-03 19:13:42 UTC

I confirm this problem does not exist in SL6.2:

[root@positron Packages]# rpm -q mdadm
mdadm-3.2.2-9.el6.x86_64

K.O.