Bug 1009702

Summary: mdmonitor.service fails to start
Product: [Fedora] Fedora Reporter: markm <marek78uk>
Component: mdadmAssignee: Jes Sorensen <Jes.Sorensen>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 19CC: agk, dledford, Jes.Sorensen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-19 18:31:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description markm 2013-09-18 22:37:37 UTC
Description of problem:

noticed, mdmonitor service fails on boot


Version-Release number of selected component (if applicable):

# rpm -qa | grep mdadm
mdadm-3.2.6-21.fc19.x86_64

How reproducible:

always

Steps to Reproduce:
1. Boot computer with Raid Array

Actual results:

mdmonitor.service fails

Expected results:

mdmonitor.service to run?


Additional info:

# systemctl status mdmonitor.service
mdmonitor.service - Software RAID monitoring and management
   Loaded: loaded (/usr/lib/systemd/system/mdmonitor.service; enabled)
   Active: failed (Result: exit-code) since Wed 2013-09-18 23:24:10 BST; 52s ago
  Process: 529 ExecStart=/sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid (code=exited, status=1/FAILURE)


# systemctl start mdmonitor.service
Job for mdmonitor.service failed. See 'systemctl status mdmonitor.service' and 'journalctl -xn' for details.


# journalctl -xn
-- Logs begin at Fri 2013-06-14 21:58:12 BST, end at Wed 2013-09-18 23:25:16 BST. --
Sep 18 23:25:15 stefan systemd[1]: Starting Software RAID monitoring and management...
-- Subject: Unit mdmonitor.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit mdmonitor.service has begun starting up.
Sep 18 23:25:15 stefan mdadm[2401]: mdadm: No mail address or alert command - not monitoring.
Sep 18 23:25:15 stefan systemd[1]: mdmonitor.service: control process exited, code=exited status=1
Sep 18 23:25:15 stefan systemd[1]: Failed to start Software RAID monitoring and management.
-- Subject: Unit mdmonitor.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: http://www.freedesktop.org/wiki/Software/systemd/catalog/be02cf6855d2428ba40df7e9d022f03d
-- 
-- Unit mdmonitor.service has failed.
-- 
-- The result is failed.
Sep 18 23:25:15 stefan systemd[1]: Unit mdmonitor.service entered failed state.
Sep 18 23:25:16 stefan sendmail[1455]: unable to qualify my own domain name (stefan) -- using short name
Sep 18 23:25:16 stefan sendmail[2403]: starting daemon (8.14.7): SMTP+queueing@01:00:00
Sep 18 23:25:16 stefan systemd[1]: Started Sendmail Mail Transport Agent.
-- Subject: Unit sendmail.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sendmail.service has finished starting up.
-- 
-- The start-up result is done.
Sep 18 23:25:16 stefan systemd[1]: Starting Sendmail Mail Transport Client...
-- Subject: Unit sm-client.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit sm-client.service has begun starting up.



now interesting part:

# mdadm --examine --scan
ARRAY /dev/md/3  metadata=1.2 UUID=3b3bf277:65467954:29e7727b:a327eb46 name=Stefan:3

but:

# cat /etc/mdadm.conf 
ARRAY /dev/md0 metadata=1.2 name=Stefan:3 UUID=3b3bf277:65467954:29e7727b:a327eb46


obviously I want my raid array to be named /dev/md0, not /dev/md/3 or /dev/md/123 or whatever it thinks it should be.


Note, raid array is mounted and runs smoothly (I can see it's status via Disk Utility).

Comment 1 Doug Ledford 2013-09-19 18:31:25 UTC
The contents of your mdadm.conf file suggests that you created this array yourself and not during the install process.

There is no mail address line in your mdadm.conf file, nor is there a program line in your mdadm.conf file.  Without one or the other of these configuration directives in the mdadm.conf file, the mdmonitor service will not run.  Anaconda normally adds the MAILADDR directive to the file for you if you create an array during the install process.  Since you didn't do that, you have to add it yourself.

As for device naming.  You are using mdadm version 1.x superblocks.  The old standard with version 0.90 superblocks was to name devices by number.  However, everyone always wanted /dev/md0 first, then /dev/md1, etc.  It meant that any time you needed to mount an array on another host, you were pretty much guaranteed to have a name conflict.

With version 1.x superblocks we intentionally moved from using numbers to represent arrays to using names.  So you can now name your array root, or home, or whatever.  And to avoid the same sort of conflicts, we added a homehost directive into the array and mdadm.conf file.  So if you wanted to refer to the root array on host www1, you would name the array at creation time www1:root, and in the mdadm.conf file you would put HOMEHOST www1 and voila, your array would get assembled as /dev/md/root.  If you then needed to mount the array from www2 on www1 to copy some files, it would assemble as /dev/md/www2:root so that you would not have a name conflict like used to happen.  And so that you could actually mount old style /dev/md0 arrays at the same time, the names in /dev/md are symlinks to the /dev/md# devices we use to assemble the array, and we pick the number starting at 127 and counting backwards (this was an old limitation of the block subsystem major/minor device numbers that we had to deal with).

But, to preserve backward compatibility for people that really wanted to stick with the old /dev/md0 type names, we also added a special hook.  If you create a new device with a version 1.x superblock, and you name that device as <homehost>:<number>, we will assume that you want the old style /dev/md<number> array name, and that you want us to use that number in creating the array.  Because your array is defined using Stefan:3 in the name field, you are in fact telling us that you are on homehost Stefan and that you want /dev/md3 as your device.

The fact that you have /dev/md0 in your mdadm.conf file and it's getting ignored when it would normally override the name entry of the superblock means that probably you created this array after the system was installed, that you manually created your mdadm.conf file, and that you didn't recreate your dracut initrd images to include the mdadm.conf file, and that your array is getting assembled by dracut from the initramfs and since the mdadm.conf file is not there, it is getting the default name that your name field in the superblock says it should get.

In any case, pretty much everything in this bug report is a case of just needing to set up the superblock and the mdadm.conf file properly, not a bug in the program, so I'm closing this bug out.

Comment 2 markm 2013-09-20 21:13:16 UTC
Thank you for your long and detailed reply.

A few things to note:

1) anaconda didn't mount my array, had to generate mdadm.conf file myself using same command as I did to check what mdadm suggested. file was not created from scratch manually, it was pure out from mdadm --scan

2) I understand explanation regarding names, just to note, I never wanted "stefan:3" as a name, at some point it appeared and it stayed there (my array was created ca. 3 years ago, but I've been upgrading fedora every year).

once again, many thanks for a detailed explanation!