Bug 1348925

Summary:

mdadm -IRs returns 1 (0 expected )

Product:

Red Hat Enterprise Linux 6

Reporter:

nikhil kshirsagar <nkshirsa>

Component:

mdadm

Assignee:

XiaoNi <xni>

Status:

CLOSED ERRATA

QA Contact:

guazhang <guazhang>

Severity:

medium

Docs Contact:

Milan Navratil <mnavrati>

Priority:

unspecified

Version:

6.7

CC:

bhu, dledford, Jes.Sorensen, jjose, mnavrati, nkshirsa, xni, yizhan

Target Milestone:

Keywords:

Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

mdadm-3.3.4-7.el6

Doc Type:

Bug Fix

Doc Text:

An incorrect exit status of "mdadm -IRs" no longer causes error messages at boot time Previously, the load_container() function was incorrectly trying to load a container from the member array. As a consequence, the "mdadm -IRs" command incorrectly returned the `1` exit status, which led to error messages at boot time. The load_container() function has been modified to prevent loading a container from a member array. As a result, error messages at boot time no longer occur.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-03-21 08:58:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1269194, 1346447

Attachments:

Description	Flags
strace output	none
raid snapshot	none
boot logs image	none

Description nikhil kshirsagar 2016-06-22 10:37:54 UTC

Created attachment 1170678 [details]
strace output

Description of problem:
mdadm -IRs returns 1 (and hence fails in rc.sysinit )

Version-Release number of selected component (if applicable):
mdadm-3.3.2-5.el6.x86_64

How reproducible:
I am still getting the exact repro scenario from the customer but will paste all the details and how they managed to get this error on their system.

The customer says
"What we did is, configured the HW RAID 1 in the 2 SSDs (for OS) from Intel RAID controller and then installed the OS.
During OS installation we did select the advanced drivers and selected the partition that came under the RAID tab. May be that’s the cause."

The raid hardware is
RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05)
RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)



Actual results:
mdadm -IRs returns 0
The exact error seen is

"mdadm: array /dev/md/OSVOLUME0 now has 2 devices (0 new)  FAILED"

during bootup in the rc.sysinit script. The exact command that fails is 
[ -r /proc/mdstat -a -r /dev/md/md-device-map ] && action $"Run MD devices: " /sbin/mdadm -IRs

(see attached image of boot logs)

Expected results:
mdadm -IRs should return 0 to shell/caller

Additional info:

strace output attached.

# cat proc/mdstat 
Personalities : [raid1] 
md126 : active raid1 sda[1] sdb[0]
      111357952 blocks super external:/md0/0 [2/2] [UU]
      
md0 : inactive sda[1](S) sdb[0](S)
      6306 blocks super external:imsm
       
unused devices: <none>


We see mdadm.conf is

MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 UUID=823faff3:7d076a53:ea497222:7734106d
#ARRAY /dev/md127 UUID=501d37d2:5a0ce9b0:71148990:32a9d147

the md127 device was something we asked the customer to comment since we dont see it in the /proc/mdstat output but even with it commented we have the same problem.


# cat proc/cmdline 
ro root=/dev/mapper/sys-root rd_MD_UUID=a52bcfbe:68cfe56b:27a6195d:6581c824 rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=sys/root SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_MD_UUID=3acb860a:5da1daa7:9c522152:fc97b5af  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet

# grep "RAID bus" lspci
00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05)
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

Analysis so far:

I'm looking at mdadm-3.3.2 code.

look at strace output, (group_exit(1)) has happened without any additional logs (pr_err() ) thrown on screen. So whatever has happened, has returned 1 to the OS/shell without having any pr_err logs prior to the exit(1). We use this to focus our search and also remember the stat commands at the end of the strace output. Lets start by searching for those in the known code path we have.

assemble_container_content() is where we see the "array /dev/md/OSVOLUME0 now has 2 devices (0 new)" message and then we see no more messages, and the strace shows exit(1)


strace ends this way


rename("/dev/md/md-device-map.new", "/dev/md/md-device-map") = 0
open("/sys/block/md126/md//array_state", O_WRONLY) = 7
write(7, "readonly", 8)                 = -1 EBUSY (Device or resource busy)
close(7)                                = 0
write(2, "mdadm: array /dev/md/OSVOLUME0 n"..., 48mdadm: array /dev/md/OSVOLUME0 now has 2 devices) = 48 <-- printing the output seen
write(2, " (0 new)", 8 (0 new))                 = 8
write(2, "\n", 1)                       = 1
close(6)                                = 0
stat("/sys/dev/block/8:16/partition", 0x7ffce60cd2d0) = -1 ENOENT (No such file or directory)
stat("/sys/dev/block/8:0/partition", 0x7ffce60cd2d0) = -1 ENOENT (No such file or directory)
unlink("/dev/md/md-device-map.lock")    = 0
close(3)                                = 0
munmap(0x7fd80e2ba000, 4096)            = 0
exit_group(1)                           = ?
+++ exited with 1 +++

So lets examine the code for whatever might be causing the stat calls to be seen in the system calls. We see the code in assemble_container_content()

assemble_container_content() itself doesnt do much more after printing the output we see. 

From Assemble.c  line 1986  assemble_container_content() function ends this way,


        } else if (c->verbose >= 0) {
                if (err)  <--------------------------------------------- err is 1, code path goes into this if case
                        pr_err("array %s now has %d device%s",
                               chosen_name, working + preexist,
                               working + preexist == 1 ? "":"s");
                else
                        pr_err("Started %s with %d device%s",
                               chosen_name, working + preexist,
                               working + preexist == 1 ? "":"s");
                if (preexist)
                        fprintf(stderr, " (%d new)", working);
                if (expansion)
                        fprintf(stderr, " ( + %d for expansion)",
                                expansion);
                fprintf(stderr, "\n");  <---- output we see ends here
        }
        if (!err)
                wait_for(chosen_name, mdfd);                   <-- wont go in this function since err is not 0
        return err;
        /* FIXME should have an O_EXCL and wait for read-auto */
}
#endif



The call to assemble_container_content function will be in Incremental.c (since its the -I parameter passed to mdadm)

mdadm 3.3.2

Incremental.c Line 1602

                assemble_container_content(st, mdfd, ra, c,
                                           chosen_name, &result);
                close(mdfd); <---- see in the strace output before the stat calls
        }
        if (c->export && result) {
                char sep = '=';
                printf("MD_STARTED");
                if (result & INCR_NO) {
                        printf("%cno", sep);
                        sep = ',';
                }
                if (result & INCR_UNSAFE) {
                        printf("%cunsafe", sep);
                        sep = ',';
                }
                if (result & INCR_ALREADY) {
                        printf("%calready", sep);
                        sep = ',';
                }
                if (result & INCR_YES) {
                        printf("%cyes", sep);
                        sep = ',';
                }
                printf("\n");
        }

        /* don't move spares to container with volume being activated
           when all volumes are blocked */
        if (ra_all == ra_blocked)
                return 0;

        /* Now move all suitable spares from spare container */
        domains = domain_from_array(list, st->ss->name);
        memcpy(suuid, uuid_zero, sizeof(int[4]));
        if (domains &&
            (smp = map_by_uuid(&map, suuid)) != NULL &&

If we trace the source from here looking for stat calls, to get a clue of what might have happened, lets look at domain_from_array()

domains = domain_from_array(list, st->ss->name); (


[root@nkshirsa mdadm-3.3.2]# grep domain_from_array *
Incremental.c:		dl = domain_from_array(sra, st2->ss->name);
Incremental.c:	domains = domain_from_array(list, st->ss->name);
mdadm.h:extern struct domainlist *domain_from_array(struct mdinfo *mdi,
grep: misc: Is a directory
policy.c:struct domainlist *domain_from_array(struct mdinfo *mdi, const char *metadata) <--- definition here
grep: systemd: Is a directory
grep: tests: Is a directory
[root@nkshirsa mdadm-3.3.2]# 

Lets see policy.c

struct domainlist *domain_from_array(struct mdinfo *mdi, const char *metadata)
{
        struct domainlist *domlist = NULL;

        if (!mdi)
                return NULL;
        for (mdi = mdi->devs ; mdi ; mdi = mdi->next)
                domainlist_add_dev(&domlist, makedev(mdi->disk.major,
                                                     mdi->disk.minor),
                                   metadata);

        return domlist;
}

Lets see domainlist_add_dev

void domainlist_add_dev(struct domainlist **dom, int devid, const char *metadata)
{
        struct dev_policy *pol = devid_policy(devid);
        domain_merge(dom, pol, metadata);
        dev_policy_free(pol);
}

Lets see devid_policy()

struct dev_policy *devid_policy(int dev)
{
        struct mdinfo disk;
        disk.disk.major = major(dev);
        disk.disk.minor = minor(dev);
        return disk_policy(&disk);
}

Lets see disk_policy(&disk);



/*
 * disk_policy() gathers policy information for the
 * disk described in the given mdinfo (disk.{major,minor}).
 */
struct dev_policy *disk_policy(struct mdinfo *disk)
{
        char *path = NULL;
        char *type = disk_type(disk); <--- contains stat system call
        struct dev_policy *pol = NULL;

        if (config_rules_has_path)
                path = disk_path(disk); <--- contains stat system call

        pol = path_policy(path, type);

        free(path);
        return pol;
}


static char *disk_type(struct mdinfo *disk)
{
        char buf[30+20+20];
        struct stat stb;
        sprintf(buf, "/sys/dev/block/%d:%d/partition",
                disk->disk.major, disk->disk.minor);
        if (stat(buf, &stb) == 0) <---------------------------
                return type_part;
        else
                return type_disk;
}


static char *disk_path(struct mdinfo *disk)
{
        struct stat stb;
        int prefix_len;
        DIR *by_path;
        char symlink[PATH_MAX] = "/dev/disk/by-path/";
        char nm[PATH_MAX];
        struct dirent *ent;
        int rv;

        by_path = opendir(symlink);
        if (by_path) {
                prefix_len = strlen(symlink);
                while ((ent = readdir(by_path)) != NULL) {
                        if (ent->d_type != DT_LNK)
                                continue;
                        strncpy(symlink + prefix_len,
                                        ent->d_name,
                                        sizeof(symlink) - prefix_len);
                        if (stat(symlink, &stb) < 0)  <-------------------------------------
                                continue;
                        if ((stb.st_mode & S_IFMT) != S_IFBLK)
                                continue;
                        if (stb.st_rdev != makedev(disk->disk.major, disk->disk.minor))
                                continue;
                        closedir(by_path);
                        return xstrdup(ent->d_name);
                }
                closedir(by_path);
        }
        /* A NULL path isn't really acceptable - use the devname.. */
        sprintf(symlink, "/sys/dev/block/%d:%d", disk->disk.major, disk->disk.minor);
        rv = readlink(symlink, nm, sizeof(nm)-1);
        if (rv > 0) {
                char *dname;
                nm[rv] = 0;
                dname = strrchr(nm, '/');
                if (dname)
                        return xstrdup(dname + 1);
        }
        return xstrdup("unknown");
}


Are these the two stat system calls we see in the strace end ?
close(6)                                = 0
stat("/sys/dev/block/8:16/partition", 0x7ffce60cd2d0) = -1 ENOENT (No such file or directory)
stat("/sys/dev/block/8:0/partition", 0x7ffce60cd2d0) = -1 ENOENT (No such file or directory)
unlink("/dev/md/md-device-map.lock")    = 0
close(3)                                = 0
munmap(0x7fd80e2ba000, 4096)            = 0
exit_group(1)                           = ?
+++ exited with 1 +++


Anyway, going forward, assuming so, after this stat we see a sudden exit_group without any pr_err message.

The only way this seems to happen is if 

static char *disk_path(struct mdinfo *disk)
{
        struct stat stb;
        int prefix_len;
        DIR *by_path;
        char symlink[PATH_MAX] = "/dev/disk/by-path/";
        char nm[PATH_MAX];
        struct dirent *ent;
        int rv;

        by_path = opendir(symlink);
        if (by_path) {
                prefix_len = strlen(symlink);
                while ((ent = readdir(by_path)) != NULL) {
                        if (ent->d_type != DT_LNK)
                                continue;
                        strncpy(symlink + prefix_len,
                                        ent->d_name,
                                        sizeof(symlink) - prefix_len);
                        if (stat(symlink, &stb) < 0)
                                continue;
                        if ((stb.st_mode & S_IFMT) != S_IFBLK)
                                continue;
                        if (stb.st_rdev != makedev(disk->disk.major, disk->disk.minor))
                                continue;
                        closedir(by_path);
                        return xstrdup(ent->d_name);   <------------------------- this function seems to have a exit without any pr_err message
                }
                closedir(by_path);
        }
        /* A NULL path isn't really acceptable - use the devname.. */
        sprintf(symlink, "/sys/dev/block/%d:%d", disk->disk.major, disk->disk.minor);
        rv = readlink(symlink, nm, sizeof(nm)-1);
        if (rv > 0) {
                char *dname;
                nm[rv] = 0;
                dname = strrchr(nm, '/');
                if (dname)
                        return xstrdup(dname + 1);
        }
        return xstrdup("unknown"); <-------------------------------------- this function seems to have a exit without any pr_err message
}

char *xstrdup(const char *str)
{
        char *rv = strdup(str);
        char *msg;
        if (rv)
                return rv;
        msg = Name ": memory allocation failure - aborting\n";
        exit(4+!!write(2, msg, strlen(msg)));  <--------------- what is this ? Could this result in exit(1) ?
}

When i ran this code in a test program, it exits with 5, not 1.

Its an unlikely case anyway, why would we get memory allocation failure at bootup anyway ?

But what else could cause an exit(1) from mdadm without any errors/warnings/info/messages logged ? It is supposed to return 0 to the shell when called, right ?

I tried to grep for exit(1) to see if that will give any clue.. 

[root@nkshirsa mdadm-3.3.2]# grep -ri "exit(1)"
swap_super.c:		exit(1);
swap_super.c:		exit(1);
swap_super.c:		exit(1);
swap_super.c:		exit(1);
swap_super.c:		exit(1);
swap_super.c:		exit(1);
swap_super.c:		exit(1);
util.c.ddf:			exit(1);
util.c.ddf:		exit(1);
util.c:			exit(1);
util.c:		exit(1);
Grow.c:		exit(1);
restripe.c:		exit(1);
restripe.c:			exit(1);
restripe.c:			exit(1);
restripe.c:			exit(1);
Incremental.c:		exit(1);
mdadm.c:				exit(1);
mdadm.c:				exit(1);
mdadm.c:			exit(1);
mdadm.c:				exit(1);
mdadm.c:		exit(1);
mdadm.c:				exit(1);
mdadm.c:				exit(1);
mdadm.c:				exit(1);
mdadm.c:				exit(1);
mdadm.c:				exit(1);
mdmon.c:		exit(1);


No case in mdadm.c has an exit(1) without a prior pr_err call, except this one, and I dont think we have this case, since we passed  -IRs to mdadm ,

        if (mode == MANAGE || mode == BUILD || mode == CREATE
            || mode == GROW
            || (mode == ASSEMBLE && ! c.scan)) {
                if (devs_found < 1) {
                        pr_err("an md device must be given in this mode\n");
                        exit(2);
                }
                if ((int)ident.super_minor == -2 && c.autof) {
                        pr_err("--super-minor=dev is incompatible with --auto\n");
                        exit(2);
                }
                if (mode == MANAGE || mode == GROW) {
                        mdfd = open_mddev(devlist->devname, 1);
                        if (mdfd < 0)
                                exit(1); <--------------------------- only possibility, unlikely to be executed in our case, right?
                } else
                        /*


Both the exit(1)'s in util.c are for mdmon, not mdadm

util.c:			exit(1);
util.c:		exit(1);

the one in incremental.c is 

static void run_udisks(char *arg1, char *arg2)
{
        int pid = fork();
        int status;
        if (pid == 0) {
                execl("/usr/bin/udisks", "udisks", arg1, arg2, NULL); <--- we dont see this system call reflected in strace
                execl("/bin/udisks", "udisks", arg1, arg2, NULL);
                exit(1);
        }

I dont think we will be executing code in this case in 

Grow.c:		exit(1);
restripe.c:		exit(1);
restripe.c:			exit(1);
restripe.c:			exit(1);
restripe.c:			exit(1);

All the exit(1)'s in swap_super.c have a prior perror.

So what is causing the mdadm binary to exit with a return code of 1 ?

Comment 1 nikhil kshirsagar 2016-06-22 10:38:39 UTC

Created attachment 1170679 [details]
raid snapshot

Comment 2 nikhil kshirsagar 2016-06-22 10:39:02 UTC

Created attachment 1170680 [details]
boot logs image

Comment 4 Jes Sorensen 2016-06-22 19:07:22 UTC

Reading this bug report it is completely unclear what the problem is here.

Per /proc/mdstat it looks like the arrays are assembled correctly.

Why do they have a manually mangled /etc/mdadm.conf?

Where is the full dmesg boot log?

md0 is the IMSM container - normally that would be md127.

Comment 24 Jerrin Jose 2016-07-04 08:47:01 UTC

Please find the update from customer
===

-- We would need more steps in detail, that is the exact steps followed.

Ans: As mentioned in previous mail, that was the exact steps followed as straight forward installation after doing HW RAID 1 configuration on intel SSDs.
	After the OS drive selection from RAID tab, we created the volume partitions, and fdisk -l shows as below,

[root@node1 ~]# fdisk -l | grep -i dev
Disk /dev/sda: 120.0 GB, 120034123776 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          64      512000   83  Linux
/dev/sda2              64        1109     8388608   82  Linux swap / Solaris
/dev/sda3            1109       13864   102456320   8e  Linux LVM
Disk /dev/sdb: 120.0 GB, 120034123776 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          64      512000   83  Linux
/dev/sdb2              64        1109     8388608   82  Linux swap / Solaris
/dev/sdb3            1109       13864   102456320   8e  Linux LVM

Disk /dev/sdc: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdd: 4000.8 GB, 4000787030016 bytes
Disk /dev/sde: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdf: 4000.8 GB, 4000787030016 bytes
Disk /dev/md126: 114.0 GB, 114030542848 bytes

      Device Boot      Start         End      Blocks   Id  System
/dev/md126p1   *         257      128256      512000   83  Linux
/dev/md126p2          128257     2225408     8388608   82  Linux swap / Solaris
/dev/md126p3         2225409    27839488   102456320   8e  Linux LVM

Disk /dev/mapper/sys-root: 11.0 GB, 10997465088 bytes
Disk /dev/mapper/sys-tmp: 11.0 GB, 10997465088 bytes
Disk /dev/mapper/sys-backup: 5498 MB, 5498732544 bytes
Disk /dev/mapper/sys-opt: 42.9 GB, 42949672960 bytes
Disk /dev/mapper/sys-usr: 11.0 GB, 10997465088 bytes
Disk /dev/mapper/sys-var: 11.0 GB, 10997465088 bytes
Disk /dev/mapper/sys-home: 11.0 GB, 10997465088 bytes

-- Also in which step you ran mdadm -IRs?
Ans: We didn’t run mdadm -IRs after installation, it was asked during the support case only that’s why we ran the command as advised for output. 


-- Did you run this command due to installation failure?
Ans: The installation was successful, for iostat we get below results.

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               6.33         0.00        56.00          0        168
sdb               6.00         0.00        56.00          0        168
sdc               0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
md126             8.00         0.00        56.00          0        168
dm-0              0.00         0.00         0.00          0          0
dm-1              2.67         0.00        21.33          0         64
dm-2              0.00         0.00         0.00          0          0
dm-3              4.33         0.00        34.67          0        104
dm-4              0.00         0.00         0.00          0          0
dm-5              0.00         0.00         0.00          0          0
dm-6              0.00         0.00         0.00          0          0

^C
[root@node1 ~]#

Let me know if you need further details.
No sure if the HW raid array is any issue here, we are using this SuperMicro HW for MapR. Details were sent in previous mail.

====

Comment 43 RHEL Program Management 2016-07-20 11:05:44 UTC

Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 56 guazhang@redhat.com 2016-12-01 07:26:10 UTC

Hello
I have tested mdadm-3.3.4-1.el6 and reproduce this bug ,and then install mdadm-3.3.4-7.el6 can not reproduce this bug ,so this bug was fixed 
Moved to  "verified"

Comment 59 errata-xmlrpc 2017-03-21 08:58:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0569.html