Bug 1375002

Summary: Intel IMSM raid starts resync after each reboot
Product: [Fedora] Fedora Reporter: Oleg Samarin <osamarin68>
Component: mdadmAssignee: XiaoNi <xni>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 24CC: agk, dledford, Jes.Sorensen, osamarin68, xni
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-08 17:17:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
a part of result of journalctl -a
none
screenshot of shutdown -H with Fedora24
none
screenshot of shutdown -H with Fedora23
none
dmesg with 'setenforce 0' none

Description Oleg Samarin 2016-09-11 15:59:09 UTC
Created attachment 1199902 [details]
a part of result of journalctl -a

Description of problem:

After installing Fedora24 with a bios fake raid device it starts rebuilding the array after each reboot. The bios shows the raid state as Normal, but after booting mdadm starts resync.

I have 2 not-raided ssd devices and 2 hard disks assembled to one RAID1 volume. I have LVM installation, some volumes are on SSD (/ and /boot) and some are on the raid volume (/var, /home and some jther filesystems).

Before I had fedora23 on the same box with the same configuration without such problems.


Version-Release number of selected component (if applicable):
fedora 24

systemd-229-13.fc24.x86_64
mdadm-3.4-2.fc24.x86_64
lvm2-2.02.150-2.fc24.x86_64
dracut-044-20.fc24.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create imsm raid1 in bios
2. Install fedora24 with LVM on top of imsm raid1
3. Reboot

Actual results:
mdadm starts resync of /dev/md126

Expected results:
/dev/md126 in the clean state

Additional info:

Which more information should I provide?

Comment 1 XiaoNi 2016-09-12 06:57:23 UTC
Hi Oleg

As I know, it should do resync after creating the fake raid. What will it happen if you let the resync to finish?

Thanks
Xiao

Comment 2 Oleg Samarin 2016-09-12 19:33:42 UTC
In the log attached resync has finished, but it starts again after reboot.

Comment 3 Oleg Samarin 2016-09-12 19:39:52 UTC
More tests: there was't such problem in f23. But if I boot with F24, wait for resync finishes, then reboot to F23, then resync startes again in F23.

So the root problem is not at the start time, but at the shutdown time. Seems F24 unmounts some filesystems incorrectly, that triggers resync after boot.

How to force mdadm to log the reason, why it starts resync?

Comment 4 XiaoNi 2016-09-17 02:42:49 UTC
I checked the dmesg. It starts resync because the raid is not clean. Are those underlying devices OK? Can you have a try with other devices?

сен 11 19:52:15 oleg2.oleghome kernel: md: bind<sda>
сен 11 19:52:15 oleg2.oleghome systemd-udevd[499]: inotify_add_watch(8, /dev/sda2, 10) failed: No such file or directory
сен 11 19:52:15 oleg2.oleghome systemd-udevd[518]: inotify_add_watch(8, /dev/sda1, 10) failed: No such file or directory
сен 11 19:52:15 oleg2.oleghome kernel: md: bind<sdb>
сен 11 19:52:15 oleg2.oleghome kernel: md: bind<sdb>
сен 11 19:52:15 oleg2.oleghome kernel: md: bind<sda>
сен 11 19:52:15 oleg2.oleghome kernel: md: raid1 personality registered for level 1
сен 11 19:52:15 oleg2.oleghome kernel: md/raid1:md126: not clean -- starting background reconstruction
сен 11 19:52:15 oleg2.oleghome kernel: md/raid1:md126: active with 2 out of 2 mirrors
сен 11 19:52:15 oleg2.oleghome kernel: md126: detected capacity change from 0 to 2000395698176

Comment 5 Oleg Samarin 2016-09-17 21:17:45 UTC
All devices are OK. F23 on the same machine does not trigger resync.

What does "clean" mean? Why can they be not clean?

Comment 6 XiaoNi 2016-09-18 03:22:18 UTC
There is a judgement in raid1.c

if (mddev->recovery_cp != MaxSector)
   printk(KERN_NOTICE "md/raid1:%s: not clean"
      " -- starting background reconstruction\n",
      mdname(mddev));

If the raid device have finished the resync, the recovery_cp should be MaxSector.
After rebooting, the raid device is created and it checked recovery_cp. And it goes into this judgement. 

I have tried in my workstation. I just have two disks in my workstation. First I created raid1 in intel BIOS tools. Then I installed f24 on raid1 device. The partition I used is the default choose. I clicked automatically button. I can't reproduce this problem.

Can you add some logs in the codes and build a new kernel to check where and when the recovery_cp is modified?

Comment 7 Oleg Samarin 2016-09-18 06:26:05 UTC
May be this behavior is a consequence of  https://bugzilla.redhat.com/show_bug.cgi?id=1371991 ?

Comment 8 XiaoNi 2016-09-18 07:02:33 UTC
Did you see the same log "mdadm: error waiting for XXX to become clean" during shutdown?

Comment 9 Oleg Samarin 2016-09-20 18:49:41 UTC
Yes, I did.

Comment 10 Oleg Samarin 2016-09-20 18:50:56 UTC
Created attachment 1203001 [details]
screenshot of shutdown -H with Fedora24

Comment 11 Oleg Samarin 2016-09-20 18:51:38 UTC
Created attachment 1203002 [details]
screenshot of shutdown -H with Fedora23

Comment 12 Oleg Samarin 2016-09-20 18:56:03 UTC
Another hypotese - the red message 'Failed unmounting /var'. May be it marks the raid as not clean?

But in the screenshot from fedora 23 the same message is present (white instead of red), and it did not force resync upon the next startup.

Comment 13 Oleg Samarin 2016-09-20 21:02:43 UTC
I added more debugging to mdadm and repeated shutdown .H.
Monitor.c WaitClean:
------------------------------------------------
		/* wait for array_state to be clean */
		while (1) {
			rv = read(state_fd, buf, sizeof(buf));
			if (rv < 0)
				break;
			if (sysfs_match_word(buf, clean_states) <= 4)
				break;
			rv = sysfs_wait(state_fd, &delay);
			if (rv < 0 && errno != EINTR)
				break;
			lseek(state_fd, 0, SEEK_SET);
		}
		if (rv < 0)
			rv = 1;
		else if (fping_monitor(sock) == 0 ||
			 ping_monitor(mdi->text_version) == 0) {
			/* we need to ping to close the window between array
			 * state transitioning to clean and the metadata being
			 * marked clean
			 */
			rv = 0;
		} else
			rv = 1;
		if (rv && verbose)
			pr_err("Error waiting for %s to be clean\n",
				dev);
-------------------------------------------------


breaks the loop with condution 'if (sysfs_match_word(buf, clean_states) <= 4)', because in my case buf == "clean"

Then it tries to call fping_monitor(sock) and ping_monitor(mdi->text_version). They both returned -1, so the rv variable remain be equal to 1.

So waiting for clean was successful, but (f)ping_monitor - no. Maybe because dracut had killed the remaining processes before. But it made the same in F23.

Comment 14 XiaoNi 2016-09-21 09:09:56 UTC
I think we are close to the key reason. I can't see the message during the shutdown. The function WaitClean is called in misc_scan and misc_list. I don't know who call them at shutdown time. Can you help me to add some log to check this? 

And the reason you guess maybe right. Can you add some logs in ping_monitor to check where it returned -1? We don't need to check fping_monitor, the sock it received always be -1.

Comment 15 Oleg Samarin 2016-09-21 13:28:50 UTC
I've already added logs. ping_monitor returns -1 in my case.

Where should I add logs for  misc_scan and misc_list?

Comment 16 Oleg Samarin 2016-09-22 05:18:11 UTC
Both misc_scan and misc_list are called from mdadm.c::main while processing 'mdadm misc' command

Comment 17 XiaoNi 2016-09-22 05:50:02 UTC
Something like this:
@@ -227,29 +227,50 @@ int main(int argc, char *argv[])
                        break;

                case MiscOpt:
+                       printf("MISC : MiscOpt\n");
                case 'D':
+                       printf("MISC : D\n");
                case 'E':
+                       printf("MISC : E\n");
                case 'X':
+                       printf("MISC : X\n");
                case 'Q':
+                       printf("MISC : Q\n");
                case ExamineBB:
+                       printf("MISC : ExamineBB\n");
                case Dump:
+                       printf("MISC : Dump\n");
                case Restore:
+                       printf("MISC : Restore\n");
                case Action:
+                       printf("MISC : Action\n");
                        newmode = MISC;
                        break;


                case 'R':
+                       printf("MISC : R\n");
                case 'S':
+                       printf("MISC : S\n");
                case 'o':
+                       printf("MISC : o\n");
                case 'w':
+                       printf("MISC : w\n");
                case 'W':
+                       printf("MISC : W\n");
                case WaitOpt:
+                       printf("MISC : WaitOpt\n");
                case Waitclean:
+                       printf("MISC : Waitclean\n");
                case DetailPlatform:
+                       printf("MISC : DetailPlatform\n");
                case KillSubarray:
+                       printf("MISC : KillSubarray\n");
                case UpdateSubarray:
+                       printf("MISC : UpdateSubarray\n");
                case UdevRules:
+                       printf("MISC : UdevRules\n");
                case KillOpt:
+                       printf("MISC : KillOpt\n");

@@ -1511,16 +1532,19 @@ int main(int argc, char *argv[])
                        if (devmode == 'S' && c.scan)
                                rv = stop_scan(c.verbose);
                        else if ((devmode == 'D' || devmode == Waitclean) &&
-                                c.scan)
+                                c.scan) {
+                               printf("Now call misc_scan\n");
                                rv = misc_scan(devmode, &c);
-                       else if (devmode == UdevRules)
+                       } else if (devmode == UdevRules)
                                rv = Write_rules(udev_filename);
                        else {
                                pr_err("No devices given.\n");
                                exit(2);
                        }
-               } else
+               } else {
+                       printf("Now call misc_list\n");
                        rv = misc_list(devlist, &ident, dump_directory, ss, &c);
+               }
                break;

I know it's not a smart way to check the problem. But it's a way to check where calls Waitclean.

Comment 18 Oleg Samarin 2016-09-23 19:43:41 UTC
mdadm was running two time during shutdown.

1.
[ 1474.787226] dracut: Waiting for mdraid devices to be clean.
[ 1474.788096] dracut: MISC : Waitclean
[ 1474.788159] dracut: MISC : DetailPlatform
[ 1474.788214] dracut: MISC : KillSubarray
[ 1474.788266] dracut: MISC : UpdateSubarray
[ 1474.788317] dracut: MISC : UdevRules
[ 1474.788367] dracut: MISC : KillOpt
[ 1474.788418] dracut: Now call misc_scan

2. 
[ 1474.788653] dracut: Disassembling mdraid devices.
[ 1474.789517] dracut: MISC : S
[ 1474.789596] dracut: MISC : o
[ 1474.789664] dracut: MISC : w
[ 1474.789730] dracut: MISC : W
[ 1474.789796] dracut: MISC : WaitOpt
[ 1474.789878] dracut: MISC : Waitclean
[ 1474.789949] dracut: MISC : DetailPlatform
[ 1474.790016] dracut: MISC : KillSubarray
[ 1474.790084] dracut: MISC : UpdateSubarray
[ 1474.790150] dracut: MISC : UdevRules
[ 1474.790216] dracut: MISC : KillOpt

Comment 19 Oleg Samarin 2016-09-23 19:57:50 UTC
Created attachment 1204261 [details]
dmesg with 'setenforce 0'

If I run 'setenforce 0' before shutdown, the raid remains clean across reboot.

There is a dmesg result just before system halt.

The reason of 'Error waiting for %s to be clean' is a wrong selinux context of md127.sock.

The reason of making raid not clean is a wrong selinux context of map.new.

Comment 20 Oleg Samarin 2016-09-23 20:05:42 UTC
So, I'd like to have following:

1. Change an error message from 'Error waiting for %s to be clean' to 'Unable to update metadata: permission denied on md127.sock'

2. Add an error message 'Error modifying map.new: permission denied'

3. Correct selinux policy for md127.sock and map.new

Comment 21 Oleg Samarin 2016-09-23 20:11:15 UTC
These are the command run by dracut during shutdown (from /usr/lib/dracut/modules.d/90mdraid/md-shutdown.sh):

    local _offroot=$(strstr "$(mdadm --help-options 2>&1)" offroot && echo --offroot)
    info "Waiting for mdraid devices to be clean."
    mdadm $_offroot -vv --wait-clean --scan| vinfo
    ret=$?
    info "Disassembling mdraid devices."
    mdadm $_offroot -vv --stop --scan | vinfo
    ret=$(($ret+$?))

Comment 22 Oleg Samarin 2016-09-23 20:36:43 UTC
I saw the following in the dmesg 

[    4.365599] systemd[1]: Successfully loaded SELinux policy in 89.640ms.
[    4.381281] systemd[1]: RTC configured in localtime, applying delta of 180 minutes to system time.
[    4.402737] systemd[1]: Unable to fix SELinux security context of /run/mdadm/md127.sock: Permission denied


Seems it causes the problems with comunication to mdmon at shutdown

Comment 23 Oleg Samarin 2016-09-24 09:21:11 UTC
I've created a separate bug 1379044 for correcting selinux policy. If it solve the issue, this bug will relate only to the incorrect error message upon shutdown

Comment 24 Oleg Samarin 2016-09-25 19:29:18 UTC
A solution was proposed at https://bugzilla.redhat.com/show_bug.cgi?id=1379044

Comment 25 XiaoNi 2016-09-27 07:45:56 UTC
Thanks for the analysis. So now it's a SElinux bug. Can we close this now?

Comment 26 Jes Sorensen 2016-09-27 12:41:09 UTC
(In reply to XiaoNi from comment #25)
> Thanks for the analysis. So now it's a SElinux bug. Can we close this now?

You probably need to reassign it to selinux-policy instead of mdadm

Cheers,
Jes

Comment 27 Oleg Samarin 2016-09-27 17:06:08 UTC
> Thanks for the analysis. So now it's a SElinux bug. Can we close this now?

Could you change the error message 'Error waiting for %s to be clean' to 'Unable to update metadata: permission denied on /run/mdadm/md127.sock' before closing this bug?

The message 'Error waiting for %s to be clean' is not correct, because the array is in the clean state. It directs the troubleshooting to the wrong way.

> You probably need to reassign it to selinux-policy instead of mdadm

The separate bug 1379044 has been already assigned to selinux-policy.

Comment 28 XiaoNi 2016-09-30 04:36:52 UTC
I made a mistake in comment17. I forgot the break; in each case. But you gave me the answer that who calls WaitClean in comment21 :)

And I read the related codes. I think it's better to add the error message in the function ping_monitor like this:

[xni@xiao mdadm]$ git diff
diff --git a/msg.c b/msg.c
index 45cd450..ffde439 100644
--- a/msg.c
+++ b/msg.c
@@ -209,8 +209,10 @@ int ping_monitor(char *devname)
        if (sfd >= 0) {
                err = fping_monitor(sfd);
                close(sfd);
-       } else
+       } else {
+               pr_err("Error connecting mdmonfor %s to be clean\n",
                err = -1;
+       }
 
        return err;
 }

Jes, What's your opinion?

Comment 29 XiaoNi 2016-09-30 04:39:48 UTC
Sorry send the patch again, please ignore the patch in comment28.

diff --git a/msg.c b/msg.c
index 45cd450..dda091f 100644
--- a/msg.c
+++ b/msg.c
@@ -209,8 +209,10 @@ int ping_monitor(char *devname)
        if (sfd >= 0) {
                err = fping_monitor(sfd);
                close(sfd);
-       } else
+       } else {
+               pr_err("Error connecting monitor\n");
                err = -1;
+       }
 
        return err;
 }
 
Thanks
Xiao

Comment 30 Oleg Samarin 2016-10-01 08:35:44 UTC
Could it better to print error message not here, but in connect_monitor with more info?

Something like this:

=============================================================================
	if (connect(sfd, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
		pr_err("Error connecting monitor with %s: %s\n", addr.sun_path, strerror(errno));
		close(sfd);
		return -1;
	}
=============================================================================

Comment 31 Jes Sorensen 2016-10-24 18:09:14 UTC
Xiao,

I think it's fair to make the error message a little more elaborate.

Jes

Comment 32 Fedora End Of Life 2017-07-25 22:58:32 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 33 XiaoNi 2017-07-27 07:13:44 UTC
Hi Oleg

Sorry for this. I missed this one. We already fix this in f26. You can update to f26 to resolve this problem.

Thanks
Xiao

Comment 34 Fedora End Of Life 2017-08-08 17:17:23 UTC
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 35 R P Herrold 2018-01-23 15:01:55 UTC
The absence of this fix blocks:

https://github.com/dracutdevs/dracut/issues/342

Can the patch file please be applied?

Comment 36 R P Herrold 2018-01-23 15:03:21 UTC
Please disregard comment #35 -- applied to wrong bug ;(