Bug 1931610

Summary:	iSCSI Linux RAID disk is always resyncing after reboot
Product:	Red Hat Enterprise Linux 8	Reporter:	Jon Magrini <jmagrini>
Component:	mdadm	Assignee:	Nigel Croxon <ncroxon>
Status:	CLOSED NOTABUG	QA Contact:	Storage QE <storage-qe>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.3	CC:	dledford, heinzm, jbrassow, jdonohue, mhoyer, ncroxon
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-19 19:13:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jon Magrini 2021-02-22 19:17:53 UTC

Description of problem:
systemd can stop iscsi before safe mode delay timer expires. 

Version-Release number of selected component (if applicable):
4.18.0-240.10.1.el8_3.x86_64
iscsi-initiator-utils-6.2.0.878-5.gitd791ce0.el8.x86_64
systemd-239-41.el8_3.1.x86_64

How reproducible:
Very

Steps to Reproduce:
1. Create iscsi session to targets
2. Create md raid1 from iscsi targets
3. Create fs and mount to mointpoint
4. Add fstab entry (fstab generate create unit)
5. reboot

Actual results:
After every normal reboot, resync occurs: 

---
kernel: md/raid1:md0: active with 2 out of 2 mirrors
kernel: md/raid1:md1: not clean -- starting background reconstruction
kernel: md/raid1:md1: active with 2 out of 2 mirrors
---

md0 : active raid1 sdc1[0] sdd1[1]
      104790016 blocks super 1.2 [2/2] [UU]
      [=========>...........]  resync = 46.8% (49130560/104790016) finish=7.1min speed=130229K/sec

Expected results:
Raid is clean

Additional info:

Comment 1 Jon Magrini 2021-02-22 19:20:41 UTC

During normal shutdown, the system is solely relying on the safe mode delay timer to mark the device in sync.

The default minimum safe delay time-limit is 201 milliseconds.  So if manually unmounting first, the timer will trigger and force md to update its metadata before a later reboot request can stop iscsi.  But if left to systemd, the final writes from unmounting the filesystem leaves a race window where systemd can stop iscsi soon enough after the unmount that the timer only triggers after the iscsi devices are no longer usable.

And using some of the logs, note that md reported it was unable to update its metadata a little over 200ms after the filesystem unmounted, then iscsi was stopped.


[  126.523984] XFS (dm-4): Unmounting Filesystem 
...
[  126.535742] systemd[1]: Unmounted /data.
...
[  126.550247] systemd[6813]: iscsi-shutdown.service: Executing: /usr/sbin/iscsiadm -m node --logoutall=all
[  126.560603] systemd[1]: Got cgroup empty notification for: /system.slice/data.mount
[  126.573028] sd 7:0:0:0: [sda] Synchronizing SCSI cache
[  126.580379] sd 7:0:0:1: [sdb] Synchronizing SCSI cache
...
[  126.761855] md: super_written gets error=10   


This looks like the safe mode delay timer triggered, but too late for the I/Os to mark the disks in sync to succeed.  Nothing in the systemd units look to try and force it sooner.

Comment 2 Chris Leech 2021-02-23 19:33:19 UTC

We have specifically logged out of iSCSI sessions at shutdown (unless they're used for the root filesystem) because of storage arrays that don't like having resources tied up in iSCSI session and TCP connection state if we just drop the connection. I don't think that's going to change.

In general, it's probably best to manage RAID on the storage target side and expose the set as a single volume over iSCSI.

As a workaround, I suppose you could edit the iscsi-shutdown.service unit file to manually stop the md RAID set before logging out of the iSCSI sessions.  This seems to work for me.

# systemctl edit --full iscsi-shutdown.service
(I could not get multiple ExecStop ordering right with override files, hence the --full)

add an ExecStop line for mdadm before the existing iscsiadm line

  [Service]
  ...
  ExecStop=-/usr/sbin/mdadm --stop --scan
  ExecStop=-/usr/sbin/iscsiadm -m node --logoutall=all

This will write a modifed service file to /etc/systemd/system/iscsi-shutdown.service

It's probably also possible to do this with a new service file that orders itself After iscsi-shutdown (After gets executed before on shutdown, with the real work being done in ExecStop)

Comment 5 Nigel Croxon 2021-04-14 17:32:43 UTC

Hello Jon,

If you could give this .service file a test?

/usr/lib/systemd/system/mdadm-clean-shutdown.service 

[Unit]
Description=Wait for a update to clean the SB and bitmap before shutdown
DefaultDependencies=no
Requires=local-fs.target
Before=iscsi-shutdown.service
After=unmount.target

[Service]
Type=oneshot
ExecStart=BINDIR/mdadm --stop --scan

Comment 6 Nigel Croxon 2021-04-21 17:57:52 UTC

Jon, Is the initiator and the target on the same machine?

Comment 7 Jon Magrini 2021-04-22 12:25:42 UTC

(In reply to Nigel Croxon from comment #6)
> Jon, Is the initiator and the target on the same machine?

The initiator and target are not the same system. The target is an HPE NAS appliance. I will try and test the unit file.

Comment 9 Nigel Croxon 2021-04-23 16:43:43 UTC

When I have the MD (/dev/md0) placed in the /etc/fstab to auto mount, it hangs/stalls on boot with:
A start job is running for dev-md0.device (xxx min / yyy min).  It eventually times out and falls into
emergency mode.
I edit /etc/fstab and remove the md0 reference, the boot continues.

I think there is a power up sequence issue.

Comment 10 Nigel Croxon 2021-04-23 16:52:14 UTC

puting _netdev as an option in /etc/fstab resolved my issue of booting in the above comment.

What does your /etc/fstab look like?

Comment 11 Jon Magrini 2021-04-23 17:39:16 UTC

fstab is as follows, adding x-systemd.after=iscsi.service addressed a few random shutdown issues. 
---
/dev/storeasy/veeamrepo /srv                    xfs     _netdev,x-systemd.after=iscsi.service   0 0

Comment 12 Nigel Croxon 2021-04-27 17:25:34 UTC

I'm still unable to reproduce the issue as you have reported.

Comment 13 Nigel Croxon 2021-04-27 17:53:18 UTC

ok, maybe I spoke too soon
# dmesg |grep md0
[    4.466948] systemd[1]: dev-md0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed)
[    4.467690] systemd[1]: dev-md0.device: Dependency Before=network.target ignored (.device units cannot be delayed)
[    8.790559] md/raid1:md0: not clean -- starting background reconstruction
[    8.791167] md/raid1:md0: active with 3 out of 3 mirrors
[    8.791756] md0: detected capacity change from 0 to 103809024
[    8.809653] md: resync of RAID array md0
[    9.037098] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
[   12.732727] md: md0: resync done.

Comment 14 Nigel Croxon 2021-04-27 19:57:31 UTC

I'm getting consistent results (clean on entry) when I don't have your addition to fstab:

[root@virt2 ~]# cat /etc/fstab 

/dev/mapper/rhel-root   /                       xfs     defaults        0 0
/dev/mapper/rhel-swap   none                    swap    defaults        0 0
/dev/md0 /mdtest                                ext4    _netdev         0 0


[root@virt2 ~]# dmesg |grep md0
[    4.533658] systemd[1]: dev-md0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed)
[    4.534402] systemd[1]: dev-md0.device: Dependency Before=network.target ignored (.device units cannot be delayed)
[    9.099737] md/raid1:md0: active with 1 out of 3 mirrors
[    9.100373] md0: detected capacity change from 0 to 103809024
[    9.247297] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)



[root@virt2 ~]# cat /usr/lib/systemd/system/mdadm-clean-shutdown.service 
[Unit]
Description=Wait for a update to clean the SB and bitmap before shutdown
DefaultDependencies=no
Requires=local-fs.target
Before=iscsi-shutdown.service
After=unmount.target

[Service]
Type=oneshot
ExecStart=/sbin/mdadm --stop --scan

Comment 15 Jon Magrini 2021-06-15 15:29:39 UTC

Nigel,

Your comment #14 is still utilizing the modified shutdown service unit correct? I will retest and also ask of the customer to remove the fstab entry and modify the mdadm-clean-shutdown.service and provide results. 

Thanks.

Comment 18 Nigel Croxon 2021-07-19 19:13:38 UTC

As there is a working solution to this problem, I am closing this bz.