1931610 – iSCSI Linux RAID disk is always resyncing after reboot

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1931610 - iSCSI Linux RAID disk is always resyncing after reboot

Summary: iSCSI Linux RAID disk is always resyncing after reboot

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	mdadm
Sub Component:
Version:	8.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Nigel Croxon
QA Contact:	Storage QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-22 19:17 UTC by Jon Magrini
Modified:	2024-03-25 18:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-19 19:13:38 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5830941	0	None	None	None	2021-07-19 14:01:07 UTC

Description Jon Magrini 2021-02-22 19:17:53 UTC

Description of problem:
systemd can stop iscsi before safe mode delay timer expires. 

Version-Release number of selected component (if applicable):
4.18.0-240.10.1.el8_3.x86_64
iscsi-initiator-utils-6.2.0.878-5.gitd791ce0.el8.x86_64
systemd-239-41.el8_3.1.x86_64

How reproducible:
Very

Steps to Reproduce:
1. Create iscsi session to targets
2. Create md raid1 from iscsi targets
3. Create fs and mount to mointpoint
4. Add fstab entry (fstab generate create unit)
5. reboot

Actual results:
After every normal reboot, resync occurs: 

---
kernel: md/raid1:md0: active with 2 out of 2 mirrors
kernel: md/raid1:md1: not clean -- starting background reconstruction
kernel: md/raid1:md1: active with 2 out of 2 mirrors
---

md0 : active raid1 sdc1[0] sdd1[1]
      104790016 blocks super 1.2 [2/2] [UU]
      [=========>...........]  resync = 46.8% (49130560/104790016) finish=7.1min speed=130229K/sec

Expected results:
Raid is clean

Additional info:

Comment 1 Jon Magrini 2021-02-22 19:20:41 UTC

During normal shutdown, the system is solely relying on the safe mode delay timer to mark the device in sync.

The default minimum safe delay time-limit is 201 milliseconds.  So if manually unmounting first, the timer will trigger and force md to update its metadata before a later reboot request can stop iscsi.  But if left to systemd, the final writes from unmounting the filesystem leaves a race window where systemd can stop iscsi soon enough after the unmount that the timer only triggers after the iscsi devices are no longer usable.

And using some of the logs, note that md reported it was unable to update its metadata a little over 200ms after the filesystem unmounted, then iscsi was stopped.


[  126.523984] XFS (dm-4): Unmounting Filesystem 
...
[  126.535742] systemd[1]: Unmounted /data.
...
[  126.550247] systemd[6813]: iscsi-shutdown.service: Executing: /usr/sbin/iscsiadm -m node --logoutall=all
[  126.560603] systemd[1]: Got cgroup empty notification for: /system.slice/data.mount
[  126.573028] sd 7:0:0:0: [sda] Synchronizing SCSI cache
[  126.580379] sd 7:0:0:1: [sdb] Synchronizing SCSI cache
...
[  126.761855] md: super_written gets error=10   


This looks like the safe mode delay timer triggered, but too late for the I/Os to mark the disks in sync to succeed.  Nothing in the systemd units look to try and force it sooner.

Comment 2 Chris Leech 2021-02-23 19:33:19 UTC

We have specifically logged out of iSCSI sessions at shutdown (unless they're used for the root filesystem) because of storage arrays that don't like having resources tied up in iSCSI session and TCP connection state if we just drop the connection. I don't think that's going to change.

In general, it's probably best to manage RAID on the storage target side and expose the set as a single volume over iSCSI.

As a workaround, I suppose you could edit the iscsi-shutdown.service unit file to manually stop the md RAID set before logging out of the iSCSI sessions.  This seems to work for me.

# systemctl edit --full iscsi-shutdown.service
(I could not get multiple ExecStop ordering right with override files, hence the --full)

add an ExecStop line for mdadm before the existing iscsiadm line

  [Service]
  ...
  ExecStop=-/usr/sbin/mdadm --stop --scan
  ExecStop=-/usr/sbin/iscsiadm -m node --logoutall=all

This will write a modifed service file to /etc/systemd/system/iscsi-shutdown.service

It's probably also possible to do this with a new service file that orders itself After iscsi-shutdown (After gets executed before on shutdown, with the real work being done in ExecStop)

Comment 5 Nigel Croxon 2021-04-14 17:32:43 UTC

Hello Jon,

If you could give this .service file a test?

/usr/lib/systemd/system/mdadm-clean-shutdown.service 

[Unit]
Description=Wait for a update to clean the SB and bitmap before shutdown
DefaultDependencies=no
Requires=local-fs.target
Before=iscsi-shutdown.service
After=unmount.target

[Service]
Type=oneshot
ExecStart=BINDIR/mdadm --stop --scan

Comment 6 Nigel Croxon 2021-04-21 17:57:52 UTC

Jon, Is the initiator and the target on the same machine?

Comment 7 Jon Magrini 2021-04-22 12:25:42 UTC

(In reply to Nigel Croxon from comment #6)
> Jon, Is the initiator and the target on the same machine?

The initiator and target are not the same system. The target is an HPE NAS appliance. I will try and test the unit file.

Comment 9 Nigel Croxon 2021-04-23 16:43:43 UTC

When I have the MD (/dev/md0) placed in the /etc/fstab to auto mount, it hangs/stalls on boot with:
A start job is running for dev-md0.device (xxx min / yyy min).  It eventually times out and falls into
emergency mode.
I edit /etc/fstab and remove the md0 reference, the boot continues.

I think there is a power up sequence issue.

Comment 10 Nigel Croxon 2021-04-23 16:52:14 UTC

puting _netdev as an option in /etc/fstab resolved my issue of booting in the above comment.

What does your /etc/fstab look like?

Comment 11 Jon Magrini 2021-04-23 17:39:16 UTC

fstab is as follows, adding x-systemd.after=iscsi.service addressed a few random shutdown issues. 
---
/dev/storeasy/veeamrepo /srv                    xfs     _netdev,x-systemd.after=iscsi.service   0 0

Comment 12 Nigel Croxon 2021-04-27 17:25:34 UTC

I'm still unable to reproduce the issue as you have reported.

Comment 13 Nigel Croxon 2021-04-27 17:53:18 UTC

ok, maybe I spoke too soon
# dmesg |grep md0
[    4.466948] systemd[1]: dev-md0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed)
[    4.467690] systemd[1]: dev-md0.device: Dependency Before=network.target ignored (.device units cannot be delayed)
[    8.790559] md/raid1:md0: not clean -- starting background reconstruction
[    8.791167] md/raid1:md0: active with 3 out of 3 mirrors
[    8.791756] md0: detected capacity change from 0 to 103809024
[    8.809653] md: resync of RAID array md0
[    9.037098] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
[   12.732727] md: md0: resync done.

Comment 14 Nigel Croxon 2021-04-27 19:57:31 UTC

I'm getting consistent results (clean on entry) when I don't have your addition to fstab:

[root@virt2 ~]# cat /etc/fstab 

/dev/mapper/rhel-root   /                       xfs     defaults        0 0
/dev/mapper/rhel-swap   none                    swap    defaults        0 0
/dev/md0 /mdtest                                ext4    _netdev         0 0


[root@virt2 ~]# dmesg |grep md0
[    4.533658] systemd[1]: dev-md0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed)
[    4.534402] systemd[1]: dev-md0.device: Dependency Before=network.target ignored (.device units cannot be delayed)
[    9.099737] md/raid1:md0: active with 1 out of 3 mirrors
[    9.100373] md0: detected capacity change from 0 to 103809024
[    9.247297] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)



[root@virt2 ~]# cat /usr/lib/systemd/system/mdadm-clean-shutdown.service 
[Unit]
Description=Wait for a update to clean the SB and bitmap before shutdown
DefaultDependencies=no
Requires=local-fs.target
Before=iscsi-shutdown.service
After=unmount.target

[Service]
Type=oneshot
ExecStart=/sbin/mdadm --stop --scan

Comment 15 Jon Magrini 2021-06-15 15:29:39 UTC

Nigel,

Your comment #14 is still utilizing the modified shutdown service unit correct? I will retest and also ask of the customer to remove the fstab entry and modify the mdadm-clean-shutdown.service and provide results. 

Thanks.

Comment 18 Nigel Croxon 2021-07-19 19:13:38 UTC

As there is a working solution to this problem, I am closing this bz.

Note You need to log in before you can comment on or make changes to this bug.