Bug 2166237

Summary: systemd unmounts some file systems during boot because device is seen as "dead" for a short time
Product: Red Hat Enterprise Linux 9 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.1CC: anish.bhatt, dtardon, jamacku, systemd-maint-list, zconnor
Target Milestone: rcKeywords: TestOnly, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-252-3.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 08:22:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2138081    
Bug Blocks:    

Description Renaud Métrich 2023-02-01 09:22:33 UTC
Description of problem:

We have a customer reporting that he repeatedly sees some of his file systems (/boot, /home) get mounted at boot, then unmounted automatically shortly after.

Troubleshooting showed that the corresponding devices were seen as "dead" for a short time, causing the systemd logic to unmount the file systems:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
$ egrep -w "(md124|Switching|/home)" sos_commands/logs/journalctl_--no-pager_--catalog_--boot
[...]
Jan 27 22:26:18 ... systemd[1]: dev-md124.device: Changed dead -> plugged
Jan 27 22:26:18 ... systemd[1]: sys-devices-virtual-block-md124.device: Changed dead -> plugged
Jan 27 22:26:19 ... systemd[1]: sys-devices-virtual-block-md124.device: Changed dead -> plugged
Jan 27 22:26:19 ... systemd[1]: dev-md124.device: Changed dead -> plugged
[...]
Jan 27 22:26:19 ... systemctl[1208]: Switching root - root: /sysroot; init: n/a
Jan 27 22:26:19 ... systemd[1]: Switching root.
[...]
Jan 27 22:26:21 ... systemd-tmpfiles[1254]: Entry "/home" does not match any include prefix, skipping.
Jan 27 22:26:22 ... systemd[1]: home.mount: About to execute /usr/bin/mount /dev/disk/by-label/role /home -t xfs
Jan 27 22:26:22 ... systemd[1]: Mounting /home...
Jan 27 22:26:22 ... systemd[1255]: home.mount: Executing: /usr/bin/mount /dev/disk/by-label/role /home -t xfs
Jan 27 22:26:22 ... kernel: XFS (md124): Mounting V5 Filesystem
Jan 27 22:26:22 ... kernel: XFS (md124): Ending clean mount
[...]
Jan 27 22:26:25 ... systemd[1]: Unit blockdev has alias blockdev@.target.
Jan 27 22:26:25 ... systemd[1]: sys-devices-virtual-block-md124.device: Changed dead -> plugged
Jan 27 22:26:25 ... systemd[1]: dev-md124.device: Changed dead -> plugged
Jan 27 22:26:25 ... systemd[1]: sys-devices-virtual-block-md124.device: Changed plugged -> dead
Jan 27 22:26:25 ... systemd[1]: dev-md124.device: Changed plugged -> dead

---> HERE IT'S DEAD, CAUSING THE UNMOUNT TO HAPPEN

Jan 27 22:26:25 ... systemd[1]: sys-devices-virtual-block-md124.device: Collecting.
Jan 27 22:26:25 ... systemd[1]: dev-md124.device: Changed dead -> plugged
Jan 27 22:26:25 ... systemd[1]: sys-devices-virtual-block-md124.device: Changed dead -> plugged
Jan 27 22:26:25 ... systemd[1]: home.mount: About to execute /usr/bin/umount /home -c
Jan 27 22:26:25 ... systemd[1]: Unmounting /home...
Jan 27 22:26:25 ... systemd[1702]: home.mount: Executing: /usr/bin/umount /home -c
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

We can even see in some cases a Partition "going away", which is unlikely to happen at all:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Jan 27 22:26:25 ... systemd[1]: Invoking unit catchup() handlers…
Jan 27 22:26:25 ... systemd[1]: dev-disk-by\x2did-ata\x2dDISK_ID\x2dpart7.device: Changed plugged -> dead
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

We found out that this issue was known Upstream (https://github.com/systemd/systemd/issues/26254) and was fixed by 3 commits:

75d7b59
cf1ac0c
4fc69e8

Version-Release number of selected component (if applicable):

systemd-250-12.el9_1.x86_64

How reproducible:

Regularly on customer systems when deploying nodes from a template, couldn't reproduce internally.

Additional info:

I delivered a test package to the customer with these 3 fixes and customer confirmed it solved the issue.

Comment 2 David Tardon 2023-02-01 15:03:07 UTC
> We found out that this issue was known Upstream
> (https://github.com/systemd/systemd/issues/26254) and was fixed by 3 commits:
> 
> 75d7b59
> cf1ac0c
> 4fc69e8

These commits are part of either systemd 251 or 252, hence this bug should be already fixed by rebase to 252.

Comment 3 Anish Bhatt 2023-02-01 19:10:34 UTC
David, would that be expected as part of RHEL 9.2 or earlier/later ?

Comment 4 David Tardon 2023-02-02 07:15:00 UTC
(In reply to Anish Bhatt from comment #3)
> David, would that be expected as part of RHEL 9.2 or earlier/later ?

9.2

Comment 14 errata-xmlrpc 2023-05-09 08:22:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2531