Description of problem: Performing "mount <mountpoint>" succeeds with rc 0, but systemd automatically unmounts the filesystem immediately: systemd[1]: Unit mnt.mount is bound to inactive unit dev-rhel_vm\x2drhel74-lv1.device. Stopping, too. systemd[1]: Unmounting /mnt... kernel: XFS (dm-3): Unmounting Filesystem systemd[1]: Unmounted /mnt. This happens when a mount point (e.g. /mnt) is reused after a device has been deleted. Typical use case: admin wants to move a logical volume from one VG to another VG. He will do: 1. create the LV (lv2) on new VG 2. mount lv2 on /tmplv2 temporarily 3. copy old LV (lv1) mounted on /mylv to /tmplv2 4. umount /tmplv2 /mylv 5. update /etc/fstab (changing lv1 into lv2) 6. mount /mylv At step 6, the /mylv filesystem gets unmounted automatically, until "systemctl daemon-reload" is executed. Version-Release number of selected component (if applicable): systemd-219-42.el7_4.1.x86_64 How reproducible: Always Steps to Reproduce: 1. Create 2 LVs and format them lvcreate -n lv1 -L 1G vg lvcreate -n lv2 -L 1G vg mkfs.xfs /dev/vg/lv1 mkfs.xfs /dev/vg/lv2 2. Have lv1 be in /etc/fstab and mount it, then unmount it # grep lv1 /etc/fstab /dev/vg/lv1 /mnt xfs defaults 0 0 # mount /mnt # umount /mnt 3. Edit /etc/fstab to use lv2 instead of lv1 # grep lv2 /etc/fstab /dev/vg/lv2 /mnt xfs defaults 0 0 4. Delete lv1 # lvremove /dev/vg/lv1 5. Mount lv2 # mount /mnt; echo $? 0 Actual results: /mnt gets automatically unmounted by systemd. Journal shows: systemd[1]: Unit mnt.mount is bound to inactive unit dev-vg-lv1.device. Stopping, too. systemd[1]: Unmounting /mnt... systemd[1]: Unmounted /mnt. Expected results: No unmount Additional info: This seems to be due to some cache within systemd: even though /mnt has been unmounted, hence unit still exist for systemd: # systemctl --all | grep mnt.mount mnt.mount loaded inactive dead /mnt I would expect the unit to be destroyed upon umount. But it's not the case, because it has been added to /etc/fstab I guess.
I just got bitten by this as well. Had an iSCSI volume mounted, needed to use different storage, transferred files, unmounted iSCSI and mounted the new storage at the same mount point. So far so good. But then when I removed the iSCSI target volume and logged out of it systemd "failed" the mount and unmounted the NEW storage. And kept unmounting it as soon as I tried to mount it again. Worked around it by commenting the mount in fstab, running "systemctl daemon-reload" and "systemctl reset-failed", un-commenting the mount from fstab and running "systemctl daemon-reload" again (and then re-mounting, of course).
I just discovered a 3-day data loss because of this bug. Last Friday, I finished a migration of a filesystem which stores a central log server. Then, I removed the old logical volume and did not figure that systemd unmounted the migrated filesystem and stopped the "rsyslog" daemon (*). ---------------- (*) Note: actually, "rsyslog" stopped because of a custom unit parameter: [Unit] Requires=dev-mapper-volg-lvol.device After=dev-mapper-volg-lvol.device
Created attachment 1413329 [details] Bash script for bug reproduction The attached shell script allows reproduction of the bug. After a data migration simulation, systemd enters a loop that tries to unmount a busy filesystem.
I potentially ran into this issue and have logged a support ticket with Red Hat (CASE 02093462). In my situation, the SysAdmin incorrectly added an entry into /etc/fstab with the wrong logical volume name. They manually mounted the file system successfully. Later, after a reboot, the file system would not mount because systemd associated the mount unit to the inactive logical volume. The systemctl daemon-reload command was executed and the file system mounted properly.
Hi, all, We also have a systemd-related problem in the RHEL-7.4 test. It looks similar to this, can you provide a fixed systemd installation package, we want to upgrade and test it. Thanks. Xiaojun.
(In reply to Tan Xiaojun from comment #6) > Hi, all, > > We also have a systemd-related problem in the RHEL-7.4 test. It looks > similar to this, can you provide a fixed systemd installation package, we > want to upgrade and test it. > > Thanks. > Xiaojun. Oh. BTW, we need the AArch64(arm64) version of the package. Thanks a lot. Xiaojun.
There is no fix yet. Workaround is to run "systemctl daemon-reload" after modifying the mount point.
This behavior is unexpected, unnecessary, and frankly dangerous. It is unexpected because ever since the existence of an /etc/fstab file, it was only consulted under two conditions: 1) During boot, it would be read to see what FSes need to be mounted, and 2) the "mount" command would read it if the parameters given to it required that (for instance "mount -a" or "mount /some/mount/point"). These really are the only two times I can think of off the top of my head that this file has every traditionally been consulted. In fact, since boot generally has always just done a "mount -a" these could be considered one, but I digress. This file was never baby-sat to make sure that the system constantly has the things listed in it are always mounted, and when unmounted externally, automatically re-mount them. Certainly at no time has it ever been considered normal for the _former_ contents of this file to be consulted and enforced by the system (which in essence is what happens here). It's unnecessary simply because it has never been anyone's expectation that this file (or some idea of what might be in it) be enforced by the system. It's dangerous because people are just not expecting this, they don't realize they have to change their behavior and that is reasonable, because the file looks the same as it always has. Here's a "for instance" for you: Let's say someone has a swap device that they feel should be bigger. They add some storage, make that storage into a new swap device, update the fstab, start swapping to the new device, and stop swapping on the old one. They figure they're done: They're swapping to the new one now, and they figure that the next time the machine boots, it'll pick up the change in /etc/fstab, because this is how that has always worked. They go on their merry way. At some point, maybe the next day, maybe a week later, the storage admin sees the sysadmin in the hall and asks if they finished that swap exchange. As far as the sysadmin knows they're done, and says so. They talk about the weather, the game last night, complain about management and whatever else. The next time the storage admin thinks of it, removes that old swap LUN. Neither of them thought to check if systemd decided it was smarter than everyone else and made the machine start swapping to the old device. At this point, you've just irreversibly corrupted memory. Maybe you've corrupted your database in the process.... The machine crashes and we all have a terrible time trying to figure out what the heck happened. The customer doesn't make the connection to the removed LUN and the crash (perhaps the storage admin tarried for a day or two more before removing it so the sysadmin had no idea there even was a correlation in time with any other event), so they never even mention it to us. What a nightmare for everyone involved. Well, everyone except the developers who thought this was a good idea in the first place. Since we never figure out why this is the reason we lost that customer (because we never figured out this is what happened), the programmers get off scott-free, completely oblivious and emboldened to continue making decisions like this. Well, at least until the company goes under because we keep doing things like this and all our customers just leave when someone better comes along. OK, so maybe I'm having a little fun with the story time, but I think you get the idea: This was just not a good move. If you're going to leave a well known file hanging around in your distribution, with the same format it's always had, you really should continue to use it in the same way it's always been used. If you want to change behavior in a drastic way like this, the old well-known file needs to go away so that people don't continue to use it the way they always have and expect it to work the way it always has.
My reproducer is broken. After creating /dev/vg/lv1 entry in /etc/fstab, use "systemctl daemon-reload" so that the "mnt.mount" unit gets created. Then, because /etc/fstab has been modified without reloading systemd afterwards, the mnt.mount unit points to invalid (non-existent) device: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # systemctl status mnt.mount ● mnt.mount - /mnt Loaded: loaded (/etc/fstab; bad; vendor preset: disabled) Active: inactive (dead) since Fri 2018-07-06 13:42:45 CEST; 4s ago Where: /mnt What: /dev/vg/lv1 Docs: man:fstab(5) man:systemd-fstab-generator(8) Process: 1521 ExecUnmount=/bin/umount /mnt (code=exited, status=0/SUCCESS) Jul 06 13:42:45 vm-systemd7-umount systemd[1]: About to execute: /bin/umount /mnt Jul 06 13:42:45 vm-systemd7-umount systemd[1]: Forked /bin/umount as 1521 Jul 06 13:42:45 vm-systemd7-umount systemd[1]: mnt.mount changed mounted -> unmounting Jul 06 13:42:45 vm-systemd7-umount systemd[1]: Unmounting /mnt... Jul 06 13:42:45 vm-systemd7-umount systemd[1521]: Executing: /bin/umount /mnt Jul 06 13:42:45 vm-systemd7-umount systemd[1]: Child 1521 belongs to mnt.mount Jul 06 13:42:45 vm-systemd7-umount systemd[1]: mnt.mount mount process exited, code=exited status=0 Jul 06 13:42:45 vm-systemd7-umount systemd[1]: mnt.mount changed unmounting -> dead Jul 06 13:42:45 vm-systemd7-umount systemd[1]: Job mnt.mount/stop finished, result=done Jul 06 13:42:45 vm-systemd7-umount systemd[1]: Unmounted /mnt. Warning: mnt.mount changed on disk. Run 'systemctl daemon-reload' to reload units. -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- See also the "Warning" which clearly indicates the issue.
I can confirm that this issue has nothing to do with LVM, I can reproduce with iscsi (see below for reproducer). ------------------------------------------------------------------------------------------------- Also, the issue only reproduces if filesystem is in /etc/fstab AND there ISN'T the "noauto" flag. ------------------------------------------------------------------------------------------------- I could also reproduce on Fedora 27. I will update the GitHub issue (https://github.com/systemd/systemd/issues/8596) also, hoping Lennart will reopen it. Reproducer using iscsi, with 2 Targets, 1 Lun in each, so that a target can be deleted to mimic switching between targets (e.g. Disaster Recovery scenario). Server (RHEL7): - copy saveconfig.json to /etc/target - create disks: truncate -s 200M /tmp/disk1.img truncate -s 200M /tmp/disk2.img - restart "target" service: systemctl restart target - stop firewall for convenience: systemctl stop firewall Client (Fedora 27 or RHEL7): - copy initiatorname.iscsi to /etc/iscsi/ - discover the targets (XXX == iscsi server): iscsiadm -m discovery -t st -p XXX - attach the targets (replace XXX by what was found above): iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.XXX.x8664:sn.c0e14b4d5602 -l iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.XXX.x8664:sn.f6f77528fc0b -l - format luns: mkfs.xfs -L DISK1 /dev/disk/by-id/scsi-3600140529f7fada8dfd43babba397b96 mkfs.xfs -L DISK2 /dev/disk/by-id/scsi-360014051f1fbab5955b4facafb2a36fc Now, run the usual scenario. For convenience, I did the following: 1. Edit /etc/fstab and add line: LABEL=DISK1 /mnt xfs defaults 0 0 2. Fake a normal boot # systemctl daemon-reload; mount /mnt 3. Unmount /mnt # umount /mnt 4. Edit /etc/fstab and change /mnt line to use second disk: LABEL=DISK2 /mnt xfs defaults 0 0 5. Detach the 1st target with lun DISK0 (replace XXX by what was found above): # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.XXX.x8664:sn.c0e14b4d5602 -u 6. Check that "systemctl status mnt.mount" complains about reloading (don't reload of course) # systemctl status mnt.mount | grep Warning Warning: mnt.mount changed on disk. Run 'systemctl daemon-reload' to reload units. 7. Mount the new disk: # mount /mnt --> gets automatically unmounted until "systemctl daemon-reload" is performed.
Created attachment 1457011 [details] iSCSI reproducer
Let's try this from a different angle: Will someone from the systemd development team please explain what the design intent was behind this behavior? What problem was this behavior intended to solve, or what does it bring to the user which was not available to the user before? Is anyone from the systemd development team even getting any of this? If you're out there, would you at least drop us a line and let us know that you see this? I just did a quick scan and didn't see any response from anyone in development. Either I missed it, or perhaps there is something wrong with the flags on this bug which make it unseen, or perhaps, well, something else. If we get no response from development soon, I'll assume the best intentions and start trying to figure out why no one from the systemd development team are seeing this.
Hello, I have been testing this in order to understand when it pops up in more detail: [1] The type of device being mounted is of little importance. I have successfully reproduced it even with scsi devices on VMs and with loop devices (in addition to iscsi and lvm mentioned earlier). Details for the reproducers below. [2] Based on my tests, the key factor appears to be that the device described in the mount unit (in "What=") be invalid (non existing). This triggers systemd to take action. There are other secondary artifacts related to this - such as systemd automounting the filesystem as soon as the device re-appears (which, most likely, is not what the administrator expects. Auto-mounting is not necessarily a good idea and it is breaking conventions dating decades back - for example the administrator might wish to run a filesystem check on the device) [3] It is important to note that, as long as the device mentioned in "What=" in the mount systemd unit exists, systemd *will* accept other devices being mounted on the same mount point. [4] This still appears on a fully updated RHEL 7.5 with systemd-219-57.el7.x86_64. [5] Reproducers: A. With loop devices: [A.1] Create 2 disks for the loop devices: truncate -s 10g 10gdisk1 truncate -s 10g 10gdisk2 [A.2] Create the loop devices: losetup -f 10gdisk1 losetup -f 10gdisk2 Confirm they are there: losetup -a [A.3] Format them with mkfs (this is only necessary the first time the loop devices are created): mkfs.xfs /dev/loop0 mkfs.xfs /dev/loop1 [A.4] In /etc/fstab, add the entry: /dev/loop0 /mnt xfs defaults 0 0 and then reload the units: systemctl daemon-reload [A.5] Remove /dev/loop0: losetup -d /dev/loop0 [A.6] Try to mount /dev/loop1 on /mnt mount /dev/loop1 /mnt In the logs the following messages appear: Aug 13 10:46:14 fastvm-rhel-7-5-100 kernel: XFS (loop1): Mounting V5 Filesystem Aug 13 10:46:14 fastvm-rhel-7-5-100 kernel: XFS (loop1): Ending clean mount Aug 13 10:46:14 fastvm-rhel-7-5-100 systemd: Unit mnt.mount is bound to inactive unit dev-loop0.device. Stopping, too. Aug 13 10:46:14 fastvm-rhel-7-5-100 systemd: Unmounting /mnt... Aug 13 10:46:14 fastvm-rhel-7-5-100 kernel: XFS (loop1): Unmounting Filesystem Aug 13 10:46:14 fastvm-rhel-7-5-100 systemd: Unmounted /mnt. -------------------------------------------- B. With plain scsi devices: [B.1] Create a system with (at least) 2 additional scsi devices, apart from the one used for root. (e.g. sdb & sdc). This can be a VM with 2 additional disks. Format the devices with a filesystem (I've been using xfs): mkfs.xfs /dev/sdb mkfs.xfs /dev/sdc [B.2] Add the following entry in fstab and reload systemd units: /dev/sdb /mnt xfs defaults 0 0 And then: systemctl daemon-reload [B.3] Remove /dev/sdb from the system: echo 1 > /sys/block/sdb/device/delete [B.4] Mount /dev/sdc on /mnt: mount /dev/sdc /mnt The following appears in the logs: Aug 13 11:08:09 fastvm-rhel-7-5-100 kernel: XFS (sdc): Mounting V5 Filesystem Aug 13 11:08:09 fastvm-rhel-7-5-100 kernel: XFS (sdc): Ending clean mount Aug 13 11:08:09 fastvm-rhel-7-5-100 systemd: Unit mnt.mount is bound to inactive unit dev-sdb.device. Stopping, too. Aug 13 11:08:09 fastvm-rhel-7-5-100 systemd: Unmounting /mnt... Aug 13 11:08:09 fastvm-rhel-7-5-100 kernel: XFS (sdc): Unmounting Filesystem Aug 13 11:08:09 fastvm-rhel-7-5-100 systemd: Unmounted /mnt. -------------------------------------------- Additional note for B: It is interesting, that after rescanning and re-detecting /dev/sdb, it gets auto mounted. To be more specific, as soon as I run: echo '- - -' > /sys/class/scsi_host/host2/scan I see in the logs: Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: scsi 2:0:0:2: Direct-Access QEMU QEMU HARDDISK 1.5. PQ: 0 ANSI: 5 Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: sd 2:0:0:2: Attached scsi generic sg1 type 0 Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: sd 2:0:0:2: [sdb] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB) Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: sd 2:0:0:2: [sdb] Write Protect is off Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: sd 2:0:0:2: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: sd 2:0:0:2: [sdb] Attached SCSI disk Aug 13 11:09:49 fastvm-rhel-7-5-100 systemd: mnt.mount: Directory /mnt to mount over is not empty, mounting anyway. Aug 13 11:09:49 fastvm-rhel-7-5-100 systemd: Mounting /mnt... Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: XFS (sdb): Mounting V5 Filesystem Aug 13 11:09:49 fastvm-rhel-7-5-100 kernel: XFS (sdb): Ending clean mount Aug 13 11:09:49 fastvm-rhel-7-5-100 systemd: Mounted /mnt. Nobody issued a mount command here! And nobody would expect this to be mounted. (In the case of my test system, sdb has host:bus:target:lun: 2:0:0:2 - therefore it is on host2. The host number may need to be adjusted on other systems) Regards, Alexandros
Issue happens as soon as the mount unit is part of the dependency tree (through WantedBy or RequiredBy). Hence, to reproduce with /etc/fstab entry, "noauto" doesn't work (because there is no dep created), but "nofail" is enough (because a WantedBy dep is created). Most simplest reproducer without /etc/fstab: # truncate -s 1G /root/myloop # losetup /dev/loop1 /root/myloop # mkfs.xfs /dev/loop1 # cat /etc/systemd/system/mnt.mount [Mount] What=/dev/loop0 Where=/mnt Type=xfs Options=defaults [Install] WantedBy=multi-user.target # systemctl enable mnt.mount /dev/loop0 doesn't exist. Everytime I try to "mount /dev/loop1 /mnt", "/mnt" gets automatically unmounted. systemd backtrace when this happens: (gdb) break unit_check_binds_to Breakpoint 1 at 0x557d18317e70: file src/core/unit.c, line 1642. (gdb) cont Continuing. Breakpoint 1, unit_notify (u=0x557d19fc9790, os=UNIT_ACTIVE, ns=UNIT_ACTIVE, reload_success=<optimized out>) at src/core/unit.c:1990 1990 unit_check_binds_to(u); (gdb) bt #0 unit_notify (u=0x557d19fc9790, os=UNIT_ACTIVE, ns=UNIT_ACTIVE, reload_success=<optimized out>) at src/core/unit.c:1990 #1 0x0000557d18324a01 in device_update_found_by_name (now=true, found=DEVICE_FOUND_MOUNT, add=true, path=0x557d19efdd90 "\200\313\357\031}U", m=0x557d19efdd90) at src/core/device.c:523 #2 device_found_node (m=m@entry=0x557d19efdd90, node=node@entry=0x557d19f10440 "/dev/loop1", add=add@entry=true, found=found@entry=DEVICE_FOUND_MOUNT, now=now@entry=true) at src/core/device.c:833 #3 0x0000557d18327757 in mount_load_proc_self_mountinfo (m=m@entry=0x557d19efdd90, set_flags=set_flags@entry=true) at src/core/mount.c:1609 #4 0x0000557d18328447 in mount_dispatch_io (source=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=0x557d19efdd90) at src/core/mount.c:1756 #5 0x0000557d1830eae0 in source_dispatch (s=s@entry=0x557d19f066e0) at src/libsystemd/sd-event/sd-event.c:2115 #6 0x0000557d1830fb7a in sd_event_dispatch (e=0x557d19efe280) at src/libsystemd/sd-event/sd-event.c:2472 #7 0x0000557d1830fd1f in sd_event_run (e=<optimized out>, timeout=<optimized out>) at src/libsystemd/sd-event/sd-event.c:2501 #8 0x0000557d18270ec3 in manager_loop (m=0x557d19efdd90) at src/core/manager.c:2212 #9 0x0000557d1826546b in main (argc=5, argv=0x7ffdc938e5f8) at src/core/main.c:1773 (gdb)
After the mount occurs, systemd discovers the mount and we fall into this piece of code: 1669 static void retroactively_start_dependencies(Unit *u) { 1670 Iterator i; 1671 Unit *other; 1672 1673 assert(u); 1674 assert(UNIT_IS_ACTIVE_OR_ACTIVATING(unit_active_state(u))); 1675 ... 1681 SET_FOREACH(other, u->dependencies[UNIT_BINDS_TO], i) 1682 if (!set_get(u->dependencies[UNIT_AFTER], other) && 1683 !UNIT_IS_ACTIVE_OR_ACTIVATING(unit_active_state(other))) 1684 manager_add_job(u->manager, JOB_START, other, JOB_REPLACE, true, NULL, NULL) ; Case of 'WantedBy=multi-user.target' as dependency: Line 1681, we have "other" == dev-loop0.device, which is the non-existent device. Hence, this causes the automatic umount. We should have got "dev-loop1.device" instead. Case of no 'WantedBy': Line 1681, we have "other" == dev-loop1.device (i.e. good device). Apparently, the u->dependencies list doesn't get updated with proper device. Below is the backtrace when the device is the expected one (dev-loop1.device): 2204 int unit_add_dependency(Unit *u, UnitDependency d, Unit *other, bool add_reference) { (gdb) p *other $4 = {manager = 0x557d19efdd90, type = UNIT_DEVICE, load_state = UNIT_LOADED, merged_into = 0x0, id = 0x557d19f099a0 "dev-loop1.device", instance = 0x0, names = 0x557d19f85e40, dependencies = { 0x0 <repeats 24 times>}, requires_mounts_for = 0x0, description = 0x557d19fab290 "/dev/loop1", documentation = 0x0, fragment_path = 0x0, source_path = 0x0, dropin_paths = 0x0, fragment_mtime = 0, source_mtime = 0, dropin_mtime = 0, job = 0x0, nop_job = 0x0, job_timeout = 90000000, job_timeout_action = EMERGENCY_ACTION_NONE, job_timeout_reboot_arg = 0x0, refs = 0x0, conditions = 0x0, asserts = 0x0, condition_timestamp = {realtime = 0, monotonic = 0}, assert_timestamp = {realtime = 0, monotonic = 0}, inactive_exit_timestamp = { realtime = 1534170428533938, monotonic = 291516132}, active_enter_timestamp = { realtime = 1534170428533938, monotonic = 291516132}, active_exit_timestamp = {realtime = 0, monotonic = 0}, inactive_enter_timestamp = {realtime = 0, monotonic = 0}, slice = {unit = 0x0, refs_next = 0x0, refs_prev = 0x0}, units_by_type_next = 0x557d19f936c0, units_by_type_prev = 0x557d19f93ce0, has_requires_mounts_for_next = 0x0, has_requires_mounts_for_prev = 0x0, load_queue_next = 0x0, load_queue_prev = 0x0, dbus_queue_next = 0x557d19f394d0, dbus_queue_prev = 0x557d19fc5010, cleanup_queue_next = 0x0, cleanup_queue_prev = 0x0, gc_queue_next = 0x0, gc_queue_prev = 0x0, cgroup_queue_next = 0x0, cgroup_queue_prev = 0x0, pids = 0x0, sigchldgen = 0, gc_marker = 8714, deserialized_job = -1, load_error = 0, unit_file_state = _UNIT_FILE_STATE_INVALID, unit_file_preset = -1, cgroup_path = 0x0, cgroup_realized_mask = 0, cgroup_subtree_mask = 0, cgroup_members_mask = 0, on_failure_job_mode = JOB_REPLACE, stop_when_unneeded = false, default_dependencies = true, refuse_manual_start = false, refuse_manual_stop = false, allow_isolate = false, ignore_on_isolate = true, ignore_on_snapshot = true, condition_result = false, assert_result = false, transient = false, in_load_queue = false, in_dbus_queue = true, in_cleanup_queue = false, in_gc_queue = false, in_cgroup_queue = false, sent_dbus_new_signal = true, no_gc = false, in_audit = false, cgroup_realized = false, cgroup_members_mask_valid = true, cgroup_subtree_mask_valid = true} (gdb) bt #0 unit_add_dependency (u=u@entry=0x557d19fc5010, d=UNIT_AFTER, other=other@entry=0x557d19f93a00, add_reference=add_reference@entry=true) at src/core/unit.c:2204 #1 0x0000557d18313a57 in unit_add_two_dependencies (u=0x557d19fc5010, d=<optimized out>, e=UNIT_BINDS_TO, other=0x557d19f93a00, add_reference=true) at src/core/unit.c:2314 #2 0x0000557d183154c9 in unit_add_node_link (u=u@entry=0x557d19fc5010, what=<optimized out>, wants=<optimized out>, dep=UNIT_BINDS_TO) at src/core/unit.c:2869 #3 0x0000557d183271d1 in mount_add_device_links (m=0x557d19fc5010) at src/core/mount.c:348 #4 mount_add_extras (m=m@entry=0x557d19fc5010) at src/core/mount.c:521 #5 0x0000557d18328968 in mount_load (u=0x557d19fc5010) at src/core/mount.c:571 #6 0x0000557d18313db0 in unit_load (u=0x557d19fc5010) at src/core/unit.c:1209 #7 0x0000557d1826d00e in manager_dispatch_load_queue (m=m@entry=0x557d19efdd90) at src/core/manager.c:1394 #8 0x0000557d18328457 in mount_dispatch_io (source=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=0x557d19efdd90) at src/core/mount.c:1768 #9 0x0000557d1830eae0 in source_dispatch (s=s@entry=0x557d19f066e0) at src/libsystemd/sd-event/sd-event.c:2115 #10 0x0000557d1830fb7a in sd_event_dispatch (e=0x557d19efe280) at src/libsystemd/sd-event/sd-event.c:2472 #11 0x0000557d1830fd1f in sd_event_run (e=<optimized out>, timeout=<optimized out>) at src/libsystemd/sd-event/sd-event.c:2501 #12 0x0000557d18270ec3 in manager_loop (m=0x557d19efdd90) at src/core/manager.c:2212 #13 0x0000557d1826546b in main (argc=5, argv=0x7ffdc938e5f8) at src/core/main.c:1773 (gdb)
The difference comes from mount_setup_unit(). When mnt.mount is not known to systemd (not in the dep tree), a new unit is created, which right device dev-loop1.device. When mnt.mount is known to systemd due to being in dep tree, the unit is reused, causing the wrong device dev-loop0.device to be used in the BindsTo dependency.
Sum up of the findings: At the time of the mount, when a mount unit already exists for systemd (typically because systemd-fstab-generator generated it and linked it to other units, such as remote-fs.target for a remote mount), the mount unit is reused, and nothing is changed except the "What" property (which is read from /proc/self/mounts). This leads to an inconsistency between configured dependencies and expected ones: # systemctl show mnt.mount | grep loop What=/dev/loop1 BindsTo=dev-loop0.device WantedBy=multi-user.target dev-loop0.device After=local-fs-pre.target dev-loop0.device systemd-journald.socket system.slice -.mount RequiresMountsFor=/ /dev/loop0 When the mount unit doesn't exist yet, then there is no inconsistency of course, since a new unit is built. The design issue is believing a mount point name is considered unique, which explains why the mount unit for a given mount point is built as "<mountpoint>.mount". What is unique is the tuple "<device> + <mountpoint>" instead. If systemd used this name instead, then there would be no issue at all. Another scenario where the issue can occur (and that's even freakier!): - Assume /mnt is defined in /etc/fstab with device /dev/loop0 and that device exists. - Unmount /mnt - Mount /dev/loop1 as /mnt instead: mount /dev/loop1 /mnt. - Delete /dev/loop0: losetup -d /dev/loop0 Result: /mnt gets unmounted! (since there is the BindsTo=dev-loop0.device property. I can reproduce with Fedora 28 as well, now preparing to re-open the issue upstream...
I am wondering why this bug report was tagged as "confidential" ( https://imgur.com/a/yHKWGPj ), because: # I was able to subscribe to notifications from this bug report after I had found it using Google Search. Therefore, this bug was originally public. # If this bug is hidden, neither systemd developers nor the Linux community will be able to retrieve additional information which was not forwarded to systemd's bug tracker ( https://github.com/systemd/systemd/issues/8596 and https://github.com/systemd/systemd/issues/9869 ). # I believe this bug is not related to information security, as I do not see any security-related information. A systemd developer written me the following comment: "I am sorry, but if you have issues with LVM2, please first contact your downstreams and the LVM community.". I have not contacted the LVM community because I do not believe it is needed, however how will I prove that a previous contact to distribution maintainers have already been made if this bug report is hidden? Unless there is a feasible reason which I have not seen yet, I request Red Hat to make this bug report public again.
I'm also seeing this bug. We have a process where we remove a test filesystem, and volume group, and take a snapshot at the storage array level of a production filesystem and then mount it back to the where the original test filesystem was mounted. Basically, just re-using the same mount point with a new device. The only way I can get it to work on a reliable basis is to run "systemctl daemon-reload" before the mount. I don't really understand why there are no updates to this bug which seems to be easily reproduced.
No solution is known as of now, see https://github.com/systemd/systemd/issues/9869.
Hello, There is one additional artifact resulting from this behaviour of systemd: If the filesystem is in use and systemd cannot unmount it, then systemd gets in an endless loop flooding the logs with messages due to its failure to unmount a busy filesystem. On a test VM, I've got in about 2 minutes 20k lines from systemd (and umount) for these failures. A simple way to reproduce this is: [1] Create the devices (for this example loop devices - but this is just because they can be deleted easily. Any block device will do): truncate -s 10G /tmp/bigfile1 truncate -s 10G /tmp/bigfile2 losetup -f /tmp/bigfile1 losetup -f /tmp/bigfile2 losetup -a /dev/loop0: [64768]:4194370 (/tmp/bigfile1) /dev/loop1: [64768]:4674228 (/tmp/bigfile2) mkfs.xfs /dev/loop0 mkfs.xfs /dev/loop1 mkdir /test [2] Add in /etc/fstab an entry to mount /dev/loop0 on /test, e.g.: /dev/loop0 /test xfs defaults 0 0 [3] Reload systemd units and mount everything from fstab: systemctl daemon-reload mount -a grep test /proc/mounts /dev/loop0 /test xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 [4] Unmount /dev/loop0 and mount /dev/loop1 on /test umount /dev/loop0 mount /dev/loop1 /test grep test /proc/mounts /dev/loop1 /test xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 [5] Keep /test busy (e.g. cd /test) and monitor /var/log/messages with tail -f [6] Delete /dev/loop0: losetup -d /dev/loop0 Result: continuous messages from umount and systemd about /test being busy. Regards, Alexandros
Hi, (I also put this update on the upstream project: https://github.com/systemd/systemd/issues/9869). We are seeing this exact behaviour also on our RHEL7 servers. But we are having another use case. This problem is triggered when the backup client starts to backup and when the puppet client (and it's restarting services) run on the same time. We use R1Soft backup software which make a snapshot on a special partition (/dev/hcp1) and when the client is done than it unmounts that partition (check 13:39:10 below), systemd is unmounting the /var partition and renders the server useless. IMHO this is very bad behaviour and for us the only workaround is to stop the backupservers before we roll new updates/restarts (using Puppet) out to our systems Feb 5 13:37:10 xxxxxxxx kernel: hcp: INFO: dec_device_user: 2,9942,cdp-2-6 Feb 5 13:37:10 xxxxxxxx kernel: hcp: INFO: stopping hcp session hcp1. Feb 5 13:37:10 xxxxxxxx kernel: hcp: INFO: hcp session hcp1 stopped. Feb 5 13:37:10 xxxxxxxx systemd: Stopping MariaDB database server... Feb 5 13:37:10 xxxxxxxx systemd: Stopping Avahi mDNS/DNS-SD Stack... Feb 5 13:37:10 xxxxxxxx systemd: Stopping NTP client/server... Feb 5 13:37:10 xxxxxxxx systemd: Stopped target Local File Systems. Feb 5 13:37:10 xxxxxxxx systemd: Stopping Load/Save Random Seed... Feb 5 13:37:10 xxxxxxxx systemd: Stopped This is the timer to set the schedule for automated renewals. Feb 5 13:37:10 xxxxxxxx systemd: Stopping The nginx HTTP and reverse proxy server... Feb 5 13:37:10 xxxxxxxx systemd: Stopping Update UTMP about System Boot/Shutdown... Feb 5 13:37:10 xxxxxxxx systemd: Stopped Flush Journal to Persistent Storage. Feb 5 13:37:10 xxxxxxxx chronyd[2938]: chronyd exiting Feb 5 13:37:10 xxxxxxxx systemd: Stopped Flexible branding. Feb 5 13:37:10 xxxxxxxx systemd: Stopping The PHP FastCGI Process Manager... Feb 5 13:37:10 xxxxxxxx systemd: Stopped The nginx HTTP and reverse proxy server. Feb 5 13:37:10 xxxxxxxx systemd: Stopped Update UTMP about System Boot/Shutdown. Feb 5 13:37:10 xxxxxxxx systemd: Stopped The PHP FastCGI Process Manager. Feb 5 13:37:10 xxxxxxxx systemd: Stopped Load/Save Random Seed. Feb 5 13:37:10 xxxxxxxx avahi-daemon[2887]: Got SIGTERM, quitting. Feb 5 13:37:10 xxxxxxxx avahi-daemon[2887]: Leaving mDNS multicast group on interface eth0.IPv4 with address ip.ip.ip.ip. Feb 5 13:37:10 xxxxxxxx avahi-daemon[2887]: avahi-daemon 0.6.31 exiting. Feb 5 13:37:10 xxxxxxxx systemd: Stopped NTP client/server. Feb 5 13:37:10 xxxxxxxx systemd: Stopped Avahi mDNS/DNS-SD Stack. Feb 5 13:37:10 xxxxxxxx systemd: Closed Avahi mDNS/DNS-SD Stack Activation Socket. Feb 5 13:37:11 xxxxxxxx systemd: Stopped The NGINX part of SMT. Feb 5 13:37:15 xxxxxxxx systemd: Stopped MariaDB database server. Feb 5 13:37:15 xxxxxxxx systemd: Unmounting /var... Feb 5 13:37:15 xxxxxxxx umount: umount: /var: target is busy. Feb 5 13:37:15 xxxxxxxx umount: (In some cases useful info about processes that use Feb 5 13:37:15 xxxxxxxx umount: the device is found by lsof(8) or fuser(1)) Feb 5 13:37:15 xxxxxxxx systemd: var.mount mount process exited, code=exited status=32 Feb 5 13:37:15 xxxxxxxx systemd: Failed unmounting /var. Feb 5 13:37:15 xxxxxxxx systemd: Unit var.mount is bound to inactive unit dev-disk-by\x2duuid-7ec6b55c\x2d5923\x2d4dd2\x2db8aa\x2da821e84f71ee.device. Stopping, too. Feb 5 13:37:15 xxxxxxxx systemd: Unmounting /var... Feb 5 13:37:15 xxxxxxxx umount: umount: /var: target is busy. Feb 5 13:37:15 xxxxxxxx umount: (In some cases useful info about processes that use Feb 5 13:37:15 xxxxxxxx umount: the device is found by lsof(8) or fuser(1))
This is not fixed in upstream, so it won't be in rhel-7.7.
This one is hurting us pretty good :-( I have filled in a support ticket so hope this speed things up.
Hello ! Do you have some news about this ? This bug is really critical when you have multiple partitions. You need to push it on repo ASAP! Best regards, Quentin
Below is yet another scenario where that mighty behaviour occurs: 1. Set up a system with "/var" on EXT4 filesystem (can be another file system of course) By default, Anaconda creates /etc/fstab entries using "UUID=<uuid>" devices 2. Record the UUID for "/var", say "87fda01b-eeaa-4702-806d-ca693f55d6ad" 3. Add a new disk (e.g. "/dev/vdb") to the system and create a file system with same UUID as "/var": "87fda01b-eeaa-4702-806d-ca693f55d6ad" NOTE: in the real world, this is legit in case you backup a partition "byte by byte" for example. Another example is using the third party backup software https://www.r1soft.com which creates a temporary device "/dev/hcp1" with exact same UUID as partition being backed up. 4. At this point "/dev/disk/by-uuid/87fda01b-eeaa-4702-806d-ca693f55d6ad" will point to the new disk (e.g. "/dev/vdb") due to udev rule running 5. Reload systemd 6. Unplug the "/dev/vdb" disk NOTE: in the real world, this is legit in case you remove the backup disk (for example if it's a USB device). Another example is finishing the backup when using third party backup software https://www.r1soft.com which deletes the temporary device "/dev/hcp1" 7. Udev updates the "/dev/disk/by-uuid/87fda01b-eeaa-4702-806d-ca693f55d6ad" link to point back to initial partition hosting "/var" 8. Systemd tries unmounting "/var" in loop (because this partition is busy in our case), **breaking** the system The only workaround to this new reproducer is to mount devices by device path, instead of LABEL or UUID, but device paths may change ... I believe this now needs to be seriously taken into account Upstream, this is a critical issue.
There is still no solution in upstream, this bug will very likely miss 7.8.
just got bit by this while rebuilding /boot and rootvg online. couldn't mount the new /boot nor would grub2-mkconfig produce usable config. all of this was solved with a daemon-reload.
this bug also can replicated with this following scenario by deactivate old VG that belong to that mountpoint 1).initial config, vg_T2-T2 mounted to /u05_new & vg_T1-T1 to /u05 [root@rhel77 ~]# df -hT |grep u05 /dev/mapper/vg_T2-T2 ext4 2.0G 6.0M 1.9G 1% /u05_new /dev/mapper/vg_T1-T1 ext4 2.0G 6.0M 1.9G 1% /u05 2).Swap the mountpoint so vg_T1-T1 to /u05_old & vg_T2-T2 to /u05 [root@rhel77 ~]# umount /u05 [root@rhel77 ~]# umount /u05_new [root@rhel77 ~]# cat /etc/fstab |grep u05 /dev/mapper/vg_T1-T1 /u05_old ext4 defaults 0 0 /dev/mapper/vg_T2-T2 /u05 ext4 defaults 0 0 [root@rhel77 ~]# mount -a [root@rhel77 ~]# df -hT |grep u05 /dev/mapper/vg_T1-T1 ext4 2.0G 6.0M 1.9G 1% /u05_old /dev/mapper/vg_T2-T2 ext4 2.0G 6.0M 1.9G 1% /u05 3).unmout the /u05_old (for reclaim) [root@rhel77 ~]# umount /u05_old [root@rhel77 ~]# cat /etc/fstab |grep u05 #/dev/mapper/vg_T1-T1 /u05_old ext4 defaults 0 0 /dev/mapper/vg_T2-T2 /u05 ext4 defaults 0 0 [root@rhel77 ~]# df -hT |grep u05 /dev/mapper/vg_T2-T2 ext4 2.0G 6.0M 1.9G 1% /u05 4).Deactivete the VG_T1 [root@rhel77 ~]# vgchange -an vg_T1 0 logical volume(s) in volume group "vg_T1" now active 5).now the /u05 is just unmounted automatically [root@rhel77 ~]# df -hT |grep u05 6).on systemd/generator/ list root@rhel77 ~]# ls -l /run/systemd/generator/* -rw-r--r--. 1 root root 217 Aug 28 20:41 /run/systemd/generator/-.mount -rw-r--r--. 1 root root 254 Aug 28 20:41 /run/systemd/generator/boot.mount -rw-r--r--. 1 root root 176 Aug 28 20:41 /run/systemd/generator/dev-mapper-rhel\x2dswap.swap -rw-r--r--. 1 root root 220 Aug 28 20:41 /run/systemd/generator/u05.mount -rw-r--r--. 1 root root 224 Aug 28 20:41 /run/systemd/generator/u05_new.mount /run/systemd/generator/local-fs.target.requires: total 0 lrwxrwxrwx. 1 root root 30 Aug 28 20:41 -.mount -> /run/systemd/generator/-.mount lrwxrwxrwx. 1 root root 33 Aug 28 20:41 boot.mount -> /run/systemd/generator/boot.mount lrwxrwxrwx. 1 root root 32 Aug 28 20:41 u05.mount -> /run/systemd/generator/u05.mount lrwxrwxrwx. 1 root root 36 Aug 28 20:41 u05_new.mount -> /run/systemd/generator/u05_new.mount /run/systemd/generator/swap.target.requires: total 0 lrwxrwxrwx. 1 root root 51 Aug 28 20:41 dev-mapper-rhel\x2dswap.swap -> /run/systemd/generator/dev-mapper-rhel\x2dswap.swap [root@rhel77 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 484M 0 484M 0% /dev tmpfs 496M 0 496M 0% /dev/shm tmpfs 496M 6.8M 489M 2% /run tmpfs 496M 0 496M 0% /sys/fs/cgroup /dev/mapper/rhel-root 6.2G 1.2G 5.1G 20% / /dev/sda1 1014M 137M 878M 14% /boot tmpfs 100M 0 100M 0% /run/user/0 [root@rhel77 ~]#
We were also almost bitten by this bug -- which would have produced significant data lost had I not run 'df' right after attempting to mount a new device on a pre-existing mount point and finding that the new device would not mount and no error occurred. Only by checking /var/log/messages did I discover what was going on. systemd is being way too cute here -- if it wants to try to track /etc/fstab, it should perform a "systemct daemon-reload" itself if it notices that /etc/fstab has changed, rather than thinking its data about mounts should supercede the data in /etc/fstab.
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7
(In reply to Chris Williams from comment #48) > Feature Requests can re-opened and moved to RHEL 8 if the desired > functionality is not already present in the product. Will do shortly.
I am not allowed to change the product can someone from Red Hat please changed the product to be RHEL 8.
*** Bug 1660761 has been marked as a duplicate of this bug. ***
(In reply to Dariusz Wojewódzki from comment #57) > https://github.com/systemd/systemd/pull/18556 has been closed too. > Are there any other considerations for this BZ? Well, my attempt didn't get merged (patch was fairly small and "backportable" to older releases). Alternative solution that we will implement will depend on major rework of systemd dependency tracking mechanism. That is something we won't be able to backport to RHEL-8 or older. IOW, this won't be an issue anymore in RHEL-9. However, for RHEL-7 and RHEL-8 customers will need to run systemctl daemon-reload after editing /etc/fstab. I am scared to close this as WONTFIX considering number of cases...
Can you please make public the reasoning for the WONTFIX conclusion? This situation has been an on-going headache in the field under a variety of use case scenarios.
(In reply to rhayden from comment #59) > Can you please make public the reasoning for the WONTFIX conclusion? This > situation has been an on-going headache in the field under a variety of use > case scenarios. I updated https://access.redhat.com/solutions/4212551 , it should reflect that last state here. If anybody hits this (especially if on RHEL8, where changes are more likely than on RHEL7), and the workaround is not working/to painfull etc, then please share.
(In reply to Christian Horn from comment #60) > (In reply to rhayden from comment #59) > > Can you please make public the reasoning for the WONTFIX conclusion? This > > situation has been an on-going headache in the field under a variety of use > > case scenarios. > > I updated https://access.redhat.com/solutions/4212551 , > it should reflect that last state here. > If anybody hits this (especially if on RHEL8, where changes are more > likely than on RHEL7), and the workaround is not working/to painfull etc, > then please share. Not all of us have Redhat subscriber access. Please post the WONTFIX rationale and the proposed workaround here.
I have access. Below is the Resolution section...basically upstream is not interested in backporting the patches to RHEL 7/8. ------ start cut-------- Resolution This issue has been investigated in bz1494014, and upstream. Patches which would have been small enough to make it into RHEL7/RHEL8 have not found backing in upstream, so for these versions the workaround applies. Future RHEL version are believed to not be affected by this issue, due to systemd changes. Workaround: Run the systemctl daemon-reload after modifying the mount point in /etc/fstab. ------ end cut-------- It is disappointing that Red Hat's customers, who have been pretty vocal on this issue for 4+ years after numerous reported impacted incidents, are denied a fix and told to just re-train and deal with it. I get that the work around is simple, if you remember to do it. But, dang, this functionality change introduced by systemd really annoyed me and the whole bug process has been packed with frustration with upstream maintainers. For example, if you search for "systemd daemon-reload" in the Official RHEL 8 documentation on Managing File Systems ( https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/managing_file_systems/index ), you get nothing. Just unbelievably frustrating. I will eventually get over it.
(In reply to rhayden from comment #62) > It is disappointing that Red Hat's customers [..] I see your pain on this, and I do not like it either.. just to give more perspective: - the envisioned solution which also looks proper to upstream seems not even implemented today if I understand Michal in #58 correctly - seems like in rhel8 we are not doing systemd rebases like in rhel7, these were at times painful. Not doing rebases is great for stability and to lower chances of regressions, but also porting heavier fixes gets harder. It seems like the fix for the issue is more of a heavy one. I'm wondering if we could do a small change in rhel8 only, to hint users performing such an action on doing "systemctl daemon-reload". Kind of "you might want to do execute partprobe" which one got in the past when new partition tables were not reread under some situations. Not sure where to place that notification best to cover all affected cases. We probably would not want to go as far as having lvm tools check if the volume is in /etc/fstab, and just then issue the warning.
With RHEL8, the /etc/fstab file clearly mentions that for changes to take effect, "systemctl daemon-reload" has to be issued. But I understand that admins may totally forget about this. Hence, as a poor man's reminder, the following path and associated service units may be implemented to remind about that: /etc/systemd/system/fstab-watcher.path: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [Unit] Description=Watches for modifications of /etc/fstab [Path] PathChanged=/etc/fstab [Install] WantedBy=default.target -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- /etc/systemd/system/fstab-watcher.service: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [Unit] Description=Reacts to watches for modifications of /etc/fstab [Service] ExecStart=/usr/bin/wall -n "/etc/fstab changed, don't forget to execute 'systemctl daemon-reload' for changes to apply" -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Because wall will print to every user terminals, which may be overkill and annoying, it's also possible to restrict to some group only, e.g. wheel: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- ExecStart=/usr/bin/wall -n -g wheel "/etc/fstab changed, don't forget to execute 'systemctl daemon-reload' for changes to apply" -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Alternatively such service may be used to automatically trigger a "systemctl daemon-reload".
shouldn't caching mechanisms hash the source and periodically cross check it ? Maybe the logic that decided the unmount was required should do the check.
(In reply to Renaud Métrich from comment #64) > > Alternatively such service may be used to automatically trigger a "systemctl > daemon-reload". I like this option to simply go ahead and run the daemon-reload since the expectation is the SysAdmin is going to do it anyways. I wish I would have thought about that years ago.
I still don't understand why systemd thinks it has to babysit mounts and override things people do externally to it. Your afore mentioned workaround only helps the sysadmin who knows about this bizarre behavior beforehand. So, what happens when, for whatever reason, the sysadmin decides a FS needs to be fscked. (S)he unmounts the FS and starts an fsck, which is a perfectly reasonable thing to expect to be allowed to do. Then systemd decides that this should be mounted and mounts it up while fsck is running. Now you've mounted up a device that's being fscked. As far as I know, that's a pretty sure-fire way to really corrupt a FS. Why can't we just make systemd stop babysitting mounts altogether? I can think of no reason for this behavior other than someone thought it seemed like a cool thing to do. Just fire up something to mount up local FSes to start, and network FSes once the network is up (and whatever else), and just ignore them after that. It's the way that's always worked, and people still expect it to work that way.
After discussing the topic with a partner: - Couldn't systemd itself check fstab for modification before umounting something? mounts/umounts are not done very often, so having systemd verify if it's idea about the state of /etc/fstab is still matching reality might be a good idea. If that stat of /etc/fstab is considered to much/heavy, maybe looking at /etc/mtab could be enough, that is a symlink to /proc/self/mounts. - Seeing Renauds idea of /etc/systemd/system/fstab-watcher.path, I wonder if that unit could also directly do "systemctl daemon-reload". Seems like at least processes kill-HUP'ing themself is doable, so maybe also this.
I don't know that doing an automatic reload when the fstab has been changed is necessarily the right way to go, either. Here's one unintended consequence I can think of off the top of my head: Say some sysadmin is making several changes to their system, editing a lot of unit files and such. They're not done yet, and not really ready for any of these changes to take effect, as they're not done putting everything in place. They modify their fstab as one of the changes, and suddenly, the half-baked changes they've made are suddenly in force. That could cause a lot of problems, depending on what they were doing. I'm sure if I put more thought into it, I could probably come up with some more, but that alone should be enough. None of this would have ever been a problem if it were expected behavior. The operating system taking actions behind the scenes that it thinks you want just is not expected behavior. Doing an automatic reload is adding another thing that the operating system thinks the user wants behind the scenes without them asking for it. This is adding even more unexpected behavior to the last. I'm still a firm believer that babysitting mounts is just bad behavior, and really should stop. I could be wrong about this, but I don't think any customers want systemd to be doing this. As for: --->8------>8------>8------>8------>8------>8------>8------>8------>8--- If anybody hits this (especially if on RHEL8, where changes are more likely than on RHEL7), and the workaround is not working/to painfull etc, then please share. --->8------>8------>8------>8------>8------>8------>8------>8------>8--- The workaround (I believe you're talking about "just run a reload" right?) has been painful from day one, and has gotten no less so. The issue here is not that it can't be easily done, it's that it's unexpected to have to do so, and if not done, causes unexpected, potentially destructive behavior. Making the users "find out about it the hard way" by losing data, corrupting FSes, etc., etc. just doesn't seem to me like something Red Hat should be doing to their customers. There are lots of cases on this issue. There are probably more that were simply never attached. So, for your conditions: 1) It's not working, because people don't know about it until it's too late and they've got a messed up their system, lost data, whatever else (not to mention are now furious at Red Hat). 2) Yes, it's far too painful because it's unexpected behavior that can cause really bad things to happen. Therefore, I think it's warranted to re-open this bug. I know there are some customers attached to this bug report. If you disagree, please chime in. You won't hurt my feelings. It wouldn't be the first time I was wrong, and likely not the last. It's your opinion that should matter, not mine.
(In reply to Thomas Gardner from comment #69) > I'm still a firm believer that babysitting mounts is just bad behavior, and > really should stop. +1 from me, but discussion around that is probably divided and also happening (or did happen and is closed? Not sure..) in upstream. > --->8------>8------>8------>8------>8------>8------>8------>8------>8--- > If anybody hits this (especially if on RHEL8, where changes are more > likely than on RHEL7), and the workaround is not working/to painfull etc, > then please share. > --->8------>8------>8------>8------>8------>8------>8------>8------>8--- I'm all in for reopening and reevaluating how much pain this causes, and what can be done against it. I was with above putting into words what the state at that point in time was.
(In reply to David Tardon from comment #78) > A pull request has been posted upstream recently that claims to fix the > remaining issue. If it's merged, I'll try to get this to 8.7. I double checked and upstream version v252-rc2 works as expected, i.e. upstream has all the fixes already. But sadly, what I wrote in comment #58 still applies, the main fix landed as part of https://github.com/systemd/systemd/pull/19322, but that PR is 1) huge and 2) completely changes how dependencies (the most core systemd's concept) are implemented internally. On top of that, I am certain that since the time the patch was merged there were follow-ups. I've tried to close this once and there was (understandably so) pushback, so we need to address this *somehow* in RHEL-8 as well. I propose we follow the path of warning the users about this corner case instead of attempting highly risky backport. We should have all the pieces of this "warning solution" in place. We need to backport following systemd patches adding the flag file which records the time of last systemd daemon-reload, https://github.com/systemd/systemd/commit/15b9243c0d7f6d1531fa65dbc01bd11e8e6c12ca https://github.com/systemd/systemd/commit/4b3ad81bfafcd97acb06db463495e348d159d8e6 Next, we should file a backport request for this util-linux change which adds the hint based on mtime of the daemon-reload flag file and /etc/fstab, https://github.com/util-linux/util-linux/commit/1db0715169954a8f3898f7ca9d3902cd6c27084d After these 3 fixes are backported, the original issue still won't be fixed but users will be warned about the problem. I think this is the best we can do, considering all the risks.
(In reply to Michal Sekletar from comment #79) > I double checked and upstream version v252-rc2 works as expected, i.e. > upstream has all the fixes already. But sadly, what I wrote in comment #58 > still applies, the main fix landed as part of > https://github.com/systemd/systemd/pull/19322 Ah, right, I had forgotten about that one when I offered a backport. https://github.com/systemd/systemd/pull/23367 by itself looked doable.
The systemd part of comment #79 is being addressed in RHEL 8.8 by https://bugzilla.redhat.com/show_bug.cgi?id=2136869. However, I don't see any BZ for util-linux to backport the hint message part - is this still planned for 8.8? /cc @msekleta
(In reply to Frantisek Sumsal from comment #82) > The systemd part of comment #79 is being addressed in RHEL 8.8 by > https://bugzilla.redhat.com/show_bug.cgi?id=2136869. However, I don't see > any BZ for util-linux to backport the hint message part - is this still > planned for 8.8? > > /cc @msekleta I filed bugs against util-linux now and asked Karel if he is in favor of going through the exception.
(In reply to Michal Sekletar from comment #83) > I filed bugs against util-linux now and asked Karel if he is in favor of > going through the exception. Actually doing 0day would be easier option so let's go with that.
Workaround in mount utility warning users about necessity to run systemctl daemon-reload after editing /etc/fstab will be delivered in RHEL-8.8. There is nothing more we can do at this time on systemd side.