Bug 1904008 - livemedia-creator aborts on /mnt/sysimage/run unmount failed
Summary: livemedia-creator aborts on /mnt/sysimage/run unmount failed
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: lorax
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Brian Lane
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-03 11:20 UTC by Gilles Buloz
Modified: 2021-10-01 22:57 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-01 22:57:47 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
log files produced by livemedia-creator (406.28 KB, application/gzip)
2020-12-03 11:20 UTC, Gilles Buloz
no flags Details
The custom kickstart passed to livemedia-creator (30.27 KB, text/plain)
2020-12-04 09:50 UTC, Gilles Buloz
no flags Details

Description Gilles Buloz 2020-12-03 11:20:15 UTC
Created attachment 1736036 [details]
log files produced by livemedia-creator

Description of problem:
The live media generation fails with the following messages :
2020-11-12 11:01:30,964: Error in atexit._run_exitfuncs:
2020-11-12 11:01:30,966: Traceback (most recent call last):
2020-11-12 11:01:30,967: File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 477, in _get_method_reply
2020-11-12 11:01:30,987: return self._handle_method_error(error)
2020-11-12 11:01:30,989: File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 497, in _handle_method_error
2020-11-12 11:01:30,992: raise exception from None
2020-11-12 11:01:30,993: dasbus.error.DBusError: umount of /mnt/sysimage/run failed (32)

Version-Release number of selected component (if applicable):
livemedia-creator v33.10-1

How reproducible:
I get this error when running livemedia-creator from a Fedora 33 VM run by qemu-kvm on a x86_64 host.
This error is not systematic but SEEMS TO OCCUR ONLY WHEN RUNNING WITH ONE CPU (-smp 1), and no more when using more CPUs (testd with -smp 4)

Steps to Reproduce:
1. Run a Fedora 33 VM from qemu-kvm using one CPU
I use qemu-kvm -m 2048 -smp 1 -cpu host  -hda hda.qcow2 -net nic -net user,hostfwd=tcp::5556-:22 -nographic  # need console=ttyS0 for all messages, or at least ttyS0 login enabled
2. Generate a kickstart file from fedora-kickstarts
I use Xfce spin kickstart
3. Run livemedia-creator
I use : sudo livemedia-creator --ks trace_livecd.ks.tmp --no-virt --resultdir ${PWD}/iso --tmp=${CREATOR_TMP_DIR} --project "TRACe-Live-${FEDORA_RELEASEVER}" --releasever ${BSSID} --make-iso --volid TRACe-Live_FC${FEDORA_RELEASEVER}_ID${BSSID} --iso-only --iso-name TRACe-Live_FC${FEDORA_RELEASEVER}_ID${BSSID}.iso --macboot --extra-boot-args "rhgb $CMDLINE_EXTRAS"

Actual results:
The error message above

Expected results:
No such error and live image created

Additional info:
The log files produced by livemedia-creator when the error occurs : see anaconda_err.log.tgz

Comment 1 Brian Lane 2020-12-03 23:32:22 UTC
From the end of livemedia.log it looks like systemd has grabbed /mnt/sysimage/run/user/1001 for some reason. Could you attach a copy of your kickstart?


These are from Anaconda trying to unmount run:

2020-11-12 11:01:30,964 INFO pylorax: Error in atexit._run_exitfuncs:
2020-11-12 11:01:30,966 INFO pylorax: Traceback (most recent call last):
2020-11-12 11:01:30,967 INFO pylorax: File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 477, in _get_method_reply
2020-11-12 11:01:30,987 INFO pylorax: return self._handle_method_error(error)
2020-11-12 11:01:30,989 INFO pylorax: File "/usr/lib/python3.9/site-packages/dasbus/client/handler.py", line 497, in _handle_method_error
2020-11-12 11:01:30,992 INFO pylorax: raise exception from None
2020-11-12 11:01:30,993 INFO pylorax: dasbus.error.DBusError: umount of /mnt/sysimage/run failed (32)



This is livemedia-creator trying to clean up after Anaconda:

2020-11-12 11:03:33,092 INFO livemedia-creator: Shutting down log processing
2020-11-12 11:03:33,127 WARNING pylorax.imgutils: failed to unmount /mnt/sysimage/run/user/1001. retrying (1/3)...
2020-11-12 11:03:33,155 DEBUG pylorax.imgutils: fuser -vm:
                     USER        PID ACCESS COMMAND
/mnt/sysimage/run/user/1001:
                     root     kernel mount /run/user/1001
                     gbl       28218 F.... systemd


2020-11-12 11:03:34,163 WARNING pylorax.imgutils: failed to unmount /mnt/sysimage/run/user/1001. retrying (2/3)...
2020-11-12 11:03:34,176 DEBUG pylorax.imgutils: fuser -vm:
                     USER        PID ACCESS COMMAND
/mnt/sysimage/run/user/1001:
                     root     kernel mount /run/user/1001
                     gbl       28218 F.... systemd


2020-11-12 11:03:35,183 ERROR pylorax: Cleanup of /mnt/sysimage/run/user/1001 failed. See program.log for details
2020-11-12 11:03:35,186 WARNING pylorax.imgutils: failed to unmount /mnt/sysimage/run. retrying (1/3)...
2020-11-12 11:03:35,199 DEBUG pylorax.imgutils: fuser -vm:
                     USER        PID ACCESS COMMAND
/mnt/sysimage/run:   root     kernel mount /run
                     root          1 F.... systemd
                     root        430 F...m systemd-journal
                     root        440 F.... systemd-udevd
                     root        531 F.... NetworkManager
                     root        534 F.... systemd-homed
                     chrony      537 F.... chronyd
                     dbus        539 F.... dbus-broker-lau
                     dbus        541 F.... dbus-broker
                     root        559 f.... systemd-logind
                     root        595 F.... systemd-userdbd
                     root        920 F.... login
                     root      28213 F.... sshd
                     gbl       28230 F.... sshd
                     root      28444 F.... systemd-userwor
                     root      100293 F.... systemd-userwor
                     root      100294 F.... systemd-userwor

Comment 2 Gilles Buloz 2020-12-04 09:50:40 UTC
Created attachment 1736391 [details]
The custom kickstart passed to livemedia-creator

This is a kickstart file based on official Xfce spin kickstart, but with changes for our remix live image (changes are marked with the word "TRACe").
Two additional %post scripts are added at the end of file to build and install extra custom software to the image.

Comment 3 Brian Lane 2020-12-04 16:58:51 UTC
It looks like your %post is doing quite a bit with rpmbuild and rpm which is unusual for a live build. I would try simplifying your setup to see if you can narrow it down.

But in general, changes and packages should be added to a repo and installed by anaconda, not from a %post, and not built while running inside anaconda.

It looks like this is happening because of user 1001 inside the install, which is odd because there shouldn't be any extra users until the live iso is booted. Figuring out what user 1001 is may shed some light on this.

I'm going to move this to systemd, since the problem is really that it is holding onto the mount. There's nothing livemedia-creator or anaconda can do about that. Maybe they can help you debug it in more detail.

Comment 4 Gilles Buloz 2020-12-07 13:45:10 UTC
Thanks Brian, I'll try the Xfce spin kickstart without my addons.
Additional info :
- the UID 1001 is the one of user "gbl" of by VM from which I run "sudo livemedia-creator ..."
- I haven't seen the problem again with my VM configured with 4 CPUs cores. Only saw it with only one CPU. This looks like a timing related issue.
- I build some packages from %post before some contain extra kernel modules and must be built for the live kernel, so it's easier for me to build and install then from %post instead of building then separately and adding them to my local repo.
- One of these packages has some "systemctl enable xxx" in its RPM postinstall script, so this systemctl is run from the %post. But I see the original Xfce kickstart also has a "systemctl enable tmp.mount" run from a %post, so this should not be a problem.

Comment 5 Zbigniew Jędrzejewski-Szmek 2021-01-31 11:26:30 UTC
The user manager will open sockets under /run/user/UID/. This causes 'fuser -vm' to show 'F systemd' in the listing.
So it is expected that systemd will block /run/user/UID/ while it is running. It is not clear why it is running
at all though. But I think this is something to be figured out from the livemedia-creator side, because this
is a very special environment.

> One of these packages has some "systemctl enable xxx" in its RPM postinstall script, so this systemctl is run from the %post.

This systemctl will talk to the system instance. So even if it had some effect, I don't see how it could cause
the *user* manager to be started.

We had some bug about systemd doing too much from scriptlets executed in a chroot, that might be related.
But I can't find it now. I'll post a link if I do.

Sorry, I don't think I can help with this until there's more info. *Something* causes the user instance to
start, but until we know what and why, I can't say whether systemd is behaving correctly.


Note You need to log in before you can comment on or make changes to this bug.