Bug 1430907

Summary: Workstation live compose failed due to /dev umount error
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: distributionAssignee: Václav Pavlín <vpavlin>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 26CC: dennis, dgilmore, kevin
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-10 23:59:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Williamson 2017-03-09 20:33:28 UTC
This is similar to our old friend https://bugzilla.redhat.com/show_bug.cgi?id=1315541 , but doesn't look to have the same cause.

The Fedora-26-20170309.n.0 Workstation live image compose failed:

https://koji.fedoraproject.org/koji/taskinfo?taskID=18278005

The problem is failure to umount /dev , again. anaconda's log - https://kojipkgs.fedoraproject.org//work/tasks/8005/18278005/anaconda-program.log - shows this:

12:53:24,606 INFO program: Running... umount /mnt/sysimage/dev
12:53:24,620 INFO program: stderr:
12:53:24,620 INFO program: b'umount: /mnt/sysimage/dev: target is busy'
12:53:24,621 INFO program: b'        (In some cases useful info about processes that'
12:53:24,621 INFO program: b'         use the device is found by lsof(8) or fuser(1).)'
12:53:24,622 DEBUG program: Return code: 32
12:53:24,622 INFO program: Running... lsof /mnt/sysimage/dev
12:53:24,796 INFO program: stdout:

then the lsof output lists just about every running process, starting with systemd, indicating that possibly systemd is the culprit. Then, as usual, livemedia-creator tries to clean up the mount that anaconda left in place, and fails, and this is counted as a fatal error. See https://kojipkgs.fedoraproject.org//work/tasks/8005/18278005/root.log :

DEBUG util.py:435:  2017-03-09 12:53:25,067: Error in atexit._run_exitfuncs:
DEBUG util.py:435:  2017-03-09 12:53:25,068: Traceback (most recent call last):
DEBUG util.py:435:  2017-03-09 12:53:25,068: File "/usr/lib/python3.6/site-packages/blivet/threads.py", line 45, in run_with_lock
DEBUG util.py:435:  2017-03-09 12:53:25,068: return m(*args, **kwargs)
DEBUG util.py:435:  2017-03-09 12:53:25,068: File "/usr/lib/python3.6/site-packages/blivet/blivet.py", line 1470, in umount_filesystems
DEBUG util.py:435:  2017-03-09 12:53:25,068: self.fsset.umount_filesystems(swapoff=swapoff)
DEBUG util.py:435:  2017-03-09 12:53:25,069: File "/usr/lib/python3.6/site-packages/blivet/osinstall.py", line 664, in umount_filesystems
DEBUG util.py:435:  2017-03-09 12:53:25,069: device.format.teardown()
DEBUG util.py:435:  2017-03-09 12:53:25,069: File "/usr/lib/python3.6/site-packages/blivet/threads.py", line 45, in run_with_lock
DEBUG util.py:435:  2017-03-09 12:53:25,069: return m(*args, **kwargs)
DEBUG util.py:435:  2017-03-09 12:53:25,069: File "/usr/lib/python3.6/site-packages/blivet/formats/__init__.py", line 645, in teardown
DEBUG util.py:435:  2017-03-09 12:53:25,070: self._teardown(**kwargs)
DEBUG util.py:435:  2017-03-09 12:53:25,070: File "/usr/lib/python3.6/site-packages/blivet/threads.py", line 45, in run_with_lock
DEBUG util.py:435:  2017-03-09 12:53:25,070: return m(*args, **kwargs)
DEBUG util.py:435:  2017-03-09 12:53:25,070: File "/usr/lib/python3.6/site-packages/blivet/formats/fs.py", line 560, in _teardown
DEBUG util.py:435:  2017-03-09 12:53:25,071: raise FSError("umount of %s failed (%d)" % (mountpoint, rc))
DEBUG util.py:435:  2017-03-09 12:53:25,071: blivet.errors.FSError: umount of /mnt/sysimage/dev failed (32)
DEBUG util.py:435:  2017-03-09 12:53:59,498: Shutting down log processing
DEBUG util.py:435:  2017-03-09 12:53:59,522: failed to unmount /mnt/sysimage/dev. retrying (1/3)...
DEBUG util.py:435:  2017-03-09 12:54:00,606: failed to unmount /mnt/sysimage/dev. retrying (2/3)...
DEBUG util.py:435:  2017-03-09 12:54:01,676: Cleanup of /mnt/sysimage/dev failed. See program.log for details
DEBUG util.py:435:  2017-03-09 12:54:01,683: failed to unmount /mnt/sysimage. retrying (1/3)...
DEBUG util.py:435:  2017-03-09 12:54:02,740: failed to unmount /mnt/sysimage. retrying (2/3)...
DEBUG util.py:435:  2017-03-09 12:54:03,799: Cleanup of /mnt/sysimage failed. See program.log for details
DEBUG util.py:435:  2017-03-09 12:54:03,800: Install failed: novirt_install cleanup of anaconda mounts failed.
DEBUG util.py:435:  2017-03-09 12:54:03,801: Removing bad disk image
DEBUG util.py:435:  2017-03-09 12:54:03,801: ERROR: Image creation failed: novirt_install cleanup of anaconda mounts failed.
DEBUG util.py:573:  Child return code was: 1

We need to investigate a bit more to figure out if this is a consistent error, or if it's another lottery situation, or what, but it's at least a potential F26 Alpha blocker, so proposing it as such.

Comment 1 Adam Williamson 2017-03-09 20:34:54 UTC
dgilmore suggests that we may have changed from using chroot to systemd-nspawn as the mock isolation mechanism, and that could be the cause of the problem...

Comment 2 Adam Williamson 2017-03-09 20:53:48 UTC
Two others failed for the 26-20170309.n.0 compose:

https://koji.fedoraproject.org/koji/taskinfo?taskID=18277617 (Python-Classroom-Live)
https://koji.fedoraproject.org/koji/taskinfo?taskID=18277585 (Design_suite)

Dennis says these are both based on Workstation.

Comment 3 Dennis Gilmore 2017-03-09 22:01:22 UTC
gpg-agent was running and keeping /mnt/sysimage/dev/null open in the chroot. after further investigation chroot is still used so there is no chance of it being systemd-nspawn

Comment 4 Adam Williamson 2017-03-09 23:23:59 UTC
Dennis believes https://pagure.io/fedora-kickstarts/pull-request/160 and https://pagure.io/fedora-kickstarts/pull-request/161 should fix this.

Comment 5 Adam Williamson 2017-03-09 23:38:20 UTC
Now merged, will wait for next compose.

Comment 6 Dennis Gilmore 2017-03-10 23:59:03 UTC
Workstation live is back and working today