Bug 1374809 - livemedia creator hanging on package scriptlet errors since anaconda commit "Fix replacement of deprecated DNF method"
Summary: livemedia creator hanging on package scriptlet errors since anaconda commit "...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lorax
Version: 26
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Brian Lane
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-09 17:09 UTC by Kevin Fenzi
Modified: 2018-05-17 22:04 UTC (History)
7 users (show)

Fixed In Version: lorax-25.16-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-17 22:04:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Kevin Fenzi 2016-09-09 17:09:18 UTC
Starting on 2016-09-08, some few livemedia tasks in rawhide nightly compose are hanging. On 2016-09-08 it was Xfce and Securitylab. On 2016-09-09 it's those two and also Design suite.

Here's todays Xfce x86_64: 

http://koji.fedoraproject.org/koji/taskinfo?taskID=15554203

Looking at root.log we see at the end: 

...
DEBUG util.py:421:  2016-09-09 08:33:53,974: Installing libsemanage.x86_64 (684/1376)
DEBUG util.py:421:  2016-09-09 08:33:53,974: Installing shadow-utils.x86_64 (685/1376)
DEBUG util.py:421:  2016-09-09 08:33:53,974: Installing mlocate.x86_64 (686/1376)
DEBUG util.py:421:  2016-09-09 08:46:06,048: Running in chroot, ignoring request.

So, it was installing and doesn't seem like it finished, but the last message seems like it comes from post after the install is done. 

On the builder we see: 

  ├─kojid,12065 /usr/sbin/kojid --fg --force-lock --verbose
  │   ├─kojid,4622 /usr/sbin/kojid --fg --force-lock --verbose
  │   └─kojid,4623 /usr/sbin/kojid --fg --force-lock --verbose
  │       └─mock,5531 -tt /usr/sbin/mock -r koji/f26-build-6378992-635784 --cwd /tmp --chroot -- ...
  │           └─livemedia-creat,5540 /sbin/livemedia-creator --ks /tmp/koji-image-f26-build-15554203.ks --logfile...
  │               ├─{livemedia-creat},5558
  │               └─anaconda,5559 /usr/sbin/anaconda --kickstart /tmp/koji-image-f26-build-15554203.ks --cmdline ...
  │                   ├─{anaconda},5609
  │                   └─anaconda,5854 /usr/sbin/anaconda --kickstart /tmp/koji-image-f26-build-15554203.ks ...
  │                       └─{anaconda},5886

I do see that last anaconda process has open: 
/var/lib/mock/f26-build-6378992-635784/root/mnt/sysimage/var/lib/rpm/.rpm.lock
along with a bunch of other dnf/rpm files. 

The following mounts are active: 

tmpfs                                                            9.8G   12K  9.8G   1% /var/lib/mock/f26-build-6378992-635784/root/dev/shm
/dev/loop0                                                       4.9G  3.1G  1.8G  63% /var/lib/mock/f26-build-6378992-635784/root/mnt/sysimage
tmpfs                                                            9.8G     0  9.8G   0% /var/lib/mock/f26-build-6378992-635784/root/mnt/sysimage/dev/shm
tmpfs                                                            9.8G     0  9.8G   0% /var/lib/mock/f26-build-6381498-635941/root/dev/shm

It seems like it might be some kind of race as restarting the builds a number of times seems to get them to finish. ;( 

Happy to gather additional info.

Comment 1 Brian Lane 2016-09-12 18:00:17 UTC
I'm pretty sure the 'Running in chroot, ignoring request.' message is from a package install, not the end. Depending on how buffered the packages are it may be mlocate causing problems. If you can look at the logs in /tmp the end of dnf.librepo.log or one of the other dnf logs may show what the actual last package is.

Comment 2 Adam Williamson 2016-09-13 16:15:24 UTC
'Running in chroot, ignoring request.' comes from systemd and will happen any time a package scriptlet tries to do a systemctl action tagged as VERB_NO_CHROOT in https://github.com/systemd/systemd/blob/master/src/systemctl/systemctl.c#L7720 ; it's fairly common (you do see it in successful live composes also) and I don't *think* it's related to the problem. It probably comes from a `systemctl start mlocate-updatedb.timer` call in mlocate %post.

Comment 3 Dennis Gilmore 2016-09-13 16:20:53 UTC
14:50:56 select_next_target: Selecting mirror for: Packages/c/clucene-core-2.3.3.4-24.20130812.e8e3d20git.fc24.x86_64.rpm
14:50:56 prepare_next_transfer: URL: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/x86_64/os/Packages/c/clucene-core-2.3.3.4-24.20130812.e8e3d20git.fc24.x86_64.rpm
14:50:56 prepare_next_transfer: Resume ignored, existing file was not originally being downloaded by Librepo
14:50:56 check_transfer_statuses: Transfer finished: Packages/g/gstreamer1-plugins-good-1.9.2-1.fc26.x86_64.rpm (Effective url: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/x86_64/os/Packages/g/gstreamer1-plugins-good-1.9.2-1.fc26.x86_64.rpm)
14:50:56 check_finished_trasfer_checksum: Checksum (sha256) 291a938e2e71d3c58762cb32d40a73c8e95959dd4a5ec7f611d2b22f0ac95811 is OK
14:50:56 lr_headercb: Server returned Content-Length: "94442" (converted 94442/94442 expected)
14:50:56 check_transfer_statuses: Transfer finished: Packages/c/clucene-contribs-lib-2.3.3.4-24.20130812.e8e3d20git.fc24.x86_64.rpm (Effective url: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/x86_64/os/Packages/c/clucene-contribs-lib-2.3.3.4-24.20130812.e8e3d20git.fc24.x86_64.rpm)
14:50:56 check_finished_trasfer_checksum: Checksum (sha256) de98f72d8b77eae1b22bbe5032ef7f60ef33841ee406cd016e94f0bb91355818 is OK
14:50:56 lr_headercb: Server returned Content-Length: "547194" (converted 547194/547194 expected)
14:50:56 check_transfer_statuses: Transfer finished: Packages/c/clucene-core-2.3.3.4-24.20130812.e8e3d20git.fc24.x86_64.rpm (Effective url: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/x86_64/os/Packages/c/clucene-core-2.3.3.4-24.20130812.e8e3d20git.fc24.x86_64.rpm)
14:50:56 check_finished_trasfer_checksum: Checksum (sha256) 6756aaf8527204f27638b4142f7e9ee0d8d6b67c30a56dea068632e7a22ce95d is OK
14:50:56 lr_download_packages: Restoring an old SIGINT handler

Comment 4 Dennis Gilmore 2016-09-13 16:40:34 UTC
15:59:27 prepare_next_transfer: URL: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/i386/os/Packages/d/dbus-x11-1.11.4-1.fc26.i686.rpm
15:59:27 prepare_next_transfer: Resume ignored, existing file was not originally being downloaded by Librepo
15:59:27 lr_headercb: Server returned Content-Length: "21414" (converted 21414/21414 expected)
15:59:27 check_transfer_statuses: Transfer finished: Packages/p/perl-Term-Cap-1.17-365.fc25.noarch.rpm (Effective url: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/i386/os/Packages/p/perl-Term-Cap-1.17-365.fc25.noarch.rpm)
15:59:27 check_finished_trasfer_checksum: Checksum (sha256) a6aec1d2667d723c74284de11a91ae972ab07083f2bec405596bbc388b2d0599 is OK
15:59:27 lr_headercb: Server returned Content-Length: "53478" (converted 53478/53478 expected)
15:59:27 check_transfer_statuses: Transfer finished: Packages/d/dbus-x11-1.11.4-1.fc26.i686.rpm (Effective url: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/i386/os/Packages/d/dbus-x11-1.11.4-1.fc26.i686.rpm)
15:59:27 check_finished_trasfer_checksum: Checksum (sha256) c50556bc6e0b282de27d6d8c35876f8868a30902d83d3e31f4dbfdcd3524690f is OK
15:59:27 check_transfer_statuses: Transfer finished: Packages/l/linux-firmware-20160816-67.git7c3dfc0b.fc26.noarch.rpm (Effective url: http://kojipkgs.fedoraproject.org/compose//rawhide/Fedora-Rawhide-20160913.n.1/compose/Everything/i386/os/Packages/l/linux-firmware-20160816-67.git7c3dfc0b.fc26.noarch.rpm)
15:59:27 check_finished_trasfer_checksum: Checksum (sha256) 32c004cee019a457a80665fd1bba6f96d6e918bcb57f86179955da362ce31da9 is OK
15:59:27 lr_download_packages: Restoring an old SIGINT handler

Comment 5 Dennis Gilmore 2016-09-13 16:41:15 UTC
looked at a couple. the packages do not seem consistent, but the 
15:59:27 lr_download_packages: Restoring an old SIGINT handler 

message is

Comment 6 Dennis Gilmore 2016-09-13 16:46:35 UTC
packaging.log shows 
16:02:58,297 INFO packaging: Installed: mlocate-0.26-14.fc25.i686 1458309222 e77c6e8c7434d690194ef4e420ba9544d68cb1860cac8611eaf225f7d58cdab2
16:02:58,432 ERR dnf.rpm: Non-fatal POSTIN scriptlet failure in rpm package mlocate
16:02:58,433 INFO dnf.rpm: Installed: mlocate-0.26-14.fc25.i686
16:02:59,058 INFO dnf.rpm: Installed: libutempter-1.1.6-8.fc24.i686
16:02:59,299 ERR packaging: Installation failed: PayloadInstallError('DNF error: Non-fatal POSTIN scriptlet failure in rpm package mlocate',)

it seems that the non fatal error in mlocate is fatal to anaconda

Comment 7 Fedora Update System 2016-09-13 17:42:47 UTC
mlocate-0.26-15.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-1e6ac5d4c1

Comment 8 Adam Williamson 2016-09-13 17:43:07 UTC
well, we still have a few unanswered questions, but our current theory for the cause of this is simply that mlocate has a packaging error: it uses the systemd scriptlet snippets but does not have the correct requirements. It doesn't have %{systemd_requires} in its spec so it was not Requires(pre) or Requires(post)-ing systemd. And indeed, every time the compose fails, it seems to happen when mlocate gets ordered for installation before systemd; when the compose succeeds, mlocate happens to get ordered after systemd.

Given that it's pretty clear systemd is not installed when the mlocate install runs we're rather confused as to how the 'Running in chroot, ignoring request.' error manages to appear at all, since that error message clearly comes from systemd. Which isn't there. So, what? But we're choosing to studiously not concern ourselves with that for now, and just fix the mlocate package to include the appropriate dependencies, and if that makes the problem go away, we'll just chalk it up to the mysterious ways of the universe and be on our merry way.

http://koji.fedoraproject.org/koji/taskinfo?taskID=15617626
https://bodhi.fedoraproject.org/updates/mlocate-0.26-15.fc25

Comment 9 Adam Williamson 2016-09-14 01:27:04 UTC
So now I think there's a bit more to this story. I suspect this anaconda commit is involved:

https://github.com/rhinstaller/anaconda/commit/a1f36ec7c5f4552a921568f885251a832f7f6387

I think it somehow makes the live compose process less tolerant of non-fatal packaging errors. Which isn't necessarily a *bug* exactly, though the fact that it causes the compose to get stuck (rather than failing) *is* a bug, I would say.

Anyway, that commit is definitely to the relevant anaconda code (specifically, the messages from #c6 are clearly coming from this code), and it landed in Rawhide on 2016-09-06 and Branched on 2016-09-09, which lines up quite precisely with when this problem suddenly started occurring.

Obviously the packaging bugs should be fixed, but arguably lmc and/or anaconda should be changed so at least the compose *fails* rather than hanging when we hit this case.

Since we fixed the bug in mlocate, we found what looks to be another packaging error in wireshark-cli which causes the same result...

Comment 10 Brian Lane 2016-09-14 16:16:06 UTC
The call to _failure_limbo is likely the culprit. It changed the behavior of that loop so that it will hang for a long time instead of just raising the error. I'll leave it to Jiri to decide what to do there.

As far as lmc goes, it should add a check for 'ERR packaging: Installation failed:' to its log monitor code so that it will stop the compose attempt.

Comment 11 Brian Lane 2016-09-14 16:23:18 UTC
https://github.com/rhinstaller/lorax/pull/157

Comment 12 Fedora Update System 2016-09-27 07:20:35 UTC
lorax-25.16-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-b806dfedca

Comment 13 Bruno Wolff III 2016-10-10 23:51:56 UTC
This is also breaking the games spin as reported in bug 1383094. Is this now a feature such that packages should change their scripts to report success for non-fatal errors or should I wait for things to change back?

Comment 14 Adam Williamson 2016-10-11 00:21:31 UTC
I believe that's correct, yeah - anaconda will treat any scriptlet error as fatal, so scriptlets for packages in lives must succeed.

Comment 15 Bruno Wolff III 2016-10-11 00:41:22 UTC
Is the recommended way to do that to just add " || true" to command lines for commands that might fail when building lives? For example: 
/sbin/modprobe joystick || true

Comment 16 Fedora Update System 2016-10-12 18:59:47 UTC
lorax-25.16-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora End Of Life 2017-02-28 10:14:17 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 18 Fedora End Of Life 2018-05-03 08:31:08 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.


Note You need to log in before you can comment on or make changes to this bug.