Description of problem: ======================= During my upgrade from latest F22 to F23, I hit an issue when upgrade to initial-setup-0.3.31-1.fc22.x86_64 got stuck (for hours) in cleanup phase. The process tree showed that dnf ran script and that ran systemctl. I had to kill that systemctl command to continue in my upgrade. Version-Release number of selected component (if applicable): ============================================================= F23, initial-setup-0.3.31-1.fc22.x86_64 How reproducible: ================= I was upgrading only once. Steps to Reproduce: =================== 1. From F22 2. dnf update --releasever=23 Actual results: =============== Update of initial-setup is stucked in cleanup phase. Expected results: ================= Update continues.
This is how the relevant part of the initial-setup's .spec file looks like: %post if [ $1 -ne 2 -a ! -f /etc/sysconfig/initial-setup ]; then platform="$(arch)" if [ "$platform" = "s390" -o "$platform" = "s390x" ]; then echo "RUN_INITIAL_SETUP=YES" > /etc/sysconfig/initial-setup else %systemd_post initial-setup-text.service fi fi %preun %systemd_preun initial-setup-text.service %postun %systemd_postun_with_restart initial-setup-text.service %post gui %systemd_post initial-setup-graphical.service %preun gui %systemd_preun initial-setup-graphical.service %postun gui %systemd_postun_with_restart initial-setup-graphical.service I don't see anything that could got stuck other than the %systemd_* macros so I'm reassigning this to systemd.
Those macros call systemctl to communicate with PID 1. If the upgrade was stuck in one of those macros, then mostly likely the real problem was that PID 1 was stuck itself or had a segmentation fault or similar. To verify that systemctl is at fault, you could look (if the issue happens again) at pstree output for the dnf process. If systemd had crashed (or similar), some kind of information should be visible in journalctl. It *may* have happened during the upgrade or earlier, hard to say. Is there anything interesting in the logs?
In /var/log/dnf.rpm.log: ~~~ Jul 17 23:37:14 INFO Cleanup: initial-setup-0.3.31-1.fc22.x86_64 Jul 17 23:51:07 INFO --- logging initialized --- Jul 18 00:51:08 INFO --- logging initialized --- Jul 18 01:51:22 INFO --- logging initialized --- Jul 18 02:51:23 INFO --- logging initialized --- Jul 18 08:28:17 INFO Cleanup: ibus-typing-booster-1.2.11-1.fc22.noarch ~~~ At the same time in journal log: ~~~ Jul 17 23:37:13 localhost.localdomain systemd[1]: Stopped TCG Core Services Daemon. Jul 17 23:37:13 localhost.localdomain systemd[1]: Reloading. Jul 17 23:37:13 localhost.localdomain systemd[1]: Configuration file /usr/lib/systemd/system/auditd.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway. Jul 17 23:37:13 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/initial-setup-text.service:21] Support for option SysVStartPriority= has been removed and it is ignored Jul 17 23:37:13 localhost.localdomain systemd[1]: Reloading. Jul 17 23:37:13 localhost.localdomain systemd[1]: Configuration file /usr/lib/systemd/system/auditd.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway. Jul 17 23:37:13 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/initial-setup-text.service:21] Support for option SysVStartPriority= has been removed and it is ignored Jul 17 23:37:14 localhost.localdomain systemd[1]: Reloading. Jul 17 23:37:14 localhost.localdomain systemd[1]: Configuration file /usr/lib/systemd/system/auditd.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway. Jul 17 23:37:14 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/initial-setup-text.service:21] Support for option SysVStartPriority= has been removed and it is ignored Jul 17 23:37:14 localhost.localdomain audit[1]: <audit-1130> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=initial-setup-text comm="systemd" exe="/usr/l ib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 17 23:37:14 localhost.localdomain audit[1]: <audit-1131> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=initial-setup-text comm="systemd" exe="/usr/l ib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 17 23:37:14 localhost.localdomain systemd[1]: Stopping Initial Setup configuration program (text mode)... Jul 17 23:37:14 localhost.localdomain systemd[1]: Starting Initial Setup configuration program (text mode)... Jul 17 23:51:06 localhost.localdomain systemd[1]: Starting dnf makecache... ~~~ And during every boot in journal log: ~~~ Jul 18 08:48:37 localhost.localdomain root[25868]: Running initial_setup Jul 18 08:48:37 localhost.localdomain python[25869]: detected unhandled Python exception in '/usr/lib/python2.7/site-packages/initial_setup/__main__.py' ~~~
Happened again with 'initial-setup' upgrade today. $ pstree --arguments --long --show-pids --ns-changes --uid-changes -Z 28007 dnf,28007,`unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023' -OO /bin/dnf update └─sh,28150,`unconfined_u:unconfined_r:rpm_script_t:s0-s0:c0.c1023' /var/tmp/rpm-tmp.2aTsdl 1 └─systemctl,28173,`unconfined_u:unconfined_r:rpm_script_t:s0-s0:c0.c1023' try-restart initial-setup-text.service $ rpm -q initial-setup initial-setup-0.3.33-2.fc23.x86_64 initial-setup-0.3.35-1.fc23.x86_64 And killing the systemctl process solved the issue. $ sudo kill 28173 $ rpm -q initial-setup initial-setup-0.3.35-1.fc23.x86_64 $ ps aux | grep system root 1 0.0 0.1 194276 9160 ? Ss Aug01 0:14 /usr/lib/systemd/systemd --switched-root --system --deserialize 22 root 628 0.0 0.5 93380 45844 ? Ss Aug01 0:03 /usr/lib/systemd/systemd-journald root 669 0.0 0.0 47492 7688 ? Ss Aug01 0:00 /usr/lib/systemd/systemd-udevd dbus 852 0.0 0.0 66852 6476 ? Ssl Aug01 0:27 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation root 868 0.0 0.0 24480 3156 ? Ss Aug01 0:00 /usr/lib/systemd/systemd-logind gdm 1529 0.0 0.0 45132 4912 ? Ss Aug01 0:00 /usr/lib/systemd/systemd --user root 27856 0.0 0.0 43012 4904 ? Ss 11:08 0:00 /usr/lib/systemd/systemd --user sgraf 28324 0.0 0.0 45260 5152 ? Ss 11:11 0:00 /usr/lib/systemd/systemd --user sgraf 30196 0.0 0.0 117060 2380 pts/0 S+ 11:20 0:00 grep --color=auto system In the mean time I can for example start/stop/status tuned service (using systemctl) without any issues.
(In reply to Stanislav Graf from comment #3) [...] > And during every boot in journal log: > ~~~ > Jul 18 08:48:37 localhost.localdomain root[25868]: Running initial_setup > Jul 18 08:48:37 localhost.localdomain python[25869]: detected unhandled > Python exception in > '/usr/lib/python2.7/site-packages/initial_setup/__main__.py' > ~~~ This error during every boot got fixed with latest update. Possibly due to Bug 1244558. Maybe this BZ would be fixed also now.
*** This bug has been marked as a duplicate of bug 1244558 ***
Reopening as this sounds like a blocker to me. Proposing as Fedora 23 Beta Blocker because it breaks the following criteria for KDE spin: "For each one of the release-blocking package sets, it must be possible to successfully complete an upgrade from a fully updated installation of the previous stable Fedora release with that package set installed." With this bug it is impossible to complete an upgrade.
Well, this is a complete mess: it was closed as a dupe of a bug that was closed and then re-opened without a comment, and now this one has been re-opened. I have no idea what any of this is meant to mean.
I ran into this today on a simple dnf update on a Fedora 23 Cinnamon. Sep 13 17:43:19 INFO Cleanup: initial-setup-gui-0.3.35-1.fc23.x86_64 Sep 13 17:43:19 INFO Cleanup: initial-setup-gui-0.3.35-1.fc23.x86_64 Sep 13 17:43:20 INFO Cleanup: anaconda-gui-23.19.2-1.fc23.x86_64 Sep 13 17:43:20 INFO Cleanup: anaconda-gui-23.19.2-1.fc23.x86_64 Sep 13 17:43:21 INFO Cleanup: initial-setup-0.3.35-1.fc23.x86_64 Sep 13 17:43:21 INFO Cleanup: initial-setup-0.3.35-1.fc23.x86_64 Sep 13 18:03:40 ERROR Non-fatal POSTUN scriptlet failure in rpm package initial-setup Sep 13 18:03:40 ERROR Non-fatal POSTUN scriptlet failure in rpm package initial-setup However, on a nearly identical system I updated less than 15 minutes after that there were no issues. On the system that hung, no further progress was made through the cleanup phase, leaving (on this system) 177 packages in limbo. The only thing the next run of dnf did was uninstall the extra kernels that were still there, as they weren't removed yet. Attempting to resolve any leftover issues to this point has been unseccessful - dnf history redo is hitting too many errors to weed through, and I know of no other method to accomplish this.
Looks to me like the problem is that the initial-setup-text service is running (started) in the background. systemd restarts it when updating the package and gets stuck because the initial-setup-text prompts user for input somewhere and never terminates. However, the initial-setup-text service should never be running when the system is booted to the state where user logs in and does system update. Martin, could you please have a look at this?
Can I suggest that the obvious fix here is simply this? %postun %systemd_postun initial-setup-text.service %postun gui %systemd_postun initial-setup-graphical.service since, as noted, i-s should never be running as a daemon on an installed system, I can't see any situation in which using systemd_postun_with_restart would ever do anything *good*. No?
Discussed at 2015-09-14 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2015-09-14/f23-blocker-review.2015-09-14-16.00.log.txt . We agreed that this is clearly a bad bug, but determining its blocker/FE status is a bit complex. 1) do we actually need to change the frozen package set to fix it? 2) when does the bug actually happen? For 2), we're fairly sure the bug happens when initial-setup is running and you update it, but we're not sure under what circumstances initial-setup is running. It's never *supposed* to be running while the system is being used 'normally' (i.e. installing updates or upgrading). For 1), any fix that involves changing %postun - e.g. the one I suggested in #c11 - benefits from happening in the frozen Beta package set, as %postun is taken from the 'before' version, not the 'after' version. However, it's conceivable that we can fix this elsewhere, e.g. by stopping any running initial-setup in the %pre of the 'after' package. Taking all that together, we decided to grant this issue a freeze exception for now, and leave blocker status open depending on the answer to 2). We will decide whether to actually pull a fix through the freeze on the basis of exactly how this winds up being fixed (whether we change %postun and, if so, how vital it is that that change be included in Beta).
vpodzime: mkolman: could we please have your thoughts on the issues raised in #c12 and the best way to fix this? thanks.
(In reply to awilliam from comment #11) > Can I suggest that the obvious fix here is simply this? > > %postun > %systemd_postun initial-setup-text.service > > %postun gui > %systemd_postun initial-setup-graphical.service > > since, as noted, i-s should never be running as a daemon on an installed > system, I can't see any situation in which using systemd_postun_with_restart > would ever do anything *good*. No? Yeah, that might work - trying it out now. :)
Seems to work! Patch posted for review. :)
mkolman: there's an issue with fixing it in %postun, noted in #c12 again: if we do (only) that, it should be fixed in the frozen Beta package set, because %postun is taken from the *old* version when updating, not the *new* version. So if we ship Beta with the 'broken' %postun, even if we ship an update with the 'fixed' %postun, you may hit the issue the first time you update. This is why someone suggested instead/also having something in %pre, because %pre comes from the new package. The suggestion was to stop any running initial-setup service/process in %pre. WDYT about that?
Discussed at 2015-09-17 Fedora 23 Beta Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/teams/f23_beta_go_no-go_meeting/f23_beta_go_no-go_meeting.2015-09-17-16.00.log.txt . Rejected as a blocker. The main 'supported' scenario in which this would be encountered is a regular update to an installed F23 system, if somehow i-s is running (the scenarios where i-s is running have not yet been figured out; that's the unknown part of this bug). It follows logically that we don't need to change the frozen Beta tree or media to avoid that happening: we only need to ensure the first post-Beta update to initial-setup avoids the bug. Therefore this bug is not a release blocker, but we are going to state that we require the next Fedora 23 initial-setup update to somehow avoid this issue happening - i.e. we want it to have a %pre fix/workaround or something else that means this bug will not be encountered *even when upgrading from an initial-setup package with the 'bad' %postun, with initial-setup running*. The currently-pending i-s update will be un-submitted to ensure it doesn't go stable without a fix for this.
We should also document this on the 'upgrading with yum' page for people doing (technically unsupported) live upgrades.
(In reply to awilliam from comment #16) > mkolman: there's an issue with fixing it in %postun, noted in #c12 again: if > we do (only) that, it should be fixed in the frozen Beta package set, > because %postun is taken from the *old* version when updating, not the *new* > version. So if we ship Beta with the 'broken' %postun, even if we ship an > update with the 'fixed' %postun, you may hit the issue the first time you > update. > > This is why someone suggested instead/also having something in %pre, because > %pre comes from the new package. The suggestion was to stop any running > initial-setup service/process in %pre. > > WDYT about that? Yeah, %pre should work according to the wiki[0]. BTW, I think I might have a preliminary theory why some people have IS running without knowing about it: * unless a user specification has been found in kickstart IS will run on system startup if it is installed and its unit/-s enabled - it does not actually check if there are user accounts on the system, it only checks the kickstart * the Initial Setup services (initial-setup-graphical and initial-setup-text) have been until recently in the list of services to automatically enable once installed - so it was possible to get IS pulled in & automatically activated after the installation, for example by installing a package group containing it - this has been fixed as part of bug 1215224 about two months ago * what about the users not noticing Initial Setup has been running on their machine ? - the Initial Setup TUI starts on a console and tries to block all other consoles from initializing to prevent user meddling from interfering with initial system re-/configuration - unfortunately this is not very robust and basically requires you to write all console names in the unit file, if you miss one the console won't be blocked and users can log-in on it and cause the mischief we are currently seeing - there is a RFE[1] on systemd to provide this functionality (provide a target IS can use that blocks all consoles) but it does not seem to be getting much traction so far [0] https://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Scriptlet_Ordering [1] http://lists.freedesktop.org/archives/systemd-devel/2015-July/033754.html
mkolman: thanks for the explanation. So just to recap, sending out an update with a fix/workaround in %pre would satisfy the blocker process requirements relating to this bug. Thanks for the i-s theory, that sounds plausible for sure. I suspect the most likely contingent of affected users would be those who installed with Workstation or without a desktop, then added a desktop which uses i-s after installation.
Updated patch with the %pre based old-runnning-IS stopping & disabling code has been posted for review.
initial-setup-0.3.37-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-f23124ced6
initial-setup-0.3.37-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update initial-setup' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-f23124ced6
initial-setup-0.3.37-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.