Bug 1369794
Summary: | anaconda can no longer enable non-systemd services (so current F25 and Rawhide Cloud images don't bring up networking) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Lukas Brabec <lbrabec> |
Component: | anaconda | Assignee: | Anaconda Maintenance Team <anaconda-maint-list> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 25 | CC: | anaconda-maint-list, awilliam, dennis, g.kaviyarasu, jonathan, kevin, kparal, lbrabec, lnykryn, mark, pbrobinson, robatino, sbueno, vanmeeuwen+fedora |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-12 10:24:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Lukas Brabec
2016-08-24 12:23:44 UTC
Proposed as a Blocker for 25-alpha by Fedora user lbrabec using the blocker tracking app because: This bug can be a violation of alpha criterion: Supported cloud environments: Release-blocking cloud images must boot in the Fedora OpenStack Cloud and in Amazon EC2. While I tested this only locally with testcloud, I think this bug should be investigated further. It failed in autocloud too: https://apps.fedoraproject.org/autocloud/jobs/451/output per https://apps.fedoraproject.org/autocloud/jobs/?family=b&arch=&image_type=qcow2&status=f , it's been failing in F25 and Rawhide approximately forever. There are only successful results for F24. Looking at the build logs this might be related to trying to enable 'network' (which is not a systemd unit): ... 00:50:43,595 INFO program: Running... systemctl enable network --root /mnt/sysimage 00:50:43,619 INFO program: network.service is not a native service, redirecting to systemd-sysv-install. 00:50:43,619 INFO program: Executing: /usr/lib/systemd/systemd-sysv-install --root=/mnt/sysimage enable network 00:50:43,620 INFO program: Failed to execute /usr/lib/systemd/systemd-sysv-install: No such file or directory 00:50:43,620 DEBUG program: Return code: 1 00:50:43,621 DEBUG anaconda: running handleException 00:50:43,622 CRIT anaconda: Traceback (most recent call last):#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/threads.py", line 251, in run#012 threading.Thread.run(self, *args, **kwargs)#012#012 File "/usr/lib64/python3.5/threading.py", line 862, in run#012 self._target(*self._args, **self._kwargs)#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/install.py", line 77, in doConfiguration#012 ksdata.services.execute(storage, ksdata, instClass)#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/kickstart.py", line 1664, in execute#012 iutil.enable_service(svc)#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/iutil.py", line 787, in enable_service#012 raise ValueError("Error enabling service %s: %s" % (service, ret))#012#012ValueError: Error enabling service network: 1 00:50:44,117 DEBUG anaconda: Gtk cannot be initialized 00:50:44,117 DEBUG anaconda: In a non-main thread, sending a message with exception data 00:50:44,118 INFO anaconda: Thread Done: AnaConfigurationThread (140307388299008) 00:50:44,783 DEBUG anaconda: running handleException 00:50:44,784 CRIT anaconda: Traceback (most recent call last):#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/threads.py", line 251, in run#012 threading.Thread.run(self, *args, **kwargs)#012#012 File "/usr/lib64/python3.5/threading.py", line 862, in run#012 self._target(*self._args, **self._kwargs)#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/install.py", line 77, in doConfiguration#012 ksdata.services.execute(storage, ksdata, instClass)#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/kickstart.py", line 1664, in execute#012 iutil.enable_service(svc)#012#012 File "/usr/lib64/python3.5/site-packages/pyanaconda/iutil.py", line 787, in enable_service#012 raise ValueError("Error enabling service %s: %s" % (service, ret))#012#012ValueError: Error enabling service network: 1 00:50:44,786 DEBUG anaconda: Gtk cannot be initialized 00:50:44,786 DEBUG anaconda: In the main thread, running exception handler Waiting for factory-build-288d1c60-0e4a-4bad-a58c-02cf7e73d61d to finish installing, 6910/7200 ... So, perhaps some anaconda change related to non systemd unit file enabling? well: 00:50:43,620 INFO program: Failed to execute /usr/lib/systemd/systemd-sysv-install: No such file or directory seems to be the problem. yeah, it's obvious if you compare to an f24 log. f24: 05:20:35,441 INFO program: Running... systemctl enable network 05:20:35,453 INFO program: network.service is not a native service, redirecting to systemd-sysv-install 05:20:35,453 INFO program: Executing /usr/lib/systemd/systemd-sysv-install enable network 05:20:35,454 DEBUG program: Return code: 0 f25: 00:50:43,595 INFO program: Running... systemctl enable network --root /mnt/sysimage 00:50:43,619 INFO program: network.service is not a native service, redirecting to systemd-sysv-install. 00:50:43,619 INFO program: Executing: /usr/lib/systemd/systemd-sysv-install --root=/mnt/sysimage enable network 00:50:43,620 INFO program: Failed to execute /usr/lib/systemd/systemd-sysv-install: No such file or directory 00:50:43,620 DEBUG program: Return code: 1 Aha. I think I see it. Note the difference in the commands: 05:20:35,441 INFO program: Running... systemctl enable network 00:50:43,595 INFO program: Running... systemctl enable network --root /mnt/sysimage in F24, this was run without --root (thus, we can presume, in a chroot to the installed system, or else it wouldn't have worked). In F25 it's run with --root . Thus in F24 we'll wind up using systemd-sysv-install from the installed system chroot too, but in F25 we'll be using the one from the installer environment. Only it's not there in the installer environment, because lorax runtime-cleanup.tmpl has this: ## no services to turn on/off (keep the /etc/init.d link though) removefrom chkconfig --allbut /etc/init.d and systemd-sysv-install is in chkconfig. now we can tweak that bit of lorax so it doesn't strip systemd-sysv-install, but then it'll be interesting to see if this redirection from systemctl to systemd-sysv-install really works properly with --root... https://github.com/rhinstaller/anaconda/commit/412ca74154bef8ac232e5b3be820a182d77c30f6 is the commit that changed anaconda's behaviour, for the record. nirik points out that /usr/lib/systemd/systemd-sysv-install is just a symlink to /sbin/chkconfig , so we'd need to keep both of those in the installer environment. But here's a more pressing problem: [adamw@adam etc]$ systemctl --root=/tmp/fakesys enable network network.service is not a native service, redirecting to systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install --root=/tmp/fakesys enable network --root=/tmp/fakesys: unknown option i.e. I don't think we can rely on `systemctl --root` to work with non-systemd services. setting back to anaconda, since just fixing the lorax stripping wouldn't be enough here. I think for short term we may simply have to revert the anaconda commit. https://www.happyassassin.net/updates/1369794.0.img is an updates.img reverting the anaconda commit, for testing. I tested with https://www.happyassassin.net/ks/testsvc.ks , which just does: services --enabled=network and confirmed that indeed it crashes with a stock F25 installer image, works with the patch reverted via the updates image. It's just barely possible that reverting this would have non-obvious other consequences - for one, it probably renders https://github.com/rhinstaller/anaconda/commit/b35fe094bcd9f792bb8eb9e0ed3679c175f632fa moot - but I think it's probably our best short-term option to fix Cloud for Alpha. The 'proper' fix would, I guess, be to get chkconfig to support --root (so I cc'ed lnykryn), and of course then not strip it out of the installer. https://github.com/rhinstaller/anaconda/pull/749 should deal with this on the anaconda side, but sbueno isn't sure about doing another anaconda build for Alpha, so I will also see if we can work around this in the kickstart (by taking network out of the `services` line and just manually enabling it in %post). whoops, forgot to mention, https://www.happyassassin.net/updates/1369794.1.img includes the patch from https://github.com/rhinstaller/anaconda/pull/749 . I tested it and it seems to work fine. As an alternative to patching anaconda, https://pagure.io/fedora-kickstarts/pull-request/52 should work around this in fedora-kickstarts . I did not test it directly as I don't have a full compose chain set up here, but I did test the concept, with these three kickstarts: https://www.happyassassin.net/ks/testsvc.ks (has `services --enabled=network`) https://www.happyassassin.net/ks/testsvc2.ks (has `chkconfig network on` in %post instead) https://www.happyassassin.net/ks/testsvc3.ks (has neither `services` line nor `chkconfig` in %post) The first causes a current Fedora 25 installer to crash (unless you use one of the updates images). The second installs clean and has the network service enabled. The third installs clean and does not have the network service enabled. That's all as I'd expect. oh, obviously I'm +1 blocker on this, it violates the cited criterion. +1 blocker. That's +3, setting accepted. We have applied the fedora-kickstarts workaround for this, but I don't think we should mark the bug as fixed; rather, if that works, we should just drop the blocker status. The anaconda bug is still valid, we are just working around it. So this might turn out to have been a bit of a hijack... With the kickstart workaround and a few other things we ran into along the way, we can do a Cloud base image compose where there's no crash of the post-install setup thread, so all the service enablement happens and %post happens: https://koji.fedoraproject.org/koji/taskinfo?taskID=15365954 unfortunately, it still doesn't freaking *boot*. So it seems like the 'post-install setup thread crashes' bug wasn't actually causing the 'image doesn't boot' bug. (I was kinda figuring that the failure to do the %post workaround for #1147998 was what was causing the image not to be bootable, but apparently not). So we may need to create another bug for the boot failure. This is still definitely a real anaconda bug, though. I have a question: why is the onus on us to fix this? Why doesn't the network service migrate to using systemd after all these years? I'm kind of starting to see this as unearthing the historical relics that need to be updated. systemd isn't new anymore. well, you'd be best asking lukas I guess, but I believe the network service is basically kinda resistant to systemd conversion. the main reason we still have it is for backward compatibility with all the use cases that rely on its behaviour, and I believe that making it into a systemd service would kind of unavoidably change its behaviour, at which point the value of having it is substantially diminished. but I'm not an expert on that, IMBW. the onus isn't *necessarily* on anaconda to fix it; probably the 'best' fix would be to make chkconfig handle --root. that's why I filed a bug on that, and marked it as See Also: - it's https://bugzilla.redhat.com/show_bug.cgi?id=1369916 . but it does seem worth tracking this separately as it is possible to fix it on the anaconda side without chkconfig being fixed, if that turns out to be a problem for some reason. However, if that bug does get fixed, we coul close this one right away, as anaconda would then work fine without changes. I understand what you're saying, but in the real world I think we're _probably_ not going to get away with 'you can't work with sysv services any more, sorry'...at least not yet. oh, I did actually file a new bug for the syslinux boot fail even after this problem is worked around: https://bugzilla.redhat.com/show_bug.cgi?id=1369934 . (In reply to Adam Williamson from comment #20) > well, you'd be best asking lukas I guess, but I believe the network service > is basically kinda resistant to systemd conversion. the main reason we still > have it is for backward compatibility with all the use cases that rely on > its behaviour, and I believe that making it into a systemd service would > kind of unavoidably change its behaviour, at which point the value of having > it is substantially diminished. but I'm not an expert on that, IMBW. Ok, that's a fair point -- so then I pose the question to Lukas, about why the network service hasn't been migrated to use systemd. I just haven't kept up with a lot of the mail flying around in fedora-devel, so if that discussion ever took place there, I missed it. I'm not sure which Lukas to needinfo here, since I can think of like three off the top of my head. :-/ > I understand what you're saying, but in the real world I think we're > _probably_ not going to get away with 'you can't work with sysv services any > more, sorry'...at least not yet. Sure, "not yet" is fine and understandable, but I think we should consider deprecating old sysvinit style services then, so there is enough time for people to migrate them to our not-so-new-anymore default. And I'm still voicing my dissent on this being a blocker. I see the criteria, but I need to bring up a few glaring points: (a) The blocker criteria is arbitrary. We set those guidelines, therefore we can change them. We've also just ignored them before. (b) I'm not sure how large the target audience is for Fedora cloud, especially in an alpha release. If that's a small group of people, this isn't worth it. (c) The cloud images have been completely broken since the end of F24, and nobody noticed until today, the day before go/no-go. Not even the Cloud Working Group noticed. we never 'just ignore' the criteria. we sometimes change them on the fly, and we sometimes come up with extremely flexible interpretations of them, and we sometimes say "we can't block on this otherwise-blocker-worthy issue because we have no ability to fix it in any remotely reasonable time frame", but we never just *ignore* them. ;) as the criteria stand this is quite clearly an automatic blocker. Cloud base is a release blocking image, and it completely fails to boot. There's really just no ambiguity there. Whoever wants to be in charge of these things (it should be Cloud WG, but I hear rumblings that Cloud WG is obsolete or something) could declare that Cloud base is no longer a blocking image, at which point this magically ceases to be a blocker. Or FESCo or the Board or someone could say 'Cloud WG is clearly not doing a great job and thus none of their images are blocking any more'. I don't mind at all, personally, if whoever's responsible wants to do that, though it seems a bit like sharp practice. But so long as it's a blocking image, this bug is clearly a blocker. We already worked around it in kickstarts, anyway, and have a new compose running which should work. oh, the Lukas in question is lnykryn. He's already in CC. There is a couple of historical reason why we don't want have unitfile for that. I am also maintaining chkconfig and in the past I tried to implement the --root, but it was "too ugly and nobody complained" so I did not finished that. But I will move it to my todo again. There was also some move to change the cloud image to use systemd-networkd which would avoid this issue for that case at least. the workaround via kickstarts does appear to have worked in the Alpha-1.2 compose, so this issue is no longer release blocking. Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |