Bug 1369930

Summary: When creating cloud images, anaconda does not fully crash on post-install setup errors, so compose process proceeds
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 26CC: anaconda-maint-list, g.kaviyarasu, imcleod, jonathan, kevin, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-29 12:36:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Adam Williamson 2016-08-24 18:47:02 UTC
It seems like, at present, when we build Cloud images, if there is an error during post-install setup, the thread in which it's running crashes, but the main anaconda process does not. The main anaconda process seems to complete 'successfully', so the image compose process continues. This could hide major problems - in fact, it already has.

Here's an example log:

https://kojipkgs.fedoraproject.org//work/tasks/5670/15365670/oz-x86_64.log

Note that the kickstart:

https://kojipkgs.fedoraproject.org//work/tasks/5670/15365670/koji-f25-build-15365670-base.ks

has "%post --erroronfail". In the oz log, we can see that there is an error in that section:

18:10:15,462 INFO program: error reading information on service rsyslog: No such file or directory
18:10:15,463 DEBUG program: Return code: 1
18:10:15,463 ERR anaconda: Error code 1 running the kickstart script at line 37
18:10:16,173 ERR anaconda: There was an error running the kickstart script at line 37.  This is a fatal error and installation will be aborted.  The details of this error are: <snip>

then it posts a bunch of info ending in a traceback. But it seems like control reverts to the main thread, and the main thread doesn't actually crash (this isn't *entirely* clear, but the fact that program.log logs a call to systemctl reboot suggests it carried on with the kickstart's instruction to reboot the system on completion, instead of just dying):

18:10:17,176 DEBUG anaconda: Gtk cannot be initialized
18:10:17,177 DEBUG anaconda: In the main thread, running exception handler
Waiting for factory-build-fff41a09-fc9e-4909-9706-f38178d3aad1 to finish installing, 6860/7200
18:10:27,207 DEBUG packaging: getting release version from tree at None (25)
18:10:27,207 DEBUG packaging: using default release version of 25
18:10:27,209 INFO program: Running... systemctl --no-wall reboot
18:10:27,357 DEBUG program: Return code: 0

subsequent messages show imagefactory deciding the install has completed successfully and going on to produce the final image file.

The same behaviour is observed if the kickstart `services` line tries to enable a non-existent service. This causes the anaconda thread handling post-install setup to crash, and all subsequent actions it would have taken (including the whole of %post) just do not happen, but the main thread seems to go ahead and reboot the system, and imagefactory goes ahead and produces an image.

This is why we were getting 'successful' composes but entirely broken images for Cloud lately.

I think it would be better if, somehow or other, the image compose failed in these cases.

Comment 1 Fedora End Of Life 2017-02-28 10:08:49 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 2 Fedora End Of Life 2018-05-03 08:17:10 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 3 Fedora End Of Life 2018-05-29 12:36:05 UTC
Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26
is no longer maintained, which means that it will not receive any
further security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.