Bug 876716

Summary: anaconda crashes after setting root password
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: anacondaAssignee: Martin Sivák <msivak>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: akostadi, awilliam, g.kaviyarasu, gustavold, jonathan, jreiser, jreznik, jstodola, mbanas, mfabian, nalin, pmatilai, robatino, sbueno, tmraz, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=886995
Whiteboard: AcceptedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 893093 (view as bug list) Environment:
Last Closed: 2013-01-02 08:39:30 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 893093, 752661, 901510    
Attachments:
Description Flags
main pane
none
anaconda.log
none
packaging.log
none
program.log
none
storage.log
none
syslog
none
X.log
none
dmesg
none
Logs showing the stat of root directory during password check
none
Complete strace up to the crash point none

Description Kamil Páral 2012-11-14 14:15:57 EST
Created attachment 645066 [details]
main pane

Description of problem:
I have started a standard installation. When I set the root password, the whole screen went black. No exception screen (why not?), just black black screen.

Anaconda main screen says:

> /usr/share/cracklib/pw_dict.pwd: No such file or directory
> PWOpen: No such file or directory
> Pane is dead

I have done many installs and I have seen this just twice. It is probably very race conditional.

Version-Release number of selected component (if applicable):
F18 Beta TC8 DVD x86_64
anaconda 18.28

How reproducible:
rarely

Steps to Reproduce:
1. start the installation
2. enter root password
3. black screen
Comment 1 Kamil Páral 2012-11-14 14:16:44 EST
Created attachment 645067 [details]
anaconda.log
Comment 2 Kamil Páral 2012-11-14 14:16:50 EST
Created attachment 645068 [details]
packaging.log
Comment 3 Kamil Páral 2012-11-14 14:16:56 EST
Created attachment 645069 [details]
program.log
Comment 4 Kamil Páral 2012-11-14 14:16:59 EST
Created attachment 645070 [details]
storage.log
Comment 5 Kamil Páral 2012-11-14 14:17:04 EST
Created attachment 645071 [details]
syslog
Comment 6 Kamil Páral 2012-11-14 14:17:09 EST
Created attachment 645072 [details]
X.log
Comment 7 Kamil Páral 2012-11-14 14:18:28 EST
Nasty bug I would say, proposing as Final blocker.
Comment 8 Kamil Páral 2012-11-14 14:21:10 EST
I have seen this once with English language and keymap, and once with Czech language and keymap, if that helps.
Comment 9 Martin Sivák 2012-11-15 05:56:20 EST
Anaconda uses pwquality module to check the password. The module is implemented as a binary python extension in /usr/lib64/python2.7/site-packages/pwquality.so

If a binary extension segfaults, it causes the whole Python environment to crash. And this looks like just like it, because simple Python exception starts ABRT handler and allows the user to fill a bug.

Unfortunately I can't prove/debug it without reproducing it or more logs - dmesg, core dump or systemd logs (if systemd can report segfaults).
Comment 10 Adam Williamson 2012-11-28 13:07:13 EST
Discussed at 2012-11-28 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-28/f18final-blocker-review-1.2012-11-28-16.59.log.txt . We agreed that this is a worrying bug, but no-one present at the meeting had seen it in numerous installs, and we're worried about the fact that it may be hard to fix without a reliable reproducer. We agreed to delay the decision on this one while we ask the community if anyone else has seen this one, and try to get more data.
Comment 11 Adam Williamson 2012-11-29 12:58:54 EST
Discussed again at 2012-11-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-29/f18final-blocker-review-1.1.2012-11-29-17.01.log.txt . The list inquiry brought the info that several people have also seen this bug, but no-one has a reliable reproducer. We agreed to delay the decision again until we get more concrete information on the bug, and ideally a reliable reproducer (which would make it a blocker).
Comment 12 Gustavo Luiz Duarte 2012-11-29 14:04:47 EST
I was able to reproduce it reliably on ppc64 using RC2.0 + updates.img.

RC2.0: http://ppc.koji.fedoraproject.org/stage/f18-Beta-RC2.0/
updates.img: http://dwa.fedorapeople.org/880277-updates.img

I noticed that to reproduce this bug I have to wait until anaconda starts to install the packages before trying to set the root password.
Also, if I wait until installation is finished I can set the root password with no problem before clicking 'finish'.

I'm attaching dmesg.
Where is systemd logs? The files in /run/systemd/journal are zeroed.
How can I generate a core dump for this?
Comment 13 Gustavo Luiz Duarte 2012-11-29 14:07:35 EST
Created attachment 654510 [details]
dmesg
Comment 14 Gustavo Luiz Duarte 2012-11-29 17:03:15 EST
The issue seems to be caused by cracklib calling exit() in case /usr/share/cracklib/pw_dict.pwd is not found (function FascistCheck on ./lib/fascist.c).

The race condition exists because that file exists in the installer file system. So while Anaconda doesn't chroot into /mnt/sysimage you are able to set the root password (cracklib won't crash). As soon as Anaconda does the chroot, you won't be able to set the root password until cracklib gets installed in the sysimage.

I didn't validate any of this, though it makes sense based on my reading of the code, Anaconda calls python-pwquality.check() -> pwquality_check() -> FascistCheck() -> exit()

I think there are two possible solutions:
1) In case it doesn't need to run inside the chroot, it can be done in a separate process forked before the chroot is done.
2) Disable the root password dialog until cracklib gets intalled in the target file system.

Additionally, cracklib definitely should be patched to not exit() from its API. There is not reason to do that.
Comment 15 John Reiser 2012-11-30 18:55:49 EST
(In reply to comment #14)
> Additionally, cracklib definitely should be patched to not exit() from its
> API. There is not reason to do that.

Defensive programming calls for anaconda to intercept such a call, thus preventing a subsystem from bringing down the whole installer.  In C this would be: override exit() with the installer's own catching function, and use dlopen(, RTLD_NEXT)+dlsym() in order to reach the original definition.
Comment 16 Kamil Páral 2012-12-03 08:23:24 EST
Petr Schindler and me reproduced the issue.

Reproducer:
1. start installation
2. set "fedorafedorafedora" as root password
3. repeat: open the rootpw dialog, append a character to the password, confirm twice

It should crash in a few attempts (once package installation starts).
Comment 17 Martin Sivák 2012-12-03 08:56:04 EST
Created attachment 656644 [details]
Logs showing the stat of root directory during password check

It really looks like that something is doing chroot directly in the anaconda process (or possibly it's forked copy).

I updated the source to get verbose logging and got this for successful attempt:

08:45:10,541 DEBUG anaconda: ValidatePassword PWD: '/root' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64768L, st_nlink=13, st_uid=0, st_gid=0, st_size=1024L, st_atime=1354542117, st_mtime=1354542148, st_ctime=1354542148)' Dict: 'True'

And this for the crashing attempt:

08:45:11,453 DEBUG anaconda: ValidatePassword PWD: '(unreachable)/' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64770L, st_nlink=12, st_uid=0, st_gid=0, st_size=4096L, st_atime=1354542227, st_mtime=1354542308, st_ctime=1354542308)' Dict: 'False'

PWD: should be cwd.. and shows the current anaconda's working directory
Dict: shows the presence of libpwcheck's dictionary file.
Stat: shows the stat of the root point /
Comment 18 Martin Sivák 2012-12-03 09:19:10 EST
Created attachment 656677 [details]
Complete strace up to the crash point

Here is the complete strace of crashed session. I could not find any chroot in there though..
Comment 19 Martin Sivák 2012-12-03 11:05:43 EST
/proc says the root was indeed changed to /mnt/sysimage, but there is not call to chroot in the strace.

The chroot happens after "Starting package installation process".
Comment 20 Martin Sivák 2012-12-04 08:41:32 EST
It is getting weirder and weirder. I added some more debug outputs to anaconda packaging log and reproduced this. What I found out really confused me..

Debugger started from the Gtk button callback (just before the crashing pwcheck call) sees /mnt/sysimage contents as / (which means it is chrooted). os.getpid() returns 693.

Debug print in the Yum callback (packaging/yumpayload.py) reports pid 693 (the same!), but sees the ramdisk contents at /.

How can one process have two threads which are chrooted and non-chrooted at the same time?
Comment 21 Adam Williamson 2012-12-05 13:20:17 EST
Discussed at 2012-12-05 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-05/f18final-blocker-review-2.2012-12-05-17.01.log.txt .  Accepted as a blocker, now we have a reliable reproducer which indicates something is really wrong here, per criterion "The installer must be able to complete an installation using all supported interfaces" - when you hit this bug, the installation fails.
Comment 22 Martin Sivák 2012-12-10 08:32:14 EST
So, the chroot really is happening in RPM and has to stay there.

Because this is much bigger issue with regards to how we deal with threads in Anaconda I am trying to sort everything out on our devel list with the packaging team. Once I have any idea how to fix the situation, I will come back here.
Comment 23 Tomas Mraz 2012-12-14 04:30:03 EST
We can and will remove the exit() call but the problem will stay - just less pronounced - the root password will be in such case rejected (not sure if anaconda enforces the rejection or not) as weak due to the missing dictionary.
Comment 24 Adam Williamson 2012-12-14 16:17:29 EST
anaconda does handle rejections and let you re-enter, so that would be less bad, though still pretty confusing.
Comment 25 Jaroslav Reznik 2012-12-17 06:24:55 EST
Btw. what's the reason to omit cracklib dictionary?
Comment 26 Martin Sivák 2012-12-17 10:55:59 EST
The dictionary is there.. it's just that anaconda can't see it while in chroot (cased by RPM).
Comment 27 Jaroslav Reznik 2012-12-18 08:07:38 EST
Yep, I forgot the main chroot issue. Sorry. Any progress on this? Cracklib exit fix seems to be the preferred option now, true?
Comment 28 Jaroslav Reznik 2012-12-18 08:14:52 EST
This build should be picked up for F18 to workaround crash
https://admin.fedoraproject.org/updates/FEDORA-2012-20331/cracklib-2.8.22-1.fc18

"- update to 2.8.22 (#887461), which now returns an error instead of exiting when there's a failure opening the dictionary in FascistCheck()"
Comment 29 Adam Williamson 2012-12-18 15:57:00 EST
thanks. anaconda team, with that fix in cracklib, can we clean things up any further on anaconda side?
Comment 30 Jaroslav Reznik 2012-12-19 07:49:10 EST
(In reply to comment #29)
> thanks. anaconda team, with that fix in cracklib, can we clean things up any
> further on anaconda side?

I talked to Martin, seems like no further change on Anaconda side is needed.
Comment 31 Aleksandar Kostadinov 2012-12-19 16:04:03 EST
might be stupid, but why don't you just move the root pw option to a post/pre install step? Why hsould these happen in parallel?
Comment 32 Adam Williamson 2012-12-19 18:17:52 EST
it's late to move spokes around, it's not a simple/safe operation. (it was actually a pre-install step in early f18 builds, then moved to during-install as it was thought a neat way to save time).
Comment 33 Adam Williamson 2012-12-19 18:19:27 EST
jreznik: I was going off of https://bugzilla.redhat.com/show_bug.cgi?id=876716#c23 , which says that you'll get a bogus 'bad password' error if the bug happens. Which is nowhere near as bad as a black screen crash, but still pretty confusing. But I guess we can test and see if that's really the case. Marking as ON_QA. Can someone edit the update and mark it as fixing this bug? It helps with our accounting.
Comment 34 Adam Williamson 2012-12-20 01:34:54 EST
the cracklib update has gone stable now. I tested a couple of times and couldn't trigger the crash, it does seem to give a dictionary error instead. can others confirm before we close this?
Comment 35 Adam Williamson 2012-12-20 01:35:09 EST
oh, test with https://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/
Comment 36 Kamil Páral 2013-01-02 08:39:30 EST
No longer crashes. Closing.