Created attachment 645066 [details]
Description of problem:
I have started a standard installation. When I set the root password, the whole screen went black. No exception screen (why not?), just black black screen.
Anaconda main screen says:
> /usr/share/cracklib/pw_dict.pwd: No such file or directory
> PWOpen: No such file or directory
> Pane is dead
I have done many installs and I have seen this just twice. It is probably very race conditional.
Version-Release number of selected component (if applicable):
F18 Beta TC8 DVD x86_64
Steps to Reproduce:
1. start the installation
2. enter root password
3. black screen
Created attachment 645067 [details]
Created attachment 645068 [details]
Created attachment 645069 [details]
Created attachment 645070 [details]
Created attachment 645071 [details]
Created attachment 645072 [details]
Nasty bug I would say, proposing as Final blocker.
I have seen this once with English language and keymap, and once with Czech language and keymap, if that helps.
Anaconda uses pwquality module to check the password. The module is implemented as a binary python extension in /usr/lib64/python2.7/site-packages/pwquality.so
If a binary extension segfaults, it causes the whole Python environment to crash. And this looks like just like it, because simple Python exception starts ABRT handler and allows the user to fill a bug.
Unfortunately I can't prove/debug it without reproducing it or more logs - dmesg, core dump or systemd logs (if systemd can report segfaults).
Discussed at 2012-11-28 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-28/f18final-blocker-review-1.2012-11-28-16.59.log.txt . We agreed that this is a worrying bug, but no-one present at the meeting had seen it in numerous installs, and we're worried about the fact that it may be hard to fix without a reliable reproducer. We agreed to delay the decision on this one while we ask the community if anyone else has seen this one, and try to get more data.
Discussed again at 2012-11-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-29/f18final-blocker-review-1.1.2012-11-29-17.01.log.txt . The list inquiry brought the info that several people have also seen this bug, but no-one has a reliable reproducer. We agreed to delay the decision again until we get more concrete information on the bug, and ideally a reliable reproducer (which would make it a blocker).
I was able to reproduce it reliably on ppc64 using RC2.0 + updates.img.
I noticed that to reproduce this bug I have to wait until anaconda starts to install the packages before trying to set the root password.
Also, if I wait until installation is finished I can set the root password with no problem before clicking 'finish'.
I'm attaching dmesg.
Where is systemd logs? The files in /run/systemd/journal are zeroed.
How can I generate a core dump for this?
Created attachment 654510 [details]
The issue seems to be caused by cracklib calling exit() in case /usr/share/cracklib/pw_dict.pwd is not found (function FascistCheck on ./lib/fascist.c).
The race condition exists because that file exists in the installer file system. So while Anaconda doesn't chroot into /mnt/sysimage you are able to set the root password (cracklib won't crash). As soon as Anaconda does the chroot, you won't be able to set the root password until cracklib gets installed in the sysimage.
I didn't validate any of this, though it makes sense based on my reading of the code, Anaconda calls python-pwquality.check() -> pwquality_check() -> FascistCheck() -> exit()
I think there are two possible solutions:
1) In case it doesn't need to run inside the chroot, it can be done in a separate process forked before the chroot is done.
2) Disable the root password dialog until cracklib gets intalled in the target file system.
Additionally, cracklib definitely should be patched to not exit() from its API. There is not reason to do that.
(In reply to comment #14)
> Additionally, cracklib definitely should be patched to not exit() from its
> API. There is not reason to do that.
Defensive programming calls for anaconda to intercept such a call, thus preventing a subsystem from bringing down the whole installer. In C this would be: override exit() with the installer's own catching function, and use dlopen(, RTLD_NEXT)+dlsym() in order to reach the original definition.
Petr Schindler and me reproduced the issue.
1. start installation
2. set "fedorafedorafedora" as root password
3. repeat: open the rootpw dialog, append a character to the password, confirm twice
It should crash in a few attempts (once package installation starts).
Created attachment 656644 [details]
Logs showing the stat of root directory during password check
It really looks like that something is doing chroot directly in the anaconda process (or possibly it's forked copy).
I updated the source to get verbose logging and got this for successful attempt:
08:45:10,541 DEBUG anaconda: ValidatePassword PWD: '/root' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64768L, st_nlink=13, st_uid=0, st_gid=0, st_size=1024L, st_atime=1354542117, st_mtime=1354542148, st_ctime=1354542148)' Dict: 'True'
And this for the crashing attempt:
08:45:11,453 DEBUG anaconda: ValidatePassword PWD: '(unreachable)/' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64770L, st_nlink=12, st_uid=0, st_gid=0, st_size=4096L, st_atime=1354542227, st_mtime=1354542308, st_ctime=1354542308)' Dict: 'False'
PWD: should be cwd.. and shows the current anaconda's working directory
Dict: shows the presence of libpwcheck's dictionary file.
Stat: shows the stat of the root point /
Created attachment 656677 [details]
Complete strace up to the crash point
Here is the complete strace of crashed session. I could not find any chroot in there though..
/proc says the root was indeed changed to /mnt/sysimage, but there is not call to chroot in the strace.
The chroot happens after "Starting package installation process".
It is getting weirder and weirder. I added some more debug outputs to anaconda packaging log and reproduced this. What I found out really confused me..
Debugger started from the Gtk button callback (just before the crashing pwcheck call) sees /mnt/sysimage contents as / (which means it is chrooted). os.getpid() returns 693.
Debug print in the Yum callback (packaging/yumpayload.py) reports pid 693 (the same!), but sees the ramdisk contents at /.
How can one process have two threads which are chrooted and non-chrooted at the same time?
Discussed at 2012-12-05 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-05/f18final-blocker-review-2.2012-12-05-17.01.log.txt . Accepted as a blocker, now we have a reliable reproducer which indicates something is really wrong here, per criterion "The installer must be able to complete an installation using all supported interfaces" - when you hit this bug, the installation fails.
So, the chroot really is happening in RPM and has to stay there.
Because this is much bigger issue with regards to how we deal with threads in Anaconda I am trying to sort everything out on our devel list with the packaging team. Once I have any idea how to fix the situation, I will come back here.
We can and will remove the exit() call but the problem will stay - just less pronounced - the root password will be in such case rejected (not sure if anaconda enforces the rejection or not) as weak due to the missing dictionary.
anaconda does handle rejections and let you re-enter, so that would be less bad, though still pretty confusing.
Btw. what's the reason to omit cracklib dictionary?
The dictionary is there.. it's just that anaconda can't see it while in chroot (cased by RPM).
Yep, I forgot the main chroot issue. Sorry. Any progress on this? Cracklib exit fix seems to be the preferred option now, true?
This build should be picked up for F18 to workaround crash
"- update to 2.8.22 (#887461), which now returns an error instead of exiting when there's a failure opening the dictionary in FascistCheck()"
thanks. anaconda team, with that fix in cracklib, can we clean things up any further on anaconda side?
(In reply to comment #29)
> thanks. anaconda team, with that fix in cracklib, can we clean things up any
> further on anaconda side?
I talked to Martin, seems like no further change on Anaconda side is needed.
might be stupid, but why don't you just move the root pw option to a post/pre install step? Why hsould these happen in parallel?
it's late to move spokes around, it's not a simple/safe operation. (it was actually a pre-install step in early f18 builds, then moved to during-install as it was thought a neat way to save time).
jreznik: I was going off of https://bugzilla.redhat.com/show_bug.cgi?id=876716#c23 , which says that you'll get a bogus 'bad password' error if the bug happens. Which is nowhere near as bad as a black screen crash, but still pretty confusing. But I guess we can test and see if that's really the case. Marking as ON_QA. Can someone edit the update and mark it as fixing this bug? It helps with our accounting.
the cracklib update has gone stable now. I tested a couple of times and couldn't trigger the crash, it does seem to give a dictionary error instead. can others confirm before we close this?
oh, test with https://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/
No longer crashes. Closing.