Bug 876716 - anaconda crashes after setting root password
Summary: anaconda crashes after setting root password
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Martin Sivák
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F18Blocker, F18FinalBlocker 893093 901510
TreeView+ depends on / blocked
 
Reported: 2012-11-14 19:15 UTC by Kamil Páral
Modified: 2013-03-27 06:16 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
: 893093 (view as bug list)
Environment:
Last Closed: 2013-01-02 13:39:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
main pane (12.49 KB, image/png)
2012-11-14 19:15 UTC, Kamil Páral
no flags Details
anaconda.log (10.21 KB, text/plain)
2012-11-14 19:16 UTC, Kamil Páral
no flags Details
packaging.log (5.18 KB, text/plain)
2012-11-14 19:16 UTC, Kamil Páral
no flags Details
program.log (48.15 KB, text/plain)
2012-11-14 19:16 UTC, Kamil Páral
no flags Details
storage.log (132.21 KB, text/plain)
2012-11-14 19:16 UTC, Kamil Páral
no flags Details
syslog (64.50 KB, text/plain)
2012-11-14 19:17 UTC, Kamil Páral
no flags Details
X.log (58.38 KB, text/plain)
2012-11-14 19:17 UTC, Kamil Páral
no flags Details
dmesg (23.57 KB, text/plain)
2012-11-29 19:07 UTC, Gustavo Luiz Duarte
no flags Details
Logs showing the stat of root directory during password check (12.99 KB, text/plain)
2012-12-03 13:56 UTC, Martin Sivák
no flags Details
Complete strace up to the crash point (10.24 MB, text/plain)
2012-12-03 14:19 UTC, Martin Sivák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 886995 0 unspecified CLOSED open the cracklib dictionary directly to control how to handle errors 2021-02-22 00:41:40 UTC

Internal Links: 886995

Description Kamil Páral 2012-11-14 19:15:57 UTC
Created attachment 645066 [details]
main pane

Description of problem:
I have started a standard installation. When I set the root password, the whole screen went black. No exception screen (why not?), just black black screen.

Anaconda main screen says:

> /usr/share/cracklib/pw_dict.pwd: No such file or directory
> PWOpen: No such file or directory
> Pane is dead

I have done many installs and I have seen this just twice. It is probably very race conditional.

Version-Release number of selected component (if applicable):
F18 Beta TC8 DVD x86_64
anaconda 18.28

How reproducible:
rarely

Steps to Reproduce:
1. start the installation
2. enter root password
3. black screen

Comment 1 Kamil Páral 2012-11-14 19:16:44 UTC
Created attachment 645067 [details]
anaconda.log

Comment 2 Kamil Páral 2012-11-14 19:16:50 UTC
Created attachment 645068 [details]
packaging.log

Comment 3 Kamil Páral 2012-11-14 19:16:56 UTC
Created attachment 645069 [details]
program.log

Comment 4 Kamil Páral 2012-11-14 19:16:59 UTC
Created attachment 645070 [details]
storage.log

Comment 5 Kamil Páral 2012-11-14 19:17:04 UTC
Created attachment 645071 [details]
syslog

Comment 6 Kamil Páral 2012-11-14 19:17:09 UTC
Created attachment 645072 [details]
X.log

Comment 7 Kamil Páral 2012-11-14 19:18:28 UTC
Nasty bug I would say, proposing as Final blocker.

Comment 8 Kamil Páral 2012-11-14 19:21:10 UTC
I have seen this once with English language and keymap, and once with Czech language and keymap, if that helps.

Comment 9 Martin Sivák 2012-11-15 10:56:20 UTC
Anaconda uses pwquality module to check the password. The module is implemented as a binary python extension in /usr/lib64/python2.7/site-packages/pwquality.so

If a binary extension segfaults, it causes the whole Python environment to crash. And this looks like just like it, because simple Python exception starts ABRT handler and allows the user to fill a bug.

Unfortunately I can't prove/debug it without reproducing it or more logs - dmesg, core dump or systemd logs (if systemd can report segfaults).

Comment 10 Adam Williamson 2012-11-28 18:07:13 UTC
Discussed at 2012-11-28 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-28/f18final-blocker-review-1.2012-11-28-16.59.log.txt . We agreed that this is a worrying bug, but no-one present at the meeting had seen it in numerous installs, and we're worried about the fact that it may be hard to fix without a reliable reproducer. We agreed to delay the decision on this one while we ask the community if anyone else has seen this one, and try to get more data.

Comment 11 Adam Williamson 2012-11-29 17:58:54 UTC
Discussed again at 2012-11-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-29/f18final-blocker-review-1.1.2012-11-29-17.01.log.txt . The list inquiry brought the info that several people have also seen this bug, but no-one has a reliable reproducer. We agreed to delay the decision again until we get more concrete information on the bug, and ideally a reliable reproducer (which would make it a blocker).

Comment 12 Gustavo Luiz Duarte 2012-11-29 19:04:47 UTC
I was able to reproduce it reliably on ppc64 using RC2.0 + updates.img.

RC2.0: http://ppc.koji.fedoraproject.org/stage/f18-Beta-RC2.0/
updates.img: http://dwa.fedorapeople.org/880277-updates.img

I noticed that to reproduce this bug I have to wait until anaconda starts to install the packages before trying to set the root password.
Also, if I wait until installation is finished I can set the root password with no problem before clicking 'finish'.

I'm attaching dmesg.
Where is systemd logs? The files in /run/systemd/journal are zeroed.
How can I generate a core dump for this?

Comment 13 Gustavo Luiz Duarte 2012-11-29 19:07:35 UTC
Created attachment 654510 [details]
dmesg

Comment 14 Gustavo Luiz Duarte 2012-11-29 22:03:15 UTC
The issue seems to be caused by cracklib calling exit() in case /usr/share/cracklib/pw_dict.pwd is not found (function FascistCheck on ./lib/fascist.c).

The race condition exists because that file exists in the installer file system. So while Anaconda doesn't chroot into /mnt/sysimage you are able to set the root password (cracklib won't crash). As soon as Anaconda does the chroot, you won't be able to set the root password until cracklib gets installed in the sysimage.

I didn't validate any of this, though it makes sense based on my reading of the code, Anaconda calls python-pwquality.check() -> pwquality_check() -> FascistCheck() -> exit()

I think there are two possible solutions:
1) In case it doesn't need to run inside the chroot, it can be done in a separate process forked before the chroot is done.
2) Disable the root password dialog until cracklib gets intalled in the target file system.

Additionally, cracklib definitely should be patched to not exit() from its API. There is not reason to do that.

Comment 15 John Reiser 2012-11-30 23:55:49 UTC
(In reply to comment #14)
> Additionally, cracklib definitely should be patched to not exit() from its
> API. There is not reason to do that.

Defensive programming calls for anaconda to intercept such a call, thus preventing a subsystem from bringing down the whole installer.  In C this would be: override exit() with the installer's own catching function, and use dlopen(, RTLD_NEXT)+dlsym() in order to reach the original definition.

Comment 16 Kamil Páral 2012-12-03 13:23:24 UTC
Petr Schindler and me reproduced the issue.

Reproducer:
1. start installation
2. set "fedorafedorafedora" as root password
3. repeat: open the rootpw dialog, append a character to the password, confirm twice

It should crash in a few attempts (once package installation starts).

Comment 17 Martin Sivák 2012-12-03 13:56:04 UTC
Created attachment 656644 [details]
Logs showing the stat of root directory during password check

It really looks like that something is doing chroot directly in the anaconda process (or possibly it's forked copy).

I updated the source to get verbose logging and got this for successful attempt:

08:45:10,541 DEBUG anaconda: ValidatePassword PWD: '/root' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64768L, st_nlink=13, st_uid=0, st_gid=0, st_size=1024L, st_atime=1354542117, st_mtime=1354542148, st_ctime=1354542148)' Dict: 'True'

And this for the crashing attempt:

08:45:11,453 DEBUG anaconda: ValidatePassword PWD: '(unreachable)/' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64770L, st_nlink=12, st_uid=0, st_gid=0, st_size=4096L, st_atime=1354542227, st_mtime=1354542308, st_ctime=1354542308)' Dict: 'False'

PWD: should be cwd.. and shows the current anaconda's working directory
Dict: shows the presence of libpwcheck's dictionary file.
Stat: shows the stat of the root point /

Comment 18 Martin Sivák 2012-12-03 14:19:10 UTC
Created attachment 656677 [details]
Complete strace up to the crash point

Here is the complete strace of crashed session. I could not find any chroot in there though..

Comment 19 Martin Sivák 2012-12-03 16:05:43 UTC
/proc says the root was indeed changed to /mnt/sysimage, but there is not call to chroot in the strace.

The chroot happens after "Starting package installation process".

Comment 20 Martin Sivák 2012-12-04 13:41:32 UTC
It is getting weirder and weirder. I added some more debug outputs to anaconda packaging log and reproduced this. What I found out really confused me..

Debugger started from the Gtk button callback (just before the crashing pwcheck call) sees /mnt/sysimage contents as / (which means it is chrooted). os.getpid() returns 693.

Debug print in the Yum callback (packaging/yumpayload.py) reports pid 693 (the same!), but sees the ramdisk contents at /.

How can one process have two threads which are chrooted and non-chrooted at the same time?

Comment 21 Adam Williamson 2012-12-05 18:20:17 UTC
Discussed at 2012-12-05 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-05/f18final-blocker-review-2.2012-12-05-17.01.log.txt .  Accepted as a blocker, now we have a reliable reproducer which indicates something is really wrong here, per criterion "The installer must be able to complete an installation using all supported interfaces" - when you hit this bug, the installation fails.

Comment 22 Martin Sivák 2012-12-10 13:32:14 UTC
So, the chroot really is happening in RPM and has to stay there.

Because this is much bigger issue with regards to how we deal with threads in Anaconda I am trying to sort everything out on our devel list with the packaging team. Once I have any idea how to fix the situation, I will come back here.

Comment 23 Tomas Mraz 2012-12-14 09:30:03 UTC
We can and will remove the exit() call but the problem will stay - just less pronounced - the root password will be in such case rejected (not sure if anaconda enforces the rejection or not) as weak due to the missing dictionary.

Comment 24 Adam Williamson 2012-12-14 21:17:29 UTC
anaconda does handle rejections and let you re-enter, so that would be less bad, though still pretty confusing.

Comment 25 Jaroslav Reznik 2012-12-17 11:24:55 UTC
Btw. what's the reason to omit cracklib dictionary?

Comment 26 Martin Sivák 2012-12-17 15:55:59 UTC
The dictionary is there.. it's just that anaconda can't see it while in chroot (cased by RPM).

Comment 27 Jaroslav Reznik 2012-12-18 13:07:38 UTC
Yep, I forgot the main chroot issue. Sorry. Any progress on this? Cracklib exit fix seems to be the preferred option now, true?

Comment 28 Jaroslav Reznik 2012-12-18 13:14:52 UTC
This build should be picked up for F18 to workaround crash
https://admin.fedoraproject.org/updates/FEDORA-2012-20331/cracklib-2.8.22-1.fc18

"- update to 2.8.22 (#887461), which now returns an error instead of exiting when there's a failure opening the dictionary in FascistCheck()"

Comment 29 Adam Williamson 2012-12-18 20:57:00 UTC
thanks. anaconda team, with that fix in cracklib, can we clean things up any further on anaconda side?

Comment 30 Jaroslav Reznik 2012-12-19 12:49:10 UTC
(In reply to comment #29)
> thanks. anaconda team, with that fix in cracklib, can we clean things up any
> further on anaconda side?

I talked to Martin, seems like no further change on Anaconda side is needed.

Comment 31 Aleksandar Kostadinov 2012-12-19 21:04:03 UTC
might be stupid, but why don't you just move the root pw option to a post/pre install step? Why hsould these happen in parallel?

Comment 32 Adam Williamson 2012-12-19 23:17:52 UTC
it's late to move spokes around, it's not a simple/safe operation. (it was actually a pre-install step in early f18 builds, then moved to during-install as it was thought a neat way to save time).

Comment 33 Adam Williamson 2012-12-19 23:19:27 UTC
jreznik: I was going off of https://bugzilla.redhat.com/show_bug.cgi?id=876716#c23 , which says that you'll get a bogus 'bad password' error if the bug happens. Which is nowhere near as bad as a black screen crash, but still pretty confusing. But I guess we can test and see if that's really the case. Marking as ON_QA. Can someone edit the update and mark it as fixing this bug? It helps with our accounting.

Comment 34 Adam Williamson 2012-12-20 06:34:54 UTC
the cracklib update has gone stable now. I tested a couple of times and couldn't trigger the crash, it does seem to give a dictionary error instead. can others confirm before we close this?

Comment 35 Adam Williamson 2012-12-20 06:35:09 UTC
oh, test with https://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/

Comment 36 Kamil Páral 2013-01-02 13:39:30 UTC
No longer crashes. Closing.


Note You need to log in before you can comment on or make changes to this bug.