Red Hat Bugzilla – Bug 901510
Do not let RPM play with (ch)root of the running process if DNF is used used as a library
Last modified: 2014-11-19 06:46:19 EST
+++ This bug was initially created as a clone of Bug #893093 +++
+++ This bug was initially created as a clone of Bug #876716 +++
Created attachment 645066 [details]
Description of problem:
I have started a standard installation. When I set the root password, the whole screen went black. No exception screen (why not?), just black black screen.
Anaconda main screen says:
> /usr/share/cracklib/pw_dict.pwd: No such file or directory
> PWOpen: No such file or directory
> Pane is dead
I have done many installs and I have seen this just twice. It is probably very race conditional.
Version-Release number of selected component (if applicable):
F18 Beta TC8 DVD x86_64
Steps to Reproduce:
1. start the installation
2. enter root password
3. black screen
--- Additional comment from Kamil Páral on 2012-11-14 14:16:44 EST ---
Created attachment 645067 [details]
--- Additional comment from Kamil Páral on 2012-11-14 14:16:50 EST ---
Created attachment 645068 [details]
--- Additional comment from Kamil Páral on 2012-11-14 14:16:56 EST ---
Created attachment 645069 [details]
--- Additional comment from Kamil Páral on 2012-11-14 14:16:59 EST ---
Created attachment 645070 [details]
--- Additional comment from Kamil Páral on 2012-11-14 14:17:04 EST ---
Created attachment 645071 [details]
--- Additional comment from Kamil Páral on 2012-11-14 14:17:09 EST ---
Created attachment 645072 [details]
--- Additional comment from Kamil Páral on 2012-11-14 14:18:28 EST ---
Nasty bug I would say, proposing as Final blocker.
--- Additional comment from Kamil Páral on 2012-11-14 14:21:10 EST ---
I have seen this once with English language and keymap, and once with Czech language and keymap, if that helps.
--- Additional comment from Martin Sivák on 2012-11-15 05:56:20 EST ---
Anaconda uses pwquality module to check the password. The module is implemented as a binary python extension in /usr/lib64/python2.7/site-packages/pwquality.so
If a binary extension segfaults, it causes the whole Python environment to crash. And this looks like just like it, because simple Python exception starts ABRT handler and allows the user to fill a bug.
Unfortunately I can't prove/debug it without reproducing it or more logs - dmesg, core dump or systemd logs (if systemd can report segfaults).
--- Additional comment from Adam Williamson on 2012-11-28 13:07:13 EST ---
Discussed at 2012-11-28 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-28/f18final-blocker-review-1.2012-11-28-16.59.log.txt . We agreed that this is a worrying bug, but no-one present at the meeting had seen it in numerous installs, and we're worried about the fact that it may be hard to fix without a reliable reproducer. We agreed to delay the decision on this one while we ask the community if anyone else has seen this one, and try to get more data.
--- Additional comment from Adam Williamson on 2012-11-29 12:58:54 EST ---
Discussed again at 2012-11-29 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-29/f18final-blocker-review-1.1.2012-11-29-17.01.log.txt . The list inquiry brought the info that several people have also seen this bug, but no-one has a reliable reproducer. We agreed to delay the decision again until we get more concrete information on the bug, and ideally a reliable reproducer (which would make it a blocker).
--- Additional comment from Gustavo Luiz Duarte on 2012-11-29 14:04:47 EST ---
I was able to reproduce it reliably on ppc64 using RC2.0 + updates.img.
I noticed that to reproduce this bug I have to wait until anaconda starts to install the packages before trying to set the root password.
Also, if I wait until installation is finished I can set the root password with no problem before clicking 'finish'.
I'm attaching dmesg.
Where is systemd logs? The files in /run/systemd/journal are zeroed.
How can I generate a core dump for this?
--- Additional comment from Gustavo Luiz Duarte on 2012-11-29 14:07:35 EST ---
Created attachment 654510 [details]
--- Additional comment from Gustavo Luiz Duarte on 2012-11-29 17:03:15 EST ---
The issue seems to be caused by cracklib calling exit() in case /usr/share/cracklib/pw_dict.pwd is not found (function FascistCheck on ./lib/fascist.c).
The race condition exists because that file exists in the installer file system. So while Anaconda doesn't chroot into /mnt/sysimage you are able to set the root password (cracklib won't crash). As soon as Anaconda does the chroot, you won't be able to set the root password until cracklib gets installed in the sysimage.
I didn't validate any of this, though it makes sense based on my reading of the code, Anaconda calls python-pwquality.check() -> pwquality_check() -> FascistCheck() -> exit()
I think there are two possible solutions:
1) In case it doesn't need to run inside the chroot, it can be done in a separate process forked before the chroot is done.
2) Disable the root password dialog until cracklib gets intalled in the target file system.
Additionally, cracklib definitely should be patched to not exit() from its API. There is not reason to do that.
--- Additional comment from John Reiser on 2012-11-30 18:55:49 EST ---
(In reply to comment #14)
> Additionally, cracklib definitely should be patched to not exit() from its
> API. There is not reason to do that.
Defensive programming calls for anaconda to intercept such a call, thus preventing a subsystem from bringing down the whole installer. In C this would be: override exit() with the installer's own catching function, and use dlopen(, RTLD_NEXT)+dlsym() in order to reach the original definition.
--- Additional comment from Kamil Páral on 2012-12-03 08:23:24 EST ---
Petr Schindler and me reproduced the issue.
1. start installation
2. set "fedorafedorafedora" as root password
3. repeat: open the rootpw dialog, append a character to the password, confirm twice
It should crash in a few attempts (once package installation starts).
--- Additional comment from Martin Sivák on 2012-12-03 08:56:04 EST ---
Created attachment 656644 [details]
Logs showing the stat of root directory during password check
It really looks like that something is doing chroot directly in the anaconda process (or possibly it's forked copy).
I updated the source to get verbose logging and got this for successful attempt:
08:45:10,541 DEBUG anaconda: ValidatePassword PWD: '/root' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64768L, st_nlink=13, st_uid=0, st_gid=0, st_size=1024L, st_atime=1354542117, st_mtime=1354542148, st_ctime=1354542148)' Dict: 'True'
And this for the crashing attempt:
08:45:11,453 DEBUG anaconda: ValidatePassword PWD: '(unreachable)/' Stat: 'posix.stat_result(st_mode=16877, st_ino=2L, st_dev=64770L, st_nlink=12, st_uid=0, st_gid=0, st_size=4096L, st_atime=1354542227, st_mtime=1354542308, st_ctime=1354542308)' Dict: 'False'
PWD: should be cwd.. and shows the current anaconda's working directory
Dict: shows the presence of libpwcheck's dictionary file.
Stat: shows the stat of the root point /
--- Additional comment from Martin Sivák on 2012-12-03 09:19:10 EST ---
Created attachment 656677 [details]
Complete strace up to the crash point
Here is the complete strace of crashed session. I could not find any chroot in there though..
--- Additional comment from Martin Sivák on 2012-12-03 11:05:43 EST ---
/proc says the root was indeed changed to /mnt/sysimage, but there is not call to chroot in the strace.
The chroot happens after "Starting package installation process".
--- Additional comment from Martin Sivák on 2012-12-04 08:41:32 EST ---
It is getting weirder and weirder. I added some more debug outputs to anaconda packaging log and reproduced this. What I found out really confused me..
Debugger started from the Gtk button callback (just before the crashing pwcheck call) sees /mnt/sysimage contents as / (which means it is chrooted). os.getpid() returns 693.
Debug print in the Yum callback (packaging/yumpayload.py) reports pid 693 (the same!), but sees the ramdisk contents at /.
How can one process have two threads which are chrooted and non-chrooted at the same time?
--- Additional comment from Adam Williamson on 2012-12-05 13:20:17 EST ---
Discussed at 2012-12-05 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-05/f18final-blocker-review-2.2012-12-05-17.01.log.txt . Accepted as a blocker, now we have a reliable reproducer which indicates something is really wrong here, per criterion "The installer must be able to complete an installation using all supported interfaces" - when you hit this bug, the installation fails.
--- Additional comment from Martin Sivák on 2012-12-10 08:32:14 EST ---
So, the chroot really is happening in RPM and has to stay there.
Because this is much bigger issue with regards to how we deal with threads in Anaconda I am trying to sort everything out on our devel list with the packaging team. Once I have any idea how to fix the situation, I will come back here.
--- Additional comment from Tomas Mraz on 2012-12-14 04:30:03 EST ---
We can and will remove the exit() call but the problem will stay - just less pronounced - the root password will be in such case rejected (not sure if anaconda enforces the rejection or not) as weak due to the missing dictionary.
--- Additional comment from Adam Williamson on 2012-12-14 16:17:29 EST ---
anaconda does handle rejections and let you re-enter, so that would be less bad, though still pretty confusing.
--- Additional comment from Jaroslav Reznik on 2012-12-17 06:24:55 EST ---
Btw. what's the reason to omit cracklib dictionary?
--- Additional comment from Martin Sivák on 2012-12-17 10:55:59 EST ---
The dictionary is there.. it's just that anaconda can't see it while in chroot (cased by RPM).
--- Additional comment from Jaroslav Reznik on 2012-12-18 08:07:38 EST ---
Yep, I forgot the main chroot issue. Sorry. Any progress on this? Cracklib exit fix seems to be the preferred option now, true?
--- Additional comment from Jaroslav Reznik on 2012-12-18 08:14:52 EST ---
This build should be picked up for F18 to workaround crash
"- update to 2.8.22 (#887461), which now returns an error instead of exiting when there's a failure opening the dictionary in FascistCheck()"
--- Additional comment from Adam Williamson on 2012-12-18 15:57:00 EST ---
thanks. anaconda team, with that fix in cracklib, can we clean things up any further on anaconda side?
--- Additional comment from Jaroslav Reznik on 2012-12-19 07:49:10 EST ---
(In reply to comment #29)
> thanks. anaconda team, with that fix in cracklib, can we clean things up any
> further on anaconda side?
I talked to Martin, seems like no further change on Anaconda side is needed.
--- Additional comment from Aleksandar Kostadinov on 2012-12-19 16:04:03 EST ---
might be stupid, but why don't you just move the root pw option to a post/pre install step? Why hsould these happen in parallel?
--- Additional comment from Adam Williamson on 2012-12-19 18:17:52 EST ---
it's late to move spokes around, it's not a simple/safe operation. (it was actually a pre-install step in early f18 builds, then moved to during-install as it was thought a neat way to save time).
--- Additional comment from Adam Williamson on 2012-12-19 18:19:27 EST ---
jreznik: I was going off of https://bugzilla.redhat.com/show_bug.cgi?id=876716#c23 , which says that you'll get a bogus 'bad password' error if the bug happens. Which is nowhere near as bad as a black screen crash, but still pretty confusing. But I guess we can test and see if that's really the case. Marking as ON_QA. Can someone edit the update and mark it as fixing this bug? It helps with our accounting.
--- Additional comment from Adam Williamson on 2012-12-20 01:34:54 EST ---
the cracklib update has gone stable now. I tested a couple of times and couldn't trigger the crash, it does seem to give a dictionary error instead. can others confirm before we close this?
--- Additional comment from Adam Williamson on 2012-12-20 01:35:09 EST ---
oh, test with https://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/
--- Additional comment from Kamil Páral on 2013-01-02 08:39:30 EST ---
No longer crashes. Closing.
We discussed this with Vratislav today. The goal is to have DNF run RPM as a separate process so its chroot() calls do not affect other threads of the client application.
Is this really a blocker for f19 (dnf is still not used by anaconda in F19) or rhel7?
The original bug is reported against yum, because anaconda used yum for F18. I'm not sure why this one is reported against dnf.
Because in the future it will be relevant.
Ales, I just added this bug to our Fedora tracker, so that RTT is aware of it.
(removing fedora19rtt blocker, this bug is definitely not going to block F19 in any way)
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
Since I'm starting to work on a DNF payload for Anaconda I will be fixing this in some way sooner or later.
This can not be corrected on the RPM level, one of the reasons is that the package hooks expect to be chrooted when invoked.
After a brief discussion with the Packaging Tools Team today I am proposing to fix this in DNF: have a configuration option in DNF making it run the RPM transaction itself in a forked process. The main problem then is communicating the transaction progress back to the the original process. We can not support the current callback interface fully, as that would require us to serialize and deserialize/re-build package header objects which is not technically possible at the moment. The answer to this will be a simpler interface, yet with enough information so the client app (Anaconda) can report progress/failures.
(In reply to Ales Kozumplik from comment #8)
> Since I'm starting to work on a DNF payload for Anaconda I will be fixing
> this in some way sooner or later.
Thanks a lot!
> This can not be corrected on the RPM level, one of the reasons is that the
> package hooks expect to be chrooted when invoked.
> After a brief discussion with the Packaging Tools Team today I am proposing
> to fix this in DNF: have a configuration option in DNF making it run the RPM
> transaction itself in a forked process. The main problem then is
> communicating the transaction progress back to the the original process. We
> can not support the current callback interface fully, as that would require
> us to serialize and deserialize/re-build package header objects which is not
> technically possible at the moment. The answer to this will be a simpler
> interface, yet with enough information so the client app (Anaconda) can
> report progress/failures.
I think a simpler interface would be enough for the installer. To get a notion on what we need, please have a look at the script  we use now to run yum in a separate process and the installation method  that processes its output.
We would be really glad to remove these "pieces of magic" from our codebase.
We don't plan to work on this bug now. There will be huge changes across RPM.