Bug 509655

Summary: glibc 2.10.90-2 crashes, system unbootable
Product: [Fedora] Fedora Reporter: Pete Zaitcev <zaitcev>
Component: prelinkAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: b1r63r, jakub, jnovy, nicolas.mailhot, panemade, rjones, schwab, scottt.tw, selinux, sg266, yaneti
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-12 20:39:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 473303    

Description Pete Zaitcev 2009-07-04 15:31:26 UTC
Description of problem:

After an update to 2.10.90-2, all applications crash on launch, so system
became unbootable.

Version-Release number of selected component (if applicable):

glibc-2.10.90-2.x86_64

How reproducible:

Synchronous

Steps to Reproduce:
1. yum update
  
Actual results:

System unbootable, modprobe and bash crash

Expected results:

System working

Additional info:

Previous working release is 2.10.1-2.

Comment 1 Yanko Kaneti 2009-07-04 15:59:05 UTC
I had a similar catastrophe last night, and I am not sure what to attribute it exactly. The comp had crashed while unattended at 3:45. Couldn't boot the same root with any kernel that had previously worked, just hanging after the Swithching root ... messages. 
I built a rawhide minimal root with exactly the same rpms that were in the nonfucntional one.  The binaries and libs were different which I can atrribute to prelink I guess. I just copied over all the /lib64 /bin /usr/bin /usr/lib64 from the minimal root to the broken one. And it booted fine.

I am thinking prelink did something horrible...

Comment 2 Pete Zaitcev 2009-07-04 16:40:19 UTC
Indeed it looks like prelink broke something. The first crash on my
affected system was this:

Jul  4 03:21:20 niphredil kernel: <6>find[5330]: segfault at 3c24b2b574 ip 0000003c24a12ff0 sp 00007fff74898770 error 4 in libselinux.so.1[3c24a00000+1c000]

The cron log is:

Jul  4 03:15:15 niphredil run-parts(/etc/cron.daily)[27646]: starting prelink
Jul  4 03:21:20 niphredil run-parts(/etc/cron.daily)[5333]: finished
Jul  4 03:21:20 niphredil run-parts(/etc/cron.daily)[27646]: starting
Jul  4 03:21:20 niphredil run-parts(/etc/cron.daily)[5339]: finished
Jul  4 03:21:20 niphredil run-parts(/etc/cron.daily)[27646]: starting
Jul  4 03:21:20 niphredil run-parts(/etc/cron.daily)[5345]: finished
Jul  4 03:21:20 niphredil anacron[27617]: Job `cron.daily' terminated (mailing output)

(Of course nothing was mailed, because of glibc blow-up)

Strangely, prelink wasn't updated in a while (prelink-0.4.0-7.fc11
is installed here).

Comment 3 Yanko Kaneti 2009-07-04 16:44:30 UTC
I just replayed the scenario
- install a minimal f11 x86_64 (cuz rawhide anaconda is broken)
- update to rawhide, including the new glibc  (excluding the kernel, the 31 rc-s are kind broken for me)
- reboot   everything is still working fine
- install prelink
- run prelink -va    and everything goes down the drain..
rebooting has the same effect as the one I found this morning

Comment 4 Pete Zaitcev 2009-07-04 16:58:39 UTC
Jakub e-mailed to see bug 509575. We probably want to close this as a dup.

Comment 5 Pete Zaitcev 2009-07-04 16:59:47 UTC
Wait, I take this back, not a dup. The bug 509575 is a build-stopper
for prelink.

Comment 6 Saikat Guha 2009-07-05 21:07:13 UTC
For anyone else burned by this one: boot into LiveCD, mount the root partition, and run prelink -ua * in the mount point. (chroot didn't work for me; may need to set LD_LIBRARY_PATH so dependencies are found)

Comment 7 Tom London 2009-07-05 23:53:22 UTC
So with prelink-0.4.1-1.fc12 (in koji), is it "safe" to update both glibc (and prelink) packages?

Comment 8 birger 2009-07-06 08:57:04 UTC
I tried to first run prelink -au, then yum update.
After the update the system was hosed, but I was still able to run prelink -au again from the same prompt that I used to upgrade. That was enough to fix it.

No, I have not tried to reboot just yet... ;-)

Comment 9 Yanko Kaneti 2009-07-06 10:23:56 UTC
(In reply to comment #7)
> So with prelink-0.4.1-1.fc12 (in koji), is it "safe" to update both glibc (and
> prelink) packages?  

Tried the new prelink with glibc-2.10.90-2.x86_64 root (prelink -a) and everyting works ok so far.

Comment 10 Pete Zaitcev 2009-07-06 16:54:36 UTC
OK, I'm closing it then.

Comment 11 Parag AN(पराग) 2009-07-07 11:14:32 UTC
So what steps should I follow? I cannot chroot /mnt/sysimage

Comment 12 Nicolas Mailhot 2009-07-09 07:48:58 UTC
So, there was another glibc update yesterday (glibc-2.10.90-3.x86_64.rpm) and this morning the system is dead as a brick

*AGAIN*

This is totally un-funny. At this rate there won't be a single rawhide user by the end of the month. The glibc push process needs to be seriously revisited.

Comment 13 Jakub Jelinek 2009-07-09 08:39:20 UTC
It is again not glibc's fault, apparently there is a bug in prelink new code handling R_*_IRELATIVE relocations in binaries that are reprelinked (for that you need to prelink with one glibc, then prelink with another glibc without unprelinking in between).

As a workaround:
export LD_PRELOAD=libSegFault.so
should bring your system to a working state, or /usr/sbin/prelink -ua.

Comment 14 Jakub Jelinek 2009-07-09 09:21:08 UTC
Should be fixed in prelink-0.4.2-1.fc12.

Comment 15 Parag AN(पराग) 2009-07-09 10:11:38 UTC
(In reply to comment #14)
> Should be fixed in prelink-0.4.2-1.fc12.  

This is really not a good thing to rawhide users. I was forced to move back from rawhide to F-11 because of this bug as no one cares to help me to recover my rawhide system.

Comment 16 Tom London 2009-07-09 13:34:07 UTC
After updating to prelink-0.4.2-1.fc12, any need to run '/usr/sbin/prelink -ua'?  Or will it "just work"?

Comment 17 Jakub Jelinek 2009-07-09 13:43:27 UTC
After prelink upgrade next /etc/cron.daily/prelink job will be a forced reprelink of everything.  I believe it should fix up even the crippled binaries, at least in my test chroot running on unprelinked environment:
/usr/sbin/prelink.0.4.1-1 -avmRf
stuff still works (expectedly, 0.4.1-1 has problems reprelinking binaries with R_*_IRELATIVE in .gnu.conflict section)
/usr/sbin/prelink.0.4.1-1 -avmRf
(now after reprelinking everything segfaults).
/usr/sbin/prelink.0.4.2-1 -avmRf
(now everything works again).
That said, /usr/sbin/prelink -ua certainly can't hurt if you ever hit prelink issues (either had glibc-2.10.90-* installed with prelink-0.4.0-* or earlier, or
reprelinked something with prelink-0.4.1-1 while glibc-2.10.90-* was installed even during the first prelink).
For those who have rebooted, just booting a rescue and doing chroot /mnt/sysimage /usr/sbin/prelink -ua
should help.

Comment 18 Pete Zaitcev 2009-07-09 14:08:12 UTC
Gosh, the fist thing I did was rpm -e prelink, and voila! no more issues.
The load times of Firefox are dominated by startup times of sqlite anyway.
We Rawhide users seen worse. I do appreciate this fixed quickly though.

Comment 19 Nicolas Mailhot 2009-07-09 17:11:50 UTC
(In reply to comment #17)

> For those who have rebooted, just booting a rescue and doing chroot
> /mnt/sysimage /usr/sbin/prelink -ua
> should help. 

Except as others noted this problem kills the rescue process - chroot will just fail if the target system is as thoroughly hosed as occurs thanks to those bugs

Comment 20 Jakub Jelinek 2009-07-09 20:38:54 UTC
chroot /mnt/sysimage
is expected not to work, that is executing /bin/sh in the chroot, but
chroot /mnt/sysimage /usr/sbin/prelink -ua
should work (prelink is statically linked).

Comment 21 Parag AN(पराग) 2009-07-10 04:27:45 UTC
This rescue process killer problem and its fix should be documented somewhere on fedorawiki. This is really not good to force rawhide users to move back to stable release. I am using since more than 2 years rawhide and first time I moved back. Also, I see jakub is owner for both packages gcc and prelink so even if this is prelink bug, jakub should have tested new prelink before building in rawhide or should have untagged from koji.
I saw many packages breaking deps/stopping system to boot but resuce cd is always working fix I saw. But this is really killer problem.

Comment 22 Richard W.M. Jones 2009-07-30 07:38:45 UTC
This happened to me with a glibc upgrade last night - can we *please*
remove the broken packages from Rawhide and any mirrors so no one
else suffers this.

Comment 23 Jakub Jelinek 2009-07-30 07:53:05 UTC
Broken packages aren't in rawhide for weeks.

Comment 24 Jindrich Novy 2009-08-20 10:58:39 UTC
Hapenned to me as well when trying to upgrade to rawhide glibc from f11 to test some rawhide packages linked against new glibc. Needed to reinstall the whole system from scratch because I was not aware of this bug.

Could someone add Conflicts: prelink < 0.4.2 to glibc so that it won't happen to anyone else?