Bug 57570 - ext3/unmap kernel panic during glibc upgrade
Summary: ext3/unmap kernel panic during glibc upgrade
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-12-16 07:12 UTC by Alexandre Oliva
Modified: 2008-08-01 16:22 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:39:19 UTC
Embargoed:


Attachments (Terms of Use)

Description Alexandre Oliva 2001-12-16 07:12:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.6) Gecko/20011120

Description of problem:
Out of 14 x86 boxes on the lab I run at the university, only one failed to
upgrade glibc because the rpm process died with a segmentation fault and
the machine became unusable from then on: ssh would no longer work, and
switching to vt1 wouldn't give me a new login prompt after accepting Enter.
 It would still accept and echo input, but not do anything with it. 
Eventually, I reset the machine (C-A-Del wouldn't even start a reboot, even
though it was not disabled).  All of the machines were running the same
kernel (2.4.9-13).  There are 5 other machines, identical to this one in
terms of hardware and almost identical in terms of installed software, on
which the glibc upgrade was seamless, but read on.

Since a similar problem had happened to me a while ago, on another machine,
now known to be defective, and the previous glibc had already been deleted
from that one, I thought rebooting and trying again was pointless.  So I
booted the sysadmin CD-ROM for Red Hat Linux 7.2 (the one that ships with
the European edition; I'm not sure about the kernel version that it runs),
rebuilt the rpm database and tried again.  No luck.  I got a register dump
from the kernel on the console, and the system became unusable again.  I
repeated the process one or two times again, with identical results.

So I booted the machine with the 2.4.9-13debug kernel, in single-user mode,
and tried again.  Same problem, but now I got a longer message in the
console. Like previous crashes, it was an assertion failure within ext3
code, related with unmap, but I didn't take note of the message or the
stack trace, so sure I was that the problem was reproducible and that the
message had made it to /var/log/messages.

Suspecting some filesystem inconsistency (since it was the ext3 code that
was crashing), I rebooted the same debug kernel, in single-user mode, and
requested fsck on boot.  No inconsistencies were encountered by fsck.  Hmm,
too bad, that was not it.

Then, I tried to update glibc and glibc-common again.  I'm not sure whether
it already worked this time, or whether I still got one more crash and then
it worked on the next try.  But the fact is that it eventually worked, and
I'm not sure the fsck had anything to do with it.

Unfortunately, when I looked for the message in /var/log/messages, it
wasn't there, which makes a lot of sense now: after the ext3 error message,
no updates were written to the fs any more.  The only thing I remember was
that there were a lot of ext3_-prefixed functions in the stack trace, and
that kdb didn't kick in: there was a segmentation fault and another stack
trace in its stead.  And now that I have sucessfully updated glibc, I can't
reproduce the problem any longer :-(  

/var/log/messages has records of only one boot today, probably because the
other reboots' records werelost on the hard-resets that preceded them, or
nothing was logged during the single-user boots :-(

I've looked for ASSERTs in the kernel sources, looking for a message that
resembled the one I saw, and only one of them seemed familiar, but I can't
swear that was the one I saw after the many crashes.  It's from ext3/balloc.c:
!ext3_test_bit(j, bh2jh(bh)->b_committed_data)

I'm pretty sure the assert message started with `!ext3'.  I remember the
complete message contained `unmap' too, but not in the assert message. 
Perhaps in the stack trace or right before?  I don't remember :-(

Sorry that I didn't keep the stack trace nor the additional debugging
messages :-(  Hope this helps, even if a little bit.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.rpm -U glibc-2.2.4-19.3.i686.rpm glibc-common-2.2.4-19.3.i386.rpm


Actual Results:  rpm crashes with a segmentation fault.  strace revealed a
lot of munmapping right before the crash.

Expected Results:  A seamless glibc upgrade.

Additional info:

The machine is a PIII 800MHz with 128MB of memory, running Red Hat Linux
7.2, previously with all errata published before glibc-2.2.4-19.3.  Today's
upgrades were glibc-2.2.4-19.3 (as well as -common, -profile, -devel and
nscd) and jadetex-3.11-4.  It's got a single 20GB disk, partitioned as
suggested by the Red Hat Linux 7.2 installer, for a custom installation of
all packages.

This might be a symptom of defective hardware, but the user of this machine
has never complained to me about any instability.  It may also be the
result of some filesystem inconsistency, but then, this would be a bug in
the ext3 code too.  I'll keep my eyes open.

Comment 1 R.K.Aa. 2001-12-19 05:38:44 UTC
P3/500, ASUS P2B-F, RH7.1, ext2, rest of partitions ext3, all Seagate IDE disks.

The upgrade to the new glibc errate ruined my system. Could not ls, su, anything
afterwards, everything coredumped.

Rebooting failed - spawning processes to fast at each runlevel and running out
of them so hard it wasn't even possible to ctrl+alt+del.

It was also impossible to boot into linux single. The whole installation was
effectively trashed, i had to reinstall from scratch on a new disk. 
The files were downloaded from a 7.1 update mirror and upgraded manually.
Signatures were OK.

A following upgrade of all errata via up2date then went fine on a pure ext2
system, apart from up2date refusing to upgrade "filesystem" because it was "read
only" - it seemed it was looking for it on a CD??

The old disks are mountable and AOK after reinstall on another disk, but the
installation is broken, i can't boot from them. Nothing to see in the old logs.

Comment 2 Bugzilla owner 2004-09-30 15:39:19 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.