Bug 52891 - kernel 2.4.7-5 ext3 journaling assertion
Summary: kernel 2.4.7-5 ext3 journaling assertion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-08-30 15:51 UTC by Jay Turner
Modified: 2015-01-07 23:51 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-09-04 17:10:17 UTC
Embargoed:


Attachments (Terms of Use)
Oops output (recorded by-hand as best I could) (1015 bytes, text/plain)
2001-08-30 16:13 UTC, Glen Foster
no flags Details
ksymoops output from oops-data (8.17 KB, text/plain)
2001-08-30 16:19 UTC, Glen Foster
no flags Details
file MKJ wanted attached (56.31 KB, text/plain)
2001-08-30 16:38 UTC, Glen Foster
no flags Details
Fully decoded oops trace. (3.03 KB, text/plain)
2001-08-30 17:20 UTC, Stephen Tweedie
no flags Details
Correct, i686-based oops decode (2.89 KB, text/plain)
2001-08-30 17:32 UTC, Stephen Tweedie
no flags Details

Description Glen Foster 2001-08-30 15:51:28 UTC
Description of Problem:  Kernel assertion failure after 20+ hours of
mass-rebuild of SRPMS

Version-Release number of selected component (if applicable):
kernel 2.4.7-5 from RC2 candidate tree (re0828.2/i386)

How Reproducible:
Don't know

Steps to Reproduce:
1. Fresh install
2. Mass-srpm rebuild initiated (via TET, if it matters)
3. Another different TET instance was running another test at the same
time.

Actual Results:
I'll post a file with the raw oops data next, then ksymoops output, and
pertinent logs that MKJ says Steven is gonna want.

Expected Results:


Additional Information:

Comment 1 Glen Foster 2001-08-30 16:13:38 UTC
Created attachment 30197 [details]
Oops output (recorded by-hand as best I could)

Comment 2 Glen Foster 2001-08-30 16:19:10 UTC
Created attachment 30198 [details]
ksymoops output from oops-data

Comment 3 Glen Foster 2001-08-30 16:38:01 UTC
Created attachment 30199 [details]
file MKJ wanted attached

Comment 4 Michael K. Johnson 2001-08-30 16:39:13 UTC
That would be /var/log/ksyms.1, which is the proper ksyms log file for
the boot that oopsed.

Comment 5 Stephen Tweedie 2001-08-30 16:59:01 UTC
Is there _anything_ else fs or driver related in the logs (/var/log/messages)? 
Is this repeatable?

Comment 6 Stephen Tweedie 2001-08-30 17:02:17 UTC
Is normal writeback journaling mode in use (or have you used any other
non-default ext3 options)?

Comment 7 Michael K. Johnson 2001-08-30 17:02:43 UTC
<mkj> sct: /var/log/messages has nothing useful for #52891 (no ide errors,
nothing ext3 but normal mount messages)

Comment 8 Glen Foster 2001-08-30 17:07:17 UTC
Dunno about repeatablity. :-(  I don't *see* anything fs-related or
driver-related in /var/log/messages.  Do you want me to put a copy somewhere to
take a look?

Comment 9 Michael K. Johnson 2001-08-30 17:16:22 UTC
Filesystem was mounted with only default values.


Comment 10 Stephen Tweedie 2001-08-30 17:20:24 UTC
Created attachment 30200 [details]
Fully decoded oops trace.

Comment 11 Stephen Tweedie 2001-08-30 17:31:16 UTC
Never mind that last decode, it was assuming an athlon kernel (which I'd been
told) --- turns out that only an i686 kernel matches the symbols.  Re-decode
coming up.

Comment 12 Stephen Tweedie 2001-08-30 17:32:29 UTC
Created attachment 30201 [details]
Correct, i686-based oops decode

Comment 13 Glen Foster 2001-08-30 17:43:20 UTC
Oops, my bad, case of mistaken identity and sufficient short-fall of coffee.

Comment 14 Stephen Tweedie 2001-09-04 10:26:08 UTC
Found a possible cause for this.  It involves large symlinks (symlinks longer
than 60 characters), and is most likely to trigger when there is a high metadata
load on the system.  Mass rpm rebuilds is hence a likely trigger if there are
packages involved which use symlink trees during a build.

Will be coding a fix today.  The underlying cause is subtle but it looks fairly
simple (and safe) to cure.

Comment 15 Arjan van de Ven 2001-09-04 17:10:11 UTC
fix is in 2.4.7-6.5 and later

Comment 16 Stephen Tweedie 2001-09-04 21:35:15 UTC
The fix cures the local reproducer I found for the large-symlink case.  If there
are any other routes to the same assert failure then we may need to reopen the
bug, but this looks like the most likely diagnosis for now, and if the diagnosis
is correct then it should now be fixed.


Note You need to log in before you can comment on or make changes to this bug.