Bug 52891 - kernel 2.4.7-5 ext3 journaling assertion
kernel 2.4.7-5 ext3 journaling assertion
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.3
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Tweedie
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-08-30 11:51 EDT by Jay Turner
Modified: 2015-01-07 18:51 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-09-04 13:10:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Oops output (recorded by-hand as best I could) (1015 bytes, text/plain)
2001-08-30 12:13 EDT, Glen Foster
no flags Details
ksymoops output from oops-data (8.17 KB, text/plain)
2001-08-30 12:19 EDT, Glen Foster
no flags Details
file MKJ wanted attached (56.31 KB, text/plain)
2001-08-30 12:38 EDT, Glen Foster
no flags Details
Fully decoded oops trace. (3.03 KB, text/plain)
2001-08-30 13:20 EDT, Stephen Tweedie
no flags Details
Correct, i686-based oops decode (2.89 KB, text/plain)
2001-08-30 13:32 EDT, Stephen Tweedie
no flags Details

  None (edit)
Description Glen Foster 2001-08-30 11:51:28 EDT
Description of Problem:  Kernel assertion failure after 20+ hours of
mass-rebuild of SRPMS

Version-Release number of selected component (if applicable):
kernel 2.4.7-5 from RC2 candidate tree (re0828.2/i386)

How Reproducible:
Don't know

Steps to Reproduce:
1. Fresh install
2. Mass-srpm rebuild initiated (via TET, if it matters)
3. Another different TET instance was running another test at the same
time.

Actual Results:
I'll post a file with the raw oops data next, then ksymoops output, and
pertinent logs that MKJ says Steven is gonna want.

Expected Results:


Additional Information:
Comment 1 Glen Foster 2001-08-30 12:13:38 EDT
Created attachment 30197 [details]
Oops output (recorded by-hand as best I could)
Comment 2 Glen Foster 2001-08-30 12:19:10 EDT
Created attachment 30198 [details]
ksymoops output from oops-data
Comment 3 Glen Foster 2001-08-30 12:38:01 EDT
Created attachment 30199 [details]
file MKJ wanted attached
Comment 4 Michael K. Johnson 2001-08-30 12:39:13 EDT
That would be /var/log/ksyms.1, which is the proper ksyms log file for
the boot that oopsed.
Comment 5 Stephen Tweedie 2001-08-30 12:59:01 EDT
Is there _anything_ else fs or driver related in the logs (/var/log/messages)? 
Is this repeatable?
Comment 6 Stephen Tweedie 2001-08-30 13:02:17 EDT
Is normal writeback journaling mode in use (or have you used any other
non-default ext3 options)?
Comment 7 Michael K. Johnson 2001-08-30 13:02:43 EDT
<mkj> sct: /var/log/messages has nothing useful for #52891 (no ide errors,
nothing ext3 but normal mount messages)
Comment 8 Glen Foster 2001-08-30 13:07:17 EDT
Dunno about repeatablity. :-(  I don't *see* anything fs-related or
driver-related in /var/log/messages.  Do you want me to put a copy somewhere to
take a look?
Comment 9 Michael K. Johnson 2001-08-30 13:16:22 EDT
Filesystem was mounted with only default values.
Comment 10 Stephen Tweedie 2001-08-30 13:20:24 EDT
Created attachment 30200 [details]
Fully decoded oops trace.
Comment 11 Stephen Tweedie 2001-08-30 13:31:16 EDT
Never mind that last decode, it was assuming an athlon kernel (which I'd been
told) --- turns out that only an i686 kernel matches the symbols.  Re-decode
coming up.
Comment 12 Stephen Tweedie 2001-08-30 13:32:29 EDT
Created attachment 30201 [details]
Correct, i686-based oops decode
Comment 13 Glen Foster 2001-08-30 13:43:20 EDT
Oops, my bad, case of mistaken identity and sufficient short-fall of coffee.
Comment 14 Stephen Tweedie 2001-09-04 06:26:08 EDT
Found a possible cause for this.  It involves large symlinks (symlinks longer
than 60 characters), and is most likely to trigger when there is a high metadata
load on the system.  Mass rpm rebuilds is hence a likely trigger if there are
packages involved which use symlink trees during a build.

Will be coding a fix today.  The underlying cause is subtle but it looks fairly
simple (and safe) to cure.
Comment 15 Arjan van de Ven 2001-09-04 13:10:11 EDT
fix is in 2.4.7-6.5 and later
Comment 16 Stephen Tweedie 2001-09-04 17:35:15 EDT
The fix cures the local reproducer I found for the large-symlink case.  If there
are any other routes to the same assert failure then we may need to reopen the
bug, but this looks like the most likely diagnosis for now, and if the diagnosis
is correct then it should now be fixed.

Note You need to log in before you can comment on or make changes to this bug.