Description of Problem: Kernel assertion failure after 20+ hours of
mass-rebuild of SRPMS
Version-Release number of selected component (if applicable):
kernel 2.4.7-5 from RC2 candidate tree (re0828.2/i386)
Steps to Reproduce:
1. Fresh install
2. Mass-srpm rebuild initiated (via TET, if it matters)
3. Another different TET instance was running another test at the same
I'll post a file with the raw oops data next, then ksymoops output, and
pertinent logs that MKJ says Steven is gonna want.
Created attachment 30197 [details]
Oops output (recorded by-hand as best I could)
Created attachment 30198 [details]
ksymoops output from oops-data
Created attachment 30199 [details]
file MKJ wanted attached
That would be /var/log/ksyms.1, which is the proper ksyms log file for
the boot that oopsed.
Is there _anything_ else fs or driver related in the logs (/var/log/messages)?
Is this repeatable?
Is normal writeback journaling mode in use (or have you used any other
non-default ext3 options)?
<mkj> sct: /var/log/messages has nothing useful for #52891 (no ide errors,
nothing ext3 but normal mount messages)
Dunno about repeatablity. :-( I don't *see* anything fs-related or
driver-related in /var/log/messages. Do you want me to put a copy somewhere to
take a look?
Filesystem was mounted with only default values.
Created attachment 30200 [details]
Fully decoded oops trace.
Never mind that last decode, it was assuming an athlon kernel (which I'd been
told) --- turns out that only an i686 kernel matches the symbols. Re-decode
Created attachment 30201 [details]
Correct, i686-based oops decode
Oops, my bad, case of mistaken identity and sufficient short-fall of coffee.
Found a possible cause for this. It involves large symlinks (symlinks longer
than 60 characters), and is most likely to trigger when there is a high metadata
load on the system. Mass rpm rebuilds is hence a likely trigger if there are
packages involved which use symlink trees during a build.
Will be coding a fix today. The underlying cause is subtle but it looks fairly
simple (and safe) to cure.
fix is in 2.4.7-6.5 and later
The fix cures the local reproducer I found for the large-symlink case. If there
are any other routes to the same assert failure then we may need to reopen the
bug, but this looks like the most likely diagnosis for now, and if the diagnosis
is correct then it should now be fixed.