Description of Problem: Kernel assertion failure after 20+ hours of mass-rebuild of SRPMS Version-Release number of selected component (if applicable): kernel 2.4.7-5 from RC2 candidate tree (re0828.2/i386) How Reproducible: Don't know Steps to Reproduce: 1. Fresh install 2. Mass-srpm rebuild initiated (via TET, if it matters) 3. Another different TET instance was running another test at the same time. Actual Results: I'll post a file with the raw oops data next, then ksymoops output, and pertinent logs that MKJ says Steven is gonna want. Expected Results: Additional Information:
Created attachment 30197 [details] Oops output (recorded by-hand as best I could)
Created attachment 30198 [details] ksymoops output from oops-data
Created attachment 30199 [details] file MKJ wanted attached
That would be /var/log/ksyms.1, which is the proper ksyms log file for the boot that oopsed.
Is there _anything_ else fs or driver related in the logs (/var/log/messages)? Is this repeatable?
Is normal writeback journaling mode in use (or have you used any other non-default ext3 options)?
<mkj> sct: /var/log/messages has nothing useful for #52891 (no ide errors, nothing ext3 but normal mount messages)
Dunno about repeatablity. :-( I don't *see* anything fs-related or driver-related in /var/log/messages. Do you want me to put a copy somewhere to take a look?
Filesystem was mounted with only default values.
Created attachment 30200 [details] Fully decoded oops trace.
Never mind that last decode, it was assuming an athlon kernel (which I'd been told) --- turns out that only an i686 kernel matches the symbols. Re-decode coming up.
Created attachment 30201 [details] Correct, i686-based oops decode
Oops, my bad, case of mistaken identity and sufficient short-fall of coffee.
Found a possible cause for this. It involves large symlinks (symlinks longer than 60 characters), and is most likely to trigger when there is a high metadata load on the system. Mass rpm rebuilds is hence a likely trigger if there are packages involved which use symlink trees during a build. Will be coding a fix today. The underlying cause is subtle but it looks fairly simple (and safe) to cure.
fix is in 2.4.7-6.5 and later
The fix cures the local reproducer I found for the large-symlink case. If there are any other routes to the same assert failure then we may need to reopen the bug, but this looks like the most likely diagnosis for now, and if the diagnosis is correct then it should now be fixed.