From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Description of problem: I'm experiencing kernel lockups with the errata kernels, on 2 boxes that were previously stable running 2.4.18-10. The kernel seems to deadlock waiting on some IDE event, while the IDE chassis light is stuck in the "solid on" mode. The rest of the system is ok (an existing shell prompt still works, that is until it needs to touch the disk). The only similarities that I can see between the two boxes are: 1. Both are running software raid (one is raid5 over 3ware pseudo-scsi disks, the other one raid1 over ide pdc202xx). Probably unrelated, though. 2. Both have IDE system disks, although that's not a distinctive feature within my server farm, by any means. 3. Both run IO-intensive jobs at time, but again this is not a distinctive feature. One of the machines (the most deadlock-happy of the two) is a Dell PIII/450, piix4 system disk, running the i686 kernel. The other one is a dual Athlon MP/1500+, amd760MP system disk, running the athlon-smp kernel. I'll attach the sysrq-t (tasks) output from the time when the machines were deadlocked. This has happened at least 5-6 times over the last month. Also, I don't see anything in the 2.4.18-19 kernel that would address this issue. Lastly, it's not reproducible at will. Version-Release number of selected component (if applicable): How reproducible: Couldn't Reproduce Additional info:
Created attachment 88945 [details] sysrq output on the PIII/450, first incident
Created attachment 88946 [details] sysrq output on the PIII/450, second incident
Created attachment 88947 [details] sysrq output on the PIII/450, third incident As a side note, sysrq-r (Show Regs) threw the kernel into an infinite loop of printing out the stack trace. I had to push the reset button (a few hours later, on Christmas eve, after driving in...) because sysrq had become unresponsive under the printk flood.
Created attachment 88948 [details] sysrq output on the dual athlon
One of the errata kernel releases seems to have fixed this, probably when it moved to 2.4.21pre. Anyway.. RH73 is EOL and I can't reproduce this anymore, there is no point in leaving this report open so I'm closing it with an ERRATA resolution.