One of my dual quad-core x86_64 servers is reproducibly breaking under load under every 2.6.27 kernel released for F9 so far. The last kernel this server runs reliably is kernel-2.6.26.6-79.fc9.x86_64. This keeps me from upgrading to F10. Symptoms: The kernel boots normally and runs fine as long as it's not under load. After several minutes of a heavy compile-link duty cycle, processes start to hang. The box responds to pings, but stops accepting any more ssh connections. If a login session is active on the system console, bash reissues a prompt in response to empty commands, but attempting to run any command hangs the console, and makes it unresponsive. The load is disk I/O load -- this box has 4GB of RAM, and there's usually a couple of gigs free when it hangs. The SATA chipset is sata_nv.ko; a pair of SATA disks in a RAID-1 configuration. On one occasion the system ended up in a partially-frozen state, I was able to run 'ps' and see a lot of gcc processes stuck in "wait" state. I managed to set up kexec, and after triggering this reproducible hang, succesfully alt-sysrq-c a 4GB kdump. This is what crash's "ps -u" comes back with: ... 11023 8863 3 ffff88010f845b80 IN 0.1 87672 2464 sh 11138 11002 2 ffff8801190cadc0 UN 0.0 3924 516 gcc 11182 8863 6 ffff880116db96e0 IN 0.1 87672 2460 sh 11208 8863 4 ffff88010f920000 IN 0.1 87668 2436 sh 11262 10652 2 ffff880073ce8000 UN 0.0 3924 512 gcc 11268 11023 2 ffff880073ce2dc0 UN 0.0 3924 508 gcc 11287 10668 2 ffff8800708f96e0 UN 0.0 3924 508 gcc 11330 11182 4 ffff88010f9c0000 IN 0.0 3924 480 gcc 11333 11330 4 ffff88010f46db80 IN 0.0 4100 600 gcc 11336 11333 1 ffff88010f468000 RU 0.0 18028 1532 cc1 11347 11208 1 ffff88010f8416e0 RU 0.0 87668 1436 sh ... Looks like a whole bunch of processes in TASKS_UNINTERRUPTIBLE. 14058 3262 2 ffff8800708fadc0 UN 0.0 128956 1472 crond crond is nailed too :-) I can make the 4GB vmcore (from 2.6.27.7-53.fc9.x86_64) available for download somewhere (my upstream bandwidth is 1mb/s), or I can run anything else in crash, and respond with the results. I don't really know much about kernel debugging -- just enough to run crash and type commands. I can also generate more vmcore dumps, if this one has nothing useful.
This bug can no longer be reproduced in 2.6.27.12, presumably fixed.