Description of problem: Fedora 20 (on one of seven) machines freezes after random interval Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Boot F20 2. Wait a while - a few minutes, a few hours 3. Observe keyboard, mouse, screen inoperative, but pings still work Actual results: F20 reliably freezes. Expected results: F19 reliably runs without issue. F20 should, too. Additional info: On one machine Fedora 20 freezes periodically. (F20 runs fine on six others.) When frozen, the keyboard, mouse and screen are completely dead, I cannot ssh into it from another machine, but I can ping it, so the TCP stack, but nothing else, works. Freezing occurs at random intervals of a few minutes to a couple of hours. Nothing relevant is recorded in /var/log/messages, at least nothing that I cao see in the infinite and impenetrable morass of systemd detritus. Rebooting by pressing Reset is the only possible recovery. Rebooting into Fedora 19 restores normal operation - it has been running perfectly for 6+ days. I haven't a clue what's causing this, but am attributing it to systemd, since it has given us so many other wonderful new challenges, and it just "smells" right. I have randomly tried some potential solutions, none of which worked: - disabled root's crontab entry that produces clock chime sounds - disabled xscreensaver - stopped using vncserver/vncclient to access machine headlessly - disabled two new systemd services that seem completely useless to me - dnf-makecache and dnf-makecache.timer. - disabled NetworkManager and enabled network using static configuration via /etc/sysconfig/network-scripts/ifcfg-eth0 (as God intended) - switched from a 1000 b/s ethernet card to a 100 b/s card. - suspicious that nouveau might be culpable, I removed an ancient nVidia card that is no longer supported by any nVidia driver (GeForce 4 MX440) and replaced it with a slightly newer one (GeForce 6200 256 MB DDR) that is. The video immediately worked better, even with nouveau, but the freezes continued. Moreover, installing akmod-nvidia-173xx did not stop the freezes. None of these had the slightest effect on stopping the freezes. Reverting to Fedora 19 fixes it completely. Now what? What info, more useful than this, can I provide? FWIW, this machine is AMD Athlon(tm) Processor, 1000 MHz, 32 bits MemTotal: 1156756 kB /dev/sda: 120.0 GB, 1 partition /dev/sdb: 160.0 GB, 6 partitions running XFCE, no gdm or other display manager, RHGB removed selinux disabled fully yum updated 00:0b.0 Ethernet controller: ADMtek NC100 Network Everywhere Fast Ethernet 10/100 (rev 11) 00:0c.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 02) 01:00.0 VGA compatible controller: NVIDIA Corporation NV44A [GeForce 6200] (rev a1)
I cannot replicate this bug. After letting F19 run for about 6 days, I switched back to F20, intending to collect a pristine /var/log/messages and post it here after the expected freeze. It never happened. F20 has been running perfectly for several days. I don't know what changed, but I apologize for the noise, and withdraw this BZ. Sorry.
I rescind my apology. :-) I can now produce a freeze reliably and present some actual data. If I induce the freeze *and* keep the attached crt monitor unblanked, 23 sec after the freeze occurs, the system spews error messages to the "console". Nothing meaningful ever goes to /var/log/messages or to the journal because, surprise, the system is frozen. I have photographed the "console" when it displays the first batch of messages and again after a few minutes to show that the soft lockup repeats every 23 seconds - forever, apparently. The significant first lines, transcribed from the photo, read: [ 500.055014] BUG: soft lockup - CPU#0 stuck for 23s! [rsync:2173] [ 500.055014] CPU: 0 PID: 2173 Comm: rsync Tainted: PF 0 3.13.9-200.fc20.i686+PAE #1 Apparently rsync interacts badly with the kernel because it is rsync, exclusively, that triggers the freeze. In this scenario I have allowed the machine to boot up in multiuser mode (init 3) but never started any graphical interface. The console still waits for me to login. On another machine I start 'gkrellm -s datbird &" just to see when the freeze occurs. Then on that other machine I start a backup script that relies on autofs, nfs, ssh, find, and rsync. All goes well until rsync starts to transfer files - then the freeze. If I don't run the rsync backup script the machine runs perfectly all day long. Unfortunately, it's role in life is to maintain a backup image of my main server, which gets updated at 11:30PM every night. Freezes are a unique feature of F20; if I reboot to F19 no freezes occur. I see a large number of other BZ's reporting a similar 23s soft lockup; most don't mention rsync. However, Bug 1081470 - soft lockup - CPU#1 stuck for 23s! [rsync:17305] does, and may be related. A Debian user reports similar behaviour as long ago as 2012-11-30: http://forums.debian.net/viewtopic.php?f=10&t=89166
Created attachment 890103 [details] Console photo of first error messge
Created attachment 890104 [details] Console photo of later error messages
Apparently this bug has been exterminated. Thanks to all. After finding that the freezing was 100% correlated with rsync running in a backup script, I redirected the backup to another machine. This stopped the freezing. Recently I restored the backup to the original configuration. No FREEZE! Sometime between then and now a kernel update seems to have fixed the problem. Good work!
Thank you for letting us know.