Bug 1088622 - Fedora 20 freezes; Fedora 19 runs correctly
Summary: Fedora 20 freezes; Fedora 19 runs correctly
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: i686
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-16 21:39 UTC by David A. De Graaf
Modified: 2014-06-30 16:58 UTC (History)
38 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-30 16:58:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Console photo of first error messge (2.03 MB, image/jpeg)
2014-04-26 19:54 UTC, David A. De Graaf
no flags Details
Console photo of later error messages (2.31 MB, image/jpeg)
2014-04-26 19:56 UTC, David A. De Graaf
no flags Details

Description David A. De Graaf 2014-04-16 21:39:32 UTC
Description of problem:
Fedora 20 (on one of seven) machines freezes after random interval


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.  Boot F20
2.  Wait a while - a few minutes, a few hours
3.  Observe keyboard, mouse, screen inoperative, but pings still work

Actual results:
F20 reliably freezes.

Expected results:
F19 reliably runs without issue.  F20 should, too.

Additional info:
On one machine Fedora 20 freezes periodically.  (F20 runs fine on six
others.)  When frozen, the keyboard, mouse and screen are completely
dead, I cannot ssh into it from another machine, but I can ping it,
so the TCP stack, but nothing else, works.  Freezing occurs at random
intervals of a few minutes to a couple of hours.

Nothing relevant is recorded in /var/log/messages, at least nothing that
I cao see in the infinite and impenetrable morass of systemd detritus.
Rebooting by pressing Reset is the only possible recovery.

Rebooting into Fedora 19 restores normal operation - it has been
running perfectly for 6+ days.

I haven't a clue what's causing this, but am attributing it to
systemd, since it has given us so many other wonderful new challenges,
and it just "smells" right.

I have randomly tried some potential solutions, none of which worked:

- disabled root's crontab entry that produces clock chime sounds

- disabled xscreensaver

- stopped using vncserver/vncclient to access machine headlessly

- disabled two new systemd services that seem completely useless to me
  - dnf-makecache and dnf-makecache.timer.

- disabled NetworkManager and enabled network using static
  configuration via /etc/sysconfig/network-scripts/ifcfg-eth0
  (as God intended)

- switched from a 1000 b/s ethernet card to a 100 b/s card.

- suspicious that nouveau might be culpable, I removed an ancient nVidia
  card that is no longer supported by any nVidia driver (GeForce 4 MX440)
  and replaced it with a slightly newer one (GeForce 6200 256 MB DDR)
  that is.  The video immediately worked better, even with nouveau,
  but the freezes continued.  Moreover, installing akmod-nvidia-173xx
  did not stop the freezes.

None of these had the slightest effect on stopping the freezes.
Reverting to Fedora 19 fixes it completely.

Now what?
What info, more useful than this, can I provide?

FWIW, this machine is
  AMD Athlon(tm) Processor, 1000 MHz, 32 bits
  MemTotal:        1156756 kB
  /dev/sda: 120.0 GB,   1 partition
  /dev/sdb: 160.0 GB,   6 partitions
  running XFCE, no gdm or other display manager, RHGB removed
  selinux disabled
  fully yum updated

  00:0b.0 Ethernet controller: ADMtek NC100 Network Everywhere Fast
    Ethernet 10/100 (rev 11)
  00:0c.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900
    PCI Fast Ethernet (rev 02)
  01:00.0 VGA compatible controller: NVIDIA Corporation NV44A [GeForce
    6200] (rev a1)

Comment 1 David A. De Graaf 2014-04-22 17:03:42 UTC
I cannot replicate this bug.

After letting F19 run for about 6 days, I switched back to F20, intending to collect a pristine /var/log/messages and post it here after the expected freeze.

It never happened.  F20 has been running perfectly for several days.  I don't know what changed, but I apologize for the noise, and withdraw this BZ.  Sorry.

Comment 2 David A. De Graaf 2014-04-26 19:51:12 UTC
I rescind my apology.   :-)

I can now produce a freeze reliably and present some actual data.
If I induce the freeze *and* keep the attached crt monitor unblanked,
23 sec after the freeze occurs, the system spews error messages to the
"console".  Nothing meaningful ever goes to /var/log/messages or to the
journal because, surprise, the system is frozen.

I have photographed the "console" when it displays the first batch of
messages and again after a few minutes to show that the soft lockup
repeats every 23 seconds - forever, apparently.  The significant
first lines, transcribed from the photo, read:

  [  500.055014] BUG: soft lockup - CPU#0 stuck for 23s! [rsync:2173]
  [  500.055014] CPU: 0 PID: 2173 Comm: rsync Tainted: PF   0
     3.13.9-200.fc20.i686+PAE #1

Apparently rsync interacts badly with the kernel because it is rsync,
exclusively, that triggers the freeze.  In this scenario I have
allowed the machine to boot up in multiuser mode (init 3) but never
started any graphical interface.  The console still waits for me to
login.  On another machine I start  'gkrellm -s datbird &" just to
see when the freeze occurs.  Then on that other machine I start a
backup script that relies on autofs, nfs, ssh, find, and rsync.
All goes well until rsync starts to transfer files - then the freeze.

If I don't run the rsync backup script the machine runs perfectly all
day long.  Unfortunately, it's role in life is to maintain a backup
image of my main server, which gets updated at 11:30PM every night.

Freezes are a unique feature of F20; if I reboot to F19 no freezes
occur.  I see a large number of other BZ's reporting a similar 23s
soft lockup; most don't mention rsync.  However, Bug 1081470 - soft
lockup - CPU#1 stuck for 23s! [rsync:17305] does, and may be related.
A Debian user reports similar behaviour as long ago as 2012-11-30:
  http://forums.debian.net/viewtopic.php?f=10&t=89166

Comment 3 David A. De Graaf 2014-04-26 19:54:42 UTC
Created attachment 890103 [details]
Console photo of first error messge

Comment 4 David A. De Graaf 2014-04-26 19:56:26 UTC
Created attachment 890104 [details]
Console photo of later error messages

Comment 5 David A. De Graaf 2014-06-30 16:03:56 UTC
Apparently this bug has been exterminated.
Thanks to all.

After finding that the freezing was 100% correlated with rsync running in a
backup script, I redirected the backup to another machine.  This stopped the 
freezing.  Recently I restored the backup to the original configuration.
No FREEZE!  Sometime between then and now a kernel update seems to have
fixed the problem.

Good work!

Comment 6 Josh Boyer 2014-06-30 16:58:27 UTC
Thank you for letting us know.


Note You need to log in before you can comment on or make changes to this bug.