Bug 832583 - kernel-3.4.0-1.fc17.x86_64 keyboard and network lockups
kernel-3.4.0-1.fc17.x86_64 keyboard and network lockups
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
18
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-15 16:26 EDT by Wolfgang Denk
Modified: 2013-03-13 12:59 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-13 12:59:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Boot log and lspci output of system 1 (16.51 KB, application/x-gzip)
2012-06-16 03:47 EDT, Wolfgang Denk
no flags Details
Boot log and lspci output of system 2 (15.96 KB, application/x-gzip)
2012-06-16 03:48 EDT, Wolfgang Denk
no flags Details
Boot log with 3.4.4-3 kernel (19.10 KB, application/octet-stream)
2012-07-03 02:58 EDT, Wolfgang Denk
no flags Details
Cut of the relevant part of messages (5.62 KB, application/octet-stream)
2012-07-25 11:57 EDT, Mauricio Silveira
no flags Details
My LSPCI (2.05 KB, application/octet-stream)
2012-07-25 11:58 EDT, Mauricio Silveira
no flags Details

  None (edit)
Description Wolfgang Denk 2012-06-15 16:26:50 EDT
Description of problem:

With the 3.3.4-5 kernel, I see similar problems occurring on a number
of systems.

Problem 1: PS/2 keyboard suddenly goes dead.  I found no way to revive
	it other than reboot.  When plugging in a USB keyboard I can
	continue to use this, but I also had a case where the USB
	keuyboard got stuck and didn't get unstuck even on disconnect
	/ reconnect.  There are absolutely no related messages in the
	system logs or on the console.

Problem 2: The Ethenret interface goes dead for a number of seconds,
	then comes up again, reporting a "link up" event.  In the mean
	time, mounted NFS file systems report errors.  Example log:

[78495.152019] nfs: server castor not responding, timed out
[78497.960032] nfs: server castor not responding, timed out
[78498.725161] r8169 0000:04:00.0: p20p1: link up
[78504.536033] nfs: server castor not responding, timed out
[78508.744052] nfs: server castor not responding, timed out
[78510.725151] r8169 0000:04:00.0: p20p1: link up
[81541.184081] nfs: server castor not responding, timed out
[81543.992096] nfs: server castor not responding, timed out
[81546.725160] r8169 0000:04:00.0: p20p1: link up
[81800.840031] nfs: server castor not responding, timed out
[81804.725167] r8169 0000:04:00.0: p20p1: link up
[81805.048042] nfs: server castor not responding, timed out
[81809.256033] nfs: server castor not responding, timed out
[81816.725156] r8169 0000:04:00.0: p20p1: link up
[81818.128045] nfs: server castor not responding, timed out
[83411.928031] nfs: server castor not responding, timed out
[83412.725167] r8169 0000:04:00.0: p20p1: link up
[86016.048088] nfs: server castor not responding, timed out
[86018.856031] nfs: server castor not responding, timed out
[86021.664037] nfs: server castor not responding, timed out
[86022.725159] r8169 0000:04:00.0: p20p1: link up
[88090.984022] nfs: server castor not responding, timed out
[88093.792022] nfs: server castor not responding, timed out
[88096.600023] nfs: server castor not responding, timed out
[88098.725149] r8169 0000:04:00.0: p20p1: link up
[88700.192021] nfs: server castor not responding, timed out
[88703.000032] nfs: server castor not responding, timed out
[88704.725162] r8169 0000:04:00.0: p20p1: link up
[90249.192026] nfs: server castor not responding, timed out
[90252.725154] r8169 0000:04:00.0: p20p1: link up
[90513.960089] nfs: server castor not responding, timed out
[90516.725161] r8169 0000:04:00.0: p20p1: link up
[90516.768031] nfs: server castor not responding, timed out
[90519.576032] nfs: server castor not responding, timed out
[90522.384031] nfs: server castor not responding, timed out
[90525.192047] nfs: server castor not responding, timed out
[90528.725155] r8169 0000:04:00.0: p20p1: link up
[90773.960022] nfs: server castor not responding, timed out
[90776.768019] nfs: server castor not responding, timed out
[90779.576020] nfs: server castor not responding, timed out
[90780.725149] r8169 0000:04:00.0: p20p1: link up
[91276.168046] nfs: server castor not responding, timed out
[91278.725164] r8169 0000:04:00.0: p20p1: link up
[91281.532036] nfs: server castor not responding, timed out
[91472.392032] nfs: server castor not responding, timed out
[91475.200033] nfs: server castor not responding, timed out
[91476.725162] r8169 0000:04:00.0: p20p1: link up
[92291.184030] nfs: server castor not responding, timed out
[92292.725163] r8169 0000:04:00.0: p20p1: link up
[92470.312075] nfs: server castor not responding, timed out
[92472.724154] r8169 0000:04:00.0: p20p1: link up
[92818.032017] nfs: server castor not responding, timed out
[92819.436021] nfs: server castor not responding, timed out
[92820.725151] r8169 0000:04:00.0: p20p1: link up
[93959.344023] nfs: server castor not responding, timed out
[93962.152048] nfs: server castor not responding, timed out
[93964.960071] nfs: server castor not responding, timed out
[93966.725160] r8169 0000:04:00.0: p20p1: link up
[94422.208030] nfs: server castor not responding, timed out
[94422.725173] r8169 0000:04:00.0: p20p1: link up
[94424.136022] nfs: server castor not responding, timed out
[94519.592028] nfs: server castor not responding, timed out
[94522.400032] nfs: server castor not responding, timed out
[94524.725163] r8169 0000:04:00.0: p20p1: link up
[95066.232032] nfs: server castor not responding, timed out
[95069.040038] nfs: server castor not responding, timed out
[95070.725177] r8169 0000:04:00.0: p20p1: link up
[95071.848029] nfs: server castor not responding, timed out
[95074.656026] nfs: server castor not responding, timed out
[95077.464043] nfs: server castor not responding, timed out
[95082.725156] r8169 0000:04:00.0: p20p1: link up
[96816.056052] nfs: server castor not responding, timed out
[96818.864046] nfs: server castor not responding, timed out
[96821.672037] nfs: server castor not responding, timed out
[96822.725161] r8169 0000:04:00.0: p20p1: link up
[96824.480035] nfs: server castor not responding, timed out
[96827.288044] nfs: server castor not responding, timed out
[96834.072028] nfs: server castor not responding, timed out
[96834.725156] r8169 0000:04:00.0: p20p1: link up
[106594.288031] nfs: server castor not responding, timed out
[106596.725162] r8169 0000:04:00.0: p20p1: link up
[106716.072058] nfs: server castor not responding, timed out
[106718.880032] nfs: server castor not responding, timed out
[106721.688034] nfs: server castor not responding, timed out
[106722.725160] r8169 0000:04:00.0: p20p1: link up
[112116.725157] r8169 0000:04:00.0: p20p1: link up
[112140.200022] nfs: server castor not responding, timed out
[112143.008030] nfs: server castor not responding, timed out
[112145.816031] nfs: server castor not responding, timed out
[112146.725163] r8169 0000:04:00.0: p20p1: link up
[112148.624031] nfs: server castor not responding, timed out
[112151.432034] nfs: server castor not responding, timed out
[112158.232032] nfs: server castor not responding, timed out
[112158.725154] r8169 0000:04:00.0: p20p1: link up
[117507.472038] nfs: server castor not responding, timed out
[117510.725153] r8169 0000:04:00.0: p20p1: link up

etc.

On another system:

Version-Release number of selected component (if applicable):

kernel-3.3.4-5.fc17.x86_64

How reproducible:

Th keyboard problem happened once each on 2 systems, and 5 times in 2
days so far on a third one.

The network issue is more or less permanent - see log above.

Steps to Reproduce:
1. boot a system with  kernel-3.3.4-5.fc17.x86_64 and use it for a while
  
Actual results:

See logs above.

Expected results:

No problems :-)

Additional info:

It appears all problems go away when I downgrade to kernel version
3.3.4-5.fc17.x86_64
Comment 1 Josh Boyer 2012-06-15 16:56:11 EDT
(In reply to comment #0)
> Description of problem:
> 
> With the 3.3.4-5 kernel, I see similar problems occurring on a number
> of systems.


<snip>

> On another system:
> 
> Version-Release number of selected component (if applicable):
> 
> kernel-3.3.4-5.fc17.x86_64

<snip>

> Steps to Reproduce:
> 1. boot a system with  kernel-3.3.4-5.fc17.x86_64 and use it for a while

<snip>

> Additional info:
> 
> It appears all problems go away when I downgrade to kernel version
> 3.3.4-5.fc17.x86_64

So you've told us that 3.3.4-5.fc17.x86_64 doesn't work, then you tell us it works.  Confused.
Comment 2 Wolfgang Denk 2012-06-15 17:10:38 EDT
(In reply to comment #1)
>
> So you've told us that 3.3.4-5.fc17.x86_64 doesn't work, then you tell us it
> works.  Confused.

Argh... silly me.

It is kernel-3.4.0-1.fc17.x86_64  that has the problems, and
3.3.4-5.fc17.x86_64 appears to be fine.   Sorry.
Comment 3 Josh Boyer 2012-06-15 19:29:24 EDT
Out of curiosity, can you attach the dmesg from a boot with 3.4.0-1?  I'd like to see if you have an ASM108x devices in those machines.
Comment 4 Wolfgang Denk 2012-06-16 03:46:35 EDT
(In reply to comment #3)
> Out of curiosity, can you attach the dmesg from a boot with 3.4.0-1?  I'd
> like to see if you have an ASM108x devices in those machines.

See attachments.  I also included the lspci output.
Comment 5 Wolfgang Denk 2012-06-16 03:47:40 EDT
Created attachment 592287 [details]
Boot log and lspci output of system 1
Comment 6 Wolfgang Denk 2012-06-16 03:48:30 EDT
Created attachment 592288 [details]
Boot log and lspci output of system 2
Comment 7 Wolfgang Denk 2012-06-18 06:22:41 EDT
The same problems are still present with kernel version 3.4.2-4.fc17.x86_64

I had the same keyboard lockuptwice, and the network issue stiss exists, too - now with a bit of additional information in one case:

Jun 18 11:43:50 nyx kernel: [ 1738.248031] nfs: server castor not responding, timed out
Jun 18 11:43:53 nyx kernel: [ 1741.712018] ------------[ cut here ]------------
Jun 18 11:43:53 nyx kernel: [ 1741.712029] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x250/0x260()
Jun 18 11:43:53 nyx kernel: [ 1741.712033] Hardware name: P35-DS3R
Jun 18 11:43:53 nyx kernel: [ 1741.712036] NETDEV WATCHDOG: p20p1 (r8169): transmit queue 0 timed out
Jun 18 11:43:53 nyx kernel: [ 1741.712038] Modules linked in: fuse nfs fscache ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM bridge stp llc xfs nouveau snd_hda_codec_realtek snd_hda_intel mxm_wmi wmi video snd_hda_codec i2c_algo_bit ttm snd_hwdep snd_pcm drm_kms_helper snd_page_alloc drm snd_timer snd coretemp osst st r8169 iTCO_wdt microcode iTCO_vendor_support ch i2c_i801 i2c_core mii soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc uinput binfmt_misc raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor sym53c8xx ata_generic pata_acpi async_tx scsi_transport_spi pata_jmicron [last unloaded: iptable_mangle]
Jun 18 11:43:53 nyx kernel: [ 1741.712114] Pid: 0, comm: swapper/0 Not tainted 3.4.2-4.fc17.x86_64 #1
Jun 18 11:43:53 nyx kernel: [ 1741.712117] Call Trace:
Jun 18 11:43:53 nyx kernel: [ 1741.712119]  <IRQ>  [<ffffffff8105680f>] warn_slowpath_common+0x7f/0xc0
Jun 18 11:43:53 nyx kernel: [ 1741.712130]  [<ffffffff81056906>] warn_slowpath_fmt+0x46/0x50
Jun 18 11:43:53 nyx kernel: [ 1741.712135]  [<ffffffff8108661c>] ? ttwu_do_wakeup+0x2c/0xf0
Jun 18 11:43:53 nyx kernel: [ 1741.712140]  [<ffffffff815017c0>] dev_watchdog+0x250/0x260
Jun 18 11:43:53 nyx kernel: [ 1741.712144]  [<ffffffff81501570>] ? dev_deactivate_queue.constprop.30+0x80/0x80
Jun 18 11:43:53 nyx kernel: [ 1741.712150]  [<ffffffff810659b1>] run_timer_softirq+0x141/0x340
Jun 18 11:43:53 nyx kernel: [ 1741.712154]  [<ffffffff8105dbb0>] __do_softirq+0xc0/0x1e0
Jun 18 11:43:53 nyx kernel: [ 1741.712160]  [<ffffffff815f9cdc>] call_softirq+0x1c/0x30
Jun 18 11:43:53 nyx kernel: [ 1741.712164]  [<ffffffff810151f5>] do_softirq+0x75/0xb0
Jun 18 11:43:53 nyx kernel: [ 1741.712168]  [<ffffffff8105df85>] irq_exit+0xb5/0xc0
Jun 18 11:43:53 nyx kernel: [ 1741.712172]  [<ffffffff815fa61e>] smp_apic_timer_interrupt+0x6e/0x99
Jun 18 11:43:53 nyx kernel: [ 1741.712177]  [<ffffffff815f938a>] apic_timer_interrupt+0x6a/0x70
Jun 18 11:43:53 nyx kernel: [ 1741.712179]  <EOI>  [<ffffffff8101bad2>] ? mwait_idle+0x92/0x1e0
Jun 18 11:43:53 nyx kernel: [ 1741.712187]  [<ffffffff8101c50e>] cpu_idle+0xfe/0x120
Jun 18 11:43:53 nyx kernel: [ 1741.712191]  [<ffffffff815cda5e>] rest_init+0x72/0x74
Jun 18 11:43:53 nyx kernel: [ 1741.712197]  [<ffffffff81cf4c1a>] start_kernel+0x3b7/0x3c4
Jun 18 11:43:53 nyx kernel: [ 1741.712201]  [<ffffffff81cf4662>] ? repair_env_string+0x5e/0x5e
Jun 18 11:43:53 nyx kernel: [ 1741.712205]  [<ffffffff81cf4346>] x86_64_start_reservations+0x131/0x135
Jun 18 11:43:53 nyx kernel: [ 1741.712209]  [<ffffffff81cf444a>] x86_64_start_kernel+0x100/0x10f
Jun 18 11:43:53 nyx kernel: [ 1741.712212] ---[ end trace 933b84f8c20a9beb ]---
Jun 18 11:43:53 nyx kernel: [ 1741.717171] r8169 0000:04:00.0: p20p1: link up
Jun 18 12:00:09 nyx kernel: [ 2717.808048] nfs: server castor not responding, timed out
Jun 18 12:00:11 nyx kernel: [ 2719.717164] r8169 0000:04:00.0: p20p1: link up
Comment 8 Wolfgang Denk 2012-06-19 07:13:01 EDT
Eventually we should split this bug report.  I just had the dead keyboard problem with the 3.3.4-5.fc17.x86_64 kernel, too.  However, the network issue has never happened since with this one.
Comment 9 Josh Boyer 2012-06-19 08:25:52 EDT
I'm fairly confused on this one.  Your attachments in comments #5 and #6 show you're using the nvidia module, which can do weird things on upgrades.  But comment #7 doesn't have anything tainted.

There are a few known NFS issues in 3.4 that 3.4.2/3.4.3 might fix up.  Aside from that, I'm not sure what the keyboard lockup issue would be and comment #7 leads me to believe something is seriously hanging the kernel up.
Comment 10 Wolfgang Denk 2012-06-22 09:24:33 EDT
I have some additional information about the keyboard lockup issue:

1) It seems I always trigger the problem when I'm holding the left
   shift key for some extended time, typically when I'm selecting a
   text region in a window with the mouse for copy & paste.

2) I also see other errors when holding the shift key for a long
   time, for example when I'm typing a long sequence of uppercase
   letters: sometimes, they will start coming out lower case.  For a
   long time I thought this was an unreliable contact in my (old)
   keyboard, but now I realize that this happens on 4 different
   keyboards, so it looks more like a software issue.

3) When the keyboard is dead, I can still log in from another system,
   and I can run for example
   
   	evtest /dev/input/by-path/platform-i8042-serio-0-event-kbd

   which shows that the keyboard is still generating normal input
   events.  So the bug must be in somewhat higher layers.
Comment 11 Wolfgang Denk 2012-07-03 02:56:14 EDT
The network issue is still present with 3.4.4-3.fc17.x86_64 , see attached boot log.
Comment 12 Wolfgang Denk 2012-07-03 02:58:29 EDT
Created attachment 595892 [details]
Boot log with 3.4.4-3 kernel
Comment 13 Philip 2012-07-06 15:54:56 EDT
I've experienced the same (keyboard) problem (Kernel 3.4).
The first time, it happened with a PS/2 keyboard. I then connected a different PS/2 keyboard, but that didn't change anything (this is a mainboard that allows PS/2 hotplugging). I connected a USB keyboard and it worked.
The next day, the same thing happened - with the new USB keyboard. I didn't have the time to try another keyboard, I just rebooted to make it work again.

This happened all of the sudden, while I was working on the computer.
The num lock light was still on. Almost no key press worked anymore. Except backspace and some few other keys every 10th time or so (but this might very well be random).
Comment 14 Wolfgang Denk 2012-07-06 16:21:46 EDT
(In reply to comment #13)
> The next day, the same thing happened - with the new USB keyboard. I didn't
> have the time to try another keyboard, I just rebooted to make it work again.

With a USB keyboard, I was usually able to recover simply by
unplugging and re-plugging the USB keyboard.  If this didn't work on
first try, I plugged it into another USB port.  I had only a single
case since where nothing helped and I really had to reboot.

Yes, this is a major PITA!
Comment 15 Philip 2012-07-11 11:40:15 EDT
Happened again with USB keyboard, disconnecting and reconnecting it helped.
On 3.4.4-3.fc17.x86_64.
Comment 16 Philip 2012-07-19 08:48:44 EDT
Same with 3.4.4-5.fc17.x86_64...
Comment 17 Bert DeKnuydt 2012-07-25 07:14:17 EDT
As for Wolfgangs #2 problem:  I have something similar which seems to be
fixed by switching off the receive-checksum offload option of the NIC.

Test with:

     ethtool -K p2p1 rx off

I think it might be related with BZ#635596 from RHEL (which I'm not allowed to read).

Kernel 3.4.6-2.fc17.x86_64 still suffers from it.
Comment 18 Mauricio Silveira 2012-07-25 11:56:07 EDT
Same issue here.
FC17 3.4.6-2.fc17.x86_64

Attached messages cut and lspci.
Note: My Mb has 2 NIC adapters, but I only use 1... I can't remember why. I guess it has something to do with iPXE, since I use this system over iSCSI.

Interesting enough, NetworkManager seems to try and renew IP every 2 hours ( in the attachment too ).

All the content of the attached messages took place while the system was idle, I was out, not using it.

I got a lockup yesterday. I'm noticing some issues with my server too. CentOS6, where even firefox managed to cause a kernel crash! The video was out, but I managed to shut down by pressing the power button. At the same time, the desktop ( FC17 ) locked up.

I have this desktop system overclocked, but it used to work well with FC14.

Let me know if you need any more data.
Comment 19 Mauricio Silveira 2012-07-25 11:57:22 EDT
Created attachment 600338 [details]
Cut of the relevant part of messages

Kernel crash report for 3.4.6-2.fc17.x86_64
Comment 20 Mauricio Silveira 2012-07-25 11:58:29 EDT
Created attachment 600339 [details]
My LSPCI

LSPCI of my system: Mobo: Gigabyte GA-EP45-UD3P with one NIC disabled.
Comment 21 Philip 2012-08-01 10:56:36 EDT
On 3.4.4-5.fc17.x86_64:
It happened again and I've just realized that most of the seem to keys work if I hold them for 1-2 seconds. If I hit num lock, nothing happens (light stays on), but if I hold it for about 2 seconds, then release it, the light switches.

At this point, I'm not sure, if this behavior was always like this (I don't think so).
Comment 22 Wolfgang Denk 2012-08-09 14:56:31 EDT
Both problems are still present with 3.5.0-2.fc17.x86_64
Comment 23 Wolfgang Denk 2013-01-29 01:28:06 EST
At least the keyboard problem is still present with 3.7.4-204.fc18.x86_64
Comment 24 Philip 2013-01-29 11:26:04 EST
Turns out it's a feature, not a bug. It's called "Slow keys". Someone must've thought it would be great to have a feature, that'll randomly disable the keyboard (if keys only work after holding them for like 2 seconds then that counts as "disabling" too). Unfortunately it seems to be turned on by default when using GDM (which is what I use).

$ xkbset q | grep "Accessibility Features"
Accessibility Features (AccessX) = On

Holding the Shift key down for 10 seconds will activate this "feature".
gnome-control-center - Universal Access - Typing - Slow Keys is OFF.

See bug #816764:
https://bugzilla.redhat.com/show_bug.cgi?id=816764

I going to change the display manager and be done.

This is not a Kernel bug and whatever's left of this bug report does not affect me.
So I'm outta here.
Comment 25 Wolfgang Denk 2013-01-31 03:30:56 EST
(In reply to comment #24)
> Turns out it's a feature, not a bug. It's called "Slow keys". Someone
> must've thought it would be great to have a feature, that'll randomly
> disable the keyboard (if keys only work after holding them for like 2
> seconds then that counts as "disabling" too). Unfortunately it seems to be
> turned on by default when using GDM (which is what I use).

Thanks for pointing out - you are right.

What a PITA!!

Note You need to log in before you can comment on or make changes to this bug.