1082586 – [abrt] BUG: soft lockup - CPU#5 stuck for 22s! [smbd:9182]

Bug 1082586 - [abrt] BUG: soft lockup - CPU#5 stuck for 22s! [smbd:9182]

Summary: [abrt] BUG: soft lockup - CPU#5 stuck for 22s! [smbd:9182]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	19
Hardware:	i686
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	J. Bruce Fields
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://retrace.fedoraproject.org/faf...
Whiteboard:	abrt_hash:0275f34c7c8c1eee3c410db60f1...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-31 12:04 UTC by Terry Barnaby
Modified:	2014-05-24 05:33 UTC (History)
CC List:	15 users (show)
Fixed In Version:	kernel-3.14.4-100.fc19
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1088006 1088012 (view as bug list)
Environment:
Last Closed:	2014-05-12 14:16:49 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File: dmesg (124.88 KB, text/plain) 2014-03-31 12:04 UTC, Terry Barnaby	no flags	Details
patch -- locks: add new i_last_lease_break field to struct inode (3.21 KB, patch) 2014-04-04 17:17 UTC, Jeff Layton	no flags	Details \| Diff
View All

Description Terry Barnaby 2014-03-31 12:04:38 UTC

Description of problem:
Been happening for several weeks now since kernel 3.13.x I think.
Different 3.13.x kernels have this problem.
Happens within 20mins of a reboot.
Seems to happen with MSWindows clients accessing via smb.

Additional info:
reporter:       libreport-2.1.12
BUG: soft lockup - CPU#5 stuck for 22s! [smbd:9182]
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc snd_hda_codec_hdmi iTCO_wdt snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm iTCO_vendor_support snd_page_alloc ftdi_sio gpio_ich snd_timer coretemp kvm_intel snd soundcore lpc_ich mfd_core kvm e1000e ptp ppdev parport_pc pps_core microcode i2c_i801 serio_raw i5000_edac edac_core i5k_amb ioatdma parport dca acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc uinput raid1 radeon i2c_algo_bit drm_kms_helper ttm ata_generic drm usb_storage pata_acpi i2c_core analog gameport joydev
CPU: 5 PID: 9182 Comm: smbd Not tainted 3.13.6-100.fc19.i686.PAE #1
Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
task: c8ca5a00 ti: d281c000 task.ti: d281c000
EIP: 0060:[<c09c9042>] EFLAGS: 00200202 CPU: 5
EIP is at _raw_spin_lock+0x12/0x30
EAX: e73dbdcc EBX: 00000000 ECX: 00000001 EDX: 00001c1c
ESI: 00000000 EDI: e0909070 EBP: d281de4c ESP: d281de4c
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: b76a4000 CR3: 1fcab000 CR4: 000007f0
Stack:
 d281de9c c05ac559 00000004 efee29a0 c8ca5a00 00000204 00000001 e73dbdcc
 00000000 e090909c 00000001 e73dbd78 e73dbd78 d281de8c c06a641b d7a78d00
 d281de9c d7a78d00 fffffff5 d281ded0 d281df90 c0574092 cb701900 d281ded0
Call Trace:
 [<c05ac559>] __break_lease+0x259/0x330
 [<c06a641b>] ? lockref_put_or_lock+0xb/0x30
 [<c0574092>] SYSC_renameat+0x292/0x370
 [<c05ac1f0>] ? do_lock_file_wait.part.19+0x30/0xb0
 [<c05aac9e>] ? locks_free_lock+0x4e/0x60
 [<c0576bf7>] SyS_rename+0x37/0x40
 [<c09d054d>] sysenter_do_call+0x12/0x28
Code: 66 90 89 d8 f0 81 28 00 00 10 00 74 05 e8 2b cb cd ff 89 c8 5b 5d c3 66 90 55 89 e5 66 66 66 66 90 ba 00 01 00 00 f0 66 0f c1 10 <0f> b6 ce 38 d1 75 04 5d c3 f3 90 0f b6 10 38 ca 75 f7 5d c3 8d

Comment 1 Terry Barnaby 2014-03-31 12:04:45 UTC

Created attachment 880725 [details]
File: dmesg

Comment 2 Terry Barnaby 2014-04-01 14:44:48 UTC

Hi, has been having problems in other areas than smbd as well.

This is an oldish dual Xeon system with 8 CPU's reported (Hyperthreading enabled).

Is doing this a few times a day. Was completly fine for years before a
relatively recent kernel update.

Apr  1 15:38:28 beam kernel: [ 7024.124002] BUG: soft lockup - CPU#3 stuck for 22s! [yum:20897]
Apr  1 15:38:28 beam kernel: [ 7024.124002] Modules linked in: vfat fat fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek iTCO_wdt snd_hda_intel iTCO_vendor_support gpio_ich snd_hda_codec snd_hwdep ppdev snd_seq snd_seq_device cdc_acm snd_pcm snd_page_alloc snd_timer snd lpc_ich i5000_edac e1000e i2c_i801 edac_core ioatdma mfd_core nfsd auth_rpcgss microcode soundcore ptp nfs_acl lockd serio_raw pps_core sunrpc uinput i5k_amb acpi_cpufreq dca parport_pc parport binfmt_misc raid1 radeon i2c_algo_bit drm_kms_helper ttm ata_generic drm usb_storage pata_acpi i2c_core analog gameport joydev
Apr  1 15:38:28 beam kernel: [ 7024.124002] CPU: 3 PID: 20897 Comm: yum Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  1 15:38:28 beam kernel: [ 7024.124002] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  1 15:38:28 beam kernel: [ 7024.124002] task: d2d1ba80 ti: d3a80000 task.ti: d3a80000
Apr  1 15:38:28 beam kernel: [ 7024.124002] EIP: 0060:[<c05ac599>] EFLAGS: 00200246 CPU: 3
Apr  1 15:38:28 beam kernel: [ 7024.124002] EIP is at __break_lease+0x259/0x330
Apr  1 15:38:28 beam kernel: [ 7024.124002] EAX: c6b4767c EBX: 00000000 ECX: 00000001 EDX: 00000101
Apr  1 15:38:28 beam kernel: [ 7024.124002] ESI: 00000000 EDI: d2fc63f0 EBP: d3a81e9c ESP: d3a81e54
Apr  1 15:38:28 beam kernel: [ 7024.124002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  1 15:38:28 beam kernel: [ 7024.124002] CR0: 80050033 CR2: 45f43080 CR3: 0787e000 CR4: 000007f0
Apr  1 15:38:28 beam kernel: [ 7024.124002] Stack:
Apr  1 15:38:28 beam kernel: [ 7024.124002]  00000004 eaa6ccb0 d2d1ba80 00000204 00000001 c6b4767c 00000000 d2fc641c
Apr  1 15:38:28 beam kernel: [ 7024.124002]  00000001 c6b47628 c86a5198 d3a81e8c c06a646b c86dc080 d3a81e9c c86dc080
Apr  1 15:38:28 beam kernel: [ 7024.124002]  fffffff5 d3a81ed0 d3a81f90 c05740a2 c6a69a00 d3a81ed0 ea853000 ea856000
Apr  1 15:38:28 beam kernel: [ 7024.124002] Call Trace:
Apr  1 15:38:28 beam kernel: [ 7024.124002]  [<c06a646b>] ? lockref_put_or_lock+0xb/0x30
Apr  1 15:38:28 beam kernel: [ 7024.124002]  [<c05740a2>] SYSC_renameat+0x292/0x370
Apr  1 15:38:28 beam kernel: [ 7024.124002]  [<c059ffb7>] ? fsnotify+0x1f7/0x2d0
Apr  1 15:38:28 beam kernel: [ 7024.124002]  [<c0753bc2>] ? tty_write_unlock+0x32/0x40
Apr  1 15:38:28 beam kernel: [ 7024.124002]  [<c0576c47>] SyS_rename+0x37/0x40
Apr  1 15:38:28 beam kernel: [ 7024.124002]  [<c09d088d>] sysenter_do_call+0x12/0x28
Apr  1 15:38:28 beam kernel: [ 7024.124002] Code: ff 8b 17 85 d2 0f 94 c2 85 db 0f 85 ac 00 00 00 84 d2 0f 85 a2 00 00 00 8b 45 d4 8d 55 e0 e8 7f 32 ee ff 8b 45 cc e8 d7 cd 41 00 <89> f8 e8 b0 eb ff ff 85 db 78 4e 0f 84 a5 00 00 00 8b 45 dc 8b

Comment 3 J. Bruce Fields 2014-04-01 15:49:19 UTC

Thanks for the report.  The most likely culprit is the new delegation code that went into 3.13.  Turning off leases might help.  I'm not sure how to do that with Samba, but "echo 0 >/proc/sys/fs/leases-enable" before starting Samba will probably do the job.

Do you nfs-export this filesystem as well?

Are you seeing other symptoms (e.g., clients hanging?), or are the log warnings the only issue?

Comment 4 Terry Barnaby 2014-04-01 18:51:30 UTC

Thanks for the reply.

Ok, will try turning of leases.

Yes, the system is a server/workstation and exports a number of file systems to a number of clients. NFS is used heavily and I haven't noticed any issues that appear to be related to NFS accesses.

Other, non Xeon, Fedora 19 servers that we have appear ok. All are running the same system. Although samba is not used with these.

Log warning messages during samba accesses from the one MSWindows system we have seems to be the main cause. However it happened during a createrepo on the local file systems today. The createrepo process locked up and the warning message appeared. The createrepo stayed locked up. I killed it with CTRL-C, but it did not fully die for some time.

Comment 5 Terry Barnaby 2014-04-01 19:02:12 UTC

Just done a new search:
Bug 1038929 may be a duplicate.

Comment 6 J. Bruce Fields 2014-04-01 19:39:06 UTC

(In reply to Terry Barnaby from comment #4)
> Ok, will try turning of leases.
>
> Yes, the system is a server/workstation and exports a number of file systems
> to a number of clients. NFS is used heavily and I haven't noticed any issues
> that appear to be related to NFS accesses.

And the problem occurred on a filesystem that is used by both Samba and NFS clients?

> Other, non Xeon, Fedora 19 servers that we have appear ok. All are running
> the same system. Although samba is not used with these.

It's more likely that this is triggered by Samba and/or NFS than by different hardware, so I think that the fact that Samba was not running on those servers is probably the relevant fact.

> Log warning messages during samba accesses from the one MSWindows system we
> have seems to be the main cause. However it happened during a createrepo on
> the local file systems today. The createrepo process locked up and the
> warning message appeared. The createrepo stayed locked up. I killed it with
> CTRL-C, but it did not fully die for some time.

So createrepo was run directly on the server's filesystem, not over NFS or SMB?

Could createrepo have been reading from or writing to files that had also been recently accessed by SMB or NFS clients?

Comment 7 Terry Barnaby 2014-04-01 19:50:07 UTC

(In reply to J. Bruce Fields from comment #6)
> (In reply to Terry Barnaby from comment #4)
> > Ok, will try turning of leases.
> >
> > Yes, the system is a server/workstation and exports a number of file systems
> > to a number of clients. NFS is used heavily and I haven't noticed any issues
> > that appear to be related to NFS accesses.
> 
> And the problem occurred on a filesystem that is used by both Samba and NFS
> clients?
Yes, all these file systems are exported for NFS. Some portions of these
file systems are exported by samba.

> 
> > Other, non Xeon, Fedora 19 servers that we have appear ok. All are running
> > the same system. Although samba is not used with these.
> 
> It's more likely that this is triggered by Samba and/or NFS than by
> different hardware, so I think that the fact that Samba was not running on
> those servers is probably the relevant fact.
> 
> > Log warning messages during samba accesses from the one MSWindows system we
> > have seems to be the main cause. However it happened during a createrepo on
> > the local file systems today. The createrepo process locked up and the
> > warning message appeared. The createrepo stayed locked up. I killed it with
> > CTRL-C, but it did not fully die for some time.
> 
> So createrepo was run directly on the server's filesystem, not over NFS or
> SMB?
Yes.

> 
> Could createrepo have been reading from or writing to files that had also
> been recently accessed by SMB or NFS clients?
The file system that createrepo was running on could be accessed by NFS clients but not samba clients. The yum directory that createrepo was updating is used for locally built RPM's that other systems will update via a yum update (and yumbackend).

Comment 8 Terry Barnaby 2014-04-02 05:52:14 UTC

Just looked through /var/log/messages on my home server (running same Fedora19 with same update level but a basic Intel Core2 CPU 6300) and this shows a
couple of similar lock ups. This has smbd enabled but should not have been in
use at that time.

The system NFS exports a number of file systems to a number of home systems.
In this case it looks like a MythTv mythpreviewgen had problems. mythpreviewgen would have been directly accessing the local file system on the server, but
other clients would have been accessing the same file system via NFS and
mythbackend. Certainly the file mythpreviewgen would have been creating could have been accessed by the clients over NFS ...

I will try and setup a test to get this fault to occur on demand.

Apr  1 17:10:11 king kernel: [373928.040000] BUG: soft lockup - CPU#0 stuck for 22s! [mythpreviewgen:2695]
Apr  1 17:10:11 king kernel: [373928.040000] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc tun rfcomm bnep saa7134_alsa isl6421 cx24116 cx88_dvb cx88_vp3054_i2c rc_hauppauge tda1004x saa7134_dvb videobuf_dvb coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support rc_dib0700_rc5 ppdev microcode i2c_i801 serio_raw ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sanyo_decoder ir_sony_decoder ir_rc6_decoder ir_jvc_decoder ir_nec_decoder ir_rc5_decoder tda827x tda8290 tuner r820t cx88_alsa cx8800 cx8802 cx88xx rtl2832 snd_hda_codec_via saa7134 btcx_risc snd_hda_intel tveeprom dvb_usb_dib0700 videobuf_dma_sg btusb dib8000 videobuf_core snd_hda_codec dib7000m v4l2_common snd_hwdep bluetooth dib0090 dvb_usb_rtl28xxu dib0070 dib7000p dib3000mc dibx000_common dvb_usb rfkill videodev snd_seq snd_seq_device rtl2830 dvb_usb_v2 media snd_pcm dvb_core rc_core lpc_ich r8169 mii i2c_algo_bit parport_pc parport i2c_core snd_page_alloc mfd_core snd_timer snd soundcore acpi_cpufreq asus_atk0110 nfsd binfmt_misc auth_rpcgss nfs_acl lockd uinput sunrpc ata_generic pata_acpi raid1 pata_jmicron analog gameport joydev
Apr  1 17:10:11 king kernel: [373928.040000] CPU: 0 PID: 2695 Comm: mythpreviewgen Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  1 17:10:11 king kernel: [373928.040000] Hardware name: System manufacturer System Product Name/P5QL/EPU, BIOS 0315    11/10/2008
Apr  1 17:10:11 king kernel: [373928.040000] task: e4849680 ti: e7e0c000 task.ti: e7e0c000
Apr  1 17:10:11 king kernel: [373928.040000] EIP: 0060:[<c05ac520>] EFLAGS: 00200286 CPU: 0
Apr  1 17:10:11 king kernel: [373928.040000] EIP is at __break_lease+0x1e0/0x330
Apr  1 17:10:11 king kernel: [373928.040000] EAX: cc4678ec EBX: ef32ee00 ECX: ef32ee0c EDX: ef32ee0c
Apr  1 17:10:11 king kernel: [373928.040000] ESI: 00000000 EDI: ef32e9a0 EBP: e7e0df20 ESP: e7e0ded8
Apr  1 17:10:11 king kernel: [373928.040000]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  1 17:10:11 king kernel: [373928.040000] CR0: 80050033 CR2: 4e394d90 CR3: 280fd000 CR4: 000007f0
Apr  1 17:10:11 king kernel: [373928.040000] Stack:
Apr  1 17:10:11 king kernel: [373928.040000]  00000004 ef32ee00 e4849680 00000204 00000001 cc4678ec 00000000 ef32e9cc
Apr  1 17:10:11 king kernel: [373928.040000]  00000001 cc467898 cc467898 cc4678ec e7e0df20 c057f1fe e7e0df20 e55c6080
Apr  1 17:10:11 king kernel: [373928.040000]  fffffff5 cc467898 e7e0dfa4 c05737e1 00000000 0a0ed0c8 ffffff9c 00000000
Apr  1 17:10:11 king kernel: [373928.040000] Call Trace:
Apr  1 17:10:11 king kernel: [373928.040000]  [<c057f1fe>] ? iput+0x2e/0x130
Apr  1 17:10:11 king kernel: [373928.040000]  [<c05737e1>] do_unlinkat+0x151/0x200
Apr  1 17:10:11 king kernel: [373928.040000]  [<c045af05>] ? irq_exit+0x55/0xb0
Apr  1 17:10:11 king kernel: [373928.040000]  [<c0411f65>] ? do_IRQ+0x45/0xb0
Apr  1 17:10:11 king kernel: [373928.040000]  [<c0576645>] SyS_unlink+0x15/0x20
Apr  1 17:10:11 king kernel: [373928.040000]  [<c09d088d>] sysenter_do_call+0x12/0x28
Apr  1 17:10:11 king kernel: [373928.040000] Code: 00 b9 01 00 00 00 31 f6 b8 00 ef e6 c0 89 4d d8 e8 66 ce 41 00 89 fa 89 d8 e8 8d f3 ff ff 80 05 00 ef e6 c0 01 8b 45 cc 80 00 01 <8b> 0f 85 c9 8b 4d d8 0f 94 c0 84 c9 0f 84 16 01 00 00 84 c0 0f
Apr  1 17:10:13 king sh[656]: abrt-dump-oops: Found oopses: 1
Apr  1 17:10:13 king sh[656]: abrt-dump-oops: Creating problem directories
Apr  1 17:10:14 king abrt-dump-oops: Reported 1 kernel oopses to Abrt
Apr  1 17:10:15 king mythbackend[27408]: 2014-04-01 17:10:15.078274 I  Child PID 2695 killed with Terminated
Apr  1 17:10:17 king mythbackend[27408]: 2014-04-01 17:10:17.079700 I  Child PID 2695 killed with Killed

Comment 9 Terry Barnaby 2014-04-02 08:56:18 UTC

I have managed to trigger some more lockups.
On the server I was running:
while true
do
	createrepo /dist/fedora_19/beamExtra
done

On a client with NFS mounts I was running (/dist/fedora_19/beamExtra repo in repo list):
while true
do
	yum check-update
done

and
while true
do
	for f in /dist/fedora_19/beamExtra/repodata/*
	do
#		echo "Do: $f"
		cat $f > /dev/null
	done
done

There was other work being done as well and other systems connected and being
used and there was a MSWindows system connected via samba but not being used.

Apr  2 09:34:44 beam kernel: [ 5516.161002] BUG: soft lockup - CPU#6 stuck for 23s! [genpkgmetadata.:17353]
Apr  2 09:34:44 beam kernel: [ 5516.161002] Modules linked in: vfat fat fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc snd_hda_codec_hdmi snd_hda_codec_realtek coretemp kvm_intel kvm e1000e snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd cdc_acm ppdev soundcore iTCO_wdt iTCO_vendor_support parport_pc gpio_ich lpc_ich i5000_edac microcode mfd_core ptp serio_raw edac_core ioatdma pps_core parport i2c_i801 acpi_cpufreq i5k_amb dca nfsd auth_rpcgss nfs_acl lockd sunrpc uinput binfmt_misc raid1 radeon i2c_algo_bit drm_kms_helper ttm drm ata_generic usb_storage pata_acpi i2c_core analog gameport joydev
Apr  2 09:34:44 beam kernel: [ 5516.161002] CPU: 6 PID: 17353 Comm: genpkgmetadata. Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:34:44 beam kernel: [ 5516.161002] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:34:44 beam kernel: [ 5516.161002] task: eaf21680 ti: ebc5c000 task.ti: ebc5c000
Apr  2 09:34:44 beam kernel: [ 5516.161002] EIP: 0060:[<c09c9382>] EFLAGS: 00200202 CPU: 6
Apr  2 09:34:44 beam kernel: [ 5516.161002] EIP is at _raw_spin_lock+0x12/0x30
Apr  2 09:34:44 beam kernel: [ 5516.161002] EAX: c0e6ef00 EBX: df3f9f50 ECX: 0000000f EDX: 00007575
Apr  2 09:34:44 beam kernel: [ 5516.161002] ESI: 00000000 EDI: df3f9f50 EBP: ebc5dec4 ESP: ebc5dec4
Apr  2 09:34:44 beam kernel: [ 5516.161002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:34:44 beam kernel: [ 5516.161002] CR0: 80050033 CR2: b76860b0 CR3: 1f409000 CR4: 000007f0
Apr  2 09:34:44 beam kernel: [ 5516.161002] Stack:
Apr  2 09:34:44 beam kernel: [ 5516.161002]  ebc5ded0 c05ab165 00000000 ebc5df20 c05ac5a0 00000004 df3f9a80 eaf21680
Apr  2 09:34:44 beam kernel: [ 5516.161002]  00000204 00000001 c85180dc 00000000 df3f9f7c 00000001 c8518088 c8518088
Apr  2 09:34:44 beam kernel: [ 5516.161002]  c85180dc ebc5df20 c057f1fe ebc5df20 c856fd00 fffffff5 c8518088 ebc5dfa4
Apr  2 09:34:44 beam kernel: [ 5516.161002] Call Trace:
Apr  2 09:34:44 beam kernel: [ 5516.161002]  [<c05ab165>] locks_delete_block+0x15/0x30
Apr  2 09:34:44 beam kernel: [ 5516.161002]  [<c05ac5a0>] __break_lease+0x260/0x330
Apr  2 09:34:44 beam kernel: [ 5516.161002]  [<c057f1fe>] ? iput+0x2e/0x130
Apr  2 09:34:44 beam kernel: [ 5516.161002]  [<c05737e1>] do_unlinkat+0x151/0x200
Apr  2 09:34:44 beam kernel: [ 5516.161002]  [<c0576645>] SyS_unlink+0x15/0x20
Apr  2 09:34:44 beam kernel: [ 5516.161002]  [<c09d088d>] sysenter_do_call+0x12/0x28
Apr  2 09:34:44 beam kernel: [ 5516.161002] Code: 66 90 89 d8 f0 81 28 00 00 10 00 74 05 e8 3b c8 cd ff 89 c8 5b 5d c3 66 90 55 89 e5 66 66 66 66 90 ba 00 01 00 00 f0 66 0f c1 10 <0f> b6 ce 38 d1 75 04 5d c3 f3 90 0f b6 10 38 ca 75 f7 5d c3 8d
Apr  2 09:34:45 beam sh[743]: abrt-dump-oops: Found oopses: 1
Apr  2 09:34:45 beam sh[743]: abrt-dump-oops: Creating problem directories
Apr  2 09:34:46 beam abrt-dump-oops: Reported 1 kernel oopses to Abrt
Apr  2 09:34:47 beam abrt-server[17395]: Duplicate: core backtrace
Apr  2 09:34:47 beam abrt-server[17395]: DUP_OF_DIR: /var/tmp/abrt/oops-2014-03-05-16:02:14-24348-0
Apr  2 09:34:47 beam abrt-server[17395]: Deleting problem directory oops-2014-04-02-09:34:45-17394-0 (dup of oops-2014-03-05-16:02:14-24348-0)
Apr  2 09:34:47 beam dbus-daemon[788]: dbus[788]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Apr  2 09:34:47 beam dbus[788]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Apr  2 09:34:47 beam ypserv[827]: refused connect from 192.168.1.1:41294 to procedure ypproc_match (beamnet,shadow.byname;-1)
Apr  2 09:34:47 beam dbus-daemon[788]: dbus[788]: [system] Successfully activated service 'org.freedesktop.problems'
Apr  2 09:34:47 beam dbus[788]: [system] Successfully activated service 'org.freedesktop.problems'
Apr  2 09:34:47 beam ypserv[827]: refused connect from 192.168.1.1:60534 to procedure ypproc_match (beamnet,shadow.byname;-1)
Apr  2 09:35:12 beam kernel: [ 5544.161003] BUG: soft lockup - CPU#6 stuck for 22s! [genpkgmetadata.:17353]
Apr  2 09:35:12 beam kernel: [ 5544.161003] Modules linked in: vfat fat fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc snd_hda_codec_hdmi snd_hda_codec_realtek coretemp kvm_intel kvm e1000e snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd cdc_acm ppdev soundcore iTCO_wdt iTCO_vendor_support parport_pc gpio_ich lpc_ich i5000_edac microcode mfd_core ptp serio_raw edac_core ioatdma pps_core parport i2c_i801 acpi_cpufreq i5k_amb dca nfsd auth_rpcgss nfs_acl lockd sunrpc uinput binfmt_misc raid1 radeon i2c_algo_bit drm_kms_helper ttm drm ata_generic usb_storage pata_acpi i2c_core analog gameport joydev
Apr  2 09:35:12 beam kernel: [ 5544.161003] CPU: 6 PID: 17353 Comm: genpkgmetadata. Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:12 beam kernel: [ 5544.161003] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:12 beam kernel: [ 5544.161003] task: eaf21680 ti: ebc5c000 task.ti: ebc5c000
Apr  2 09:35:12 beam kernel: [ 5544.161003] EIP: 0060:[<c05ac50a>] EFLAGS: 00200246 CPU: 6
Apr  2 09:35:12 beam kernel: [ 5544.161003] EIP is at __break_lease+0x1ca/0x330
Apr  2 09:35:12 beam kernel: [ 5544.161003] EAX: c0e6ef00 EBX: df3f9a80 ECX: 000000d6 EDX: 0000d6d6
Apr  2 09:35:12 beam kernel: [ 5544.161003] ESI: 00000000 EDI: df3f9f50 EBP: ebc5df20 ESP: ebc5ded8
Apr  2 09:35:12 beam kernel: [ 5544.161003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:12 beam kernel: [ 5544.161003] CR0: 80050033 CR2: b76860b0 CR3: 1f409000 CR4: 000007f0
Apr  2 09:35:12 beam kernel: [ 5544.161003] Stack:
Apr  2 09:35:12 beam kernel: [ 5544.161003]  00000004 df3f9a80 eaf21680 00000204 00000001 c85180dc 00000000 df3f9f7c
Apr  2 09:35:12 beam kernel: [ 5544.161003]  00000001 c8518088 c8518088 c85180dc ebc5df20 c057f1fe ebc5df20 c856fd00
Apr  2 09:35:12 beam kernel: [ 5544.161003]  fffffff5 c8518088 ebc5dfa4 c05737e1 00000000 0a4ca200 ffffff9c 00000000
Apr  2 09:35:12 beam kernel: [ 5544.161003] Call Trace:
Apr  2 09:35:12 beam kernel: [ 5544.161003]  [<c057f1fe>] ? iput+0x2e/0x130
Apr  2 09:35:12 beam kernel: [ 5544.161003]  [<c05737e1>] do_unlinkat+0x151/0x200
Apr  2 09:35:12 beam kernel: [ 5544.161003]  [<c0576645>] SyS_unlink+0x15/0x20
Apr  2 09:35:12 beam kernel: [ 5544.161003]  [<c09d088d>] sysenter_do_call+0x12/0x28
Apr  2 09:35:12 beam kernel: [ 5544.161003] Code: 0f 01 00 00 8d 47 2c 89 f3 89 45 d4 8b 43 50 85 c0 0f 85 14 01 00 00 b9 01 00 00 00 31 f6 b8 00 ef e6 c0 89 4d d8 e8 66 ce 41 00 <89> fa 89 d8 e8 8d f3 ff ff 80 05 00 ef e6 c0 01 8b 45 cc 80 00
Apr  2 09:35:13 beam sh[743]: abrt-dump-oops: Found oopses: 1
Apr  2 09:35:13 beam sh[743]: abrt-dump-oops: Creating problem directories
Apr  2 09:35:13 beam abrt-server[17487]: Duplicate: core backtrace
Apr  2 09:35:13 beam abrt-server[17487]: DUP_OF_DIR: /var/tmp/abrt/oops-2014-04-01-15:33:54-18737-0
Apr  2 09:35:13 beam abrt-server[17487]: Deleting problem directory oops-2014-04-02-09:35:13-17485-0 (dup of oops-2014-04-01-15:33:54-18737-0)
Apr  2 09:35:14 beam abrt-dump-oops: Reported 1 kernel oopses to Abrt
Apr  2 09:35:17 beam kernel: [ 5549.754003] INFO: rcu_sched self-detected stall on CPU { 6}  (t=60000 jiffies g=994011 c=994010 q=106923)
Apr  2 09:35:17 beam kernel: [ 5549.754003] sending NMI to all CPUs:
Apr  2 09:35:17 beam kernel: [ 5549.754003] NMI backtrace for cpu 6
Apr  2 09:35:17 beam kernel: [ 5549.754003] CPU: 6 PID: 17353 Comm: genpkgmetadata. Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754003] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754003] task: eaf21680 ti: ebc5c000 task.ti: ebc5c000
Apr  2 09:35:17 beam kernel: [ 5549.754003] EIP: 0060:[<c06a4b21>] EFLAGS: 00200046 CPU: 6
Apr  2 09:35:17 beam kernel: [ 5549.754003] EIP is at __const_udelay+0x1/0x20
Apr  2 09:35:17 beam kernel: [ 5549.754003] EAX: 00418958 EBX: 00002710 ECX: 00000008 EDX: 000000ff
Apr  2 09:35:17 beam kernel: [ 5549.754003] ESI: c0c57240 EDI: f75d1820 EBP: ebc5dd58 ESP: ebc5dd48
Apr  2 09:35:17 beam kernel: [ 5549.754003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754003] CR0: 80050033 CR2: b76860b0 CR3: 1f409000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754003] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754003]  ebc5dd58 c043e4e2 c0b5adc5 c0c57240 ebc5dda0 c04a9938 c0b68014 0000ea60
Apr  2 09:35:17 beam kernel: [ 5549.754003]  000f2adb 000f2ada 0001a1ab 00000000 00000000 00000000 c0c57240 c0cc034c
Apr  2 09:35:17 beam kernel: [ 5549.754003]  f75d1820 c0c57240 00000006 eaf21680 00000000 00000006 ebc5ddb4 c046245c
Apr  2 09:35:17 beam kernel: [ 5549.754003] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c043e4e2>] ? arch_trigger_all_cpu_backtrace+0x72/0x90
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c04a9938>] rcu_check_callbacks+0x388/0x5a0
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c046245c>] update_process_times+0x3c/0x60
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c04b4236>] tick_sched_handle.isra.12+0x26/0x60
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c04b42a7>] tick_sched_timer+0x37/0x70
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c0475c18>] ? __remove_hrtimer+0x38/0x90
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c0475e5d>] __run_hrtimer+0x6d/0x190
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c04b4270>] ? tick_sched_handle.isra.12+0x60/0x60
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c0476a58>] hrtimer_interrupt+0x1e8/0x2a0
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c043c4ff>] local_apic_timer_interrupt+0x2f/0x60
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c045ae7e>] ? irq_enter+0x3e/0x70
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c043c6d3>] smp_apic_timer_interrupt+0x33/0x50
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c09c9b6c>] apic_timer_interrupt+0x34/0x3c
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c05a007b>] ? fsnotify+0x2bb/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c09c938a>] ? _raw_spin_lock+0x1a/0x30
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c05ac50a>] ? __break_lease+0x1ca/0x330
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c057f1fe>] ? iput+0x2e/0x130
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c05737e1>] do_unlinkat+0x151/0x200
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c0576645>] SyS_unlink+0x15/0x20
Apr  2 09:35:17 beam kernel: [ 5549.754003]  [<c09d088d>] sysenter_do_call+0x12/0x28
Apr  2 09:35:17 beam kernel: [ 5549.754003] Code: 00 00 00 8d bc 27 00 00 00 00 48 75 fd 48 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 66 66 66 66 90 ff 15 90 3c c8 c0 5d c3 55 <89> e5 64 8b 15 1c 2e d8 c0 c1 e0 02 69 d2 fa 00 00 00 f7 e2 8d
Apr  2 09:35:17 beam kernel: [ 5549.754224] NMI backtrace for cpu 4
Apr  2 09:35:17 beam kernel: [ 5549.754229] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754232] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754235] task: f705ba80 ti: f70b8000 task.ti: f70b8000
Apr  2 09:35:17 beam kernel: [ 5549.754238] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 4
Apr  2 09:35:17 beam kernel: [ 5549.754242] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.754245] EAX: ffffffed EBX: 53ea0dc0 ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754248] ESI: 00000000 EDI: 00000004 EBP: f70b9f50 ESP: f70b9f50
Apr  2 09:35:17 beam kernel: [ 5549.754250]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754253] CR0: 8005003b CR2: b776c000 CR3: 267ae000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754255] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754257]  f70b9f64 c0417cbc 53ea0dc0 00000000 f70b8000 f70b9f6c c04183e6 f70b9f90
Apr  2 09:35:17 beam kernel: [ 5549.754265]  c04a0ee9 f70b9f88 c043bfdb af5e7923 53ea0dc0 1ae265da 00000000 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754271]  f70b9fb4 c043a6ca 943bfe68 cc7bbef3 aa9bd741 deba0abb 442bf33a 1ae265da
Apr  2 09:35:17 beam kernel: [ 5549.754278] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.754284]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.754288]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.754294]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.754298]  [<c043bfdb>] ? setup_APIC_timer+0xab/0x130
Apr  2 09:35:17 beam kernel: [ 5549.754303]  [<c043a6ca>] start_secondary+0x20a/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.754305] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.754346] NMI backtrace for cpu 2
Apr  2 09:35:17 beam kernel: [ 5549.754357] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754360] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754364] task: f705b180 ti: f70b4000 task.ti: f70b4000
Apr  2 09:35:17 beam kernel: [ 5549.754368] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 2
Apr  2 09:35:17 beam kernel: [ 5549.754375] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.754378] EAX: ffffffed EBX: 43923243 ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754381] ESI: 00000000 EDI: 00000002 EBP: f70b5f50 ESP: f70b5f50
Apr  2 09:35:17 beam kernel: [ 5549.754385]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754388] CR0: 8005003b CR2: b39a0000 CR3: 21df1000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754391] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754393]  f70b5f64 c0417cbc 43923243 00000000 f70b4000 f70b5f6c c04183e6 f70b5f90
Apr  2 09:35:17 beam kernel: [ 5549.754403]  c04a0ee9 f70b5f88 c043bfdb 6f61d708 43923243 5f072e3f 00000000 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754410]  f70b5fb4 c043a6ca a5320739 a18da4af 7e2ef0de 249aa942 a8c6b58e 5f072e3f
Apr  2 09:35:17 beam kernel: [ 5549.754418] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.754427]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.754432]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.754438]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.754445]  [<c043bfdb>] ? setup_APIC_timer+0xab/0x130
Apr  2 09:35:17 beam kernel: [ 5549.754450]  [<c043a6ca>] start_secondary+0x20a/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.754452] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.754497] NMI backtrace for cpu 0
Apr  2 09:35:17 beam kernel: [ 5549.754503] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754506] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754509] task: c0c3ea00 ti: c0c32000 task.ti: c0c32000
Apr  2 09:35:17 beam kernel: [ 5549.754513] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 0
Apr  2 09:35:17 beam kernel: [ 5549.754517] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.754520] EAX: ffffffed EBX: 8513c1aa ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754524] ESI: 0008f800 EDI: 00000000 EBP: c0c33f80 ESP: c0c33f80
Apr  2 09:35:17 beam kernel: [ 5549.754527]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754530] CR0: 8005003b CR2: b47ff208 CR3: 21df1000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754532] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754534]  c0c33f94 c0417cbc 8513c1aa 0008f800 c0c32000 c0c33f9c c04183e6 c0c33fc0
Apr  2 09:35:17 beam kernel: [ 5549.754543]  c04a0ee9 00000000 00200282 aef34454 8513c1aa c0d223a0 0008f800 c0d8f800
Apr  2 09:35:17 beam kernel: [ 5549.754551]  c0c33fc8 c09bbae2 c0c33fec c0cc5af2 000000fb ffffffff ffffffff c0cc556c
Apr  2 09:35:17 beam kernel: [ 5549.754559] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.754564]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.754569]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.754574]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.754579]  [<c09bbae2>] rest_init+0x62/0x70
Apr  2 09:35:17 beam kernel: [ 5549.754586]  [<c0cc5af2>] start_kernel+0x3bd/0x3c3
Apr  2 09:35:17 beam kernel: [ 5549.754590]  [<c0cc556c>] ? repair_env_string+0x51/0x51
Apr  2 09:35:17 beam kernel: [ 5549.754594]  [<c0cc5393>] i386_start_kernel+0x12e/0x131
Apr  2 09:35:17 beam kernel: [ 5549.754596] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.754644] NMI backtrace for cpu 3
Apr  2 09:35:17 beam kernel: [ 5549.754655] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754658] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754662] task: f705b600 ti: f70b6000 task.ti: f70b6000
Apr  2 09:35:17 beam kernel: [ 5549.754666] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 3
Apr  2 09:35:17 beam kernel: [ 5549.754673] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.754677] EAX: ffffffed EBX: 27ad2caf ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754680] ESI: 00000000 EDI: 00000003 EBP: f70b7f50 ESP: f70b7f50
Apr  2 09:35:17 beam kernel: [ 5549.754683]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754686] CR0: 8005003b CR2: b496a000 CR3: 26583000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754689] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754691]  f70b7f64 c0417cbc 27ad2caf 00000000 f70b6000 f70b7f6c c04183e6 f70b7f90
Apr  2 09:35:17 beam kernel: [ 5549.754701]  c04a0ee9 f70b7f88 c043bfdb 2874cfbe 27ad2caf e21fcd79 00000000 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754709]  f70b7fb4 c043a6ca 4fece81f b07d9ec7 5f9feb03 284e9064 3ec34a08 e21fcd79
Apr  2 09:35:17 beam kernel: [ 5549.754717] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.754725]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.754731]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.754737]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.754743]  [<c043bfdb>] ? setup_APIC_timer+0xab/0x130
Apr  2 09:35:17 beam kernel: [ 5549.754748]  [<c043a6ca>] start_secondary+0x20a/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.754751] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.754795] NMI backtrace for cpu 1
Apr  2 09:35:17 beam kernel: [ 5549.754801] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754804] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754807] task: f705ad00 ti: f70b2000 task.ti: f70b2000
Apr  2 09:35:17 beam kernel: [ 5549.754811] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 1
Apr  2 09:35:17 beam kernel: [ 5549.754815] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.754818] EAX: ffffffed EBX: fe901306 ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754821] ESI: 00000000 EDI: 00000001 EBP: f70b3f50 ESP: f70b3f50
Apr  2 09:35:17 beam kernel: [ 5549.754824]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754828] CR0: 8005003b CR2: b7312000 CR3: 267ae000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754830] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754832]  f70b3f64 c0417cbc fe901306 00000000 f70b2000 f70b3f6c c04183e6 f70b3f90
Apr  2 09:35:17 beam kernel: [ 5549.754840]  c04a0ee9 f70b3f88 c043bfdb e5cfb025 fe901306 a38e2d61 00000000 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754848]  f70b3fb4 c043a6ca c145ad8a bd81511b 6d9d931e 1bd278dc 97e53bc6 a38e2d61
Apr  2 09:35:17 beam kernel: [ 5549.754856] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.754862]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.754867]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.754871]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.754877]  [<c043bfdb>] ? setup_APIC_timer+0xab/0x130
Apr  2 09:35:17 beam kernel: [ 5549.754881]  [<c043a6ca>] start_secondary+0x20a/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.754884] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.754931] NMI backtrace for cpu 7
Apr  2 09:35:17 beam kernel: [ 5549.754941] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.754944] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.754948] task: f705c800 ti: f70be000 task.ti: f70be000
Apr  2 09:35:17 beam kernel: [ 5549.754952] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 7
Apr  2 09:35:17 beam kernel: [ 5549.754959] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.754963] EAX: ffffffed EBX: 0f13f92d ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754966] ESI: 00000000 EDI: 00000007 EBP: f70bff50 ESP: f70bff50
Apr  2 09:35:17 beam kernel: [ 5549.754969]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.754972] CR0: 8005003b CR2: b50a7000 CR3: 2cc3a000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.754975] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.754977]  f70bff64 c0417cbc 0f13f92d 00000000 f70be000 f70bff6c c04183e6 f70bff90
Apr  2 09:35:17 beam kernel: [ 5549.754987]  c04a0ee9 f70bff88 c043bfdb 49aa38fe 0f13f92d a3b7e692 00000000 00000000
Apr  2 09:35:17 beam kernel: [ 5549.754995]  f70bffb4 c043a6ca 65d35ad1 f89db0ba 6f0e2a73 c78cfe1f 342ab19c a3b7e692
Apr  2 09:35:17 beam kernel: [ 5549.755003] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c043bfdb>] ? setup_APIC_timer+0xab/0x130
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c043a6ca>] start_secondary+0x20a/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.755003] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.755003] NMI backtrace for cpu 5
Apr  2 09:35:17 beam kernel: [ 5549.755003] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 1.211 msecs
Apr  2 09:35:17 beam kernel: [ 5549.755003] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.13.7-100.fc19.i686.PAE #1
Apr  2 09:35:17 beam kernel: [ 5549.755003] Hardware name: Supermicro X7DA8/X7DA8, BIOS 2.1a 01/06/2009
Apr  2 09:35:17 beam kernel: [ 5549.755003] task: f705bf00 ti: f70ba000 task.ti: f70ba000
Apr  2 09:35:17 beam kernel: [ 5549.755003] EIP: 0060:[<c0447865>] EFLAGS: 00200286 CPU: 5
Apr  2 09:35:17 beam kernel: [ 5549.755003] EIP is at native_safe_halt+0x5/0x10
Apr  2 09:35:17 beam kernel: [ 5549.755003] EAX: ffffffed EBX: 5ec57104 ECX: 00000000 EDX: 00000000
Apr  2 09:35:17 beam kernel: [ 5549.755003] ESI: 00000000 EDI: 00000005 EBP: f70bbf50 ESP: f70bbf50
Apr  2 09:35:17 beam kernel: [ 5549.755003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  2 09:35:17 beam kernel: [ 5549.755003] CR0: 8005003b CR2: b50a7000 CR3: 2ad09000 CR4: 000007f0
Apr  2 09:35:17 beam kernel: [ 5549.755003] Stack:
Apr  2 09:35:17 beam kernel: [ 5549.755003]  f70bbf64 c0417cbc 5ec57104 00000000 f70ba000 f70bbf6c c04183e6 f70bbf90
Apr  2 09:35:17 beam kernel: [ 5549.755003]  c04a0ee9 f70bbf88 c043bfdb 41fb9fe2 5ec57104 f3634b4d 00000000 00000000
Apr  2 09:35:17 beam kernel: [ 5549.755003]  f70bbfb4 c043a6ca 6fa0f530 7f7988b6 a1948175 7787977b f48909c3 f3634b4d
Apr  2 09:35:17 beam kernel: [ 5549.755003] Call Trace:
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c0417cbc>] default_idle+0x1c/0xa0
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c04183e6>] arch_cpu_idle+0x26/0x30
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c04a0ee9>] cpu_startup_entry+0x1c9/0x210
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c043bfdb>] ? setup_APIC_timer+0xab/0x130
Apr  2 09:35:17 beam kernel: [ 5549.755003]  [<c043a6ca>] start_secondary+0x20a/0x2d0
Apr  2 09:35:17 beam kernel: [ 5549.755003] Code: bc 27 00 00 00 00 55 89 e5 fa 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 fb f4 <5d> c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 f4 5d c3 8d 76 00 8d
Apr  2 09:35:17 beam kernel: [ 5549.755003] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 1.352 msecs

Comment 10 Jeff Layton 2014-04-02 15:02:08 UTC

My suspicion is that __break_lease is getting stuck continually doing a "goto retry;". 

Looking at this code in time_out_leases():

                if (past_time(fl->fl_break_time))
                        lease_modify(before, F_UNLCK);

...and past_time uses time_after() macro. Should that be using time_after_eq? It seems like if the fl_break_time == jiffies then we should go ahead and break the lease, right?

That doesn't explain the softlockups, but it doesn't quite look right to me...

Comment 11 Jeff Layton 2014-04-03 16:44:52 UTC

Pulled down the kernel that spewed the backtraces in comment #9:


0xc059cc80 is in __break_lease (fs/locks.c:1390).
1385		spin_unlock(&inode->i_lock);
1386		error = wait_event_interruptible_timeout(new_fl->fl_wait,
1387							!new_fl->fl_next, break_time);
1388		spin_lock(&inode->i_lock);
1389		locks_delete_block(new_fl);
1390		if (error >= 0) {
1391			if (error == 0)
1392				time_out_leases(inode);
1393			/*
1394			 * Wait for the next conflicting lease that has not been

(gdb) list *(__break_lease+0x1ca)
0xc059cbea is in __break_lease (fs/locks.c:603).
598	/* Must be called with i_lock held. */
599	static void locks_insert_block(struct file_lock *blocker,
600						struct file_lock *waiter)
601	{
602		spin_lock(&blocked_lock_lock);
603		__locks_insert_block(blocker, waiter);
604		spin_unlock(&blocked_lock_lock);
605	}
606	
607	/*

(gdb) list *(__break_lease+0x1e0)
0xc059cc00 is in __break_lease (fs/locks.c:1386).
1381			if (break_time == 0)
1382				break_time++;
1383		}
1384		locks_insert_block(flock, new_fl);
1385		spin_unlock(&inode->i_lock);
1386		error = wait_event_interruptible_timeout(new_fl->fl_wait,
1387							!new_fl->fl_next, break_time);
1388		spin_lock(&inode->i_lock);
1389		locks_delete_block(new_fl);
1390		if (error >= 0) {

...so it does seem somewhat likely that we're continually cycling in __break_lease via the "goto retry;" loop. How that occurs, I'm not sure.

One hypothesis: perhaps this is a livelock where some caller is continually getting a delegation, we try to break it. Then by the time we get around to rescanning the i_lock list again, another lease has appeared?

Comment 12 Jeff Layton 2014-04-04 12:22:40 UTC

Thanks for the reproducer. I was able to trigger the problem with that. FWIW, the yum command on the client is unnecessary, just a job that reads all of the files is sufficient.

After discussing this with Bruce, I have a possible approach to fix it, but it means adding a field to the generic inode structure. We'll have to see if that's acceptable. If not, then maybe we can stick it in the union with i_pipe field or something.

Comment 13 Jeff Layton 2014-04-04 12:25:06 UTC

Here's the stacktrace from the one I reproduced. I'm fairly sure the problem is as we suspect. The server is just too aggressive about giving out delegations. If we've had one broken recently, we should back off on giving out others on that inode for a bit.

[76822.564007] INFO: rcu_sched self-detected stall on CPU { 1}  (t=65000 jiffies g=67021 c=67020 q=15460)
[76822.564007] sending NMI to all CPUs:
[76822.564007] NMI backtrace for cpu 1
[76822.564007] CPU: 1 PID: 6301 Comm: genpkgmetadata. Not tainted 3.15.0-0.rc0.git2.1.fc21.x86_64 #1
[76822.564007] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[76822.564007] task: ffff8801181f1ae0 ti: ffff880098698000 task.ti: ffff880098698000
[76822.564007] RIP: 0010:[<ffffffff81057724>]  [<ffffffff81057724>] flat_send_IPI_all+0x94/0xc0
[76822.564007] RSP: 0018:ffff88011b003dd0  EFLAGS: 00010006
[76822.564007] RAX: 0000000000000000 RBX: 0000000000000046 RCX: 000000000f000000
[76822.564007] RDX: 0000000000000c00 RSI: ffffffff81c28b80 RDI: 0000000000000300
[76822.564007] RBP: ffff88011b003de8 R08: 0000000000000001 R09: 0000000000000001
[76822.564007] R10: 0000000000000000 R11: ffff88011b003b0e R12: 0000000000000800
[76822.564007] R13: 000000000000000f R14: ffffffff81c7c980 R15: 0000000000000001
[76822.564007] FS:  00007f3d5b7f1740(0000) GS:ffff88011b000000(0000) knlGS:0000000000000000
[76822.564007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76822.564007] CR2: 00007f3d5b792210 CR3: 00000000a3d5d000 CR4: 00000000000006e0
[76822.564007] Stack:
[76822.564007]  0000000000002710 ffffffff81c7c980 ffffffff81d864a0 ffff88011b003e00
[76822.564007]  ffffffff81052424 ffff88011b1cede0 ffff88011b003e58 ffffffff8111a4d4
[76822.564007]  ffff88011b003e58 ffffffff810de7bc 0000000000000046 0000000000003c64
[76822.564007] Call Trace:
[76822.564007]  <IRQ> 

[76822.564007]  [<ffffffff81052424>] arch_trigger_all_cpu_backtrace+0x64/0xa0
[76822.564007]  [<ffffffff8111a4d4>] rcu_check_callbacks+0x584/0x850
[76822.564007]  [<ffffffff810de7bc>] ? account_process_tick+0x11c/0x1b0
[76822.564007]  [<ffffffff810a89a7>] update_process_times+0x47/0x70
[76822.564007]  [<ffffffff81126855>] tick_sched_handle.isra.19+0x25/0x60
[76822.564007]  [<ffffffff81127061>] tick_sched_timer+0x41/0x60
[76822.564007]  [<ffffffff810c85b6>] __run_hrtimer+0x86/0x440
[76822.564007]  [<ffffffff81127020>] ? tick_sched_do_timer+0x40/0x40
[76822.564007]  [<ffffffff810c9147>] hrtimer_interrupt+0x107/0x260
[76822.564007]  [<ffffffff81050337>] local_apic_timer_interrupt+0x37/0x60
[76822.564007]  [<ffffffff817e9d6f>] smp_apic_timer_interrupt+0x3f/0x60
[76822.564007]  [<ffffffff817e86b2>] apic_timer_interrupt+0x72/0x80
[76822.564007]  <EOI> 

[76822.564007]  [<ffffffff810fca1a>] ? lock_acquire+0xba/0x1d0
[76822.564007]  [<ffffffff81291249>] ? locks_delete_block+0x19/0x30
[76822.564007]  [<ffffffff817dcace>] _raw_spin_lock+0x3e/0x80
[76822.564007]  [<ffffffff81291249>] ? locks_delete_block+0x19/0x30
[76822.564007]  [<ffffffff81291249>] locks_delete_block+0x19/0x30
[76822.564007]  [<ffffffff81292978>] __break_lease+0x378/0x400
[76822.564007]  [<ffffffff810fa18d>] ? trace_hardirqs_on+0xd/0x10
[76822.564007]  [<ffffffff812448fa>] do_unlinkat+0x1da/0x2c0
[76822.564007]  [<ffffffff8111cfc9>] ? current_kernel_time+0x69/0xd0
[76822.564007]  [<ffffffff810fa0b5>] ? trace_hardirqs_on_caller+0x105/0x1d0
[76822.564007]  [<ffffffff810fa18d>] ? trace_hardirqs_on+0xd/0x10
[76822.564007]  [<ffffffff81248196>] SyS_unlink+0x16/0x20
[76822.564007]  [<ffffffff817e79a9>] system_call_fastpath+0x16/0x1b
[76822.564007] Code: 48 8b 35 f0 c0 d2 00 44 8b 66 3c ff 96 50 01 00 00 44 89 e9 c1 e1 18 89 0c 25 10 a3 57 ff 44 89 e2 80 ce 04 89 14 25 00 a3 57 ff <f6> c7 02 74 16 e8 52 2a 0a 00 48 89 df 57 9d 0f 1f 44 00 00 5b

Comment 14 Jeff Layton 2014-04-04 17:17:39 UTC

Created attachment 882803 [details]
patch -- locks: add new i_last_lease_break field to struct inode

I've been experimenting with this patch today. It does seem to help some -- once the initial round of leases is cleared you stop seeing the softlockups.

It doesn't seem to be sufficient though -- that initial round of lease breaks seems to take a very long time, and you still see:

    [  452.044004] BUG: soft lockup - CPU#0 stuck for 22s! [genpkgmetadata.:1618]


...so, I need to get to the bottom of why that is. The delegation recalls shouldn't take so long. I generally only see 4-5 in /proc/locks, but it can take many seconds for them to be cleared.

Also, I think looking at the last_lease_break time isn't sufficient. If there hasn't been a delegation set on it, but there were recently active writers then you probably still don't want to give out a delegation either. We should probably be setting that value whenever we would have broken the lease, even if there isn't one presently.

Comment 15 Jeff Layton 2014-04-04 18:06:07 UTC

Looking at network traces, the callbacks get replies very quickly, so it's not a case of the client taking too long to respond.

One possibility is that the server is taking too long to process the CB_RECALL reply?

Comment 16 Jeff Layton 2014-04-04 18:54:09 UTC

No, I had it wrong -- the CB_RECALL replies come in quickly, but the client is taking a long time to issue a DELEGRETURN. Very odd...

Comment 17 Jeff Layton 2014-04-07 15:02:10 UTC

For some strange reason I'm unable to reproduce the softlockup today. I'm not sure what changed. Still, I can see that the client is, in some cases, taking excessively long to return the delegation. That's at least part of the problem I suspect.

There are some tracepoints in the delegreturn code that might be helpful. For my own notes:

trace_nfs4_recall_delegation
trace_nfs4_delegreturn
trace_nfs4_delegreturn_exit

Comment 18 Terry Barnaby 2014-04-07 15:57:02 UTC

Glad to sea you have recreated the fault and are working on it, thanks.

Haven't seen the issue since running:
echo 0 >/proc/sys/fs/leases-enable

If I can test anything here let me know.

Comment 19 Jeff Layton 2014-04-08 00:53:01 UTC

Ok, I've figured out part of the problem on the client. The 4.0 backchannel code seems to be broken, in particular if:

- you have the client and server set up for krb5
- you mount a non-canonical name for the server

The problem seems to be in check_gss_callback_principal, which does this:

        /* Expect a GSS_C_NT_HOSTBASED_NAME like "nfs@serverhostname" */

        if (memcmp(p, "nfs@", 4) != 0)
                return 0;
        p += 4;
        if (strcmp(p, clp->cl_hostname) != 0)
                return 0;

...but the principal and the clp->cl_hostname are not guaranteed to be the same if you don't mount the exact hostname that's in the SPN. I'm not certain what the fix is for this.

Terry, do you have secure NFS set up at all? It's probably still possible to hit the softlockups even with AUTH_SYS on the backchannel, but this does seem to help exacerbate the problem.

Comment 20 Terry Barnaby 2014-04-08 07:32:25 UTC

Not using any form of secure NFS. We use a basic NFS setup although we do use "bind" on the server/workstation. On the one site we have one main server and about 6 clients all but one running Fedora 19 (One MSWIndows system) and all
generally lightly loaded on a lightly loaded Gigabit local network.

Server fstab portion:
UUID=848bc851-ea5a-4f30-b3c5-0e6fc8a8e6cd /data                   ext4    defaults        1 2
UUID=34d22e5f-402c-429d-b132-2a4ac6e22079 /data1                  ext4    defaults        1 2

/data/home		/home			auto	bind		0 0
/data/beam		/usr/beam		auto	bind		0 0
/data/src		/src			auto	bind		0 0
/data1/opt		/opt			auto	bind		0 0
/data1/dist		/dist			auto	bind		0 0
/data1/scratch		/scratch		auto	bind		0 0

Client fstab portions:
beam.beamnet:/home		/home		nfs	exec,suid,async,rw	0 0
beam.beamnet:/usr/beam		/usr/beam	nfs	exec,suid,async,rw	0 0
beam.beamnet:/src		/src		nfs	exec,suid,async,rw	0 0
beam.beamnet:/scratch		/scratch	nfs	exec,suid,async,rw	0 0
beam.beamnet:/dist		/dist		nfs	exec,suid,async,rw	0 0
beam.beamnet:/var/cache/yum	/var/cache/yum	nfs	exec,suid,async,rw	0 0
beam.beamnet:/opt		/opt		nfs	exec,suid,async,rw	0 0

Comment 21 Jeff Layton 2014-04-14 19:15:49 UTC

Ok, I've made some progress. The issue seems to be that we're making this call:

        error = wait_event_interruptible_timeout(new_fl->fl_wait,
                                                !new_fl->fl_next, break_time);

...with a break_time of 0. That prevents the task from ever going to sleep, and
that causes it to spin in a tight loop. The logic that sets the break_time is pretty convoluted, so I'm still working out the best way to fix it.

Comment 22 Jeff Layton 2014-04-14 20:16:39 UTC

Ah-ha! In nfsd_break_deleg_cb:

        /*
         * We don't want the locks code to timeout the lease for us;
         * we'll remove it ourself if a delegation isn't returned
         * in time:
         */
        fl->fl_break_time = 0;

...so that line is the culprit.

Unfortunately, it was added there to plug another race, so before we can remove it we'll have to see if we need to take steps to mitigate this in other ways. The commit that added it was this one:

commit 0272e1fd9f4fa8a43357c168e081744f99e67195
Author: J. Bruce Fields <bfields.edu>
Date:   Wed Sep 12 18:56:12 2007 -0400

    knfsd: let nfsd manage timing out its own leases

..it's probably obsolete now, but we'll need to look carefully.

Comment 23 Jeff Layton 2014-04-15 12:45:46 UTC

Ok, I think the right thing to do for this is to just ensure that we don't pass
a timeout of 0 to wait_event_interruptible. Patch sent for that upstream. I'll merge it into my linux-next branch today and send it on to Linus next week unless someone objects.

Comment 24 Jeff Layton 2014-04-16 15:26:05 UTC

Terry, I have a test kernel building with this patch now:

    http://koji.fedoraproject.org/koji/taskinfo?taskID=6746470

...would you be able to test it out and see if it prevents the softlockups? Please don't forget to reenable leases after you install it. Disabling them shouldn't be required with this patched kernel.

Comment 25 Terry Barnaby 2014-04-17 07:38:43 UTC

I am on holiday so will try this next week. Thanks.

Comment 26 Terry Barnaby 2014-04-26 04:26:20 UTC

Have been testing this kernel for 4 days now with both test examples and the original MSWindows smb shares. There has been no problems so I consider the fix you have applied works.

Many thanks, any idea when this may make it into the Fedora release kernels ?

Comment 27 Jeff Layton 2014-04-28 12:05:43 UTC

Not sure yet. Linus pulled this in over the weekend so it's in mainline now. It's marked for stable kernels, so F19 should get it within the next few weeks?

Comment 28 Terry Barnaby 2014-04-28 12:11:05 UTC

Many thanks and thanks for your work and excellent communications on what has been happening. If all of the bugs I have reported to various bugs lists had such an excellent response.
Thanks again.

Comment 29 George Joseph 2014-05-05 00:14:09 UTC

Just piling on here...

I started seeing this issue within a few minutes of a reboot after doing a yum update on a F20 system yesterday.  

"BUG: soft lockup - CPU#5 stuck for 23s! [10.0.0.254-mana:3582]" where 10.0.0.254 is the address of a NFSv4 server.  

Funny thing is I was already at 3.13.10-200.fc20.x86_64 and did not see the problem.  After yum installed 3.14.2-200.fc20.x86_64 (and a bunch of other things) I got the issue but rebooting back to 3.13.10-200.fc20.x86_64 didn't fix it.  Something other than the kernel update must have triggered the issue but I don't see anything in the update package list that would have obviously triggered it.

Anyway, turning off "leases-enable" does fix the soft lockup so I'll keep an eye out for the patch.  Any info about where to find it?  linux.git commit id perhaps?

Thanks.

Comment 30 Jeff Layton 2014-05-05 00:34:15 UTC

The commit id is f1c6bb2cb8b81013e8979806f8e15e3d53efb96d. This is a long-standing bug, so the patch is applicable to a wide range of kernels, but there were some changes merged into 3.13 that might make this occur more frequently so it's not surprising that we have more reports of this.

Comment 31 George Joseph 2014-05-05 01:52:21 UTC

Thanks Jeff!

Applied the patch to 3.14.2 and rebooted with fs.leases-enable=1 and not seeing any issues.

Comment 32 bjsjunk1 2014-05-05 02:31:16 UTC

Im a bit of a newbie, how do I apply this patch?  I get this error on my fedora 19 box when copying files from a windows client and was hoping this fixes it.

Thanks

Comment 33 Josh Boyer 2014-05-06 12:24:58 UTC

I'll backport the patch today.  The upstream stable maintainer is behind and it will be a release or two before it hits 3.14.y stable (3.13.y is EOL already).

Comment 34 Josh Boyer 2014-05-06 12:29:09 UTC

Patch applied in Fedora git for F20.  F19 should pick it up when it rebases to 3.14.y later this week.

Comment 35 Jeff Layton 2014-05-06 12:35:04 UTC

Thanks Josh. FWIW, this patch should apply just fine to even very old kernels if you want to pull it in for v3.13 as well. This bug was there for a long time...

Comment 36 Josh Boyer 2014-05-06 12:43:39 UTC

I would, but we aren't doing another build of 3.13.y in F19.  It's being rebased today I think, so applying it before the rebase wouldn't result in anything productive.  Thanks much for working through the issue.

Comment 37 Fedora Update System 2014-05-07 20:52:59 UTC

kernel-3.14.3-200.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.14.3-200.fc20

Comment 38 Fedora Update System 2014-05-08 10:11:05 UTC

Package kernel-3.14.3-200.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.14.3-200.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-6122/kernel-3.14.3-200.fc20
then log in and leave karma (feedback).

Comment 39 Fedora Update System 2014-05-10 03:19:38 UTC

kernel-3.14.3-200.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 40 Robby 2014-05-10 07:55:01 UTC

kernel-3.14.3-200.fc20 does not fix the issue for me, I'm having issues booting Fedora Xen DomUs with that kernel version:
BUG: soft lockup - CPU#1 stuck for 22s! [systemd-udevd:157]
INFO: rcu_sched self-detected stall on CPU { 1}

This happens right after grub attempts to load the kernel. However, the Dom0 itself boots fine with it.

kernel-3.13.10-200.fc20.x86_64 is the last version that boots up in the DomUs.

Comment 41 Terry Barnaby 2014-05-12 06:38:22 UTC

Looks like this patch did not make it into F19's next kernel kernel-3.14.3-100.fc19.
Any idea when it will be available for F19 ?

Comment 42 Jeff Layton 2014-05-12 12:35:30 UTC

Yeah, reopening bug. I'm guessing that Josh didn't notice that this was an F19 bug. This is fixed in F20, but I don't think F19 has the patch yet.

Comment 43 Jeff Layton 2014-05-12 12:36:51 UTC

(In reply to Robby from comment #40)
> kernel-3.14.3-200.fc20 does not fix the issue for me, I'm having issues
> booting Fedora Xen DomUs with that kernel version:
> BUG: soft lockup - CPU#1 stuck for 22s! [systemd-udevd:157]
> INFO: rcu_sched self-detected stall on CPU { 1}
> 
> This happens right after grub attempts to load the kernel. However, the Dom0
> itself boots fine with it.
> 
> kernel-3.13.10-200.fc20.x86_64 is the last version that boots up in the
> DomUs.

Do you have the stack trace? I suspect that this is unrelated to this bug (which is rather specific to the lease handling code).

Comment 44 Josh Boyer 2014-05-12 12:57:36 UTC

(In reply to Jeff Layton from comment #42)
> Yeah, reopening bug. I'm guessing that Josh didn't notice that this was an
> F19 bug. This is fixed in F20, but I don't think F19 has the patch yet.

It's there in git and in the 3.14.3 rebase for F19 in updates-testing.

Comment 45 Josh Boyer 2014-05-12 13:01:23 UTC

(In reply to Terry Barnaby from comment #41)
> Looks like this patch did not make it into F19's next kernel
> kernel-3.14.3-100.fc19.
> Any idea when it will be available for F19 ?

Do you say that because you tested the f19 3.14.3 rebase and you still hit the issue?  The patch is included:

#rhbz 1082586
Patch25075: locks-allow-__break_lease-to-sleep-even-when-break_t.patch

#rhbz 1082586
ApplyPatch locks-allow-__break_lease-to-sleep-even-when-break_t.patch

and went in with Fedora commit:

commit 511f256d9fb8703f883adfe92f4a0d46289a14b8
Author: Justin M. Forbes <jforbes>
Date:   Thu May 8 07:24:27 2014 -0500

    Linux v3.14.3

as I said it would in comment #34.

Comment 46 Terry Barnaby 2014-05-12 13:59:52 UTC

No I haven't tested it. I looked at the info for kernel-3.14.3-100.fc19 at:
https://admin.fedoraproject.org/updates/FEDORA-2014-6167/kernel-3.14.3-100.fc19?_csrf_token=4cd1047c552a95b179371477782e0c98f5f2225a

But there was no mention in the Bugs Fixed section or the change log on this bug.
The same info for kernel-3.14.3-200.fc20 (mentioned above) does have info on the
bug in the Bugs Fixed section and the change log.

So I assumed that the fix has not been included in kernel-3.14.3-100.fc19 which is updates-testing as yet.

Comment 47 Josh Boyer 2014-05-12 14:14:46 UTC

OK, well the patch is included.  Please test the f19 3.14.3 to make sure it fixes your issue.

Comment 48 Jeff Layton 2014-05-12 14:16:49 UTC

My mistake then. Let's reclose this bug since the patch is apparently in, and just reopen it if it turns out not to fix Terry's issue for some reason.

Comment 49 Fedora Update System 2014-05-13 12:16:29 UTC

kernel-3.14.3-100.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/FEDORA-2014-6167/kernel-3.14.3-100.fc19

Comment 50 Fedora Update System 2014-05-13 22:11:48 UTC

kernel-3.14.4-100.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.14.4-100.fc19

Comment 51 Terry Barnaby 2014-05-18 08:20:31 UTC

This does fix the "stuck for 22s!" problem.
However I have found a problem with this kernel that "may" be related to the bug fix.

Scenario: Two Fedora19 systems both running kernel-PAE-3.14.3-100.fc19.i686

The client has mounted directories using NFSv4 from the server.
There is a directory "dir1" with a few files in it (5 say).

On the client run: cp -a dir1 dir2
On the client run: cd dir2
On the client run: cp filea filea1

The copy locks up. ps -alx gives:
0  1002  2978  1633  20   0   5124   576 nfs4_h D+   pts/4      0:00 cp kingnet.conf kingnet1.conf

While the cp is going on, on the client run: ls
The ls locks up. ps -alx gives
0  1002  3054  2981  20   0   5136   588 iterat D+   pts/11     0:00 ls

While the cp is going on, on the server: run: cp fileb fileb1
The copy returns the error:
cp: cannot create regular file ‘beamnet1.conf’: Cannot allocate memory

There are no messages in /var/log/messages on either system.
There is plenty of memory available on both systems (> 3G free).
There is plenty of disk space available.

Rebooting the client has no effect (cp still locks up).
Restarting nfs-server on the server has no effect.
A reboot of the server clears the problem.

Instantly repeatable.
A "cp -r" does not have this problem, so something todo with file metadata.

Comment 52 Fedora Update System 2014-05-21 23:21:21 UTC

kernel-3.14.4-100.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 53 Terry Barnaby 2014-05-24 04:52:15 UTC

The problem in comment 51 is still in kernel-3.14.4-100.fc19.

Comment 54 Terry Barnaby 2014-05-24 05:33:01 UTC

Seems to still be present with: "/proc/sys/fs/leases-enable = 0" so assume it is
not related.
Opened bug 1100941 for this one: https://bugzilla.redhat.com/show_bug.cgi?id=1100941

Note You need to log in before you can comment on or make changes to this bug.