Bug 1095436 - May 7 01:32:30 r1epi kernel: BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:673] [NEEDINFO]
Summary: May 7 01:32:30 r1epi kernel: BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:673]
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: J. Bruce Fields
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-07 16:54 UTC by g. artim
Modified: 2015-06-30 01:00 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-30 01:00:50 UTC
Type: Bug
Embargoed:
kernel-team: needinfo?


Attachments (Terms of Use)
mini dump from messages. (12.66 KB, text/plain)
2014-05-07 16:54 UTC, g. artim
no flags Details
pattern of dumps before lockup or crash. (26.50 KB, text/plain)
2014-05-12 16:15 UTC, g. artim
no flags Details
all the mini-Dumps before the hang or freeze. (1.69 MB, text/plain)
2014-05-12 16:16 UTC, g. artim
no flags Details

Description g. artim 2014-05-07 16:54:14 UTC
Created attachment 893391 [details]
mini dump from messages.

Description of problem:

system lockup on high load of nfs server.


Version-Release number of selected component (if applicable):

kernel-3.14.2-200.fc20.x86_64


How reproducible:

rsync'ing 19tb raid to 19tb raid.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

no crash, i switched back to kernel 
kernel-3.13.9-200.fc20.x86_64
seemed more stable, no crashes, no runs to the dataceter,
hope this helps.


Additional info:

Comment 1 g. artim 2014-05-09 17:19:16 UTC
The last 2 nights where heavy loads on this server, when rsync and batch processes are big and

kernel-3.13.9-200.fc20.x86_64

did _not_ lockup. Food for git diff. g.

Comment 2 J. Bruce Fields 2014-05-09 22:06:39 UTC
The first warning is in d_obtain_alias called as part of filehandle lookup.

The other two are in shrink_dentry_list->__d_drop.

git log v3.13.9..v3.14.2 fs/dcache.c doesn't turn up anything suspicious.

I don't see any interesting changes to filehandle lookup code either.

On a quick look I'm stumped.  Further warnings might be interesting, or experiments with kernels in between those two.

Comment 3 g. artim 2014-05-12 16:15:14 UTC
Created attachment 894805 [details]
pattern of dumps before lockup or crash.

just a 'grep -a BUG message.log' can see BUG on NFS, rsync (what was running), swap kernel process.

Comment 4 g. artim 2014-05-12 16:16:24 UTC
Created attachment 894806 [details]
all the mini-Dumps before the hang or freeze.

ALL the mini-DUMPs...fyi. g.)

Comment 5 g. artim 2014-05-12 16:18:32 UTC
interesting its only happening on 6 cpu's, its an i7, but maybe just scheduled to just 6 or the other 2 its not happening on? (dont know my stuff on this thou).

Comment 6 J. Bruce Fields 2014-05-12 20:32:16 UTC
In theory I guess this could be another consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1082586.  Could you confirm whether it's still reproduceable on v3.14.3 ?

Comment 7 g. artim 2014-05-13 15:17:13 UTC
Yes I could...but its a prod server so I can only do it on the weekend when there is an opening and I'm not clear on what produces the result -- let me know if you have a nfs or rsync test tool to hammer the server (a little howto would help with this fried brain). )

Comment 8 g. artim 2014-05-31 21:40:45 UTC
I somehow did a yum update and switched back to 3.14.2-200 
and 4 days later it crashed again, see below info. So I've
since installed the 3.14.4-200 kernel and seeing what 
happens. If it fails I'll return to 3.13.9-200, seem to 
not have the problem. The below doc show the nfs options
I was using..I switch them all to the default for this 
test. btw, I did a poke to increase the RPCNFSDCOUNT like
so: echo 16 > /proc/fs/nfsd/threads
the afternoon before it hung, not clear if this mean much

    -- gary.


#------------------------------------------------------#
kernels installed 
#------------------------------------------------------#
vmlinuz-3.13.9-200.fc20.x86_64 (may not have the problem)
vmlinuz-3.14.2-200.fc20.x86_64 (crashed on)
vmlinuz-3.14.4-200.fc20.x86_64 (switched to)

#------------------------------------------------------#
in /etc/sysconfig/nfs:
#------------------------------------------------------#

#RQUOTAD="/usr/sbin/rpc.rquotad"
# Port rquotad should listen on.
#RQUOTAD_PORT=875
# Optinal options passed to rquotad
RPCRQUOTADOPTS=""
#
# Optional arguments passed to in-kernel lockd
#LOCKDARG=
# TCP port rpc.lockd should listen on.
#LOCKD_TCPPORT=32803
# UDP port rpc.lockd should listen on.
#LOCKD_UDPPORT=32769
LOCKD_UDPPORT=30001
LOCKD_TCPPORT=30001
#
# Optional arguments passed to rpc.nfsd. See rpc.nfsd(8)
RPCNFSDARGS=""
# Number of nfs server processes to be started.
# The default is 8. 
RPCNFSDCOUNT=8
# Set V4 grace period in seconds
#NFSD_V4_GRACE=90
#
# Optional arguments passed to rpc.mountd. See rpc.mountd(8)
RPCMOUNTDOPTS=""
#
# Optional arguments passed to rpc.statd. See rpc.statd(8)
STATDARG=""
#
# Optional arguments passed to rpc.idmapd. See rpc.idmapd(8)
RPCIDMAPDARGS=""
#
# Optional arguments passed to rpc.gssd. See rpc.gssd(8)
RPCGSSDARGS=""
#
# Optional arguments passed to rpc.svcgssd. See rpc.svcgssd(8)
RPCSVCGSSDARGS=""
#
# To enable RDMA support on the server by setting this to
# the port the server should listen on
#RDMA_PORT=20049 
#
# Optional arguments passed to blkmapd. See blkmapd(8)
BLKMAPDARGS=""

#------------------------------------------------------#
in /etc/rc.d/rc.local
#------------------------------------------------------#

echo Applying NFS performance options...
echo 16 > /proc/fs/nfsd/threads
echo "120" >  /sys/block/sdb/device/timeout 
echo "120" >  /sys/block/sdc/device/timeout 
echo 262144 > /proc/sys/net/core/rmem_max
echo 262144 > /proc/sys/net/core/rmem_default
echo 262144 > /proc/sys/net/core/wmem_max
echo 262144 > /proc/sys/net/core/wmem_default
echo 0 > /sys/block/sdb/queue/read_ahead_kb
echo noop > /sys/block/sdb/queue/scheduler

#------------------------------------------------------#
in /var/log/messages at hangup or crash:
#------------------------------------------------------#

May 31 01:01:01 r1epi systemd: Started Session 56 of user root.
May 31 01:36:47 r1epi kernel: BUG: soft lockup - CPU#2 stuck for 22s! [nfsd:22675]
May 31 01:36:47 r1epi kernel: Modules linked in: binfmt_misc sch_sfq xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv4 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support btrfs x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel raid6_pq eeepc_wmi asus_wmi btusb sparse_keymap mxm_wmi bluetooth xor ghash_clmulni_intel snd_hda_intel 6lowpan_iphc snd_hda_codec microcode rfkill snd_hwdep snd_pcm e1000 serio_raw e1000e lpc_ich i2c_i801 mfd_core snd_timer snd mei_me ptp soundcore mei pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper firewire_ohci drm firewire_core crc_itu_t aacraid i2c_core video
May 31 01:36:47 r1epi kernel: CPU: 2 PID: 22675 Comm: nfsd Not tainted 3.14.2-200.fc20.x86_64 #1
May 31 01:36:47 r1epi kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
May 31 01:36:47 r1epi kernel: task: ffff88000e595580 ti: ffff88008cd2a000 task.ti: ffff88008cd2a000
May 31 01:36:47 r1epi kernel: RIP: 0010:[<ffffffff81202607>]  [<ffffffff81202607>] d_obtain_alias+0x1b7/0x1d0
May 31 01:36:47 r1epi kernel: RSP: 0018:ffff88008cd2bb00  EFLAGS: 00000202
May 31 01:36:47 r1epi kernel: RAX: ffff88041623d000 RBX: ffffffff812039a3 RCX: ffff88022f9c9e49
May 31 01:36:47 r1epi kernel: RDX: ffff88041623d0b0 RSI: ffffffffa0916140 RDI: ffff88017de7eb98
May 31 01:36:47 r1epi kernel: RBP: ffff88008cd2bb18 R08: 0000000000017bf0 R09: ffffffff812021b5
May 31 01:36:47 r1epi kernel: R10: fd6265d59ea69203 R11: ffffea000fed4f00 R12: 0000000000000000
May 31 01:36:47 r1epi kernel: R13: ffff88027f9ac9a8 R14: ffffffff810d1cb5 R15: ffff88008cd2ba78
May 31 01:36:47 r1epi kernel: FS:  0000000000000000(0000) GS:ffff88042fa80000(0000) knlGS:0000000000000000
May 31 01:36:47 r1epi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 31 01:36:47 r1epi kernel: CR2: 00007f1369f2ab94 CR3: 0000000001c0c000 CR4: 00000000000407e0
May 31 01:36:47 r1epi kernel: Stack:
May 31 01:36:47 r1epi kernel: ffff88027f9ac9a8 ffffffffffffff8c 000000000004027d ffff88008cd2bb88
May 31 01:36:47 r1epi kernel: ffffffffa08d6375 00000000008f9f6b ffff880417b10000 000000000e595580
May 31 01:36:48 r1epi kernel: 6bff88000e595a90 0100000000008f9f 0000000000000000 00000000c08b1472
May 31 01:36:48 r1epi kernel: Call Trace:
May 31 01:36:48 r1epi kernel: [<ffffffffa08d6375>] btrfs_get_dentry+0x115/0x140 [btrfs]
May 31 01:36:48 r1epi kernel: [<ffffffffa01f9a40>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffffa08d6662>] btrfs_fh_to_dentry+0x32/0x60 [btrfs]
May 31 01:36:48 r1epi kernel: [<ffffffff812cc842>] exportfs_decode_fh+0x72/0x2e0
May 31 01:36:48 r1epi kernel: [<ffffffffa01ffa2b>] ? exp_find+0x10b/0x1c0 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffff810c2885>] ? sched_clock_cpu+0x85/0xc0
May 31 01:36:48 r1epi kernel: [<ffffffff811cc925>] ? kmem_cache_alloc+0x35/0x1f0
May 31 01:36:48 r1epi kernel: [<ffffffffa01fa796>] fh_verify+0x316/0x600 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffff810f0c50>] ? getboottime+0x30/0x40
May 31 01:36:48 r1epi kernel: [<ffffffffa01bbf7e>] ? cache_check+0x12e/0x380 [sunrpc]
May 31 01:36:48 r1epi kernel: [<ffffffffa0208ac9>] nfsd4_putfh+0x49/0x50 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffffa020ad1a>] nfsd4_proc_compound+0x56a/0x7b0 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffffa01f6dbb>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffffa01b1d00>] svc_process_common+0x480/0x6f0 [sunrpc]
May 31 01:36:48 r1epi kernel: [<ffffffffa01b2077>] svc_process+0x107/0x170 [sunrpc]
May 31 01:36:48 r1epi kernel: [<ffffffffa01f674f>] nfsd+0xbf/0x130 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffffa01f6690>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 31 01:36:48 r1epi kernel: [<ffffffff810ae211>] kthread+0xe1/0x100
May 31 01:36:48 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:48 r1epi kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0
May 31 01:36:48 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:48 r1epi kernel: Code: 00 66 41 83 45 58 01 66 83 83 88 00 00 00 01 48 89 de 4c 89 ef e8 ca 81 0e 00 4c 89 e8 e9 a1 fe ff ff f3 90 48 8b 88 b0 00 00 00 <80> e1 01 75 f2 e9 75 ff ff ff 48 89 f8 e9 86 fe ff ff 0f 0b 0f 
May 31 01:36:48 r1epi kernel: BUG: soft lockup - CPU#3 stuck for 22s! [kswapd0:82]
May 31 01:36:48 r1epi kernel: Modules linked in: binfmt_misc sch_sfq xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv4 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support btrfs x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel raid6_pq eeepc_wmi asus_wmi btusb sparse_keymap mxm_wmi bluetooth xor ghash_clmulni_intel snd_hda_intel 6lowpan_iphc snd_hda_codec microcode rfkill snd_hwdep snd_pcm e1000 serio_raw e1000e lpc_ich i2c_i801 mfd_core snd_timer snd mei_me ptp soundcore mei pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper firewire_ohci drm firewire_core crc_itu_t aacraid i2c_core video
May 31 01:36:48 r1epi kernel: CPU: 3 PID: 82 Comm: kswapd0 Not tainted 3.14.2-200.fc20.x86_64 #1
May 31 01:36:48 r1epi kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
May 31 01:36:48 r1epi kernel: task: ffff880417468000 ti: ffff880417ba4000 task.ti: ffff880417ba4000
May 31 01:36:48 r1epi kernel: RIP: 0010:[<ffffffff812003b5>]  [<ffffffff812003b5>] __d_drop+0x95/0xc0
May 31 01:36:48 r1epi kernel: RSP: 0018:ffff880417ba5b60  EFLAGS: 00000202
May 31 01:36:48 r1epi kernel: RAX: ffff88022f9c9e49 RBX: 0000000000000000 RCX: 0000000000060005
May 31 01:36:48 r1epi kernel: RDX: ffff88041623d0b0 RSI: 0000000000000000 RDI: ffff880106f19a80
May 31 01:36:48 r1epi kernel: RBP: ffff880417ba5b88 R08: ffff880106f19b00 R09: 000000018015000a
May 31 01:36:48 r1epi kernel: R10: ffffffff811ff55f R11: ffffea00065f45c0 R12: ffff88041865ba80
May 31 01:36:48 r1epi kernel: R13: 00000000000003f0 R14: 0000000000000032 R15: 0000000000000069
May 31 01:36:48 r1epi kernel: FS:  0000000000000000(0000) GS:ffff88042fac0000(0000) knlGS:0000000000000000
May 31 01:36:48 r1epi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 31 01:36:48 r1epi kernel: CR2: 00007fce0a1d2000 CR3: 0000000001c0c000 CR4: 00000000000407e0
May 31 01:36:48 r1epi kernel: Stack:
May 31 01:36:48 r1epi kernel: ffffffff812004f4 ffff880106f19b00 ffff880417ba5be0 ffff880106f19a80
May 31 01:36:48 r1epi kernel: ffff880106f19a80 ffff880417ba5bc8 ffffffff81200881 ffff880106f19b00
May 31 01:36:48 r1epi kernel: 00000000bbfef780 ffff880417ba5be0 0000000000000079 ffff88041623d000
May 31 01:36:48 r1epi kernel: Call Trace:
May 31 01:36:48 r1epi kernel: [<ffffffff812004f4>] ? dentry_kill+0xa4/0x210
May 31 01:36:48 r1epi kernel: [<ffffffff81200881>] shrink_dentry_list+0xa1/0x100
May 31 01:36:48 r1epi kernel: [<ffffffff81201f76>] prune_dcache_sb+0x56/0x80
May 31 01:36:48 r1epi kernel: [<ffffffff811ecfb7>] super_cache_scan+0xe7/0x160
May 31 01:36:48 r1epi kernel: [<ffffffff81185688>] shrink_slab_node+0x138/0x290
May 31 01:36:48 r1epi kernel: [<ffffffff811db34b>] ? mem_cgroup_iter+0x16b/0x2d0
May 31 01:36:48 r1epi kernel: [<ffffffff81187c6b>] shrink_slab+0x8b/0x170
May 31 01:36:48 r1epi kernel: [<ffffffff8118a7ca>] kswapd_shrink_zone+0x14a/0x1f0
May 31 01:36:48 r1epi kernel: [<ffffffff8118bc36>] kswapd+0x476/0x860
May 31 01:36:48 r1epi kernel: [<ffffffff8118b7c0>] ? mem_cgroup_shrink_node_zone+0x160/0x160
May 31 01:36:48 r1epi kernel: [<ffffffff810ae211>] kthread+0xe1/0x100
May 31 01:36:48 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:48 r1epi kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0
May 31 01:36:48 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:48 r1epi kernel: Code: 10 00 00 00 00 0f ba 32 00 8b 47 58 89 c2 c1 ea 10 66 39 d0 74 28 83 47 04 02 c3 0f 1f 00 f3 c3 66 0f 1f 44 00 00 f3 90 48 8b 02 <a8> 01 75 f7 eb a2 48 8b 47 68 48 8d 90 b0 00 00 00 eb 95 55 48 
May 31 01:36:51 r1epi kernel: BUG: soft lockup - CPU#4 stuck for 23s! [nfsd:676]
May 31 01:36:51 r1epi kernel: Modules linked in: binfmt_misc sch_sfq xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv4 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support btrfs x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel raid6_pq eeepc_wmi asus_wmi btusb sparse_keymap mxm_wmi bluetooth xor ghash_clmulni_intel snd_hda_intel 6lowpan_iphc snd_hda_codec microcode rfkill snd_hwdep snd_pcm e1000 serio_raw e1000e lpc_ich i2c_i801 mfd_core snd_timer snd mei_me ptp soundcore mei pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper firewire_ohci drm firewire_core crc_itu_t aacraid i2c_core video
May 31 01:36:51 r1epi kernel: CPU: 4 PID: 676 Comm: nfsd Not tainted 3.14.2-200.fc20.x86_64 #1
May 31 01:36:51 r1epi kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
May 31 01:36:51 r1epi kernel: task: ffff8803fd80b900 ti: ffff8800c05e2000 task.ti: ffff8800c05e2000
May 31 01:36:51 r1epi kernel: RIP: 0010:[<ffffffff81202607>]  [<ffffffff81202607>] d_obtain_alias+0x1b7/0x1d0
May 31 01:36:51 r1epi kernel: RSP: 0018:ffff8800c05e3b30  EFLAGS: 00000202
May 31 01:36:51 r1epi kernel: RAX: ffff88041623d000 RBX: ffffffff812039a3 RCX: ffff88022f9c9e49
May 31 01:36:51 r1epi kernel: RDX: ffff88041623d0b0 RSI: ffffffffa0916140 RDI: ffff88031f144f58
May 31 01:36:51 r1epi kernel: RBP: ffff8800c05e3b48 R08: 0000000000017bf0 R09: ffffffff812021b5
May 31 01:36:51 r1epi kernel: R10: fe6f22b26f5e9203 R11: ffffea0003027140 R12: 0000000000000000
May 31 01:36:51 r1epi kernel: R13: ffff880172dda9a8 R14: ffffffff810d1cb5 R15: ffff8800c05e3aa8
May 31 01:36:51 r1epi kernel: FS:  0000000000000000(0000) GS:ffff88042fb00000(0000) knlGS:0000000000000000
May 31 01:36:51 r1epi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 31 01:36:51 r1epi kernel: CR2: 00007f2ec792359c CR3: 0000000001c0c000 CR4: 00000000000407e0
May 31 01:36:51 r1epi kernel: Stack:
May 31 01:36:51 r1epi kernel: ffff880172dda9a8 ffffffffffffff8c 000000000005d12a ffff8800c05e3bb8
May 31 01:36:51 r1epi kernel: ffffffffa08d6375 0000000000a9f19e ffff880417b10000 0000000000000001
May 31 01:36:51 r1epi kernel: 9e00000000000000 010000000000a9f1 0000000000000000 00000000ae4e30da
May 31 01:36:51 r1epi kernel: Call Trace:
May 31 01:36:51 r1epi kernel: [<ffffffffa08d6375>] btrfs_get_dentry+0x115/0x140 [btrfs]
May 31 01:36:51 r1epi kernel: [<ffffffffa01f9a40>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffffa08d6662>] btrfs_fh_to_dentry+0x32/0x60 [btrfs]
May 31 01:36:51 r1epi kernel: [<ffffffff812cc842>] exportfs_decode_fh+0x72/0x2e0
May 31 01:36:51 r1epi kernel: [<ffffffffa01ffa2b>] ? exp_find+0x10b/0x1c0 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffff810c8c4e>] ? dequeue_task_fair+0x42e/0x640
May 31 01:36:51 r1epi kernel: [<ffffffff810c2885>] ? sched_clock_cpu+0x85/0xc0
May 31 01:36:51 r1epi kernel: [<ffffffff811cc925>] ? kmem_cache_alloc+0x35/0x1f0
May 31 01:36:51 r1epi kernel: [<ffffffff810b3606>] ? prepare_creds+0x26/0x1c0
May 31 01:36:51 r1epi kernel: [<ffffffffa01fa796>] fh_verify+0x316/0x600 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffffa0204c0c>] nfsd3_proc_getattr+0x7c/0x110 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffffa01f6dbb>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffffa01b1d00>] svc_process_common+0x480/0x6f0 [sunrpc]
May 31 01:36:51 r1epi kernel: [<ffffffffa01b2077>] svc_process+0x107/0x170 [sunrpc]
May 31 01:36:51 r1epi kernel: [<ffffffffa01f674f>] nfsd+0xbf/0x130 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffffa01f6690>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 31 01:36:51 r1epi kernel: [<ffffffff810ae211>] kthread+0xe1/0x100
May 31 01:36:51 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:51 r1epi kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0
May 31 01:36:51 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:51 r1epi kernel: Code: 00 66 41 83 45 58 01 66 83 83 88 00 00 00 01 48 89 de 4c 89 ef e8 ca 81 0e 00 4c 89 e8 e9 a1 fe ff ff f3 90 48 8b 88 b0 00 00 00 <80> e1 01 75 f2 e9 75 ff ff ff 48 89 f8 e:
9 86 fe ff ff 0f 0b 0f 
May 31 01:36:55 r1epi kernel: BUG: soft lockup - CPU#1 stuck for 22s! [nfsd:682]
May 31 01:36:55 r1epi kernel: BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:22673]
May 31 01:36:55 r1epi kernel: Modules linked in: binfmt_misc sch_sfq xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv4 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support btrfs x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel raid6_pq eeepc_wmi asus_wmi btusb sparse_keymap mxm_wmi bluetooth xor ghash_clmulni_intel snd_hda_intel 6lowpan_iphc snd_hda_codec microcode rfkill snd_hwdep snd_pcm e1000 serio_raw e1000e lpc_ich i2c_i801 mfd_core snd_timer snd mei_me ptp soundcore mei pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper firewire_ohci drm firewire_core crc_itu_t aacraid i2c_core video
May 31 01:36:55 r1epi kernel: CPU: 0 PID: 22673 Comm: nfsd Not tainted 3.14.2-200.fc20.x86_64 #1
May 31 01:36:55 r1epi kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
May 31 01:36:55 r1epi kernel: task: ffff88000e595f00 ti: ffff880003c18000 task.ti: ffff880003c18000
May 31 01:36:55 r1epi kernel: RIP: 0010:[<ffffffff81202607>]  [<ffffffff81202607>] d_obtain_alias+0x1b7/0x1d0
May 31 01:36:55 r1epi kernel: RSP: 0018:ffff880003c19a80  EFLAGS: 00000202
May 31 01:36:55 r1epi kernel: RAX: ffff88041623d000 RBX: ffffffff812039a3 RCX: ffff88022f9c9e49
May 31 01:36:55 r1epi kernel: RDX: ffff88041623d0b0 RSI: ffffffffa0916140 RDI: ffff880172c09a18
May 31 01:36:55 r1epi kernel: RBP: ffff880003c19a98 R08: 0000000000017bf0 R09: ffffffff812021b5
May 31 01:36:55 r1epi kernel: R10: fe6f2c3664fd9403 R11: ffffea001012cd00 R12: 0000000000000000
May 31 01:36:55 r1epi kernel: R13: ffff880172d425b0 R14: ffffffff810d1cb5 R15: ffff880003c199f8
May 31 01:36:55 r1epi kernel: FS:  0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
May 31 01:36:55 r1epi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 31 01:36:55 r1epi kernel: CR2: 00007f85fef93000 CR3: 0000000001c0c000 CR4: 00000000000407f0
May 31 01:36:55 r1epi kernel: Stack:
May 31 01:36:55 r1epi kernel: ffff880172d425b0 ffffffffffffff8c 000000000005d7a4 ffff880003c19b08
May 31 01:36:55 r1epi kernel: ffffffffa08d6375 0000000000aa6405 ffff880417b10000 00000000ff38d8cc
May 31 01:36:55 r1epi kernel: 05ff8803ff38d400 010000000000aa64 0000000000000000 000000003f519fc4
May 31 01:36:55 r1epi kernel: Call Trace:
May 31 01:36:55 r1epi kernel: [<ffffffffa08d6375>] btrfs_get_dentry+0x115/0x140 [btrfs]
May 31 01:36:55 r1epi kernel: [<ffffffffa01f9a40>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa08d6662>] btrfs_fh_to_dentry+0x32/0x60 [btrfs]
May 31 01:36:55 r1epi kernel: [<ffffffff812cc842>] exportfs_decode_fh+0x72/0x2e0
May 31 01:36:55 r1epi kernel: [<ffffffffa01ffa2b>] ? exp_find+0x10b/0x1c0 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffff811cc925>] ? kmem_cache_alloc+0x35/0x1f0
May 31 01:36:55 r1epi kernel: [<ffffffffa01fa796>] fh_verify+0x316/0x600 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01fbb50>] nfsd_open+0x40/0x1d0 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffff810a4f69>] ? try_to_grab_pending+0xa9/0x150
May 31 01:36:55 r1epi kernel: [<ffffffffa01fe59b>] nfsd_write+0xbb/0x110 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa0204af0>] nfsd3_proc_write+0xc0/0x160 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01f6dbb>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01b1d00>] svc_process_common+0x480/0x6f0 [sunrpc]
May 31 01:36:55 r1epi kernel: [<ffffffffa01b2077>] svc_process+0x107/0x170 [sunrpc]
May 31 01:36:55 r1epi kernel: Modules linked in: binfmt_misc
May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: [<ffffffffa01f674f>] nfsd+0xbf/0x130 [nfsd]
May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: sch_sfq
May 31 01:36:55 r1epi kernel: xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4
May 31 01:36:55 r1epi kernel: [<ffffffffa01f6690>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: [<ffffffff810ae211>] kthread+0xe1/0x100
May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40May 31 01:36:55 r1epi kernel: Code: 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 66 
May 31 01:36:55 r1epi kernel: 41 
May 31 01:36:55 r1epi kernel: 83 
May 31 01:36:55 r1epi kernel: 45 
May 31 01:36:55 r1epi kernel: 58 
May 31 01:36:55 r1epi kernel: 01 
May 31 01:36:55 r1epi kernel: 66 
May 31 01:36:55 r1epi kernel: 83 
May 31 01:36:55 r1epi kernel: 83 
May 31 01:36:55 r1epi kernel: 88 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 01 
May 31 01:36:55 r1epi kernel: 48 
May 31 01:36:55 r1epi kernel: 89 
May 31 01:36:55 r1epi kernel: de 
May 31 01:36:55 r1epi kernel: 4c 
May 31 01:36:55 r1epi kernel: 89 
May 31 01:36:55 r1epi kernel: ef 
May 31 01:36:55 r1epi kernel: e8 
May 31 01:36:55 r1epi kernel: ca 
May 31 01:36:55 r1epi kernel: 81 
May 31 01:36:55 r1epi kernel: 0e 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 4c 
May 31 01:36:55 r1epi kernel: 89 
May 31 01:36:55 r1epi kernel: e8 
May 31 01:36:55 r1epi kernel: e9 
May 31 01:36:55 r1epi kernel: a1 
May 31 01:36:55 r1epi kernel: fe 
May 31 01:36:55 r1epi kernel: ff 
May 31 01:36:55 r1epi kernel: ff 
May 31 01:36:55 r1epi kernel: f3 
May 31 01:36:55 r1epi kernel: 90 
May 31 01:36:55 r1epi kernel: 48 
May 31 01:36:55 r1epi kernel: 8b 
May 31 01:36:55 r1epi kernel: 88 
May 31 01:36:55 r1epi kernel: b0 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: 00 
May 31 01:36:55 r1epi kernel: <80> 
May 31 01:36:55 r1epi kernel: e1 May 31 01:36:55 r1epi kernel: 48 
May 31 01:36:55 r1epi kernel: 89 
May 31 01:36:55 r1epi kernel: f8 
May 31 01:36:55 r1epi kernel: e9 
May 31 01:36:55 r1epi kernel: 86 
May 31 01:36:55 r1epi kernel: fe 
May 31 01:36:55 r1epi kernel: ff 
May 31 01:36:55 r1epi kernel: ff 
May 31 01:36:55 r1epi kernel: 0f 
May 31 01:36:55 r1epi kernel: 0b 
May 31 01:36:55 r1epi kernel: 0f 
May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: nf_conntrack_ipv6
May 31 01:36:55 r1epi kernel: nf_defrag_ipv4 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support btrfs x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel raid6_pq eeepc_wmi asus_wmi btusb sparse_keymap mxm_wmi bluetooth xor ghash_clmulni_intel snd_hda_intel 6lowpan_iphc snd_hda_codec microcode rfkill snd_hwdep snd_pcm e1000 serio_raw e1000e lpc_ich i2c_i801 mfd_core snd_timer snd mei_me ptp soundcore mei pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper firewire_ohci drm firewire_core crc_itu_t aacraid i2c_core video
May 31 01:36:55 r1epi kernel: CPU: 1 PID: 682 Comm: nfsd Not tainted 3.14.2-200.fc20.x86_64 #1
May 31 01:36:55 r1epi kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
May 31 01:36:55 r1epi kernel: task: ffff8803fd80f200 ti: ffff8803fd1c0000 task.ti: ffff8803fd1c0000
May 31 01:36:55 r1epi kernel: RIP: 0010:[<ffffffff81202607>]  [<ffffffff81202607>] d_obtain_alias+0x1b7/0x1d0
May 31 01:36:55 r1epi kernel: RSP: 0018:ffff8803fd1c1a80  EFLAGS: 00000202
May 31 01:36:55 r1epi kernel: RAX: ffff88041623d000 RBX: ffffffff812039a3 RCX: ffff88022f9c9e49
May 31 01:36:55 r1epi kernel: RDX: ffff88041623d0b0 RSI: ffffffffa0916140 RDI: ffff880172d5d658
May 31 01:36:55 r1epi kernel: RBP: ffff8803fd1c1a98 R08: 0000000000017bf0 R09: ffffffff812021b5
May 31 01:36:55 r1epi kernel: R10: fe6f2bbe751b9003 R11: ffffea0010462480 R12: 0000000000000000
May 31 01:36:55 r1epi kernel: R13: ffff880172d49da0 R14: ffffffff810d1cb5 R15: ffff8803fd1c19f8
May 31 01:36:55 r1epi kernel: FS:  0000000000000000(0000) GS:ffff88042fa40000(0000) knlGS:0000000000000000
May 31 01:36:55 r1epi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 31 01:36:55 r1epi kernel: CR2: 00007fce0a1d2000 CR3: 0000000001c0c000 CR4: 00000000000407e0
May 31 01:36:55 r1epi kernel: Stack:
May 31 01:36:55 r1epi kernel: ffff880172d49da0 ffffffffffffff8c 000000000005d7a7 ffff8803fd1c1b08
May 31 01:36:55 r1epi kernel: ffffffffa08d6375 0000000000aa6445 ffff880417b10000 00000000ff38b5cc
May 31 01:36:55 r1epi kernel: 45ff8803ff38b100 010000000000aa64 0000000000000000 00000000c9f4e379
May 31 01:36:55 r1epi kernel: Call Trace:
May 31 01:36:55 r1epi kernel: [<ffffffffa08d6375>] btrfs_get_dentry+0x115/0x140 [btrfs]
May 31 01:36:55 r1epi kernel: [<ffffffffa01f9a40>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa08d6662>] btrfs_fh_to_dentry+0x32/0x60 [btrfs]
May 31 01:36:55 r1epi kernel: [<ffffffff812cc842>] exportfs_decode_fh+0x72/0x2e0
May 31 01:36:55 r1epi kernel: [<ffffffffa01ffa2b>] ? exp_find+0x10b/0x1c0 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffff811cc925>] ? kmem_cache_alloc+0x35/0x1f0
May 31 01:36:55 r1epi kernel: [<ffffffff810b3606>] ? prepare_creds+0x26/0x1c0
May 31 01:36:55 r1epi kernel: [<ffffffffa01fa796>] fh_verify+0x316/0x600 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01fbb50>] nfsd_open+0x40/0x1d0 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffff810a4f69>] ? try_to_grab_pending+0xa9/0x150
May 31 01:36:55 r1epi kernel: [<ffffffffa01fe59b>] nfsd_write+0xbb/0x110 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa0204af0>] nfsd3_proc_write+0xc0/0x160 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01f6dbb>] nfsd_dispatch+0xbb/0x200 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01b1d00>] svc_process_common+0x480/0x6f0 [sunrpc]
May 31 01:36:55 r1epi kMay 31 01:36:55 r1epi kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0
May 31 01:36:55 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40
May 31 01:36:55 r1epi kernel: Code: 00 66 41 83 45 58 01 66 83 83 88 00 00 00 01 48 89 de 4c 89 ef e8 ca 81 0e 00 4c 89 e8 e9 a1 fe ff ff f3 90 48 8b 88 b0 00 00 00 <80> e1 01 75 f2 e9 75 ff ff ff 48 89 f8 e9 86 fe ff ff 0f 0b 0f 
ernel: [<ffffffffa01b2077>] svc_process+0x107/0x170 [sunrpc]
May 31 01:36:55 r1epi kernel: [<ffffffffa01f674f>] nfsd+0xbf/0x130 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffffa01f6690>] ? nfsd_destroy+0x80/0x80 [nfsd]
May 31 01:36:55 r1epi kernel: [<ffffffff810ae211>] kthread+0xe1/0x100
May 31 01:36:55 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40

May 31 01:36:55 r1epi kernel: 01 
May 31 01:36:55 r1epi kernel: 75 
May 31 01:36:55 r1epi kernel: f2 
May 31 01:36:55 r1epi kernel: e9 
May 31 01:36:55 r1epi kernel: 75 
May 31 01:36:55 r1epi kernel: ff 
May 31 01:36:55 r1epi kernel: ff 
May 31 01:36:55 r1epi kernel: ff 

May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: [<ffffffff816fef7c>] ret_from_fork+0x7c/0xb0
May 31 01:36:55 r1epi kernel: 
May 31 01:36:55 r1epi kernel: [<ffffffff810ae130>] ? insert_kthread_work+0x40/0x40

Comment 9 J. Bruce Fields 2014-06-03 17:24:19 UTC
There were also reports of soft lockups in shrink_dentry_list on lkml recently: https://lkml.org/lkml/2014/5/26/125

Apparently fixed in 3.15-rc7, but I'm unclear when the problem was introduced--possibly too recently to explain your issue.

Comment 10 g. artim 2014-07-16 15:10:34 UTC
just another fyi...these are the kernels installed, vanilla Fedora fc20:

-rwxr-xr-x  1 root root  5329128 Apr  4 05:17 vmlinuz-3.13.9-200.fc20.x86_64
-rwxr-xr-x  1 root root  5514584 Apr 28 07:47 vmlinuz-3.14.2-200.fc20.x86_64
-rwxr-xr-x  1 root root  5514392 May 13 06:56 vmlinuz-3.14.4-200.fc20.x86_64

this seems to be the only stable one -- no "BUG: soft lockup - CPU#0 stuck for..":

Linux r1epi 3.13.9-200.fc20.x86_64 #1 SMP Fri Apr 4 12:13:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Has not crashed for a few weeks. 

I could try  3.15-rc7 or just stay on 3.13.9-200 for a bit, please advise, thanks!

Comment 11 J. Bruce Fields 2014-07-16 20:33:09 UTC
Actually taking a quick look at the logs I think the fix for the soft lookup mentioned above are in -rc8, not -rc7 (b2b80195d882 "dealing with the rest of shrink_dentry_list() livelock").

Anyway, looks like the latest Fedora 20 kernel is 3.15.4-200, could you just try that?  That should have a fix for the known shrink_dentry_list soft lookup.

Comment 12 g. artim 2014-08-13 17:28:15 UTC
3.13.9-200.fc20.x86_64

note I got list_del corruption this week on the above kernel (fyi) -- I thought it would be more stable, I have now moved to 

3.15.8-200.fc20.x86_64 

hoping for no more crashing at 9pm! ha, thanks, Gary

Aug 11 19:44:00 r1epi kernel: [1083825.700024] ------------[ cut here ]------------
Aug 11 19:44:00 r1epi kernel: [1083825.700031] WARNING: CPU: 1 PID: 18206 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Aug 11 19:44:00 r1epi kernel: [1083825.700033] list_del corruption, ffff88027735e160->next is LIST_POISON1 (dead000000100100)
Aug 11 19:44:00 r1epi kernel: [1083825.700034] Modules linked in: binfmt_misc sch_sfq xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ipv6 nf_defrag_
ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek btrfs raid6_pq libcrc32c eeepc_wmi asus_wmi sparse
_keymap xor mxm_wmi x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel btusb bluetooth lpc_ich microcode serio_raw mfd_core rfkill s
nd_hda_intel e1000 i2c_i801 snd_hda_codec snd_hwdep snd_pcm e1000e snd_page_alloc snd_timer mei_me snd ptp mei soundcore pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc
 i915 i2c_algo_bit drm_kms_helper drm firewire_ohci firewire_core crc_itu_t aacraid i2c_core video
Aug 11 19:44:00 r1epi kernel: [1083825.700068] CPU: 1 PID: 18206 Comm: nfsd Not tainted 3.13.9-200.fc20.x86_64 #1
Aug 11 19:44:00 r1epi kernel: [1083825.700070] Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
Aug 11 19:44:00 r1epi kernel: [1083825.700072]  0000000000000009 ffff88001ab3bce8 ffffffff81687dac ffff88001ab3bd30
Aug 11 19:44:00 r1epi kernel: [1083825.700074]  ffff88001ab3bd20 ffffffff8106d4dd ffff88027735e160 ffff8800c3faa000
Aug 11 19:44:00 r1epi kernel: [1083825.700076]  0000000000000002 000000000000006c 000000000000006c ffff88001ab3bd80
Aug 11 19:44:00 r1epi kernel: [1083825.700078] Call Trace:
Aug 11 19:44:00 r1epi kernel: [1083825.700083]  [<ffffffff81687dac>] dump_stack+0x45/0x56
Aug 11 19:44:00 r1epi kernel: [1083825.700086]  [<ffffffff8106d4dd>] warn_slowpath_common+0x7d/0xa0
Aug 11 19:44:00 r1epi kernel: [1083825.700088]  [<ffffffff8106d54c>] warn_slowpath_fmt+0x4c/0x50
Aug 11 19:44:00 r1epi kernel: [1083825.700090]  [<ffffffff8132cd93>] __list_del_entry+0x63/0xd0
Aug 11 19:44:00 r1epi kernel: [1083825.700097]  [<ffffffffa01f8c41>] lru_put_end+0x21/0x60 [nfsd]
Aug 11 19:44:00 r1epi kernel: [1083825.700102]  [<ffffffffa01f95d5>] nfsd_cache_update+0x85/0x150 [nfsd]
Aug 11 19:44:00 r1epi kernel: [1083825.700107]  [<ffffffffa01ede02>] nfsd_dispatch+0x192/0x200 [nfsd]
Aug 11 19:44:00 r1epi kernel: [1083825.700117]  [<ffffffffa01b931d>] svc_process_common+0x46d/0x6d0 [sunrpc]
Aug 11 19:44:00 r1epi kernel: [1083825.700126]  [<ffffffffa01b9687>] svc_process+0x107/0x170 [sunrpc]
Aug 11 19:44:00 r1epi kernel: [1083825.700131]  [<ffffffffa01ed71f>] nfsd+0xbf/0x130 [nfsd]
Aug 11 19:44:00 r1epi kernel: [1083825.700135]  [<ffffffffa01ed660>] ? nfsd_destroy+0x80/0x80 [nfsd]
Aug 11 19:44:00 r1epi kernel: [1083825.700138]  [<ffffffff8108f2f2>] kthread+0xd2/0xf0
Aug 11 19:44:00 r1epi kernel: [1083825.700141]  [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40
Aug 11 19:44:00 r1epi kernel: [1083825.700144]  [<ffffffff81696cbc>] ret_from_fork+0x7c/0xb0
Aug 11 19:44:00 r1epi kernel: [1083825.700146]  [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40
Aug 11 19:44:00 r1epi kernel: [1083825.700147] ---[ end trace 4ca77c7dc9ca19d2 ]---
Aug 11 19:44:00 r1epi kernel: ------------[ cut here ]------------
Aug 11 19:44:00 r1epi kernel: ------------[ cut here ]------------
Aug 11 19:44:00 r1epi kernel: WARNING: CPU: 1 PID: 18206 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Aug 11 19:44:00 r1epi kernel: list_del corruption, ffff88027735e160->next is LIST_POISON1 (dead000000100100)
Aug 11 19:44:00 r1epi kernel: Modules linked in: binfmt_misc sch_sfq xt_iprange bonding ip6t_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek btrfs raid6_pq libcrc32c eeepc_wmi asus_wmi sparse_keymap xor mxm_wmi x86_pkg_temp_thermal coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel btusb bluetooth lpc_ich microcode serio_raw mfd_core rfkill snd_hda_intel e1000 i2c_i801 snd_hda_codec snd_hwdep snd_pcm e1000e snd_page_alloc snd_timer mei_me snd ptp mei soundcore pps_core shpchp wmi nfsd auth_rpcgss nfs_acl lockd sunrpc i915 i2c_algo_bit drm_kms_helper drm firewire_ohci firewire_core crc_itu_t aacraid i2c_core video
Aug 11 19:44:00 r1epi kernel: CPU: 1 PID: 18206 Comm: nfsd Not tainted 3.13.9-200.fc20.x86_64 #1
Aug 11 19:44:00 r1epi kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
Aug 11 19:44:00 r1epi kernel: 0000000000000009 ffff88001ab3bce8 ffffffff81687dac ffff88001ab3bd30
Aug 11 19:44:00 r1epi kernel: ffff88001ab3bd20 ffffffff8106d4dd ffff88027735e160 ffff8800c3faa000
Aug 11 19:44:00 r1epi kernel: 0000000000000002 000000000000006c 000000000000006c ffff88001ab3bd80
Aug 11 19:44:00 r1epi kernel: Call Trace:
Aug 11 19:44:00 r1epi kernel: [<ffffffff81687dac>] dump_stack+0x45/0x56
Aug 11 19:44:00 r1epi kernel: [<ffffffff8106d4dd>] warn_slowpath_common+0x7d/0xa0
Aug 11 19:44:00 r1epi kernel: [<ffffffff8106d54c>] warn_slowpath_fmt+0x4c/0x50
Aug 11 19:44:00 r1epi kernel: [<ffffffff8132cd93>] __list_del_entry+0x63/0xd0
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01f8c41>] lru_put_end+0x21/0x60 [nfsd]
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01f95d5>] nfsd_cache_update+0x85/0x150 [nfsd]
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01ede02>] nfsd_dispatch+0x192/0x200 [nfsd]
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01b931d>] svc_process_common+0x46d/0x6d0 [sunrpc]
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01b9687>] svc_process+0x107/0x170 [sunrpc]
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01ed71f>] nfsd+0xbf/0x130 [nfsd]
Aug 11 19:44:00 r1epi kernel: [<ffffffffa01ed660>] ? nfsd_destroy+0x80/0x80 [nfsd]
Aug 11 19:44:00 r1epi kernel: [<ffffffff8108f2f2>] kthread+0xd2/0xf0
Aug 11 19:44:00 r1epi kernel: [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40
Aug 11 19:44:00 r1epi kernel: [<ffffffff81696cbc>] ret_from_fork+0x7c/0xb0
Aug 11 19:44:00 r1epi kernel: [<ffffffff8108f220>] ? insert_kthread_work+0x40/0x40
Aug 11 19:44:00 r1epi kernel: ---[ end trace 4ca77c7dc9ca19d2 ]---
Aug 11 19:44:04 r1epi salt-minion: [WARNING ] SaltReqTimeoutError: Waited 60 seconds
Aug 11 19:44:04 r1epi salt-minion: [INFO    ] Waiting for minion key to be accepted by the master.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

Comment 13 J. Bruce Fields 2014-08-13 18:36:39 UTC
(In reply to g. artim from comment #12)
> 3.13.9-200.fc20.x86_64
> 
> note I got list_del corruption this week on the above kernel (fyi) -- I
> thought it would be more stable, I have now moved to 
> 
> 3.15.8-200.fc20.x86_64 
> 
> hoping for no more crashing at 9pm! ha, thanks, Gary

Thanks, yes that should have the fix; let us know either way.

Comment 14 g. artim 2014-10-19 19:56:35 UTC
did more testing, this combo creates the soft lockup cpu#n - n=0..7:
run:
===
- tree command on nfs client on a big (20TB) raid, nfs4 mounted, 1000s of files
- tree on another client, same dir
- scp from a client to home server of 3TB file
- rsync backups of all servers across the net, output to nfs server
- rsync on server raid to raid (no nfs required), the from target is 20TB, 5805 adaptec, to target is the same.
- on server watch `dmesg | tail`
 
6 console terms open on the raid, after about 15 mins it hangs and I get:

self detectable stall on cpu (1)
bug soft lockup cpu#0
..thru cpu#7
stuck for 22s! [NFSD: nnn]

on the dmesg, and funny thing I keep gettting them after about 22 seconds.

Ok, frustrated as hell, I tried the following in order to see if its ME, something I did, I built the server:

I flashed the mb, retested: lockup
I went from a 850W psu to 1300W psu, retested: lockup
I replaced the memory, ran memtest, retested: lockup
I pulled the MB/CPU/Memory from my desktop, installed, then did a complete reinstall and update of O/S to fc20, retest: lockup

hw now:
======
 mb asus Z87-PRO
 cpu i7-4771 CPU @ 3.50GHz
 raid cards adaptec 5805

software:
========
 3.16.6-200.fc20.x86_64 #1 SMP Wed Oct 15 13:06:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 btrfs (with lzo compression) on a hw raid 5 and hw raid 0
 
My research group is away at conference, let me know asap if there is something more I could do to help fix this. Also, would I get more stablility if I ran Redhat. I've run fedora for about 16 years, production and test and have never had a problem I could get fixed. if running an stable version would help, let me know. thanks for ANY feedback!! -- gary

Comment 15 g. artim 2014-10-19 20:02:09 UTC
correction 3GB file scp'ed...ooops!

Comment 16 J. Bruce Fields 2014-10-20 21:36:16 UTC
Could you post the softlookup warnings from the new softlookups?  Or are the backtraces really identical to the ones you were seeing before?

Hard to judge whether you would have seen this on another distro.  I haven't seen this on RHEL but there could just be something unusual about your setup or workload.

(Also, btrfs is unsupported on RHEL (except as a "tech preview"), and probably isn't what would be recommended for production when stability's the priority.)

Comment 17 g. artim 2014-10-22 18:26:33 UTC
couldnt find the soft lock errors in the log and didnt capture through the terminal (watch 'dmesg|tail') i had open. Didn't see anything more then in the past. 

food for thought:
I rebooted after the lockup and _just_ ran:

rsync -av /my2 /backup   

(both btrfs filesystems) and got a lockup, but not with CPU soft lock errors, but a backtrace on btrfs. I cant locate the messages in the log....maybe never made it to the log, need to get the serial port working for a console capture.

At this point I'm compressing the from target so I can easily copy from the btrfs to xfs filesystem in hopes of ridding myself of this instability (do a switch between the 2 to get _only_ xfs filesystems). I have to face the issue with having used the lzo compression...more disks space. I did run for quite some time (year) with this config without as many lockups (but not 0), something happened to make it much more unstable, could be software, but none of the raid cards or smartd report errors on the drives. Im thinking btrfs is not production ready yet.

Comment 18 J. Bruce Fields 2014-10-22 18:37:57 UTC
I guess we should cc: Josef if we think btrfs might be at fault.

Comment 19 g. artim 2014-10-30 21:05:32 UTC
okay more interesting (voodoo) stuff:

I converted both raids, within the same computer, to xfs. Seemed to run clean, but then I got:

[176363.245588] aacraid: Host adapter abort request (3,0,0,0)
[176363.245600] aacraid: Host adapter abort request (3,0,0,0)
[176363.245612] aacraid: Host adapter abort request (3,0,0,0)
[176363.245624] aacraid: Host adapter abort request (3,0,0,0)
[176363.245690] aacraid: Host adapter reset request. SCSI hang ?
[193002.981921] aacraid: Host adapter abort request (3,0,0,0)
[193002.981938] aacraid: Host adapter abort request (3,0,0,0)
[193002.981952] aacraid: Host adapter abort request (3,0,0,0)
[193002.981966] aacraid: Host adapter abort request (3,0,0,0)
[193002.981979] aacraid: Host adapter abort request (3,0,0,0)
[193002.981991] aacraid: Host adapter abort request (3,0,0,0)
[193003.956322] aacraid: Host adapter reset request. SCSI hang ?

so I did some browsing on these error and Adaptec 5805 and someone said not to use cfq algorithm with these cards. The machine with throwing these errors every so often and df, dmesg commands were slow. So for laughs I did

echo noop > /sys/block/sdb/queue/scheduler
echo noop > /sys/block/sdc/queue/scheduler

and the system came back or started responding as expected. Now after all I did to try and solve this issue it ends up being a scheduling issue??? I dont know or ?

Comment 20 Justin M. Forbes 2014-11-13 15:58:41 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.17.2-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 21 Justin M. Forbes 2014-12-10 14:59:00 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 22 J. Bruce Fields 2015-04-24 20:21:06 UTC
Sorry for the automated closing....

(In reply to g. artim from comment #19)
> and the system came back or started responding as expected. Now after all I
> did to try and solve this issue it ends up being a scheduling issue??? I
> dont know or ?

So since then have you seen any recurrence of the problem?

It would be interesting to know if switching to xfs helped, in which case it's more likely to be something btrfs-specific.

Comment 23 g. artim 2015-04-24 20:53:46 UTC
after switching to xfs the problem seem to have stop...but because I had 2 raid configs in one system I turn one off at the same time, could have been the pci bus. I moved the second raid to a separate system. I've recently move to LSI cards and 24 3TB drives and seems stable also -- the new config is xfs, miss the lzo option of btrfs, but it (btrfs-lzo) cornered me when I tried to switch to xfs and could copy all the data. gary

Comment 24 Fedora Kernel Team 2015-04-28 18:31:18 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.19.5-100.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 25 Fedora End Of Life 2015-05-29 11:47:22 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 26 Fedora End Of Life 2015-06-30 01:00:50 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.