Bug 1008029 - BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1H:268]
BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1H:268]
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: GFS (Show other bugs)
18
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Robert Peterson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-13 16:25 EDT by Tom M
Modified: 2014-02-05 17:23 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-05 17:23:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tom M 2013-09-13 16:25:58 EDT
Description of problem:
Running gfs2 filesystem, syslog reports CPU lockup on node #2 of a 2 node cluster ~30 seconds.  Logs show trace from gfs2, glock_workqueue and glock_work_func

Message from syslogd@b5-25 at Sep 13 15:50:37 ...
 kernel:[17577.138155] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1H:268]

Version-Release number of selected component (if applicable):

uname -a
Linux b5-25 3.10.10-100.fc18.x86_64 #1 SMP Thu Aug 29 20:13:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Happens every 30 seconds.

Steps to Reproduce:
1.  Installed and configured 2 node cluster via pacemaker + corosync.
2.  Start dlm and clmvd service.
3.  Created volumes from multipathed SAN shared storage.
 (pvcreate, vgcreate, lvcreate from /dev/mapper/mpathl)
4.  Create gfs2 filesystem from the volume.
  mkfs -t gfs2 -p lock_dlm -t testcluster:gfs_mpathl -j 2 /dev/mapper/vgfiles_l-lvfiles_l
5.  Mounted gfs2 filestem on both nodes and wrote 1 file.

Actual results:
  Kernel oops and cpu stuck message printed to syslog.

Additional info:

From /var/log/messages:

Sep 13 16:13:42 b5-25 sh[718]: abrt-dump-oops: Found oopses: 1
Sep 13 16:13:42 b5-25 sh[718]: abrt-dump-oops: Creating problem directories
Sep 13 16:13:42 b5-25 abrtd: Directory 'oops-2013-09-13-16:13:42-3096-0' creation detected
Sep 13 16:13:42 b5-25 abrt-dump-oops: Reported 1 kernel oopses to Abrt
Sep 13 16:13:42 b5-25 abrtd: Duplicate: core backtrace
Sep 13 16:13:42 b5-25 abrtd: DUP_OF_DIR: /var/tmp/abrt/oops-2013-09-13-15:47:47-2566-0
Sep 13 16:13:42 b5-25 abrtd: Deleting problem directory oops-2013-09-13-16:13:42-3096-0 (dup of oops-2013-09-13-15:47:47-2566-0)
Sep 13 16:14:09 b5-25 kernel: [18988.278028] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1H:268]
Sep 13 16:14:09 b5-25 kernel: [18988.278032] Modules linked in: gfs2 dlm sctp drbd lru_cache libcrc32c bnep bluetooth rfkill iTCO_wdt iTCO_vendor_support acpi_cpufreq mperf coretemp kvm dcdbas serio_raw joydev microcode lpc_ich mfd_core i7core_edac edac_core acpi_power_meter bnx2 uinput mgag200 i2c_algo_bit drm_kms_helper ttm lpfc crc32c_intel drm i2c_core scsi_transport_fc scsi_tgt
Sep 13 16:14:09 b5-25 kernel: [18988.278065] CPU: 0 PID: 268 Comm: kworker/0:1H Not tainted 3.10.10-100.fc18.x86_64 #1
Sep 13 16:14:09 b5-25 kernel: [18988.278067] Hardware name: Dell Inc. PowerEdge R610/0J352H, BIOS 2.2.11 01/11/2011
Sep 13 16:14:09 b5-25 kernel: [18988.278079] Workqueue: glock_workqueue glock_work_func [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278081] task: ffff880323a996e0 ti: ffff88032454a000 task.ti: ffff88032454a000
Sep 13 16:14:09 b5-25 kernel: [18988.278083] RIP: 0010:[<ffffffff8165c172>]  [<ffffffff8165c172>] _raw_spin_lock+0x22/0x30
Sep 13 16:14:09 b5-25 kernel: [18988.278089] RSP: 0018:ffff88032454bb88  EFLAGS: 00000297
Sep 13 16:14:09 b5-25 kernel: [18988.278091] RAX: 00000000000000a4 RBX: ffffffff8113cc57 RCX: ffff8801a5ef6000
Sep 13 16:14:09 b5-25 kernel: [18988.278093] RDX: 00000000000000a5 RSI: 0000000000000000 RDI: ffff8801a6f058a8
Sep 13 16:14:09 b5-25 kernel: [18988.278095] RBP: ffff88032454bb88 R08: ffff8801a6f05808 R09: 0000000000000008
Sep 13 16:14:09 b5-25 kernel: [18988.278097] R10: ffff8801a6f05820 R11: 0000000002000000 R12: ffff8801a50681b0
Sep 13 16:14:09 b5-25 kernel: [18988.278099] R13: ffff88032454bca0 R14: ffffffff8113f2d5 R15: ffff88032454baf8
Sep 13 16:14:09 b5-25 kernel: [18988.278102] FS:  0000000000000000(0000) GS:ffff88032fc00000(0000) knlGS:0000000000000000
Sep 13 16:14:09 b5-25 kernel: [18988.278104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 13 16:14:09 b5-25 kernel: [18988.278106] CR2: 00007fc6ef2af2c0 CR3: 0000000001c0c000 CR4: 00000000000007f0
Sep 13 16:14:09 b5-25 kernel: [18988.278108] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 13 16:14:09 b5-25 kernel: [18988.278110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 13 16:14:09 b5-25 kernel: [18988.278111] Stack:
Sep 13 16:14:09 b5-25 kernel: [18988.278113]  ffff88032454bbf8 ffffffffa03c9074 ffff88032454bc08 0000000000000000
Sep 13 16:14:09 b5-25 kernel: [18988.278117]  ffff880323a996e0 ffffffff81083290 00ff88032454bbb8 ffff8801a50680e8
Sep 13 16:14:09 b5-25 kernel: [18988.278121]  ffff88032454bbe8 ffff8801a5068000 ffff88032454bc08 ffff8801a6f05000
Sep 13 16:14:09 b5-25 kernel: [18988.278124] Call Trace:
Sep 13 16:14:09 b5-25 kernel: [18988.278134]  [<ffffffffa03c9074>] __gfs2_ail_flush+0x44/0x1a0 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278139]  [<ffffffff81083290>] ? wake_up_bit+0x40/0x40
Sep 13 16:14:09 b5-25 kernel: [18988.278149]  [<ffffffffa03c9297>] gfs2_ail_empty_gl+0xc7/0x110 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278158]  [<ffffffffa03c9391>] ? inode_go_sync+0xb1/0x140 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278168]  [<ffffffffa03c9391>] inode_go_sync+0xb1/0x140 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278177]  [<ffffffffa03c7815>] do_xmote+0x135/0x270 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278186]  [<ffffffffa03c7f18>] run_queue+0x138/0x280 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278195]  [<ffffffffa03c85dc>] glock_work_func+0x6c/0x160 [gfs2]
Sep 13 16:14:09 b5-25 kernel: [18988.278200]  [<ffffffff8107b54a>] process_one_work+0x17a/0x400
Sep 13 16:14:09 b5-25 kernel: [18988.278204]  [<ffffffff8107c9ac>] worker_thread+0x11c/0x370
Sep 13 16:14:09 b5-25 kernel: [18988.278208]  [<ffffffff8107c890>] ? manage_workers.isra.21+0x2b0/0x2b0
Sep 13 16:14:09 b5-25 kernel: [18988.278211]  [<ffffffff81082ab0>] kthread+0xc0/0xd0
Sep 13 16:14:09 b5-25 kernel: [18988.278216]  [<ffffffff81010000>] ? perf_trace_xen_mmu_flush_tlb_others+0x50/0x110
Sep 13 16:14:09 b5-25 kernel: [18988.278219]  [<ffffffff810829f0>] ? kthread_create_on_node+0x120/0x120
Sep 13 16:14:09 b5-25 kernel: [18988.278224]  [<ffffffff81664c6c>] ret_from_fork+0x7c/0xb0
Sep 13 16:14:09 b5-25 kernel: [18988.278227]  [<ffffffff810829f0>] ? kthread_create_on_node+0x120/0x120
Sep 13 16:14:09 b5-25 kernel: [18988.278229] Code: 90 90 90 90 90 90 90 90 90 66 66 66 66 90 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 <0f> b6 07 38 d0 75 f7 5d c3 0f 1f 44 00 00 66 66 66 66 90 55 48
Comment 1 Fedora End Of Life 2013-12-21 09:35:06 EST
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 2 Fedora End Of Life 2014-02-05 17:23:24 EST
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.