Bug 1290140

Summary: deadlock while configuring storage (pvcreate) with looping display (pvs) operations running
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Clustering / clvmd QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, ccaulfie, cluster-maint, heinzm, jbrassow, jfriesse, msnitzer, prajnoha, thornber, zkabelac
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-21 21:19:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log from mckinley-01
none
log from mckinley-02
none
log from mckinley-04 none

Description Corey Marthaler 2015-12-09 17:57:49 UTC
Description of problem:
I was attempting to reproduce bug 1290129 when my cluster deadlocked.

On all four nodes in the cluster I was running pvs commands every second.

creating lvm devices...
mckinley-01: pvcreate /dev/mapper/mpatha1 /dev/mapper/mpathf1 /dev/mapper/mpathd1 /dev/mapper/mpathe1
[hang]



Dec  9 17:13:21 mckinley-01 crmd[3544]:  notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origi]
Dec  9 17:13:21 mckinley-01 pengine[3543]:  notice: Calculated Transition 172: /var/lib/pacemaker/pengine/pe-input-44.bz2
Dec  9 17:13:21 mckinley-01 crmd[3544]:  notice: Transition 172 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lie
Dec  9 17:13:21 mckinley-01 crmd[3544]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNA]


Dec  9 17:12:59 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:12:59 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:12:59 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:13:00 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:13:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:14:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:14:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:15:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:15:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:15:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:16:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)
Dec  9 17:17:03 mckinley-02 corosync[3001]: [QB    ] conn (3001-3281-30) Nothing in q but got POLLIN on fd:30 (res2:1)



  PV                         VG               Fmt  Attr PSize   PFree  PV Tags
  /dev/mapper/mpatha1                              ---       0      0         
  /dev/mapper/mpathb1                              ---       0      0         
  /dev/mapper/mpathc1                              ---       0      0         
  /dev/mapper/mpathd1                              ---       0      0         
  /dev/mapper/mpathe1                              ---       0      0         
  /dev/mapper/mpathf1                              ---       0      0         
  /dev/mapper/mpathg1                              ---       0      0         
  /dev/mapper/mpathh1                              ---       0      0         
  /dev/rhel_mckinley-02/home                       ---       0      0         
  /dev/rhel_mckinley-02/root                       ---       0      0         
  /dev/rhel_mckinley-02/swap                       ---       0      0         
  /dev/sda1                                        ---       0      0         
  /dev/sda2                  rhel_mckinley-02 lvm2 a--  278.97g 60.00m        
  /dev/sdb1                  rhel_mckinley-02 lvm2 a--  279.46g     0         
  Error locking on node 4: Command timed out
  Error locking on node 1: Command timed out
  Error locking on node 3: Command timed out
  Error locking on node 4: Command timed out
  Error locking on node 1: Command timed out
  Error locking on node 3: Command timed out
  Can't get lock for #orphans_pool
  Cannot process volume group #orphans_pool
  Error locking on node 4: Command timed out
  Error locking on node 1: Command timed out
  Error locking on node 3: Command timed out
  Can't get lock for #orphans_lvm2
  Cannot process volume group #orphans_lvm2
  PV                         VG               Fmt  Attr PSize   PFree  PV Tags
  /dev/mapper/mpatha1                              ---       0      0         
  /dev/mapper/mpathb1                              ---       0      0         
  /dev/mapper/mpathc1                              ---       0      0         
  /dev/mapper/mpathd1                              ---       0      0         
  /dev/mapper/mpathe1                              ---       0      0         
  /dev/mapper/mpathf1                              ---       0      0         
  /dev/mapper/mpathg1                              ---       0      0         
  /dev/mapper/mpathh1                              ---       0      0         
  /dev/rhel_mckinley-02/home                       ---       0      0         
  /dev/rhel_mckinley-02/root                       ---       0      0         
  /dev/rhel_mckinley-02/swap                       ---       0      0         
  /dev/sda1                                        ---       0      0         
  /dev/sda2                  rhel_mckinley-02 lvm2 a--  278.97g 60.00m        
  /dev/sdb1                  rhel_mckinley-02 lvm2 a--  279.46g     0         
  Error locking on node 4: Command timed out
  Error locking on node 1: Command timed out
  Error locking on node 3: Command timed out
  Error locking on node 4: Command timed out
  Error locking on node 1: Command timed out
  Error locking on node 3: Command timed out
  Unable to obtain global lock.



## MCKINLEY-01
Dec  9 17:29:43 mckinley-01 kernel: pvcreate        S ffff880fe5fc8ea0     0 48243  48242 0x00000080
Dec  9 17:29:43 mckinley-01 kernel: ffff880036797ac0 0000000000000082 ffff880fdd21f300 ffff880036797fd8
Dec  9 17:29:43 mckinley-01 kernel: ffff880036797fd8 ffff880036797fd8 ffff880fdd21f300 7fffffffffffffff
Dec  9 17:29:43 mckinley-01 kernel: ffff880fdd21f300 0000000000000000 ffff880fe5fc8b40 ffff880fe5fc8ea0
Dec  9 17:29:43 mckinley-01 kernel: Call Trace:
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff8163a8e9>] schedule+0x29/0x70
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff816385d9>] schedule_timeout+0x209/0x2d0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810b5d85>] ? check_preempt_curr+0x75/0xa0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810b5dc9>] ? ttwu_do_wakeup+0x19/0xd0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811c3bb8>] ? __kmalloc_node_track_caller+0x58/0x270
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810a6806>] ? prepare_to_wait+0x56/0x90
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff815d274f>] unix_stream_read_generic+0x30f/0x8e0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810c34b7>] ? enqueue_entity+0x237/0x890
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff812888a5>] ? sock_has_perm+0x75/0x90
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff815d2e14>] unix_stream_recvmsg+0x54/0x70
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff815cff30>] ? unix_state_double_unlock+0x50/0x50
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff8150ffb6>] sock_aio_read.part.7+0x146/0x160
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810b8c22>] ? default_wake_function+0x12/0x20
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff8150fff1>] sock_aio_read+0x21/0x30
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811ddccd>] do_sync_read+0x8d/0xd0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811de4d5>] vfs_read+0x145/0x170
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811def7f>] SyS_read+0x7f/0xe0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b


Dec  9 17:29:43 mckinley-01 kernel: pvs             S ffff881fe5f49da0     0 48140   3447 0x00000080
Dec  9 17:29:43 mckinley-01 kernel: ffff881fd09b3ac0 0000000000000082 ffff880fe234a280 ffff881fd09b3fd8
Dec  9 17:29:43 mckinley-01 kernel: ffff881fd09b3fd8 ffff881fd09b3fd8 ffff880fe234a280 7fffffffffffffff
Dec  9 17:29:43 mckinley-01 kernel: ffff880fe234a280 0000000000000000 ffff881fe5f49a40 ffff881fe5f49da0
Dec  9 17:29:43 mckinley-01 kernel: Call Trace:
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff8163a8e9>] schedule+0x29/0x70
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff816385d9>] schedule_timeout+0x209/0x2d0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810b5d85>] ? check_preempt_curr+0x75/0xa0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810a6806>] ? prepare_to_wait+0x56/0x90
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff815d274f>] unix_stream_read_generic+0x30f/0x8e0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810b0ddf>] ? __wake_up_sync_key+0x4f/0x60
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff812888a5>] ? sock_has_perm+0x75/0x90
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff815d2e14>] unix_stream_recvmsg+0x54/0x70
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff815cff30>] ? unix_state_double_unlock+0x50/0x50
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff8150ffb6>] sock_aio_read.part.7+0x146/0x160
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff810b8c22>] ? default_wake_function+0x12/0x20
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff8150fff1>] sock_aio_read+0x21/0x30
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811ddccd>] do_sync_read+0x8d/0xd0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811de4d5>] vfs_read+0x145/0x170
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff811def7f>] SyS_read+0x7f/0xe0
Dec  9 17:29:43 mckinley-01 kernel: [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b


## MCKINLEY-02
Dec  9 17:30:05 mckinley-02 kernel: corosync        S ffffc9001ca65dc0     0  3002      1 0x00000080
Dec  9 17:30:05 mckinley-02 kernel: ffff881fe752fcd0 0000000000000086 ffff881fe5989700 ffff881fe752ffd8
Dec  9 17:30:05 mckinley-02 kernel: ffff881fe752ffd8 ffff881fe752ffd8 ffff881fe5989700 ffff881fe5989700
Dec  9 17:30:05 mckinley-02 kernel: ffff881fe5989700 0000000000000000 ffff881fe752fde0 ffffc9001ca65dc0
Dec  9 17:30:05 mckinley-02 kernel: Call Trace:
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff8163a8e9>] schedule+0x29/0x70
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810e2464>] futex_wait_queue_me+0xc4/0x120
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810e2fd9>] futex_wait+0x179/0x280
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810e506e>] do_futex+0xfe/0x5b0
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810e55a0>] SyS_futex+0x80/0x180
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
Dec  9 17:30:05 mckinley-02 kernel: pacemakerd      S ffff880fe61bbbc4     0  3020      1 0x00000080
Dec  9 17:30:05 mckinley-02 kernel: ffff880fe61bba30 0000000000000082 ffff881fe4684500 ffff880fe61bbfd8
Dec  9 17:30:05 mckinley-02 kernel: ffff880fe61bbfd8 ffff880fe61bbfd8 ffff881fe4684500 ffff880fe61bbb88
Dec  9 17:30:05 mckinley-02 kernel: 000000000007a11f 0000000000000000 0000000000000000 ffff880fe61bbbc4
Dec  9 17:30:05 mckinley-02 kernel: Call Trace:
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff8163a8e9>] schedule+0x29/0x70
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff81639a02>] schedule_hrtimeout_range_clock+0xb2/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810a95c0>] ? hrtimer_get_res+0x50/0x50
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff816399f6>] ? schedule_hrtimeout_range_clock+0xa6/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff81639ab3>] schedule_hrtimeout_range+0x13/0x20
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f2ba5>] poll_schedule_timeout+0x55/0xb0
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f412d>] do_sys_poll+0x4cd/0x580
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff81213b38>] ? generic_write_end+0x78/0xb0
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffffa03a47b9>] ? xfs_vm_write_end+0x29/0x80 [xfs]
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff81169e06>] ? generic_file_buffered_write+0x1b6/0x290
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f2ac0>] ? poll_select_copy_remaining+0x150/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f2ac0>] ? poll_select_copy_remaining+0x150/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f2ac0>] ? poll_select_copy_remaining+0x150/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f2ac0>] ? poll_select_copy_remaining+0x150/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f2ac0>] ? poll_select_copy_remaining+0x150/0x150
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811ddd9d>] ? do_sync_write+0x8d/0xd0
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff8122bb27>] ? eventfd_ctx_read+0x67/0x210
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff810d814c>] ? ktime_get_ts64+0x4c/0xf0
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff811f42e4>] SyS_poll+0x74/0x110
Dec  9 17:30:05 mckinley-02 kernel: [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b



Version-Release number of selected component (if applicable):
3.10.0-325.el7.x86_64
corosync-2.3.4-7.el7.x86_64   BUILT: Mon 22 Jun 2015 09:32:50 AM CDT
pacemaker-1.1.13-10.el7.x86_64   BUILT: Thu 08 Oct 2015 06:06:55 PM CDT

lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015

Comment 1 Corey Marthaler 2015-12-09 21:25:12 UTC
Created attachment 1104138 [details]
log from mckinley-01

Comment 2 Corey Marthaler 2015-12-09 21:25:49 UTC
Created attachment 1104139 [details]
log from mckinley-02

Comment 3 Corey Marthaler 2015-12-09 21:26:20 UTC
Created attachment 1104140 [details]
log from mckinley-04

Comment 5 Jonathan Earl Brassow 2016-06-30 16:33:49 UTC
Possible duplicate of Bug 1194667 ?

Comment 6 Jonathan Earl Brassow 2017-07-27 20:37:01 UTC
(In reply to Jonathan Earl Brassow from comment #5)
> Possible duplicate of Bug 1194667 ?

and possibly bug 1263356 ?