Bug 328241 - clvm deadlock in locking/cluster_locking.c:413
clvm deadlock in locking/cluster_locking.c:413
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: lvm2-cluster (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-11 15:38 EDT by Corey Marthaler
Modified: 2010-01-11 23:06 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-08-26 03:27:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Here's the sysrq-T from the link-08 where the cmd was hung. (99.66 KB, text/plain)
2007-10-12 11:08 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2007-10-11 15:38:16 EDT
Description of problem:
Not sure if this is related to regression bz 318851, but after getting the
latest dlm-kernel which fixed 318851, I attempted testing the latest cmirror bug
fixes and ended up with clvmd deadlocked just after cmirror leg recovery took
place and appeared to have finished properly.


================================================================================
Iteration 0.4 started at Thu Oct 11 09:26:46 CDT 2007
================================================================================
Senario: Kill secondary leg of synced 2 leg mirror(s)

****** Mirror hash info for this scenario ******
* name:      syncd_secondary_2legs
* sync:      1
* mirrors:   1
* disklog:   1
* failpv:    /dev/sdb1
* legs:      2
* pvs:       /dev/sda1 /dev/sdb1 /dev/sdg1
************************************************

Creating mirror(s) on link-08...
qarsh root@link-08 lvcreate -m 1 -n syncd_secondary_2legs_1 -L 800M
helter_skelter /dev/sda1:0-1000 /dev/sdb1:0-1000 /dev/sdg1:0-150

Waiting for mirror(s) to sync
Verifying fully syncd mirror(s), currently at
 ...5.50% ...15.00% ...24.50% ...34.50% ...44.00% ...54.50% ...64.50% ...74.50%
...84.50% ...94.50% ...100.00%

Creating gfs on top of mirror(s) on link-08...
Mounting mirrored gfs filesystems on link-02...
Mounting mirrored gfs filesystems on link-07...
Mounting mirrored gfs filesystems on link-08...

Writing verification files (checkit) to mirror(s) on...
        link-02:
checkit starting with:
CREATE
Num files:          100
Random Seed:        8311
Verify XIOR Stream: /tmp/checkit_syncd_secondary_2legs_1
Working dir:        /mnt/syncd_secondary_2legs_1/checkit
        link-07:
checkit starting with:
CREATE
Num files:          100
Random Seed:        20206
Verify XIOR Stream: /tmp/checkit_syncd_secondary_2legs_1
Working dir:        /mnt/syncd_secondary_2legs_1/checkit
        link-08:
checkit starting with:
CREATE
Num files:          100
Random Seed:        22230
Verify XIOR Stream: /tmp/checkit_syncd_secondary_2legs_1
Working dir:        /mnt/syncd_secondary_2legs_1/checkit

Starting the io load (collie/xdoio) on mirror(s)
Sleeping 15 seconds to get some I/O locks outstanding before the failure

Disabling device sdb on link-02
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion(s) on link-08
10+0 records in
10+0 records out
Verifying the down conversion of the failed mirror(s)
  /dev/sdb1: open failed: No such device or address

[HERE AN LVS CMD HUNG]

Oct 11 11:11:25 link-08 kernel: lvs           S 0000000000000012     0 22317 
22316                     (NOTLB)
Oct 11 11:11:25 link-08 kernel: 0000010011139bb8 0000000000000002
0000010038dad7f0 0000000000000000
Oct 11 11:11:25 link-08 kernel:        0000000000000016 ffffffff80132f7e
0000000000000001 0000000001008c60
Oct 11 11:11:25 link-08 kernel:        0000010031cdf7f0 0000000000000b68
Oct 11 11:11:25 link-08 kernel: Call
Trace:<ffffffff80132f7e>{try_to_wake_up+876}
<ffffffff8030f927>{schedule_timeout+257}
Oct 11 11:11:25 link-08 kernel:        <ffffffff80135edc>{prepare_to_wait+21}
<ffffffff8030aa6e>{unix_stream_recvmsg+592}
Oct 11 11:11:25 link-08 kernel:       
<ffffffff80135fe0>{autoremove_wake_function+0}
<ffffffff80135fe0>{autoremove_wake_function+0}
Oct 11 11:11:25 link-08 kernel:        <ffffffff802ab90e>{sock_aio_read+297}
<ffffffff802aba54>{sock_aio_write+306}
Oct 11 11:11:25 link-08 kernel:        <ffffffff8017ab85>{do_sync_read+178}
<ffffffff80188f9b>{__user_walk+94}
Oct 11 11:11:25 link-08 kernel:        <ffffffff8018353c>{vfs_stat64+24}
<ffffffff8030ee73>{thread_return+0}
Oct 11 11:11:25 link-08 kernel:        <ffffffff8030eecb>{thread_return+88}
<ffffffff80135fe0>{autoremove_wake_function+0}
Oct 11 11:11:25 link-08 kernel:        <ffffffff8017ac93>{vfs_read+226}
<ffffffff8017aedc>{sys_read+69}
Oct 11 11:11:25 link-08 kernel:        <ffffffff8011026a>{system_call+126}


lvs -vvvvvv
[...]
#activate/activate.c:440         Getting device info for VolGroup00-LogVol00
#ioctl/libdm-iface.c:1572         dm version   OF   [16384]
#ioctl/libdm-iface.c:1572         dm info 
LVM-STM37j70AgnyxGIok8z6Do2lnDqib62L8NgJLsAPdn2xis4KU1whY1E                    
                      1FJqScoMV OF   [16384]
#activate/activate.c:440         Getting device info for VolGroup00-LogVol00
#ioctl/libdm-iface.c:1572         dm info 
LVM-STM37j70AgnyxGIok8z6Do2lnDqib62L8NgJLsAPdn2xis4KU1whY1E                    
                      1FJqScoMV NF   [16384]
#activate/activate.c:440         Getting device info for VolGroup00-LogVol00
#ioctl/libdm-iface.c:1572         dm info 
LVM-STM37j70AgnyxGIok8z6Do2lnDqib62L8NgJLsAPdn2xis4KU1whY1E                    
                      1FJqScoMV NF   [16384]
#activate/activate.c:440         Getting device info for VolGroup00-LogVol01
#ioctl/libdm-iface.c:1572         dm info 
LVM-STM37j70AgnyxGIok8z6Do2lnDqib62LNwih5UmH5f6NXox5no7dn9Q                    
                      vpYIwLxCo OF   [16384]
#activate/activate.c:440         Getting device info for VolGroup00-LogVol01
#ioctl/libdm-iface.c:1572         dm info 
LVM-STM37j70AgnyxGIok8z6Do2lnDqib62LNwih5UmH5f6NXox5no7dn9Q                    
                      vpYIwLxCo NF   [16384]
#activate/activate.c:440         Getting device info for VolGroup00-LogVol01
#ioctl/libdm-iface.c:1572         dm info 
LVM-STM37j70AgnyxGIok8z6Do2lnDqib62LNwih5UmH5f6NXox5no7dn9Q                    
                      vpYIwLxCo NF   [16384]
#locking/cluster_locking.c:413       Locking V_VolGroup00 at 0x6
#locking/cluster_locking.c:413       Locking V_helter_skelter at 0x1


Looks like the downconvert worked fine:
[root@link-08 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
helter_skelter-syncd_secondary_2legs_1  (253, 5)
VolGroup00-LogVol00     (253, 0)
[root@link-08 ~]# dmsetup status
VolGroup00-LogVol01: 0 4063232 linear
helter_skelter-syncd_secondary_2legs_1: 0 1638400 linear
VolGroup00-LogVol00: 0 151781376 linear
[root@link-08 ~]# dmsetup ls --tree
VolGroup00-LogVol01 (253:1)
 └─ (3:2)
helter_skelter-syncd_secondary_2legs_1 (253:5)
 └─ (8:1)
VolGroup00-LogVol00 (253:0)
 └─ (3:2)


Oct 11 09:40:27 link-08 kernel: dm-cmirror: Recovery halted due to error on GGVE3AkS
Oct 11 09:40:27 link-08 kernel: scsi3 (0:2): rejecting I/O to offline device
Oct 11 09:40:27 link-08 kernel: scsi3 (0:2): rejecting I/O to offline device
Oct 11 09:40:28 link-08 kernel: dm-cmirror: Recovery halted due to error on GGVE3AkS
Oct 11 09:40:28 link-08 kernel: scsi3 (0:2): rejecting I/O to offline device
Oct 11 09:40:29 link-08 last message repeated 221 times
Oct 11 09:40:29 link-08 kernel: dm-cmirror: LOG INFO:
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   uuid:
LVM-irCTD2KHMrA95wunOHBHGaAmIgHdVvcDTd7OSlyrm42PnyoT9Dl81peVGGVE3AkS
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   uuid_ref    : 1
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   log type    : disk
Oct 11 09:40:29 link-08 kernel: dm-cmirror:  ?region_count: 1600
Oct 11 09:40:29 link-08 kernel: dm-cmirror:  ?sync_count  : 1600
Oct 11 09:40:29 link-08 kernel: dm-cmirror:  ?sync_search : 1600
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   in_sync     : YES
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   suspended   : NO
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   recovery_halted : YES
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   server_id   : 1
Oct 11 09:40:29 link-08 kernel: dm-cmirror:   server_valid: YES
Oct 11 09:40:29 link-08 kernel: scsi3 (0:2): rejecting I/O to offline device
Oct 11 09:40:29 link-08 last message repeated 43 times
Oct 11 09:40:29 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1072/GGVE3AkS
Oct 11 09:40:29 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:29 link-08 kernel: scsi3 (0:2): rejecting I/O to offline device
Oct 11 09:40:29 link-08 last message repeated 2 times
Oct 11 09:40:29 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1072/GGVE3AkS
Oct 11 09:40:29 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:29 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1072/GGVE3AkS
Oct 11 09:40:29 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:29 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1072/GGVE3AkS
Oct 11 09:40:29 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
[...]
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1211/GGVE3AkS
Oct 11 09:40:34 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1211/GGVE3AkS
Oct 11 09:40:34 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
1211/GGVE3AkS
Oct 11 09:40:34 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror: cluster_presuspend: recovery halted
on GGVE3AkS(1)
Oct 11 09:40:34 link-08 kernel: scsi3 (0:2): rejecting I/O to offline device
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Notifying server(1) of sync change:
855/GGVE3AkS
Oct 11 09:40:34 link-08 kernel: dm-cmirror: server_complete_resync_work -
Setting recovery_halted = 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror: cluster_postsuspend
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Server for GGVE3AkS still busy,
waiting for others
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Telling everyone I'm suspending
(GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror: LRT_MASTER_LEAVING(13): (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   starter     : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   co-ordinator: 0
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   node_count  : 0
Oct 11 09:40:34 link-08 kernel: dm-cmirror: LRT_MASTER_LEAVING(13): (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   starter     : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   co-ordinator: 0
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   node_count  : 3
Oct 11 09:40:34 link-08 kernel: dm-cmirror: LRT_ELECTION(10): (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   starter     : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   co-ordinator: 57005
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   node_count  : 0
Oct 11 09:40:34 link-08 kernel: dm-cmirror: LRT_ELECTION(10): (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   starter     : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   co-ordinator: 57005
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   node_count  : 3
Oct 11 09:40:34 link-08 kernel: dm-cmirror: LRT_SELECTION(11): (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   starter     : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   co-ordinator: 57005
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   node_count  : 3
Oct 11 09:40:34 link-08 kernel: dm-cmirror: LRT_MASTER_ASSIGN(12): (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   starter     : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   co-ordinator: 57005
Oct 11 09:40:34 link-08 kernel: dm-cmirror:   node_count  : 1
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Suspending now (GGVE3AkS)
Oct 11 09:40:34 link-08 kernel: dm-cmirror: Removing GGVE3AkS (1)
Oct 11 09:40:34 link-08 kernel: dm-cmirror: 0 region user structures freed
Oct 11 09:40:36 link-08 qarshd[22231]: Nothing to do
Oct 11 09:40:36 link-08 qarshd[22261]: Nothing to do
Oct 11 09:40:37 link-08 kernel: dm-cmirror: stop_server called
Oct 11 09:40:39 link-08 qarshd[22231]: Nothing to do
Oct 11 09:40:39 link-08 qarshd[22261]: Nothing to do
Oct 11 09:40:42 link-08 kernel: dm-cmirror: Closing socket on server side
Oct 11 09:40:42 link-08 qarshd[22231]: Nothing to do
Oct 11 09:40:42 link-08 qarshd[22261]: That's enough
Oct 11 09:40:45 link-08 qarshd[22231]: Nothing to do
Oct 11 09:40:48 link-08 qarshd[22231]: Nothing to do
Oct 11 09:40:48 link-08 qarshd[22316]: Talking to peer 10.15.80.47:49792
Oct 11 09:40:48 link-08 qarshd[22316]: Running cmdline: lvs


Version-Release number of selected component (if applicable):
[root@link-08 ~]# uname -ar
Linux link-08 2.6.9-60.ELsmp #1 SMP Tue Sep 25 22:55:08 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux
[root@link-08 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.27-2.el4
[root@link-08 ~]# rpm -q cmirror-kernel
cmirror-kernel-2.6.9-38.1
[root@link-08 ~]# rpm -q dlm-kernel
dlm-kernel-2.6.9-50.1
Comment 1 Christine Caulfield 2007-10-12 03:32:09 EDT
I suspect it's unrelated to that patch, which only affected the unlock path.
This hang seems to be in the lock path.

Any chance of a clvmd log,  dlm lock dump and sysrq-T ?
Comment 2 Corey Marthaler 2007-10-12 11:08:41 EDT
Created attachment 225631 [details]
Here's the sysrq-T from the link-08 where the cmd was hung.
Comment 5 Christine Caulfield 2009-08-26 03:27:28 EDT
This bug has been in NEEDINFO for ages now so I'll close it.

Feel free to re-open it if you see the problem again.

Note You need to log in before you can comment on or make changes to this bug.