Bug 245799 - cmirror/clvmd deadlock during simultaneous cmirror operations
cmirror/clvmd deadlock during simultaneous cmirror operations
Status: CLOSED DEFERRED
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror-kernel (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-26 15:11 EDT by Corey Marthaler
Modified: 2013-09-23 11:32 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 11:32:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lvm backtraces (4.71 KB, text/plain)
2007-06-26 15:14 EDT, Corey Marthaler
no flags Details
lvm backtraces (4.44 KB, text/plain)
2007-06-26 15:15 EDT, Corey Marthaler
no flags Details
lvm backtraces (5.06 KB, text/plain)
2007-06-26 15:16 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2007-06-26 15:11:57 EDT
Description of problem:
I was doing looping cmirror creates and deletes on each node in a 3 node
cluster, in order to verify bz 217895, and after a few iterations clvmd hung.

I'll attach the backtraces from the 3 nodes, here are the messages:


[root@link-02 tmp]# cat messages.txt
Jun 26 13:57:18 link-02 kernel: dm-cmirror: LOG INFO:
Jun 26 13:57:18 link-02 kernel: dm-cmirror:   uuid:
LVM-0TTcBDxtFhUgecwRivnHcce7rYD55adOyEpgsNT3ucbtIJEFk5pY4lfrwRsNvELD
Jun 26 13:57:18 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Jun 26 13:57:18 link-02 kernel: dm-cmirror:  ?region_count: 4096
Jun 26 13:57:18 link-02 kernel: dm-cmirror:  ?sync_count  : 0
Jun 26 13:57:18 link-02 kernel: dm-cmirror:  ?sync_search : 0
Jun 26 13:57:18 link-02 kernel: dm-cmirror:   in_sync     : YES
Jun 26 13:57:18 link-02 kernel: dm-cmirror:   suspended   : NO
Jun 26 13:57:18 link-02 kernel: dm-cmirror:   server_id   : 3
Jun 26 13:57:18 link-02 kernel: dm-cmirror:   server_valid: YES
Jun 26 13:58:48 link-02 kernel: dm-cmirror: LOG INFO:
Jun 26 13:58:48 link-02 kernel: dm-cmirror:   uuid:
LVM-0TTcBDxtFhUgecwRivnHcce7rYD55adOyEpgsNT3ucbtIJEFk5pY4lfrwRsNvELD
Jun 26 13:58:48 link-02 kernel: dm-cmirror:   uuid_ref    : 1
Jun 26 13:58:48 link-02 kernel: dm-cmirror:  ?region_count: 4096
Jun 26 13:58:48 link-02 kernel: dm-cmirror:  ?sync_count  : 0
Jun 26 13:58:48 link-02 kernel: dm-cmirror:  ?sync_search : 0
Jun 26 13:58:48 link-02 kernel: dm-cmirror:   in_sync     : YES
Jun 26 13:58:48 link-02 kernel: dm-cmirror:   suspended   : NO
Jun 26 13:58:48 link-02 kernel: dm-cmirror:   server_id   : 3
Jun 26 13:58:48 link-02 kernel: dm-cmirror:   server_valid: YES


[root@link-04 tmp]# cat messages.txt
Jun 26 13:59:45 link-04 kernel: dm-cmirror: LOG INFO:
Jun 26 13:59:45 link-04 kernel: dm-cmirror:   uuid:
LVM-0TTcBDxtFhUgecwRivnHcce7rYD55adOyEpgsNT3ucbtIJEFk5pY4lfrwRsNvELD
Jun 26 13:59:45 link-04 kernel: dm-cmirror:   uuid_ref    : 1
Jun 26 13:59:45 link-04 kernel: dm-cmirror:  ?region_count: 4096
Jun 26 13:59:45 link-04 kernel: dm-cmirror:  ?sync_count  : 0
Jun 26 13:59:45 link-04 kernel: dm-cmirror:  ?sync_search : 0
Jun 26 13:59:45 link-04 kernel: dm-cmirror:   in_sync     : YES
Jun 26 13:59:45 link-04 kernel: dm-cmirror:   suspended   : NO
Jun 26 13:59:45 link-04 kernel: dm-cmirror:   server_id   : 3
Jun 26 13:59:45 link-04 kernel: dm-cmirror:   server_valid: YES



[root@link-08 tmp]# cat messages.txt
Jun 26 14:03:24 link-08 kernel: dm-cmirror: LOG INFO:
Jun 26 14:03:24 link-08 kernel: dm-cmirror:   uuid:
LVM-0TTcBDxtFhUgecwRivnHcce7rYD55adOyEpgsNT3ucbtIJEFk5pY4lfrwRsNvELD
Jun 26 14:03:24 link-08 kernel: dm-cmirror:   uuid_ref    : 1
Jun 26 14:03:24 link-08 kernel: dm-cmirror:  ?region_count: 4096
Jun 26 14:03:24 link-08 kernel: dm-cmirror:  ?sync_count  : 4096
Jun 26 14:03:24 link-08 kernel: dm-cmirror:  ?sync_search : 4096
Jun 26 14:03:24 link-08 kernel: dm-cmirror:   in_sync     : YES
Jun 26 14:03:24 link-08 kernel: dm-cmirror:   suspended   : NO
Jun 26 14:03:24 link-08 kernel: dm-cmirror:   server_id   : 3
Jun 26 14:03:24 link-08 kernel: dm-cmirror:   server_valid: YES
Jun 26 14:04:54 link-08 kernel: dm-cmirror: LOG INFO:
Jun 26 14:04:54 link-08 kernel: dm-cmirror:   uuid:
LVM-0TTcBDxtFhUgecwRivnHcce7rYD55adOyEpgsNT3ucbtIJEFk5pY4lfrwRsNvELD
Jun 26 14:04:54 link-08 kernel: dm-cmirror:   uuid_ref    : 1
Jun 26 14:04:54 link-08 kernel: dm-cmirror:  ?region_count: 4096
Jun 26 14:04:54 link-08 kernel: dm-cmirror:  ?sync_count  : 4096
Jun 26 14:04:54 link-08 kernel: dm-cmirror:  ?sync_search : 4096
Jun 26 14:04:54 link-08 kernel: dm-cmirror:   in_sync     : YES
Jun 26 14:04:54 link-08 kernel: dm-cmirror:   suspended   : NO
Jun 26 14:04:54 link-08 kernel: dm-cmirror:   server_id   : 3
Jun 26 14:04:54 link-08 kernel: dm-cmirror:   server_valid: YES



Version-Release number of selected component (if applicable):
2.6.9-55.8.ELsmp
cmirror-kernel-2.6.9-32.0
lvm2-cluster-2.02.21-7.el4
Comment 1 Corey Marthaler 2007-06-26 15:14:48 EDT
Created attachment 157941 [details]
lvm backtraces
Comment 2 Corey Marthaler 2007-06-26 15:15:47 EDT
Created attachment 157942 [details]
lvm backtraces
Comment 3 Corey Marthaler 2007-06-26 15:16:11 EDT
Created attachment 157943 [details]
lvm backtraces
Comment 4 Corey Marthaler 2007-06-26 17:13:42 EDT
This was fairly easy to reproduce... I smell a regression.
Comment 5 Corey Marthaler 2007-06-26 17:21:25 EDT
Just a note that a write to the mirror with dd during the clvmd deadlock did
succeed.
Comment 6 Nate Straz 2007-06-28 12:53:16 EDT
I think I hit this bug while running activator on a cluster using UP kernels. 
The first time I hit it on the 6th iteration, the second time I hit it on the 20th.

Backtrace for vgchange-anactivator4 (13510):
#1  0x008f55a3 in __read_nocancel () from /lib/tls/libpthread.so.0
#2  0x080a00b0 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags" is not
available.)    at locking/cluster_locking.c:115
#3  0x080a04a7 in _lock_resource (cmd=0x8e82a88, resource=Variable "resource" is
not available.)    at locking/cluster_locking.c:410
#4  0x0808a1f5 in _lock_vol (cmd=0x8e82a88, resource=0xbff6d920 "activator4",  
  flags=6) at locking/locking.c:237
        #######################################
        # _lock_vol flags = LCK_VG | LCK_UNLOCK
        #######################################
#5  0x0808a414 in lock_vol (cmd=0x8e82a88, vol=0x8e99e00 "activator4", flags=6)
   at locking/locking.c:270
#6  0x08066599 in _process_one_vg (cmd=0x8e82a88,     vg_name=0x8e99e00
"activator4", vgid=0x0, tags=0xbff6dad0,     arg_vgnames=0xbff6dac8,
lock_type=33, consistent=1, handle=0x0, ret_max=1,     process_single=0x80697e8
<vgchange_single>) at toollib.c:487
#7  0x0806698c in process_each_vg (cmd=0x8e82a88, argc=1, argv=0xbff716cc,    
lock_type=33, consistent=0, handle=0x0,     process_single=0x80697e8
<vgchange_single>) at toollib.c:568
#8  0x0806a8fe in vgchange (cmd=0x8e82a88, argc=-512, argv=0xfffffe00)    at
vgchange.c:617
#9  0x0805b148 in lvm_run_command (cmd=0x8e82a88, argc=1, argv=0xbff716cc)    at
lvmcmdline.c:935
#10 0x0805c147 in lvm2_main (argc=3, argv=0xbff716c4, is_static=0)    at
lvmcmdline.c:1423

Backtrace for lvs (12930):
#1  0x008f55a3 in __read_nocancel () from /lib/tls/libpthread.so.0
#2  0x080a00b0 in _lock_for_cluster (cmd=51 '3', flags=Variable "flags" is not
available.)    at locking/cluster_locking.c:115
#3  0x080a04a7 in _lock_resource (cmd=0x9b6da40, resource=Variable "resource" is
not available.)    at locking/cluster_locking.c:410
#4  0x0808a1f5 in _lock_vol (cmd=0x9b6da40, resource=0xbfdff1b0 "activator4",  
  flags=33) at locking/locking.c:237
        #######################################
        # _lock_vol flags = LCK_VG | LCK_HOLD | LCK_READ
        #######################################
#5  0x0808a414 in lock_vol (cmd=0x9b6da40, vol=0x9b86ff8 "activator4",    
flags=33) at locking/locking.c:270
#6  0x08067404 in process_each_lv (cmd=0x9b6da40, argc=0, argv=0xbfe02fa8,    
lock_type=33, handle=0x9b873d8, process_single=0x806500e <_lvs_single>)    at
toollib.c:324
#7  0x080659d5 in _report (cmd=0x9b6da40, argc=0, argv=0xbfe02fa8,    
report_type=LVS) at reporter.c:329
#8  0x0805b148 in lvm_run_command (cmd=0x9b6da40, argc=0, argv=0xbfe02fa8)    at
lvmcmdline.c:935
#9  0x0805c147 in lvm2_main (argc=1, argv=0xbfe02fa4, is_static=0)    at
lvmcmdline.c:1423
Comment 7 Jonathan Earl Brassow 2007-08-01 15:14:53 EDT
I think I've reproduced this using:

#from each node
while true ; do
 lvcreate -m1 -L 500M -n `hostname -s` vg
 lvchange -an vg/`hostname -s`
 lvremove -f vg/`hostname -s`
done

kernel: 2.6.9-55.16.ELsmp
Comment 8 Corey Marthaler 2007-08-24 09:13:53 EDT
I reproduced this bug while running cmirror_lock_stress on the latest code.

2.6.9-56.ELsmp
cmirror-kernel-2.6.9-33.2
lvm2-cluster-2.02.27-1.el4
Comment 9 Corey Marthaler 2007-09-06 09:39:06 EDT
Just a note that with the latest code, I wasn't able to reproduce this deadlock
while running cmirror lock stress tests all night. I'll continue testing however...

2.6.9-56.ELsmp
lvm2-cluster-2.02.27-2.el4
lvm2-2.02.27-2.el4
cmirror-kernel-2.6.9-34.1
Comment 10 Jonathan Earl Brassow 2007-09-28 11:34:59 EDT
assigned -> modified.
Comment 11 Corey Marthaler 2007-11-08 12:10:46 EST
Marking this verified as it hasn't been seen with any of the latest
cmirror-kernel versions.
Comment 13 Lon Hohberger 2013-09-23 11:32:02 EDT
The Red Hat Cluster Suite product is past end-of-life; closing.

Note You need to log in before you can comment on or make changes to this bug.