Bug 625609 - simultaneous cmirror operations fail due to locking issues
Summary: simultaneous cmirror operations fail due to locking issues
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
(Show other bugs)
Version: 6.0
Hardware: All Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Corey Marthaler
URL:
Whiteboard:
Keywords: Regression, TestBlocker
Depends On:
Blocks: 653628 682649
TreeView+ depends on / blocked
 
Reported: 2010-08-19 22:30 UTC by Corey Marthaler
Modified: 2011-03-07 06:03 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Attempting to run multiple LVM commands in quick succession might cause a backlog of these commands. Consequently, some of the operations requested might time-out, and subsequently, fail.
Story Points: ---
Clone Of:
: 653628 682649 (view as bug list)
Environment:
Last Closed: 2010-11-22 23:14:56 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rlerch: needinfo+


Attachments (Terms of Use)

Description Corey Marthaler 2010-08-19 22:30:47 UTC
Description of problem:
I've reproduced this problem on two clusters.

[cmarthal@silver bin]$ ./cmirror_lock_stress -l /home/msp/cmarthal/work/rhel6/sts-root -r /usr/tests/sts-rhel6.0 -R ../../var/share/resource_files/grant.xml

[...]

creating lvm devices...
Create 7 PV(s) for lock_stress on grant-01
Create VG lock_stress on grant-01
Creating herd file /tmp/cmirror_lock_stress.201008191629/lock_stress.h2 containing lock_ops cmds for collie
Starting lock operations on all cluster nodes
lock operations either failed or timed out, check in /tmp/cmirror_lock_stress.201008191629


Individual cmd output:
Creating a 4 redundant legged cmirror named grant-03.5104
  Logical volume "grant-03.5104" created

Down converting cmirror from 4 legs to 1 on grant-03
  Error locking on node grant-03: Command timed out
  Problem reactivating grant-03.5104
couldn't down convert cmirror on grant-03

Aug 19 16:33:30 grant-03 qarshd[2632]: Running cmdline: lvconvert -m 1 lock_stress/grant-03.5104
Aug 19 16:33:33 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--02.5101 for events.
Aug 19 16:33:45 grant-03 lvm[2145]: Monitoring mirror device lock_stress-grant--02.5101 for events.
Aug 19 16:33:55 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--02.5101 for events.
Aug 19 16:34:12 grant-03 lvm[2145]: Monitoring mirror device lock_stress-grant--02.5101 for events.
Aug 19 16:34:24 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--01.5105 for events.
Aug 19 16:34:39 grant-03 lvm[2145]: Monitoring mirror device lock_stress-grant--01.5105 for events.
Aug 19 16:35:01 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--01.5105 for events.
Aug 19 16:35:08 grant-03 lvm[2145]: Monitoring mirror device lock_stress-grant--01.5105 for events.
Aug 19 16:35:18 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--03.5104 for events.
Aug 19 16:36:43 grant-03 lvm[2145]: Monitoring mirror device lock_stress-grant--03.5104 for events.
Aug 19 16:36:43 grant-03 xinetd[1664]: EXIT: qarsh status=0 pid=2632 duration=193(sec)
Aug 19 16:36:48 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--02.5101 for events.
Aug 19 16:37:20 grant-03 lvm[2145]: No longer monitoring mirror device lock_stress-grant--01.5105 for events.
Aug 19 16:45:21 grant-03 lvm[2145]: lock_stress-grant--03.5104 is now in-sync.


[root@grant-03 ~]# lvs -a -o +devices
  LV                       VG          Attr   LSize   Log                Copy%  Devices
  grant-01.5105            lock_stress -wi-a- 500.00m                           /dev/sdc4(0)
  grant-02.5101            lock_stress -wi-a- 500.00m                           /dev/sdc4(125)
  grant-03.5104            lock_stress mwi-a- 500.00m grant-03.5104_mlog 100.00 grant-03.5104_mimage_0(0),grant-03.5104_mimage_1(0)
  [grant-03.5104_mimage_0] lock_stress iwi-ao 500.00m                           /dev/sdc4(250)
  [grant-03.5104_mimage_1] lock_stress iwi-ao 500.00m                           /dev/sdc3(250)
  grant-03.5104_mimage_2   lock_stress -wi-a- 500.00m                           /dev/sdc2(250)
  grant-03.5104_mimage_3   lock_stress -wi-a- 500.00m                           /dev/sdc1(125)
  grant-03.5104_mimage_4   lock_stress -wi-a- 500.00m                           /dev/sdb4(125)
  [grant-03.5104_mlog]     lock_stress lwi-ao   4.00m                           /dev/sdb2(2)


Version-Release number of selected component (if applicable):
2.6.32-59.1.el6.x86_64

lvm2-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
lvm2-libs-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
lvm2-cluster-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
udev-147-2.22.el6    BUILT: Fri Jul 23 07:21:33 CDT 2010
device-mapper-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-libs-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-event-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
device-mapper-event-libs-1.02.53-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010
cmirror-2.02.72-8.el6    BUILT: Wed Aug 18 10:41:52 CDT 2010


How reproducible:
Everytime

Comment 1 Corey Marthaler 2010-08-19 22:32:35 UTC
Marking this a regression since this test (cmirror_lock_stress) used to pass in RHEL5.5.

Comment 2 Corey Marthaler 2010-08-24 21:23:01 UTC
There should be a 6.0 release note for this issue.

Comment 3 Corey Marthaler 2010-08-26 21:46:14 UTC
FWIW, I'm able to hit this bug with mirror_sanity as well.

SCENARIO - [verify_sync_completions]
Create 8 mirrors and verify that their copy percents complete
hayes-03: lvcreate -m 1 -n sync_check_1 -L 500M mirror_sanity
hayes-01: lvcreate -m 1 -n sync_check_2 -L 500M mirror_sanity
hayes-02: lvcreate -m 1 -n sync_check_3 -L 500M mirror_sanity
hayes-01: lvcreate -m 1 -n sync_check_4 -L 500M mirror_sanity
hayes-01: lvcreate -m 1 -n sync_check_5 -L 500M mirror_sanity
hayes-02: lvcreate -m 1 -n sync_check_6 -L 500M mirror_sanity
  Error locking on node hayes-03: Command timed out
  Aborting. Failed to activate new LV to wipe the start of it.
couldn't create mirror:
        hayes-02 lvcreate -m 1 -n sync_check_6 -L 500M mirror_sanity

Comment 5 Denise Dumas 2010-09-15 19:05:35 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Do not attempt to flood a production system with LVM commands, as the backlog of commands to be processed may increase to such a level that some operations fail due to timeouts.

Comment 7 Ryan Lerch 2010-10-13 03:54:44 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Do not attempt to flood a production system with LVM commands, as the backlog of commands to be processed may increase to such a level that some operations fail due to timeouts.+Attempting to run multiple LVM commands in quick succession might cause a backlog of these commands. Consequently, some of the operations requested might time-out, and subsequently, fail.

Comment 8 RHEL Product and Program Management 2010-11-22 23:14:56 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.