Bug 506843 - cmirror operations can result in deadlocked suspended devices
Summary: cmirror operations can result in deadlocked suspended devices
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-18 21:10 UTC by Corey Marthaler
Modified: 2010-01-12 02:10 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:05:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log from hayes-01 (946.66 KB, text/plain)
2009-06-18 21:18 UTC, Corey Marthaler
no flags Details
log from hayes-02 (521.40 KB, text/plain)
2009-06-18 21:19 UTC, Corey Marthaler
no flags Details
log from hayes-03 (516.71 KB, text/plain)
2009-06-18 21:21 UTC, Corey Marthaler
no flags Details
log from grant-01 (907.80 KB, text/plain)
2009-07-21 18:51 UTC, Corey Marthaler
no flags Details
log from grant-02 (73.23 KB, text/plain)
2009-07-21 18:52 UTC, Corey Marthaler
no flags Details
log from grant-03 (906.73 KB, text/plain)
2009-07-21 18:52 UTC, Corey Marthaler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1340 0 normal SHIPPED_LIVE cmirror bug-fix update 2009-09-01 10:42:38 UTC

Description Corey Marthaler 2009-06-18 21:10:07 UTC
Description of problem:
The errors seen here in the syslog are nothing new, but I couldn't find an open bug that resembles this issue, though it looks similar to "fixed" bug 501473.

This was on the hayes cluster which uses AOE devices, but that shouldn't affect communication with the clog server.

pv_shuffle output:

[lvm_cluster_pvmove] [lvm_cluster_pvmove] === Iteration 1 of 10 started on hayes-01 at Thu Jun 18 15:21:40 CDT 2009 ===      
[lvm_cluster_pvmove] [lvm_cluster_pvmove] FREE PVS: /dev/etherd/e1.1p1 /dev/etherd/e1.1p10 /dev/etherd/e1.1p11 /dev/etherd/e1.1p12                                                                                                                        
[lvm_cluster_pvmove] [lvm_cluster_pvmove] INUSE PVS:                                                                         
[lvm_cluster_pvmove] [lvm_cluster_pvmove]       pv_shuffle_A=/dev/etherd/e1.1p6 /dev/etherd/e1.1p7 /dev/etherd/e1.1p8 /dev/etherd/e1.1p9                                                                                                                  
[lvm_cluster_pvmove] [lvm_cluster_pvmove]       pv_shuffle_B=/dev/etherd/e1.1p14 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 /dev/etherd/e1.1p4                                                                                                                 
[lvm_cluster_pvmove] [lvm_cluster_pvmove] NOT INUSE PVS:                                                                     
[lvm_cluster_pvmove] [lvm_cluster_pvmove]       pv_shuffle_A=/dev/etherd/e1.1p5                                              
[lvm_cluster_pvmove] [lvm_cluster_pvmove]       pv_shuffle_B=/dev/etherd/e1.1p13                                             
[lvm_cluster_pvmove] [lvm_cluster_pvmove]                                                                                    
[lvm_cluster_pvmove] [lvm_cluster_pvmove] Adding /dev/etherd/e1.1p10 to pv_shuffle_A                                         
[lvm_cluster_pvmove] [lvm_cluster_pvmove] Adding /dev/etherd/e1.1p11 to pv_shuffle_B                                         
[lvm_cluster_pvmove] [lvm_cluster_pvmove]                                                                                    
[lvm_cluster_pvmove] [lvm_cluster_pvmove] Moving data from /dev/etherd/e1.1p9 to /dev/etherd/e1.1p10 on hayes-01             
[lvm_cluster_pvmove] [lvm_cluster_pvmove] Moving data from /dev/etherd/e1.1p4 to /dev/etherd/e1.1p11 on hayes-01             
[lvm_cluster_pvmove] [lvm_cluster_pvmove] hayes-01: pvmove -v /dev/etherd/e1.1p9 /dev/etherd/e1.1p10                         
[lvm_cluster_pvmove] [lvm_cluster_pvmove] hayes-01: pvmove -v /dev/etherd/e1.1p4 /dev/etherd/e1.1p11                         
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Finding volume group "pv_shuffle_A"                                            
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Executing: /sbin/modprobe dm-log-clustered                                     
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Archiving volume group "pv_shuffle_A" metadata (seqno 4).                      
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Finding volume group "pv_shuffle_B"                                            
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Executing: /sbin/modprobe dm-log-clustered                                     
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Archiving volume group "pv_shuffle_B" metadata (seqno 4).                      
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Creating logical volume pvmove0                                                
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Moving 896 extents of logical volume pv_shuffle_A/linear                       
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Creating logical volume pvmove0                                                
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Moving 896 extents of logical volume pv_shuffle_B/linear                       
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Updating volume group metadata                                                 
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Updating volume group metadata                                                 
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Error locking on node hayes-03: Command timed out                              
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-02: Command timed out                                
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-01: Command timed out                                
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Temporary pvmove mirror activation failed.                                       
[lvm_cluster_pvmove] [lvm_cluster_pvmove] Error locking on node hayes-03: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-02: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-01: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Temporary pvmove mirror activation failed.
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Error locking on node hayes-03: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-02: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-01: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Unable to drop cached metadata for VG pv_shuffle_A.
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   ABORTING: Failed to write new data locations to disk.
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   ABORTING: Restoring original configuration before pvmove failed. Run pvmove --abort.
[lvm_cluster_pvmove] [lvm_cluster_pvmove] Error locking on node hayes-03: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-02: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-01: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Unable to drop cached metadata for VG pv_shuffle_B.
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   ABORTING: Failed to write new data locations to disk.
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   ABORTING: Restoring original configuration before pvmove failed. Run pvmove --abort.
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-03: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-02: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-01: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Creating volume group backup "/etc/lvm/backup/pv_shuffle_B" (seqno 5).
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-03: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-02: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]   Error locking on node hayes-01: Command timed out
[lvm_cluster_pvmove] [lvm_cluster_pvmove]     Creating volume group backup "/etc/lvm/backup/pv_shuffle_A" (seqno 5).


Jun 18 15:21:16 hayes-01 qarshd[17393]: Running cmdline: pvmove -v /dev/etherd/e1.1p9 /dev/etherd/e1.1p10
Jun 18 15:21:16 hayes-01 xinetd[2835]: START: qarsh pid=17395 from=10.15.80.47
Jun 18 15:21:16 hayes-01 qarshd[17395]: Talking to peer 10.15.80.47:42875
Jun 18 15:21:16 hayes-01 qarshd[17395]: Running cmdline: dmsetup status pv_shuffle_A-pvmove0 2>&1 | grep -q core
Jun 18 15:21:16 hayes-01 xinetd[2835]: EXIT: qarsh status=0 pid=17395 duration=0(sec)
Jun 18 15:21:16 hayes-01 xinetd[2835]: START: qarsh pid=17399 from=10.15.80.47
Jun 18 15:21:16 hayes-01 qarshd[17399]: Talking to peer 10.15.80.47:42876
Jun 18 15:21:16 hayes-01 qarshd[17399]: Running cmdline: pvmove -v /dev/etherd/e1.1p4 /dev/etherd/e1.1p11
Jun 18 15:21:17 hayes-01 xinetd[2835]: START: qarsh pid=17460 from=10.15.80.47
Jun 18 15:21:17 hayes-01 qarshd[17460]: Talking to peer 10.15.80.47:42877
Jun 18 15:21:17 hayes-01 qarshd[17460]: Running cmdline: dmsetup status pv_shuffle_A-pvmove0 2>&1 | grep -q core
Jun 18 15:21:32 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_RESUME/49] - retrying
Jun 18 15:21:32 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_GET_SYNC_COUNT/50] - retrying
Jun 18 15:21:32 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_GET_SYNC_COUNT/51] - retrying
Jun 18 15:21:47 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_RESUME/52] - retrying
[...]
Jun 18 15:42:33 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_RESUME/301] - retrying
Jun 18 15:42:33 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_GET_SYNC_COUNT/302] - retrying
Jun 18 15:42:33 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_GET_SYNC_COUNT/303] - retrying
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_RESUME/304] - retrying
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_RESUME] to server: -3
Jun 18 15:42:48 hayes-01 kernel: device-mapper: raid1: log resume failed
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_GET_RESYNC_WORK] to server: -3
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered:  Userspace cluster log server not found.
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_CTR] to server: -3
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered:  Userspace cluster log server not found
Jun 18 15:42:48 hayes-01 kernel: device-mapper: table: 253:5: mirror: Error creating mirror dirty log
Jun 18 15:42:48 hayes-01 kernel: device-mapper: ioctl: error adding target to table
Jun 18 15:42:48 hayes-01 xinetd[2835]: EXIT: qarsh status=0 pid=17399 duration=1292(sec)
Jun 18 15:42:48 hayes-01 xinetd[2835]: EXIT: qarsh status=0 pid=17393 duration=1292(sec)
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_GET_SYNC_COUNT/305] - retrying
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_GET_SYNC_COUNT] to server: -3
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered:  Userspace cluster log server not found.
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: [gFnfLkER] Request timed out: [DM_CLOG_GET_SYNC_COUNT/306] - retrying
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_GET_SYNC_COUNT] to server: -3
Jun 18 15:42:48 hayes-01 kernel: device-mapper: dm-log-clustered:  Userspace cluster log server not found.
Jun 18 15:42:50 hayes-01 kernel: device-mapper: dm-log-clustered: Attempting to contact cluster log server...
Jun 18 15:42:50 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_CTR] to server: -3
Jun 18 15:42:50 hayes-01 kernel: device-mapper: dm-log-clustered: Attempting to contact cluster log server...
Jun 18 15:42:50 hayes-01 kernel: device-mapper: dm-log-clustered: Unable to send cluster log request [DM_CLOG_CTR] to server: -3
Jun 18 15:42:50 hayes-01 kernel: device-mapper: dm-log-clustered: Attempting to contact cluster log server...

clogd is still running on all nodes in the cluster:
[root@hayes-01 ~]# ps -ef | grep clogd
root      5275     1  0 10:11 ?        00:00:00 clogd
[root@hayes-02 ~]# ps -ef | grep clogd
root      5286     1  0 10:11 ?        00:00:00 clogd
[root@hayes-03 ~]# ps -ef | grep clogd
root      5336     1  0 10:12 ?        00:00:00 clogd

[root@hayes-01 ~]# dmsetup ls
pv_shuffle_A-pvmove0    (253, 4)
pv_shuffle_A-linear     (253, 2)
pv_shuffle_B-pvmove0    (253, 5)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
pv_shuffle_B-linear     (253, 3)

[root@hayes-01 ~]# dmsetup status
[HANG]

[root@hayes-03 ~]# dmsetup info
Name:              pv_shuffle_A-pvmove0
State:             ACTIVE
/dev/mapper/pv_shuffle_A-pvmove0: open failed: No such file or directory
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-K55oFeoo5Edjvy3BTskg5LC8XSxkhpeNw1df0vzuhJ3jalSNWQZb502BgFnfLkER

Name:              pv_shuffle_A-linear
State:             SUSPENDED
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-K55oFeoo5Edjvy3BTskg5LC8XSxkhpeNUhfUM7yV8V9QNEZWbkQYd0x2lSHzlcQn

Name:              pv_shuffle_B-pvmove0
State:             ACTIVE
/dev/mapper/pv_shuffle_B-pvmove0: open failed: No such file or directory
Tables present:    None
Open count:        0
Event number:      0
Major, minor:      253, 5
Number of targets: 0
UUID: LVM-3BcsvJ7TmsqCqrpn6gDjvSGxgmLTA95VvEt8Zq6Y2B1t6PSCReD6bWVy8QoLGbWK

Name:              VolGroup00-LogVol01
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-daGgA9bZOYmMtbX3yGBmK0mVSb2pdnLo6TdzQIlAXLtrUnMriWHC8M8JGZaBsJak

Name:              VolGroup00-LogVol00
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-daGgA9bZOYmMtbX3yGBmK0mVSb2pdnLoM2U5LYl3xqXXNYTL2VKyTgHTuv23RgMt

Name:              pv_shuffle_B-linear
State:             SUSPENDED
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-3BcsvJ7TmsqCqrpn6gDjvSGxgmLTA95VxPP5scFn23HjHe48yKILXA4onUVdwj8J


I'll attach the logs and dumps of what's currently running

Version-Release number of selected component (if applicable):
[root@hayes-03 ~]# /usr/tests/sts-rhel5.4/lvm2/bin/lvm_rpms
2.6.18-150.el5

lvm2-2.02.46-8.el5    BUILT: Thu Jun 18 08:06:12 CDT 2009
lvm2-cluster-2.02.46-8.el5    BUILT: Thu Jun 18 08:05:27 CDT 2009
device-mapper-1.02.32-1.el5    BUILT: Thu May 21 02:18:23 CDT 2009
cmirror-1.1.37-1.el5    BUILT: Tue May  5 11:46:05 CDT 2009
kmod-cmirror-0.1.21-14.el5    BUILT: Thu May 21 08:28:17 CDT 2009

Comment 1 Corey Marthaler 2009-06-18 21:18:53 UTC
Created attachment 348558 [details]
log from hayes-01

Comment 2 Corey Marthaler 2009-06-18 21:19:37 UTC
Created attachment 348559 [details]
log from hayes-02

Comment 3 Corey Marthaler 2009-06-18 21:21:42 UTC
Created attachment 348560 [details]
log from hayes-03

Comment 4 Corey Marthaler 2009-06-19 22:00:50 UTC
This may be a bigger deal than just pvmoves. I'm seeing this during other operations as well like lvreduce.

--- ALTERATION ITERATION 2 ---------------------------

VOLUME REDUCTION from 33072 to 22789 on taft-04
  WARNING: Reducing active logical volume to 89.02 GB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)

[DEADLOCK]

Name:              mirror_1_4509-mirror_1_45090
State:             SUSPENDED
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      1
Major, minor:      253, 75
Number of targets: 1
UUID: LVM-mLiEzEQbpF3luJ6Z2sZz6PA53YR9sAlvHh7pPeGHEezzgW2sNMUP6aq0uFJf0t4H

Appears to be a regression as I've now seen this quite a few times.

Comment 5 Jonathan Earl Brassow 2009-07-21 15:20:50 UTC
Are you still seeing this with the latest build?

The log server looks to be hung.  Have you tried getting infomation from 'killall -USR1 clogd'?  an strace?  attaching gdb?  We need to find out where/why it is hung.  I could check out your machines if you can repo the problem too.

Comment 7 Corey Marthaler 2009-07-21 18:46:41 UTC
Easily reproduced, attaching logs containing debug info...

Comment 8 Corey Marthaler 2009-07-21 18:51:24 UTC
Created attachment 354550 [details]
log from grant-01

Comment 9 Corey Marthaler 2009-07-21 18:52:23 UTC
Created attachment 354552 [details]
log from grant-02

Comment 10 Corey Marthaler 2009-07-21 18:52:57 UTC
Created attachment 354553 [details]
log from grant-03

Comment 13 Jonathan Earl Brassow 2009-07-27 18:10:24 UTC
commit fa15cb16f0de9ddf06ddb50f2fcf26dd9c257931
Author: Jonathan Brassow <jbrassow>
Date:   Mon Jul 27 13:07:36 2009 -0500

    clogd/dm-log-clustered.ko: Fix for bugs 506843, 479749, 507400

    Device-mapper userspace logs (like the clustered log) are
    identified by a universally unique identifier (UUID).  This
    identifier is used to associate requests from the kernel to
    a specific log in userspace.  The UUID must be unique everywhere,
    since multiple machines may use this identifier when sharing a
    log, as is the case for cluster logs.

    Sometimes, device-mapper/LVM may re-use a UUID.  This is the
    case during 'pvmove's, when moving from one segment of an LV
    to another, or when resizing a mirror, etc.  In these cases,
    a new log is created with the same UUID and loaded in the
    "inactive" slot.  When a device-mapper "resume" is issued,
    the "live" talbe is deactivated and the new "inactive" table
    becomes "live".  (The "inactive" table can also be removed
    via a device-mapper 'clear' command.)

    The above two issues where colliding in the 3 bugs (possibly
    more) mentioned in the title.  More than one log was being
    created with the same UUID, and there was no way to distinguish
    between them.  So, sometimes the wrong log would be swapped
    out during the exchange.

    The solution is to create a 'uuid_instance', or perhaps a
    'luid'.  A local unique identifier to go along with the
    UUID.  This new identifier is used to determine exactly which
    log is being referenced by the kernel when the log exchange
    is made.  The identifier is not universally safe, but it does
    not need to be, since create/destroy/suspend/resume operations
    are bound to a specific machine; and these are the operations
    that make up the exchange.

Comment 15 Corey Marthaler 2009-08-06 15:02:37 UTC
Marking fix verified in cmirror-1.1.39-2.el5.

Ran pv_shuffle and lvm_config all night without seeing this issue, however we did end up hitting bug 506986.

Comment 17 errata-xmlrpc 2009-09-02 11:05:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1340.html


Note You need to log in before you can comment on or make changes to this bug.