Bug 538571 - intermittent mirror creation failures after pvmove operation
Summary: intermittent mirror creation failures after pvmove operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2-cluster
Version: 5.4
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Milan Broz
QA Contact: Cluster QE
URL:
Whiteboard: FailsQA
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-18 20:36 UTC by Corey Marthaler
Modified: 2013-03-01 04:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 09:02:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log from clvmd -d on taft-01 (4.41 MB, text/plain)
2009-11-18 23:39 UTC, Corey Marthaler
no flags Details
log from clvmd -d on taft-02 (3.52 MB, text/plain)
2009-11-18 23:39 UTC, Corey Marthaler
no flags Details
log from clvmd -d on taft-03 (3.26 MB, text/plain)
2009-11-18 23:40 UTC, Corey Marthaler
no flags Details
log from clvmd -d on taft-04 (3.14 MB, text/plain)
2009-11-18 23:41 UTC, Corey Marthaler
no flags Details
clvmd -d output on grant-01 (478.75 KB, text/plain)
2009-12-02 20:39 UTC, Corey Marthaler
no flags Details
clvmd -d output on grant-02 (377.85 KB, text/plain)
2009-12-02 20:40 UTC, Corey Marthaler
no flags Details
clvmd -d output on grant-03 (249.13 KB, text/plain)
2009-12-02 20:41 UTC, Corey Marthaler
no flags Details
log from grant-03 (21.94 KB, text/plain)
2009-12-03 16:40 UTC, Corey Marthaler
no flags Details
log from grant-02 (24.73 KB, text/plain)
2009-12-04 17:12 UTC, Corey Marthaler
no flags Details
core from grant-02 (2.26 MB, application/octet-stream)
2009-12-22 20:39 UTC, Corey Marthaler
no flags Details
clvmd -d output on hayes-01 (176.56 KB, text/plain)
2010-01-22 17:28 UTC, Corey Marthaler
no flags Details
clvmd -d output on hayes-02 (199.04 KB, text/plain)
2010-01-22 17:29 UTC, Corey Marthaler
no flags Details
clvmd -d output on hayes-03 (125.15 KB, text/plain)
2010-01-22 17:29 UTC, Corey Marthaler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0299 0 normal SHIPPED_LIVE lvm2-cluster bug fix and enhancement update 2010-03-29 14:26:30 UTC

Description Corey Marthaler 2009-11-18 20:36:48 UTC
Description of problem:
I've been noticing random creation failures during regression test runs that all look similar to this. These are blocking qa mirror tests from running long periods of time.

Making mirror block_mirror100M of origin volume
grant-03: lvcreate -m 1 -n block_mirror100M -L 100M --nosync mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  /dev/mirror_sanity/block_mirror100M_mlog: not found: device not cleared
  Aborting. Failed to wipe mirror log.
  Failed to create mirror log.
couldn't create mirror:
        grant-03 lvcreate -m 1 -n block_mirror100M -L 100M --nosync mirror_sanity

Version-Release number of selected component (if applicable):
2.6.18-164.el5

lvm2-2.02.54-1.el5    BUILT: Thu Nov 12 05:28:35 CST 2009
lvm2-cluster-2.02.54-1.el5    BUILT: Thu Nov 12 05:27:29 CST 2009
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009

How reproducible:
every so often

Comment 1 Corey Marthaler 2009-11-18 23:38:20 UTC
Hit this again with 'clvmd -d' running. Attaching that output from all four taft nodes...

Comment 2 Corey Marthaler 2009-11-18 23:39:08 UTC
Created attachment 370226 [details]
log from clvmd -d on taft-01

Comment 3 Corey Marthaler 2009-11-18 23:39:47 UTC
Created attachment 370227 [details]
log from clvmd -d on taft-02

Comment 4 Corey Marthaler 2009-11-18 23:40:26 UTC
Created attachment 370228 [details]
log from clvmd -d on taft-03

Comment 5 Corey Marthaler 2009-11-18 23:41:24 UTC
Created attachment 370229 [details]
log from clvmd -d on taft-04

Comment 6 Corey Marthaler 2009-11-30 21:46:18 UTC
Saw this on the lastest rpms as well.

taft-04: lvcreate -m 3 -n syncd_secondary_4legs_1 -L 600M helter_skelter /dev/sdb1:0-1000 /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdc1:0-1000 /dev/sde1:0-150
 /dev/helter_skelter/syncd_secondary_4legs_1_mlog: not found: device not cleared
 Aborting. Failed to wipe mirror log.
 Failed to create mirror log.
 couldn't create mirror syncd_secondary_4legs_1 on taft-04

Version:
2.6.18-160.el5

lvm2-2.02.56-1.el5    BUILT: Tue Nov 24 13:24:02 CST 2009
lvm2-cluster-2.02.56-1.el5    BUILT: Tue Nov 24 13:27:05 CST 2009
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009

Comment 7 Milan Broz 2009-12-02 15:42:38 UTC
I am not able to reproduce this (the same rpms except kernel - I am using 2.6.18-164.6.1.el5).

Is there any error messages in syslog? Can we get -vvvv output of the failed command? (I see nothing wrong in clvmd logs, though)

Maybe some udev rules intereference (famous mpath_wait) but even with simulating that fail here it fails differently...
(I there device-mapper-multipath installed - /etc/udev/rules.d/40-multipath.rules is present?)

Comment 8 Corey Marthaler 2009-12-02 20:38:58 UTC
Posting the 'clvmd -d' log running during a creation failure. I'll attempt to get a -vvvv of the cmd as well.

From mirror_sanity:
SCENARIO - [vgcfgrestore_mirror_with_missing_pv]
Create a mirror, force remove a leg, and then restore it's VG
grant-02: lvcreate -m 1 -n missing_pv_mirror -L 100M --nosync mirror_sanity
  /dev/mirror_sanity/suspended: open failed: No such device or address
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  /dev/mirror_sanity/missing_pv_mirror_mlog: not found: device not cleared
  Aborting. Failed to wipe mirror log.
  Failed to create mirror log.
couldn't create mirror:
        grant-02 lvcreate -m 1 -n missing_pv_mirror -L 100M --nosync mirror_sanity

Comment 9 Corey Marthaler 2009-12-02 20:39:48 UTC
Created attachment 375587 [details]
clvmd -d output on grant-01

Comment 10 Corey Marthaler 2009-12-02 20:40:26 UTC
Created attachment 375588 [details]
clvmd -d output on grant-02

Comment 11 Corey Marthaler 2009-12-02 20:41:17 UTC
Created attachment 375589 [details]
clvmd -d output on grant-03

Comment 12 Milan Broz 2009-12-03 10:30:52 UTC
CLVMD[41a86940]: Dec  2 13:56:02 process_local_command: LOCK_LV (0x32) msg=0x1b6990d0, msglen =85, client=0x1b694830
CLVMD[41a86940]: Dec  2 13:56:02 do_lock_lv: resource 'EcwG3KnPURoZ8SWdkBEcanwE3gGYpn124lhah6lyV0iwGveAXHjRv1gKM0Pg3gzH', cmd = 0x19 LCK_LV_ACTIVATE (READ|LV|NONBLOCK), flags = 0x84 (DMEVENTD_MONITOR )
CLVMD[41a86940]: Dec  2 13:56:02 sync_lock: 'EcwG3KnPURoZ8SWdkBEcanwE3gGYpn124lhah6lyV0iwGveAXHjRv1gKM0Pg3gzH' mode:1 flags=1
CLVMD[41a86940]: Dec  2 13:56:02 sync_lock: returning lkid 520001
  device-mapper: create ioctl failed: Device or resource busy
CLVMD[41a86940]: Dec  2 13:56:02 Command return is 5

Comment 13 Milan Broz 2009-12-03 10:34:58 UTC
 device-mapper: create ioctl failed: Device or resource busy

This seems wrong. Can you post syslog from that time?
Also please lvs -o+uuid  (I am really not sure why we do not print LV name also in log)

Comment 14 Milan Broz 2009-12-03 10:46:02 UTC
Ah, well. it probably fails correctly, because some command before keep the log volume active - something similar to this:

# dmsetup create vg_bar-lv1_mlog --table "0 8192 linear 8:80 384"
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  /dev/vg_bar/lv1_mlog: not found: device not cleared
  Aborting. Failed to wipe mirror log.
  Error locking on node bar-01: Input/output error
  Unable to deactivate mirror log LV. Manual intervention required.
  Failed to create mirror log.

Please can you add to log
 - lvs -a -o+uuid before that lcvreate command, so we are sure the log volume is not in metadata
 - and dmsetup table to check that is it not active in kernel dm

(and obviously if it is there, the preceding command which did not removed that log device)

Comment 15 Corey Marthaler 2009-12-03 16:26:35 UTC
It looks like there are some entries in /dev that are still lying around on one of the nodes in the cluster (grant-03), but they don't appear to be active.

[root@grant-03 tmp]# lvs -a -o+uuid
  /dev/mapper/mirror_sanity-rename_mirrorA_mlog: open failed: No such device or address
  /dev/mapper/mirror_sanity-rename_mirrorA_mimage_0: open failed: No such device or address
  /dev/mapper/mirror_sanity-rename_mirrorA_mimage_1: open failed: No such device or address
  /dev/mapper/mirror_sanity-rename_mirrorA: open failed: No such device or address
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert LV UUID                               
  LogVol00 VolGroup00 -wi-ao 64.56G                                       3wecgf-Hx4k-oDJs-eeiV-VM5T-ARji-7NY3Bc
  LogVol01 VolGroup00 -wi-ao  9.81G                                       kudnYJ-UTPJ-ZHVZ-vJIO-s5dy-Va8P-s7oDE5


[root@grant-03 tmp]# dmsetup table
VolGroup00-LogVol01: 0 20578304 linear 8:2 135397760
VolGroup00-LogVol00: 0 135397376 linear 8:2 384

Comment 16 Corey Marthaler 2009-12-03 16:40:09 UTC
Looks like openais might have gone out to lunch just before those removes occurred. I'll attach the log from grant-03.

Comment 17 Corey Marthaler 2009-12-03 16:40:51 UTC
Created attachment 375830 [details]
log from grant-03

Comment 18 Milan Broz 2009-12-03 16:46:31 UTC
(In reply to comment #15)
> It looks like there are some entries in /dev that are still lying around on one
> of the nodes in the cluster (grant-03), but they don't appear to be active.

grant-03 is always remote node or the node where tests run?

Comment 19 Corey Marthaler 2009-12-03 17:10:48 UTC
grant-03 is just another node in the cluster. the tests are run from my desktop, and the cmds are randomly distributed to the different nodes in the cluster. So one node may created the mirror, another does something with it, and another deletes it.

Comment 20 Corey Marthaler 2009-12-04 17:11:39 UTC
Posting the log from grant-02 which was the node responsible for deactivating and removing the mirror /dev/mirror_sanity/block_mirror100M.

Comment 21 Corey Marthaler 2009-12-04 17:12:22 UTC
Created attachment 376133 [details]
log from grant-02

Comment 22 Corey Marthaler 2009-12-11 20:01:46 UTC
I'm seeing this with the latest rpms as well, and again, there's a openais config change just before the test fails.

# Note how in the testcase before the failure, there are locking errors around the same time openais claims a new configuration exists, even though no nodes has left or joined the cluster. 

SCENARIO - [pvmove_suspend_verification]
Create a linear and a fake left over pvmove target and verify
that doesn't cause a pvmove attempt to leave the linear suspended
Attempting pvmove of /dev/etherd/e1.1p1 on hayes-03
  Error locking on node hayes-03: device-mapper: create ioctl failed: Device or resource busy
  Temporary pvmove mirror activation failed.
  Error locking on node hayes-03: WARNING: dev_open(/dev/etherd/e1.1p5) called while suspended
  ABORTING: Unable to deactivate temporary logical volume "pvmove0"
  ABORTING: Restoring original configuration before pvmove failed. Run pvmove --abort.
Verifying the linear's dm state
Deactivating mirror suspended... and removing


SCENARIO - [recover_corrupt_metadata]
Create a mirror on hayes-03, corrupt it's metadata, and then restore the volume
hayes-03: lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  /dev/mirror_sanity/corrupt_meta_mirror_mlog: not found: device not cleared
  Aborting. Failed to wipe mirror log.
  Failed to initialise mirror log.
couldn't create mirror:
        hayes-03 lvcreate -m 1 -n corrupt_meta_mirror -L 300M --nosync mirror_sanity

Comment 23 Corey Marthaler 2009-12-21 20:33:42 UTC
This is in the latest.

SCENARIO - [down_convert_2_legs_w_only_1_listed]
Create a 4-way mirror and attempt to down convert it to a 2-way mirror, specifying only one PV
grant-03: lvcreate -m 3 --corelog -n four_to_two_convert -L 300M mirror_sanity
  /dev/mirror_sanity/four_to_two_convert: not found: device not cleared
  Aborting. Failed to wipe start of new LV.
couldn't create mirror:

Dec 21 14:15:18 grant-03 qarshd[31459]: Running cmdline: lvcreate -m 3 --corelog -n four_to_two_convert -L 300M mirror_sanity
Dec 21 14:15:19 grant-03 lvm[28847]: Monitoring mirror device mirror_sanity-four_to_two_convert for events
Dec 21 14:15:19 grant-03 lvm[28847]: No longer monitoring mirror device mirror_sanity-four_to_two_convert for events

# No issues talking to that other nodes:
[root@grant-03 ~]# netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500   0   434959      0      0      0   354563      0      0      0 BMRU
lo        16436   0       62      0      0      0       62      0      0      0 LRU


2.6.18-164.el5

lvm2-2.02.56-3.el5    BUILT: Mon Dec 21 11:30:40 CST 2009
lvm2-cluster-2.02.56-3.el5    BUILT: Mon Dec 21 11:31:56 CST 2009
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009

Comment 24 Milan Broz 2009-12-22 11:21:41 UTC
This is strange. Please can we try to add clvmd -R before these failing commands? If it helps, we are back in lvcache & precommited metadata problem.

Comment 25 Corey Marthaler 2009-12-22 19:36:42 UTC
I added a 'clvmd -R' before every lvcreate and that didn't seem to help. Also I just noticed a dmeventd segfault before the failure, that I presume wasn't there earlier when I filed this bug.

Dec 22 13:27:45 grant-03 openais[10966]: [TOTEM] Retransmit List: 1273
Dec 22 13:27:45 grant-03 openais[10966]: [TOTEM] Retransmit List: 12a1
Dec 22 13:27:46 grant-03 openais[10966]: [TOTEM] Retransmit List: 130b
Dec 22 13:27:47 grant-03 restorecond: Reset file context /etc/mtab: system_u:object_r:etc_t:s0->system_u:object_r:etc_runtime_t:s0
Dec 22 13:27:47 grant-03 xinetd[9128]: EXIT: qarsh status=0 pid=25825 duration=7(sec)
Dec 22 13:27:47 grant-03 lvm[25676]: No longer monitoring mirror device mirror_sanity-fs_mirror for events
Dec 22 13:27:47 grant-03 kernel: dmeventd[25702]: segfault at 0000000040bb2ff0 rip 000000304ea5ff50 rsp 0000000040bb3008 error 6

I'll attempt to grab a core file.

Comment 26 Corey Marthaler 2009-12-22 20:36:28 UTC
[root@grant-02 core_files]# ls -l core.* | grep 'Dec 22' | wc -l
77

yikes, that's a lot of segfaults!

Comment 27 Corey Marthaler 2009-12-22 20:39:23 UTC
Created attachment 379916 [details]
core from grant-02

Comment 28 Corey Marthaler 2010-01-05 17:56:56 UTC
Appear to have reproduced this again with mbroz's debug build. all three grant nodes had dmeventd segfault while doing the following operations.

Jan  4 19:11:31 grant-01 kernel: dmeventd[18273]: segfault at 000000001b9d5000 rip 00000036e627bfdb rsp 00007fff5cd005a8 error 4


Creating a 1 redundant legged cmirror named grant-03.30611
  Logical volume "grant-03.30611" created

Down converting cmirror from 1 legs to linear on grant-03
  Logical volume grant-03.30611 converted.
  /dev/mapper/lock_stress-grant--01.30613_mlog: stat failed: No such file or directory
  Path /dev/mapper/lock_stress-grant--01.30613_mlog no longer valid for device(253,6)
  /dev/mapper/lock_stress-grant--01.30613_mimage_0: stat failed: No such file or directory
  Path /dev/mapper/lock_stress-grant--01.30613_mimage_0 no longer valid for device(253,7)
  /dev/mapper/lock_stress-grant--01.30613_mimage_1: stat failed: No such file or directory
  Path /dev/mapper/lock_stress-grant--01.30613_mimage_1 no longer valid for device(253,8)
  /dev/mapper/lock_stress-grant--02.30617_mimage_3: stat failed: No such file or directory
  Path /dev/mapper/lock_stress-grant--02.30617_mimage_3 no longer valid for device(253,14)

Up converting linear to a 4 redundant legged cmirror on grant-03
  lock_stress/grant-03.30611: Converted: 12.0%
  lock_stress/grant-03.30611: Converted: 21.6%
  lock_stress/grant-03.30611: Converted: 29.6%
  lock_stress/grant-03.30611: Converted: 39.2%
  lock_stress/grant-03.30611: Converted: 47.2%
  lock_stress/grant-03.30611: Converted: 58.4%
  lock_stress/grant-03.30611: Converted: 68.8%
  lock_stress/grant-03.30611: Converted: 80.0%
  lock_stress/grant-03.30611: Converted: 91.2%
  lock_stress/grant-03.30611: Converted: 100.0%
  Logical volume grant-03.30611 converted.

Corelog converting cmirror on grant-03
  Logical volume grant-03.30611 converted.

Disklog converting cmirror on grant-03
  /dev/mapper/lock_stress-grant--02.30617_mlog: stat failed: No such file or directory
  Path /dev/mapper/lock_stress-grant--02.30617_mlog no longer valid for device(253,16)
  Logical volume grant-03.30611 converted.

Deactivating cmirror on grant-03
  Error locking on node grant-02: LV lock_stress/grant-03.30611 in use: not deactivating
couldn't deactivate cmirror on grant-03

Comment 29 Milan Broz 2010-01-05 20:45:59 UTC
core from comment #27:

(gdb) info threads
  2 process 10074  0x00000038846cced2 in select () from /lib64/libc.so.6
* 1 process 10078  0x000000388466cd30 in _IO_new_file_xsputn () from /lib64/libc.so.6
(gdb) bt
#0  0x000000388466cd30 in _IO_new_file_xsputn () from /lib64/libc.so.6
#1  0x00000038846428a0 in vfprintf () from /lib64/libc.so.6
#2  0x00000038846e719b in __fprintf_chk () from /lib64/libc.so.6
#3  0x00002b0d0de2ff18 in _write_array (pf=0xceca2d0, fp=0x2aaaac0008c0, path=0x2b0d0de892e8 "valid_devices", data=0x2)
    at filters/filter-persistent.c:169
#4  0x00002b0d0de30558 in persistent_filter_dump (f=<value optimized out>) at filters/filter-persistent.c:247
#5  0x00002b0d0de23429 in destroy_toolcontext (cmd=0xceb34a0) at commands/toolcontext.c:1303
#6  0x00002b0d0dc13df4 in unregister_device (device=0xceae7d0 "mirror_sanity-non_contig_prim_expand",
    uuid=<value optimized out>, major=<value optimized out>, minor=<value optimized out>, unused=<value optimized out>)
    at dmeventd_mirror.c:291
#7  0x0000000000402ff9 in _monitor_unregister (arg=0x2aaaac0008c0) at dmeventd.c:682
#8  0x0000000000404bd5 in _monitor_thread (arg=<value optimized out>) at /usr/include/pthread.h:582
#9  0x0000003884e064a7 in __deallocate_stack () from /lib64/libpthread.so.0
#10 0x0000000000000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (process 10074)]#0  0x00000038846cced2 in select () from /lib64/libc.so.6
(gdb) bt
#0  0x00000038846cced2 in select () from /lib64/libc.so.6
#1  0x00000000004029bd in _client_read (fifos=0x7fff9f704880, msg=0x7fff9f7048a0) at dmeventd.c:1282
#2  0x0000000000403f4d in main (argc=<value optimized out>, argv=<value optimized out>) at dmeventd.c:1425

...
(gdb) frame 3
#3  0x00002b0d0de2ff18 in _write_array (pf=0xceca2d0, fp=0x2aaaac0008c0, path=0x2b0d0de892e8 "valid_devices", data=0x2)
    at filters/filter-persistent.c:169
169                     fprintf(fp, "\t\t\"%s\"", escape_double_quotes(buf, str));

(gdb) p buf
$6 = 0

163                     str = dm_hash_get_key(pf->devices, n);
164                     if (!(buf = alloca(escaped_len(str)))) {
165                             log_error("persistent filter device path stack "
166                                       "allocation failed");
167                             return;
168                     }
169                     fprintf(fp, "\t\t\"%s\"", escape_double_quotes(buf, str));

Comment 30 Milan Broz 2010-01-06 12:27:39 UTC
(gdb) p *pf->devices
$12 = {num_nodes = 6158, num_slots = 128, slots = 0xceca340}

6158 entries in cache!

alloca can cause stack overflow here, testing for NULL is wrong also.

Comment 31 Milan Broz 2010-01-06 19:20:02 UTC
Marking this as fixed in lvm2-cluster-2.02.56-4 but maybe there is another problem.
At least the dmeventd should not segfault here.

Comment 34 Corey Marthaler 2010-01-14 23:57:40 UTC
I reproduced this bug with the latest rpms. I don't see the segfault, but the creation failed in the exact same way.

SCENARIO - [pvmove_suspend_verification]
Create a linear and a fake left over pvmove target and verify
that doesn't cause a pvmove attempt to leave the linear suspended
Attempting pvmove of /dev/etherd/e1.1p1 on hayes-02
  Error locking on node hayes-02: device-mapper: create ioctl failed: Device or resource busy
  Temporary pvmove mirror activation failed.
  Error locking on node hayes-02: WARNING: dev_open(/dev/etherd/e1.1p5) called while suspended
  ABORTING: Unable to deactivate temporary logical volume "pvmove0"
  ABORTING: Restoring original configuration before pvmove failed. Run pvmove --abort.
Verifying the linear's dm state
Deactivating mirror suspended... and removing
  /dev/mirror_sanity/suspended: open failed: No such device or address


SCENARIO - [nosync_mirror_up_convert]
Create a nosync mirror then resync it so that it can be up converted
hayes-03: lvcreate -m 1 -n nosync_mirror_up_convert -L 100M --corelog --nosync mirror_sanity
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  /dev/mirror_sanity/nosync_mirror_up_convert: not found: device not cleared
  Aborting. Failed to wipe start of new LV.
couldn't create mirror:
        hayes-03 lvcreate -m 1 -n nosync_mirror_up_convert -L 100M --corelog --nosync mirror_sanity


2.6.18-162.el5

lvm2-2.02.56-4.el5    BUILT: Wed Jan  6 14:46:02 CST 2010
lvm2-cluster-2.02.56-4.el5    BUILT: Wed Jan  6 14:47:19 CST 2010
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009

Comment 35 Corey Marthaler 2010-01-15 00:02:47 UTC
Nothing looks to strange in the log.

Jan 14 16:08:33 hayes-03 qarshd[7162]: Running cmdline: lvcreate -m 1 -n nosync_mirror_up_convert -L 100M --corelog --nosync mirror_sanity
Jan 14 16:08:36 hayes-03 lvm[4078]: Monitoring mirror device mirror_sanity-nosync_mirror_up_convert for events
Jan 14 16:08:36 hayes-03 lvm[4078]: mirror_sanity-nosync_mirror_up_convert is now in-sync
Jan 14 16:08:36 hayes-03 lvm[4078]: No longer monitoring mirror device mirror_sanity-nosync_mirror_up_convert for events

Comment 36 Milan Broz 2010-01-20 11:08:58 UTC
Let's try build lvm2[-cluster]-2.02.56-5.el5

Comment 38 Corey Marthaler 2010-01-21 21:34:06 UTC
This bug still exist in lvm2[-cluster]-2.02.56-5.el5. 

I'll try upgrading the kernel to 2.6.18-185.el5 and see if that helps.

Comment 39 Corey Marthaler 2010-01-22 16:50:54 UTC
Still exists on the -185 kernel as well. I'll attempt to add the clvmd -d output...

Comment 40 Corey Marthaler 2010-01-22 17:00:05 UTC
This issues appears to happen after a pvmove operation.

SCENARIO - [pvmove_suspend_verification]
Create a linear and a fake left over pvmove target and verify
that doesn't cause a pvmove attempt to leave the linear suspended
Attempting pvmove of /dev/etherd/e1.1p1 on hayes-01
  Error locking on node hayes-01: device-mapper: create ioctl failed: Device or resource busy
  Temporary pvmove mirror activation failed.
  Error locking on node hayes-01: WARNING: dev_open(/dev/etherd/e1.1p5) called while suspended
  ABORTING: Unable to deactivate temporary logical volume "pvmove0"
  ABORTING: Restoring original configuration before pvmove failed. Run pvmove --abort.
Verifying the linear's dm state
Deactivating mirror suspended... and removing
  /dev/mirror_sanity/suspended: open failed: No such device or address

SCENARIO - [looping_mirror_log_converts]
Create a core log mirror and then convert it's log from core to disk 20 times
  /dev/mirror_sanity/mirror_log_convert: not found: device not cleared
  Aborting. Failed to wipe start of new LV.
couldn't create mirror mirror_log_convert
[FAILED]

Comment 41 Corey Marthaler 2010-01-22 17:27:28 UTC
Attaching the clvmd -d log during one of these failures.

Comment 42 Corey Marthaler 2010-01-22 17:28:41 UTC
Created attachment 386203 [details]
clvmd -d output on hayes-01

Comment 43 Corey Marthaler 2010-01-22 17:29:08 UTC
Created attachment 386205 [details]
clvmd -d output on hayes-02

Comment 44 Corey Marthaler 2010-01-22 17:29:36 UTC
Created attachment 386206 [details]
clvmd -d output on hayes-03

Comment 45 Milan Broz 2010-01-22 17:47:07 UTC
What happens here is probably

 - node simulates pvmove in progress by use fake "pvmove0" device
 - pvmove is expected to fail

But memlocks count never drops to zero, it affects all later commands.
Probably pvmove fix for that situation is not cluster aware.

Comment 46 Milan Broz 2010-01-27 13:41:17 UTC
Fix for pvmove problem in upstream cvs -> POST.

Comment 47 Milan Broz 2010-01-27 15:01:40 UTC
Patch in lvm2/lvm2-cluster-2_02_56-6_el5.

Comment 49 Corey Marthaler 2010-01-27 17:58:16 UTC
Fix verified in:

lvm2-2.02.56-6.el5    BUILT: Wed Jan 27 09:04:06 CST 2010
lvm2-cluster-2.02.56-6.el5    BUILT: Wed Jan 27 09:04:25 CST 2010

Comment 51 errata-xmlrpc 2010-03-30 09:02:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0299.html


Note You need to log in before you can comment on or make changes to this bug.