Bug 919604

Summary:	thinpool stacked on mirror volume fails to recover from device failure
Product:	Red Hat Enterprise Linux 6	Reporter:	Corey Marthaler <cmarthal>
Component:	lvm2	Assignee:	Zdenek Kabelac <zkabelac>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.4	CC:	agk, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, slevine, thornber, zkabelac
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	lvm2-2.02.100-4.el6	Doc Type:	Bug Fix
Doc Text:	Users who wish to have device fault tolerance for their thinpool logical volumes should use the RAID segment types for this purpose (e.g. "raid1"). This is especially encouraged for thinpool metadata. The 'lvconvert' command can be used for this purpose. Here is an example of converting the metadata portion of a thinpool named "my_thinpool" to the "raid1" segment type: ~> lvconvert --type raid1 -m 1 my_vg/my_thinpool_tmeta The 'raid1' segment type is the new implementation of mirroring in LVM. The legacy mirror segment type is called 'mirror'. Conversions that result in thinpools layered on logical volumes of 'mirror' segment type are no longer allowed. That is, it is no longer possible to create thinpools on top of logical volumes of 'mirror' segment type. This is due to the possibility of I/O hangs and a failure to complete repairs during failure events. Users can still gain the desired fault tolerance by using the 'raid1' segment type which does not suffer from the same limitations. Users who have already created thinpools with data or metadata areas of 'mirror' segment type will still be able to activate those logical volumes. However, they should convert them to the 'raid1' segment type as soon as possible. This can be quickly accomplished via the 'lvconvert' command. For example, the following command would convert the data portion of a thinpool named 'my_thinpool' in the volume group 'my_vg' from the 'mirror' segment type to the newer 'raid1' segment type: ~> lvconvert --type raid1 my_vg/my_thinpool_tdata	Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-11-21 23:21:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	960054

Description Corey Marthaler 2013-03-08 21:35:32 UTC

Description of problem:
This test case works when it's a thinpool stacked on a raid volume, but not when it's a thinpool stacked on a mirror.

./helter_skelter -e kill_primary_synced_2_legs -i 1 -w POOL

Create 7 PV(s) for helter_skelter on taft-02
Create VG helter_skelter on taft-02
================================================================================
Iteration 0.1 started at Fri Mar  8 15:03:25 CST 2013
================================================================================
Scenario kill_primary_synced_2_legs: Kill primary leg of synced 2 leg mirror(s)

********* Mirror hash info for this scenario *********
* names:              syncd_primary_2legs_1
* sync:               1
* striped:            0
* leg devices:        /dev/sde1 /dev/sdb1
* log devices:        /dev/sdh1
* no MDA devices:     
* failpv(s):          /dev/sde1
* failnode(s):        taft-02
* lvmetad:            0
* thinpool stack:      1
* leg fault policy:   allocate
* log fault policy:   allocate
******************************************************

Creating mirror(s) on taft-02...
taft-02: lvcreate -m 1 -n syncd_primary_2legs_1 -L 500M helter_skelter /dev/sde1:0-1000 /dev/sdb1:0-1000 /dev/sdh1:0-150

Current mirror/raid device structure(s):
  LV                               Attr      LSize   Cpy%Sync Devices                                                            
   syncd_primary_2legs_1            mwi-a-m-- 500.00m     5.60 syncd_primary_2legs_1_mimage_0(0),syncd_primary_2legs_1_mimage_1(0)
   [syncd_primary_2legs_1_mimage_0] Iwi-aom-- 500.00m          /dev/sde1(0)                                                       
   [syncd_primary_2legs_1_mimage_1] Iwi-aom-- 500.00m          /dev/sdb1(0)                                                       
   [syncd_primary_2legs_1_mlog]     lwi-aom--   4.00m          /dev/sdh1(0)                                                       


PV=/dev/sde1
        syncd_primary_2legs_1_mimage_0: 5.1
PV=/dev/sde1
        syncd_primary_2legs_1_mimage_0: 5.1

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Convert mirror/raid volume(s) to Thinpool volume(s) on taft-02...
lvcreate -n meta_syncd_primary_2legs_1 -L 200M helter_skelter /dev/sdb1
lvconvert --thinpool helter_skelter/syncd_primary_2legs_1 --poolmetadata meta_syncd_primary_2legs_1
lvcreate --virtualsize 200M --thinpool helter_skelter/syncd_primary_2legs_1 -n virt_syncd_primary_2legs_1

Creating ext on top of mirror(s) on taft-02...
mke2fs 1.41.12 (17-May-2010)
Mounting mirrored ext filesystems on taft-02...


Current mirror/raid device structure(s):
  LV                                     Attr      LSize   Cpy%Sync Devices                                                                        
   syncd_primary_2legs_1                  twi-a-tz- 500.00m          syncd_primary_2legs_1_tdata(0)                                                 
   [syncd_primary_2legs_1_tdata]          mwi-aot-- 500.00m   100.00 syncd_primary_2legs_1_tdata_mimage_0(0),syncd_primary_2legs_1_tdata_mimage_1(0)
   [syncd_primary_2legs_1_tdata_mimage_0] iwi-aom-- 500.00m          /dev/sde1(0)                                                                   
   [syncd_primary_2legs_1_tdata_mimage_1] iwi-aom-- 500.00m          /dev/sdb1(0)                                                                   
   [syncd_primary_2legs_1_tdata_mlog]     lwi-aom--   4.00m          /dev/sdh1(0)                                                                   
   [syncd_primary_2legs_1_tmeta]          ewi-aot-- 200.00m          /dev/sdb1(125)                                                                 
   virt_syncd_primary_2legs_1             Vwi-aotz- 200.00m                                                                                         


Writing verification files (checkit) to mirror(s) on...
        ---- taft-02 ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
lvcreate -s /dev/helter_skelter/virt_syncd_primary_2legs_1 -n snap1_syncd_primary_2legs_1
lvcreate -s /dev/helter_skelter/virt_syncd_primary_2legs_1 -n snap2_syncd_primary_2legs_1
lvcreate -s /dev/helter_skelter/virt_syncd_primary_2legs_1 -n snap3_syncd_primary_2legs_1

Verifying files (checkit) on mirror(s) on...
        ---- taft-02 ----

Disabling device sde on taft-02

Getting recovery check start time from /var/log/messages: Mar  8 15:04
Attempting I/O to cause mirror down conversion(s) on taft-02
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.538989 s, 77.8 MB/s
[DEADLOCK]


[root@taft-02 ~]# lvs -a -o +devices
 /dev/sde1: read failed after 0 of 512 at 145669554176: Input/output error
 [...]
 /dev/sde1: read failed after 0 of 2048 at 0: Input/output error
 Couldn't find device with uuid Nu1sCw-E2uJ-qeSj-Dr0Z-3ruF-irrO-8Si0e7.
 LV                                     Attr      LSize   Pool                  Origin                     Data%  Log                              Cpy%Sync Devices
 snap1_syncd_primary_2legs_1            Vwi-a-tzp 200.00m syncd_primary_2legs_1 virt_syncd_primary_2legs_1  20.59
 snap2_syncd_primary_2legs_1            Vwi-a-tzp 200.00m syncd_primary_2legs_1 virt_syncd_primary_2legs_1  20.59
 snap3_syncd_primary_2legs_1            Vwi-a-tzp 200.00m syncd_primary_2legs_1 virt_syncd_primary_2legs_1  20.59
 syncd_primary_2legs_1                  twi-a-tzp 500.00m                                                    8.77                                           syncd_primary_2legs_1_tdata(0)
 [syncd_primary_2legs_1_tdata]          mwi-aot-p 500.00m                                                         syncd_primary_2legs_1_tdata_mlog    99.20 syncd_primary_2legs_1_tdata_mimage_0(0),syncd_primary_2legs_1_tdata_mimage_1(0)
 [syncd_primary_2legs_1_tdata_mimage_0] Iwi-aom-p 500.00m                                                                                                   unknown device(0)
 [syncd_primary_2legs_1_tdata_mimage_1] Iwi-aom-- 500.00m                                                                                                   /dev/sdb1(0)
 [syncd_primary_2legs_1_tdata_mlog]     lwi-aom--   4.00m                                                                                                   /dev/sdh1(0)
 [syncd_primary_2legs_1_tmeta]          ewi-aot-- 200.00m                                                                                                   /dev/sdb1(125)
 virt_syncd_primary_2legs_1             Vwi-aotzp 200.00m syncd_primary_2legs_1                             20.59


LOG:
qarshd[6632]: Running cmdline: echo offline > /sys/block/sde/device/state &
xinetd[1816]: EXIT: qarsh status=0 pid=6632 duration=0(sec)
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: device-mapper: raid1: Mirror read failed from 253:4. Trying alternative device.
lvm[3518]: Primary mirror device 253:4 read failed.
lvm[3518]: helter_skelter-syncd_primary_2legs_1_tdata is now in-sync.
kernel: sd 3:0:0:4: rejecting I/O to offline device
lvm[3518]: Primary mirror device 253:4 has failed (D).
lvm[3518]: Device failure in helter_skelter-syncd_primary_2legs_1_tdata.
kernel: sd 3:0:0:4: rejecting I/O to offline device
lvm[3518]: /dev/sde1: read failed after 0 of 512 at 145669554176: Input/output error
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: sd 3:0:0:4: rejecting I/O to offline device
lvm[3518]: Converting segment type for helter_skelter/syncd_primary_2legs_1_tdata to mirror is not yet supported.
lvm[3518]: Repair of mirrored device helter_skelter-syncd_primary_2legs_1_tdata failed.
lvm[3518]: Failed to remove faulty devices in helter_skelter-syncd_primary_2legs_1_tdata.
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: sd 3:0:0:4: rejecting I/O to offline device
kernel: INFO: task kjournald:6498 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kjournald     D 0000000000000003     0  6498      2 0x00000080
kernel: ffff880219673bb0 0000000000000046 0000000000000000 ffffffffa00043ec
kernel: ffff88021206ab80 ffff8802182a3408 0000000000000001 000000000fd0000a
kernel: ffff880219b30638 ffff880219673fd8 000000000000fb88 ffff880219b30638
kernel: Call Trace:
kernel: [<ffffffffa00043ec>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
kernel: [<ffffffff811b5db0>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff8150d9c3>] io_schedule+0x73/0xc0
kernel: [<ffffffff811b5df0>] sync_buffer+0x40/0x50
kernel: [<ffffffff8150e37f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffff811b5db0>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff8150e428>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffff811b5da6>] __wait_on_buffer+0x26/0x30
kernel: [<ffffffff811b6dc1>] __sync_dirty_buffer+0x71/0xf0
kernel: [<ffffffff811b6e53>] sync_dirty_buffer+0x13/0x20
kernel: [<ffffffffa04889f0>] journal_update_superblock+0x120/0x210 [jbd]
kernel: [<ffffffffa04825d8>] journal_commit_transaction+0x48/0x1310 [jbd]
kernel: [<ffffffff8150d230>] ? thread_return+0x4e/0x76e
kernel: [<ffffffff81080fac>] ? lock_timer_base+0x3c/0x70
kernel: [<ffffffff81081a3b>] ? try_to_del_timer_sync+0x7b/0xe0
kernel: [<ffffffffa0488768>] kjournald+0xe8/0x250 [jbd]
kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffffa0488680>] ? kjournald+0x0/0x250 [jbd]
kernel: [<ffffffff81096916>] kthread+0x96/0xa0
kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
kernel: INFO: task flush-253:10:6510 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: flush-253:10  D 0000000000000003     0  6510      2 0x00000080
kernel: ffff880201b87870 0000000000000046 ffff880201b87830 ffffffffa00043ec
kernel: ffff880201b877e0 ffffffff81012b69 ffff880201b87820 ffffffff810a1aa9
kernel: ffff8802198e9058 ffff880201b87fd8 000000000000fb88 ffff8802198e9058
kernel: Call Trace:
kernel: [<ffffffffa00043ec>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
kernel: [<ffffffff81012b69>] ? read_tsc+0x9/0x20
kernel: [<ffffffff810a1aa9>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff810a1aa9>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff811b5db0>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff8150d9c3>] io_schedule+0x73/0xc0
kernel: [<ffffffff811b5df0>] sync_buffer+0x40/0x50
kernel: [<ffffffff8150e22a>] __wait_on_bit_lock+0x5a/0xc0
kernel: [<ffffffff811670fb>] ? cache_alloc_refill+0x15b/0x240
kernel: [<ffffffff811b5db0>] ? sync_buffer+0x0/0x50
kernel: [<ffffffff811b6120>] ? end_buffer_async_write+0x0/0x190
kernel: [<ffffffff8150e308>] out_of_line_wait_on_bit_lock+0x78/0x90
kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
[...]


Version-Release number of selected component (if applicable):
2.6.32-354.el6.x86_64

lvm2-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-libs-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-cluster-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
cmirror-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013


How reproducible:
Everytime

Comment 5 Jonathan Earl Brassow 2013-08-28 22:34:45 UTC

A manual test (i.e. non-dmeventd initiated) seems to work for the repair.  If the device is then reinstated, the messages are very confusing and lead to disasterous results!

[root@bp-xen-01 lvm2]# devices vg
  LV                    Attr       Cpy%Sync Devices                                      
  [lvol0_pmspare]       ewi-------          /dev/sda1(257)                               
  pool                  twi-a-tz--          pool_tdata(0)                                
  [pool_tdata]          mwi-aom---   100.00 pool_tdata_mimage_0(0),pool_tdata_mimage_1(0)
  [pool_tdata_mimage_0] iwi-aom---          /dev/sda1(0)                                 
  [pool_tdata_mimage_1] iwi-aom---          /dev/sdb1(0)                                 
  [pool_tdata_mlog]     lwi-aom---          /dev/sdg1(0)                                 
  [pool_tmeta]          ewi-ao----          /dev/sda1(256)                               

[root@bp-xen-01 lvm2]# off.sh sda
Turning off sda

[root@bp-xen-01 lvm2]# lvconvert --repair vg/pool
  /dev/sda1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sda1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid RA4cDd-471I-9HM1-PJG8-Db1t-X2gg-gmicT3.
  Only inactive pool can be repaired.

[root@bp-xen-01 lvm2]# lvconvert --repair vg/pool_tdata
  /dev/sda1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sda1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid RA4cDd-471I-9HM1-PJG8-Db1t-X2gg-gmicT3.
  Mirror status: 1 of 2 images failed.
Attempt to replace failed mirror images (requires full device resync)? [y/n]: y
  Trying to up-convert to 2 images, 1 logs.
  vg/pool_tdata: Converted: 2.0%
  vg/pool_tdata: Converted: 71.9%
  vg/pool_tdata: Converted: 100.0%

[root@bp-xen-01 lvm2]# devices vg
  /dev/sda1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sda1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid RA4cDd-471I-9HM1-PJG8-Db1t-X2gg-gmicT3.
  LV                    Attr       Cpy%Sync Devices                                      
  [lvol0_pmspare]       ewi-----p-          unknown device(257)                          
  pool                  twi-a-tzp-          pool_tdata(0)                                
  [pool_tdata]          mwi-aom---   100.00 pool_tdata_mimage_0(0),pool_tdata_mimage_1(0)
  [pool_tdata_mimage_0] iwi-aom---          /dev/sdb1(0)                                 
  [pool_tdata_mimage_1] iwi-aom---          /dev/sdc1(0)                                 
  [pool_tdata_mlog]     lwi-aom---          /dev/sdg1(1)                                 
  [pool_tmeta]          ewi-ao--p-          unknown device(256)                          

[root@bp-xen-01 lvm2]# on.sh sda
Turning on sda

[root@bp-xen-01 lvm2]# lvs
  WARNING: Inconsistent metadata found for VG vg - updating to use version 13
  Missing device /dev/sda1 reappeared, updating metadata for VG vg to version 13.
  Device still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
  Failed to parse thin pool params: Fail.
  Failed to parse thin pool params: Fail.
  dm_report_object: report function failed for field data_percent
  LV      VG         Attr       LSize Pool Origin Data%  Move Log Cpy%Sync Convert
  pool    vg         twi-a-tzp- 1.00g            
  lv_root vg_bpxen01 -wi-ao---- 5.54g                                             
  lv_swap vg_bpxen01 -wi-ao---- 1.97g                                             

[root@bp-xen-01 lvm2]# vgreduce --removemissing vg
  WARNING: Partial LV pool needs to be repaired or removed. 
  WARNING: Partial LV pool_tmeta needs to be repaired or removed. 
  WARNING: Partial LV lvol0_pmspare needs to be repaired or removed. 
  There are still partial LVs in VG vg.
  To remove them unconditionally use: vgreduce --removemissing --force.
  Proceeding to remove empty missing PVs.

[root@bp-xen-01 lvm2]# vgreduce --removemissing --force vg
  Removing partial LV pool.
  Logical volume "pool" successfully removed
  Wrote out consistent volume group vg

[root@bp-xen-01 lvm2]# devices vg
  WARNING: Inconsistent metadata found for VG vg - updating to use version 18
  Removing PV /dev/sda1 (RA4cDd-471I-9HM1-PJG8-Db1t-X2gg-gmicT3) that no longer belongs to VG vg

[root@bp-xen-01 lvm2]# devices vg

[root@bp-xen-01 lvm2]# vgs
  VG         #PV #LV #SN Attr   VSize VFree
  vg           6   0   0 wz--n- 4.90t 4.90t
  vg_bpxen01   1   2   0 wz--n- 7.51g    0

Comment 6 Jonathan Earl Brassow 2013-08-28 22:45:30 UTC

(In reply to Jonathan Earl Brassow from comment #5)
> A manual test (i.e. non-dmeventd initiated) seems to work for the repair. 
> If the device is then reinstated, the messages are very confusing and lead
> to disasterous results!
> 

Yeah, the mirror repair worked fine, but the rest of the test is potentially bogus because I killed the one device that was part of the mirrored tdata and the only device in tmeta.

Comment 7 Jonathan Earl Brassow 2013-08-30 13:29:53 UTC

Easier to gather logs without using dmeventd.

1) Create mirror
2) convert it to thinpool
3) create thinlv
4) wait for mirror sync
5) killall -9 dmeventd (we want to do the repair, not dmeventd)
6) kill any mirror device (but not one that is also _tmeta)
** 7) write a small amount to thinlv to make kernel notice failed dev **
8) Attempt repair - it will hang.

Note that if #7 is not done, the repair will complete just fine.  This is odd because in both cases any write I/O done to the mirror will hang and the case where #7 is performed (the case where there is even more info that the device is dead) is the case that fails.

Comment 8 Jonathan Earl Brassow 2013-08-30 13:39:04 UTC

(In reply to Jonathan Earl Brassow from comment #7)
> Easier to gather logs without using dmeventd.
> 
> 1) Create mirror
> 2) convert it to thinpool
> 3) create thinlv
> 4) wait for mirror sync
> 5) killall -9 dmeventd (we want to do the repair, not dmeventd)
> 6) kill any mirror device (but not one that is also _tmeta)
> ** 7) write a small amount to thinlv to make kernel notice failed dev **
> 8) Attempt repair - it will hang.
> 
> Note that if #7 is not done, the repair will complete just fine.  This is
> odd because in both cases any write I/O done to the mirror will hang and the
> case where #7 is performed (the case where there is even more info that the
> device is dead) is the case that fails.

Using this method to hang the 'lvconvert --repair', attaching gdb to the process, and then replacing the mirror with an error target allows us to get the following backtrace:
(gdb) bt
#0  0x00000038154db400 in __open_nocancel () from /lib64/libc.so.6
#1  0x000000000045ec8e in dev_open_flags (dev=0x8928e8, flags=278528, direct=1, quiet=1)
    at device/dev-io.c:470
#2  0x000000000045f163 in dev_open_readonly_quiet (dev=0x8928e8) at device/dev-io.c:553
#3  0x0000000000468623 in _passes_partitioned_filter (f=0x842950, dev=0x8928e8)
    at filters/filter-partitioned.c:27
#4  0x00000000004650a7 in _and_p (f=0x83c830, dev=0x8928e8) at filters/filter-composite.c:24
#5  0x00000000004661d7 in _lookup_p (f=0x838770, dev=0x8928e8)
    at filters/filter-persistent.c:295
#6  0x000000000045d921 in dev_iter_get (iter=0x836740) at device/dev-cache.c:1011
#7  0x000000000044e081 in lvmcache_label_scan (cmd=0x7ff0f0, full_scan=0)
    at cache/lvmcache.c:691
#8  0x000000000049ce06 in _vg_read (cmd=0x7ff0f0, vgname=0x879c82 "vg", vgid=0x0, 
    warnings=1, consistent=0x7fff1433f9e4, precommitted=0) at metadata/metadata.c:2997
#9  0x000000000049e1d3 in vg_read_internal (cmd=0x7ff0f0, vgname=0x879c82 "vg", vgid=0x0, 
    warnings=1, consistent=0x7fff1433f9e4) at metadata/metadata.c:3413
#10 0x000000000049f9ef in _vg_lock_and_read (cmd=0x7ff0f0, vg_name=0x879c82 "vg", vgid=0x0, 
    lock_flags=36, status_flags=514, misc_flags=1048576) at metadata/metadata.c:4112
#11 0x000000000049fd6a in vg_read (cmd=0x7ff0f0, vg_name=0x879c82 "vg", vgid=0x0, 
    flags=1048576) at metadata/metadata.c:4216
#12 0x000000000049fdab in vg_read_for_update (cmd=0x7ff0f0, vg_name=0x879c82 "vg", vgid=0x0, 
    flags=0) at metadata/metadata.c:4227
#13 0x000000000041cb9a in _get_lvconvert_vg (cmd=0x7ff0f0, name=0x879c82 "vg", uuid=0x0)
    at lvconvert.c:562
#14 0x0000000000423e32 in get_vg_lock_and_logical_volume (cmd=0x7ff0f0, 
    vg_name=0x879c82 "vg", lv_name=0x7fff143408a0 "lv_tdata") at lvconvert.c:2649
#15 0x0000000000424071 in lvconvert_single (cmd=0x7ff0f0, lp=0x7fff1433fbf0)
    at lvconvert.c:2687
#16 0x0000000000424646 in lvconvert (cmd=0x7ff0f0, argc=1, argv=0x7fff1433fee8)
    at lvconvert.c:2796
#17 0x000000000042e4a4 in lvm_run_command (cmd=0x7ff0f0, argc=1, argv=0x7fff1433fee8)
    at lvmcmdline.c:1168
#18 0x000000000042f9bd in lvm2_main (argc=7, argv=0x7fff1433feb8) at lvmcmdline.c:1604
#19 0x0000000000447690 in main (argc=7, argv=0x7fff1433feb8) at lvm.c:21

Comment 9 Jonathan Earl Brassow 2013-09-04 18:53:29 UTC

I am going to disallow thin* on top of mirror logical volumes. Users will have to use the "raid1" segment type if they want this.

This bug has come down to a choice between:
1) Disallowing thin-LVs from being used as PVs.
2) Disallowing thinpools on top of mirrors.

The problem is that the code in dev_manager.c:device_is_usable() is unable to tell whether there is a mirror device lower in the stack from the device being checked. Pretty much anything layered on top of a mirror will suffer from this problem. (Snapshots are a good example of this; and option #1 above has been chosen to deal with them. This can also be seen in dev_manager.c:device_is_usable().) When a mirror failure occurs, the kernel blocks all I/O to it. If there is an LVM command that comes along to do the repair (or a different operation that requires label reading), it would normally avoid the mirror when it sees that it is blocked. However, if there is a snapshot or a thin-LV that is on a mirror, the above code will not detect the mirror underneath and will issue label reading I/O. This causes the command to hang.

Choosing #1 would mean that thin-LVs could never be used as PVs - even if they are stacked on something other than mirrors.

Choosing #2 means that thinpools can never be placed on mirrors. This is probably better than we think, since it is preferred that people use the "raid1" segment type in the first place. However, RAID* cannot currently be used in a cluster volume group - even in EX-only mode. Thus, a complete solution for option #2 must include the ability to activate RAID logical volumes (and perform RAID operations) in a cluster volume group. I've already begun working on this.

Comment 13 Jonathan Earl Brassow 2013-09-11 21:04:59 UTC

This bug has been addressed by better integration of RAID + thinpool and disallowing mirror + thinpool.

The necessary commit IDs are:
ca514351536c2dd8929944bb6b01a64587cb0a46
2691f1d764182722195cda80be1f511e968480aa
82228acfc95fa4dbe9acca2d3bfc5a89087fd5e4

Users will have to make use of RAID rather than mirror.  Perhaps that qualifies this bug as a "WONTFIX".  However, there were necessary changes to RAID and the fact that mirror has been disallowed requires a bug to pull in the changes.  This is as good a bug as any to use for that purpose.

Comment 15 Corey Marthaler 2013-09-30 13:56:18 UTC

Fix verified (in that this op is no longer allowed) in the latest rpms.

2.6.32-410.el6.x86_64
lvm2-2.02.100-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
lvm2-libs-2.02.100-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
lvm2-cluster-2.02.100-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
udev-147-2.48.el6    BUILT: Fri Aug  9 06:09:50 CDT 2013
device-mapper-1.02.79-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
device-mapper-libs-1.02.79-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
device-mapper-event-1.02.79-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
device-mapper-event-libs-1.02.79-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013
cmirror-2.02.100-4.el6    BUILT: Fri Sep 27 09:05:32 CDT 2013

[root@taft-01 ~]# lvs -a -o +devices
 LV                              Attr       LSize   Log                       Cpy%Sync Devices
 to_pool_convert                 mwi-a-m--- 100.00m to_pool_convert_mlog        100.00 to_pool_convert_mimage_0(0),to_pool_convert_mimage_1(0)          
 [to_pool_convert_mimage_0]      iwi-aom--- 100.00m                                    /dev/sdc1(0)
 [to_pool_convert_mimage_1]      iwi-aom--- 100.00m                                    /dev/sdd1(0)
 [to_pool_convert_mlog]          lwi-aom---   4.00m                                    /dev/sde1(0)
 to_pool_meta_convert            mwi-a-m--- 100.00m to_pool_meta_convert_mlog   100.00 to_pool_meta_convert_mimage_0(0),to_pool_meta_convert_mimage_1(0)
 [to_pool_meta_convert_mimage_0] iwi-aom--- 100.00m                                    /dev/sdc1(25)
 [to_pool_meta_convert_mimage_1] iwi-aom--- 100.00m                                    /dev/sdd1(25)
 [to_pool_meta_convert_mlog]     lwi-aom---   4.00m                                    /dev/sde1(1)
[root@taft-01 ~]# lvconvert --thinpool snapper_thinp/to_pool_convert --poolmetadata to_pool_meta_convert
 Mirror logical volumes cannot be used as thinpools.
Try "raid1" segment type instead.

Comment 16 errata-xmlrpc 2013-11-21 23:21:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html