Bug 803271
| Summary: | lvmetad udev rules deadlocks pvmove in cluster | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Nenad Peric <nperic> | ||||
| Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.3 | CC: | agk, cmarthal, coughlan, dwysocha, heinzm, jbrassow, mbroz, mcsontos, msnitzer, prajnoha, prockai, thornber, tlavigne, zkabelac | ||||
| Target Milestone: | rc | Keywords: | Regression, Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | lvm2-2.02.98-3.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause:
Using lvmetad (global/use_lvmetad=1 LVM2 configuration option) when processing LVM2 commands in cluster environment (global/locking_type=3).
Consequence:
LVM2 commands working incorrectly and issuing various error messages.
Fix:
If clustered locking is set, disable lvmetad use in LVM2 automatically as this configuration is not yet supported.
Result:
There's now a fallback to non-lvmetad operation in LVM2 if clustered locking is used and a warning message is issued: "WARNING: configuration setting use_lvmetad overriden to 0 due to locking_type 3. Clustered environment not supported by lvmetad yet."
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-02-21 08:09:14 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Please can you retest with 6.3 packages on errata? Thanks. Also note there is bug #799071 (could be related). I have tested it today with the following:
lvm2-libs-2.02.96-0.82.el6.x86_64
lvm2-2.02.96-0.82.el6.x86_64
lvm2-cluster-2.02.96-0.82.el6.x86_64
device-mapper-libs-1.02.75-0.82.el6.x86_64
device-mapper-1.02.75-0.82.el6.x86_64
device-mapper-event-1.02.75-0.82.el6.x86_64
device-mapper-event-libs-1.02.75-0.82.el6.x86_64
cmirror-2.02.96-0.82.el6.x86_64
kernel: 2.6.32-251.el6.x86_64
(09:34:25) [root@node01:~]$ vgcreate pv_shuffle_A /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
No physical volume label read from /dev/sdb1
No physical volume label read from /dev/sdc1
No physical volume label read from /dev/sdd1
Physical volume "/dev/sdb1" successfully created
Physical volume "/dev/sdc1" successfully created
Physical volume "/dev/sdd1" successfully created
Clustered volume group "pv_shuffle_A" successfully created
(09:36:01) [root@node01:~]$ lvcreate -n linear -L 3.5G pv_shuffle_A
Logical volume "linear" created
(09:36:50) [root@node01:~]$ lvcreate -i 3 -n stripe -L 3.5G pv_shuffle_A
Using default stripesize 64.00 KiB
Rounding size (896 extents) up to stripe boundary size (897 extents)
Logical volume "stripe" created
(09:37:06) [root@node01:~]$ lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
lv_root VolGroup -wi-ao-- 7.54g
lv_swap VolGroup -wi-ao-- 1.97g
linear pv_shuffle_A -wi-a--- 3.50g
stripe pv_shuffle_A -wi-a--- 3.50g
(09:37:19) [root@node01:~]$ vgextend pv_shuffle_A /dev/sde1
No physical volume label read from /dev/sde1
Physical volume "/dev/sde1" successfully created
Volume group "pv_shuffle_A" successfully extended
(09:37:37) [root@node01:~]$ vgchange -an pv_shuffle_A
0 logical volume(s) in volume group "pv_shuffle_A" now active
(09:38:31) [root@node01:~]$ pvs
PV VG Fmt Attr PSize PFree
/dev/sda1 pv_shuffle_A lvm2 a-- 9.76g 5.09g
/dev/sdb1 pv_shuffle_A lvm2 a-- 9.76g 8.59g
/dev/sdc1 pv_shuffle_A lvm2 a-- 9.76g 8.59g
/dev/sdd1 pv_shuffle_A lvm2 a-- 9.76g 9.76g
/dev/sde1 pv_shuffle_A lvm2 a-- 9.76g 9.76g
/dev/vda2 VolGroup lvm2 a-- 9.51g 0
(09:38:39) [root@node01:~]$ lvs -o +devices
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert Devices
lv_root VolGroup -wi-ao-- 7.54g /dev/vda2(0)
lv_swap VolGroup -wi-ao-- 1.97g /dev/vda2(1930)
linear pv_shuffle_A -wi----- 3.50g /dev/sda1(0)
stripe pv_shuffle_A -wi----- 3.50g /dev/sda1(896),/dev/sdb1(0),/dev/sdc1(0)
pvmove -v /dev/sda1 /dev/sde1
Finding volume group "pv_shuffle_A"
Executing: /sbin/modprobe dm-log-userspace
Archiving volume group "pv_shuffle_A" metadata (seqno 4).
Creating logical volume pvmove0
Moving 896 extents of logical volume pv_shuffle_A/linear
Moving 299 extents of logical volume pv_shuffle_A/stripe
Updating volume group metadata
Creating volume group backup "/etc/lvm/backup/pv_shuffle_A" (seqno 5).
Checking progress before waiting every 15 seconds
/dev/sda1: Moved: 0.6%
/dev/sda1: Moved: 24.5%
/dev/sda1: Moved: 43.3%
/dev/sda1: Moved: 45.9%
/dev/sda1: Moved: 54.6%
/dev/sda1: Moved: 63.7%
/dev/sda1: Moved: 64.4%
/dev/sda1: Moved: 68.9%
/dev/sda1: Moved: 75.0%
Updating volume group metadata
Error locking on node node03: Command timed out
Error locking on node node02: Command timed out
Error locking on node node01: Command timed out
And again in /var/log/messages:
Mar 22 09:47:28 node01 kernel: INFO: task clvmd:3350 blocked for more than 120 seconds.
Mar 22 09:47:28 node01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 22 09:47:28 node01 kernel: clvmd D 0000000000000000 0 3350 1 0x00000000
Mar 22 09:47:28 node01 kernel: ffff88003b69db18 0000000000000086 ffff8800ffffffff 00000730195c24c8
Mar 22 09:47:28 node01 kernel: ffff88003b69dae8 ffff880037485bb0 0000000000050250 ffffffffb094ced0
Mar 22 09:47:28 node01 kernel: ffff880038f61058 ffff88003b69dfd8 000000000000fb48 ffff880038f61058
Mar 22 09:47:28 node01 kernel: Call Trace:
Mar 22 09:47:28 node01 kernel: [<ffffffff8109b5d9>] ? ktime_get_ts+0xa9/0xe0
Mar 22 09:47:28 node01 kernel: [<ffffffff814f81b3>] io_schedule+0x73/0xc0
Mar 22 09:47:28 node01 kernel: [<ffffffff811b473e>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90
Mar 22 09:47:28 node01 kernel: [<ffffffff8125b7bd>] ? get_disk+0x7d/0xf0
Mar 22 09:47:28 node01 kernel: [<ffffffff811b4c2e>] __blockdev_direct_IO+0x5e/0xd0
Mar 22 09:47:28 node01 kernel: [<ffffffff811b14f0>] ? blkdev_get_blocks+0x0/0xc0
Mar 22 09:47:28 node01 kernel: [<ffffffff8127065a>] ? kobject_get+0x1a/0x30
Mar 22 09:47:28 node01 kernel: [<ffffffff811b2357>] blkdev_direct_IO+0x57/0x60
Mar 22 09:47:28 node01 kernel: [<ffffffff811b14f0>] ? blkdev_get_blocks+0x0/0xc0
Mar 22 09:47:28 node01 kernel: [<ffffffff8111491b>] generic_file_aio_read+0x6bb/0x700
Mar 22 09:47:28 node01 kernel: [<ffffffff81216d11>] ? avc_has_perm+0x71/0x90
Mar 22 09:47:28 node01 kernel: [<ffffffff8121080f>] ? security_inode_permission+0x1f/0x30
Mar 22 09:47:28 node01 kernel: [<ffffffff8117901a>] do_sync_read+0xfa/0x140
Mar 22 09:47:28 node01 kernel: [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
Mar 22 09:47:28 node01 kernel: [<ffffffff811b192c>] ? block_ioctl+0x3c/0x40
Mar 22 09:47:28 node01 kernel: [<ffffffff8118bfc2>] ? vfs_ioctl+0x22/0xa0
Mar 22 09:47:28 node01 kernel: [<ffffffff8121c2fb>] ? selinux_file_permission+0xfb/0x150
Mar 22 09:47:28 node01 kernel: [<ffffffff8120f6a6>] ? security_file_permission+0x16/0x20
Mar 22 09:47:28 node01 kernel: [<ffffffff81179a15>] vfs_read+0xb5/0x1a0
Mar 22 09:47:28 node01 kernel: [<ffffffff81179b51>] sys_read+0x51/0x90
Mar 22 09:47:28 node01 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
I am not sure if this is the same case like bug #799071 since it does not really say that any device is busy.. It just gets stuck and then crashes.
The process still waits:
(09:51:23) [root@node01:/var/log]$ ps -ef | grep pvmo
root 3602 3382 0 09:41 pts/1 00:00:00 pvmove -v /dev/sda1 /dev/sde1
(09:54:30) [root@node01:/var/log]$ clustat
Cluster Status for STSRHTS7541 @ Thu Mar 22 09:55:07 2012
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node01 1 Online, Local
node02 2 Online
node03 3 Online
(10:02:34) [root@node01:/var/log]$ grep -e '^\s*locking_type ' /etc/lvm/lvm.conf
locking_type = 3
If you want me to test it with some specific versions of packages, please let me know and I can do that.
I'm able to reproduce this issue with the latest errata rpms as well. This is a regression from the rhel6.2 rpms. 2.6.32-251.el6.x86_64 lvm2-2.02.95-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 lvm2-libs-2.02.95-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 lvm2-cluster-2.02.95-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.74-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 device-mapper-libs-1.02.74-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 device-mapper-event-1.02.74-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 device-mapper-event-libs-1.02.74-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 cmirror-2.02.95-2.el6 BUILT: Fri Mar 16 08:39:54 CDT 2012 Mar 22 14:38:41 grant-01 kernel: INFO: task clvmd:7411 blocked for more than 120 seconds. Mar 22 14:38:41 grant-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 22 14:38:41 grant-01 kernel: clvmd D 0000000000000000 0 7411 1 0x00000080 Mar 22 14:38:41 grant-01 kernel: ffff88011b7c5b18 0000000000000082 0000000000000000 ffffffffa00041fc Mar 22 14:38:41 grant-01 kernel: ffff88011b7c5ae8 0000000012bfce9b ffff88011b7c5b08 ffff880102c33ec0 Mar 22 14:38:41 grant-01 kernel: ffff88011b80f098 ffff88011b7c5fd8 000000000000fb48 ffff88011b80f098 Mar 22 14:38:41 grant-01 kernel: Call Trace: Mar 22 14:38:41 grant-01 kernel: [<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod] Mar 22 14:38:41 grant-01 kernel: [<ffffffff814f81b3>] io_schedule+0x73/0xc0 Mar 22 14:38:41 grant-01 kernel: [<ffffffff811b473e>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90 Mar 22 14:38:41 grant-01 kernel: [<ffffffff8125b7bd>] ? get_disk+0x7d/0xf0 Mar 22 14:38:41 grant-01 kernel: [<ffffffff811b4c2e>] __blockdev_direct_IO+0x5e/0xd0 Mar 22 14:38:41 grant-01 kernel: [<ffffffff811b14f0>] ? blkdev_get_blocks+0x0/0xc0 [...] Can you try to remove (you can just delete it) lvmetad udev file?
/lib/udev/rules.d/69-dm-lvm-metad.rules
and retry?
What happens here:
This rule always calls on every change event (without any check if lvmetad is running)
ENV{ID_FS_TYPE}=="LVM2_member|LVM1_member", RUN+="$env{LVM_SBIN_PATH}/lvm pvscan --cache --major $major --minor $minor"
For some reason, this code always tries to take VG_GLOBAL READ lock, this is propagated across the cluster and deadlocks clvmd.
This code is completely broken. It MUST NOT take any lock here, especially when lvmetad is not running (default).
Created attachment 573019 [details]
pvmove + clvmd logs
To reporoduce, you need two volumes active on one moved PV:
[root@bar-01 ~]# vgchange -a y vg_test
2 logical volume(s) in volume group "vg_test" now active
[root@bar-01 ~]# lvs -o +devices vg_test
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert Devices
lv1 vg_test -wi-a--- 4.00m /dev/sdf(0)
lv2 vg_test -wi-a--- 4.00m /dev/sdf(1)
and now run
#pvmove -i1 -vvvv /dev/sdf
and of course pvscan is in ps ax: bar-01 (local) 13116 ? S< 0:00 /sbin/lvm pvscan --cache --major 8 --minor 64 13127 ? S< 0:00 /sbin/lvm pvscan --cache --major 8 --minor 80 8:80 is /dev/sdf 8:64 is /dev/sde bar-02 (remote) 18243 ? S< 0:00 /sbin/lvm pvscan --cache --major 8 --minor 80 Here is the description what happens here, new udev rule is just trigger, bug was introduced in pvmove by Alasdair's fixes...
- you have segmented pvmove (2 or more LVs active on one PV moving)
- you have running clvmd (no need to clustered pvmove, exclusive pvmove on one node is enough)
- pvmove suspends top-level devices and configures additional pvmove mirror device,
then resumes all -> pvmove starts
- when the first segment (LV) finishes move, poll daemon updates metadata (vg_write)
- vg_write triggers watch rule
- shortly after pvmove suspends temporary mirror (but not top-level devices)
- vg_write, asynchronously, triggers multiple pvscan --cache on PVs from udev
- pvscan(s) take VG_GLOBAL read lock
- clvmd receives VG_GLOBAL lock and calls refresh cache (expected effect of VG_GLOBAL)
- inside cache refresh, clvmd tries to read top-level device (which is not suspended) but waits
because underlying pvmove layer device IS suspended
- clvmd is now blocked inside lvm lock, queuing all operations (including pvmove resume itself)
Logic in pvmove was changed by the patch
commit 76564e3168b61b79c7486a034b2b53819ee4b28f
Author: Alasdair Kergon <agk>
Date: Sat Jun 11 00:03:06 2011 +0000
Major pvmove fix to issue ioctls in the correct order when multiple LVs
are affected by the move. (Currently it's possible for I/O to become
trapped between suspended devices amongst other problems.
The current fix was selected so as to minimise the testing surface. I
hope eventually to replace it with a cleaner one that extends the
deptree code.
Some lvconvert scenarios still suffer from related problems.
This patch DOES NOT suspend top level devices on next segment update, only temporary mirror in between.
So the usual logic (ignoring mirrors and suspended devices) doesn't work here to filter top-level devices out.
Previously (before the patch), both top-level devices and temporary mirror layer were suspended, so cache scan ignored these devices.
Not suspending top level device seems to be in conflict with lvm internal principles, so without explanation from Alasdair we cannot fix this - it can reintroduce some bug which was perhaps fixed by this patch.
I see 3 problems there: 1) pvscan --cache from the udev rule for lvmetad should NOT be getting a clvmd VG_GLOBAL lock: we do not yet support lvmetad in combination with clvmd. If pvscan --cache is run and locking_type is not 0,1 or 5, the code should exit without getting any locks or updating anything. 2) the 'is this dev suspended?' check needs fixing to check also recursively if any device this dev is using is suspended if it's not doing that already 3) the vg_write is not changing anything that udev cares about so it should be exempt from watch rules if we can find a way to do that "Not suspending top level device seems to be in conflict with lvm internal principles" - the table of the top-level device is not changing so there is no point in suspending it. Dev A uses Dev B that uses Dev C. Only the table for Dev B is changing, so Dev B needs reloading. Dev A is unaffected. E.g. This becomes important if there are 100 Dev As using Dev B. After the patch only 1 dev is suspended and resumed. Before the patch, 101 devs got suspended and resumed - a considerable amount of extra work for no reason. I think change 2) is very expensive compared with occasional need of having 100 devs suspend and resumed - and would have to be done for every tree operation - while those potentially unreasonable suspend/resume happens only on very few occasions (and probably with significantly smaller number of devs) Moreover it makes it FAR more obvious from the plan check of dmsetup info -c why device has stopped accepting any bio and waits for something (suspend device below) I'd make things simpler and would rather stay with starting suspend from toplevel devices (and there is still my acceleration patch for avoid suspend of devices deeper in the tree - which is IMHO much better optimization we should focus on) 3) avoiding watch rule for vg_write would be hard - you can't control them - user may add as many watches as they want... 1) pvscan is accessing same device which is locked by another VG lock while only using global lock - IMHO this breaks some internal assumption in the lvm code - also it's quite questionable from the performance point of view that each modified device invokes immediate parallel scan from udev rule... - i.e. 1 VG on 300PVs this is highly inefficient.... (In reply to comment #9) > 3) the vg_write is not changing anything that udev cares about so it should be > exempt from watch rules if we can find a way to do that (In reply to comment #10) > 3) avoiding watch rule for vg_write would be hard - you can't control them - > user may add as many watches as they want... Also, unfortunately, we can't control watch rule behaviour for *non-dm devices* (no cookie). And this one is exactly the case - the "pvscan --cache" is triggered based on the watch rule set in 60-persistent-storage.rules where it's set for underlying block devices in general. So the solution can't count with 3). (In reply to comment #8) > - when the first segment (LV) finishes move, poll daemon updates metadata > (vg_write) > > - vg_write triggers watch rule > - shortly after pvmove suspends temporary mirror (but not top-level devices) > > - vg_write, asynchronously, triggers multiple pvscan --cache on PVs from udev The watch udev rule has been removed in lvm2-2.02.95-3.el6 build, so we should not run into this conrete problem anymore. But still, we need to think about how to avoid such situations to happen and consider all consequences in general when the global lock is taken and it fires the rescan like we saw here... That bug disappeare with watch rule removal. lvmetad & cluster is not yet supported configuration, closing this fixed in currrent release. Reopening this now that lvmetad is supported in 6.4. What is the solution, to have a document telling everyone that they need to remove the udev rules? Furthermore, as soon as I kill lvmetad and turn it off in lvm.conf, I can start creating cluster volumes again. Then, turn it back on, start it again, and I can no longer create them. 2.6.32-330.el6.x86_64 lvm2-2.02.98-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 lvm2-libs-2.02.98-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 lvm2-cluster-2.02.98-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 udev-147-2.43.el6 BUILT: Thu Oct 11 05:59:38 CDT 2012 device-mapper-1.02.77-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 device-mapper-libs-1.02.77-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 device-mapper-event-1.02.77-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 device-mapper-event-libs-1.02.77-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 cmirror-2.02.98-2.el6 BUILT: Tue Oct 16 05:15:59 CDT 2012 I did a scratch build with a change that I believe is the cause of the problem (the problem might be using the global lock in pvscan --cache while that one is called in udev rules at the same time). Please, give it a try whether you can still hit the issues reported. Thanks. https://brewweb.devel.redhat.com/taskinfo?taskID=5014490 Thing with cluster is that lvmetad *should* detect locking_type=3 used and if it's used, it should do everything without lvmetad - so a fallback to the classical original way (as we don't support lvmetad + cluster now). However, the 69-dm-lvmetad.rules does not know whether the device processed is clustered or not (and doesn't need to!) - it calls pvscan --cache and that one takes global lock. Either we must remove the global lock (which is the scratch build I attached in comment #22) OR we must check whether locking_type=3 is used and if yes, we should exit from pvscan cleanly, effectively doing no operation until cluster is supported with pvscan cache. And I'm afraid we're not doing that right now. At the end the bug is still present with the rules present though the reproducibility is lower than it used to be. IOW, we need locking_type=3 => use_lvmetad=0, not matter what the current setting of use_lvmetad is! So, it seems, we don't have such check yet! This must be documented as "Known Issue" in 6.4 documentation. Mornfall is preparing a simple patch that will cover this hole (a fallback to use_lvmetad=0 in case cluster locking is used). Patched, I'll add that in next 6.4 build. See also bug #814779 comment #9. So here's a 6.4 scratch with the patch from previous comment: https://brewweb.devel.redhat.com/taskinfo?taskID=5015211 There's now a warning message and a fallback to locking type 1 (file-based locking) if lvmetad is used with locking type 3 (clustered) which is not yet supported - bug #814779. Is it supposed to turn of use_lvmetad or to move to locking_type 1? So far any command executed under cluster with lvmetad enabled and running gives out: (09:16:38) [root@r6-node01:~]$ lvs WARNING: configuration setting use_lvmetad overriden to 0 due to locking_type 3. Clustered environment not supported by lvmetad yet. WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root VolGroup -wi-ao--- 7.54g lv_swap VolGroup -wi-ao--- 1.97g So it turns off use_lvmetad setting in the lvm.conf and keeps clustered locking. This in its own right is not an issue for me, just asking if this is what was intended since it differs from what is said in #Comment 31 . It's supposed to turn off lvmetad. The pvmove tests pass now with the latest rpms, giving the warning that Nenad mentioned in comment #33. WARNING: configuration setting use_lvmetad overriden to 0 due to locking_type 3. Clustered environment not supported by lvmetad yet. WARNING: configuration setting use_lvmetad overriden to 0 due to locking_type 3. Clustered environment not supported by lvmetad yet. Marking Verified. 2.6.32-338.el6.x86_64 lvm2-2.02.98-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 lvm2-libs-2.02.98-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 lvm2-cluster-2.02.98-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 udev-147-2.43.el6 BUILT: Thu Oct 11 05:59:38 CDT 2012 device-mapper-1.02.77-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 device-mapper-libs-1.02.77-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 device-mapper-event-1.02.77-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 device-mapper-event-libs-1.02.77-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 cmirror-2.02.98-3.el6 BUILT: Mon Nov 5 06:45:48 CST 2012 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |
Description of problem: CLVMD crashed at the end of pvmove while updating LVM metadata. Version-Release number of selected component (if applicable): lvm2-2.02.96-0.75.el6.x86_64 lvm2-libs-2.02.96-0.75.el6.x86_64 lvm2-cluster-2.02.96-0.75.el6.x86_64 cmirror-2.02.96-0.75.el6.x86_64 device-mapper-1.02.75-0.75.el6.x86_64 device-mapper-libs-1.02.75-0.75.el6.x86_64 device-mapper-event-libs-1.02.75-0.75.el6.x86_64 device-mapper-event-1.02.75-0.75.el6.x86_64 kernel-2.6.32-220 and greater (last tried 251) How reproducible: Everytime Steps to Reproduce: Here's what's been done: [root@node02:~]$ lvcreate -n linear -L 3.5G pv_shuffle_A Logical volume "linear" created [root@node02:~]$ lvcreate -i 3 -n stripe -L 3.5G pv_shuffle_A Using default stripesize 64.00 KiB Rounding size (896 extents) up to stripe boundary size (897 extents) Logical volume "stripe" created [root@node02:~]$ lvcreate -n linear -L 3.5G pv_shuffle_B Logical volume "linear" created [root@node02:~]$ lvcreate -i 3 -n stripe -L 3.5G pv_shuffle_B Using default stripesize 64.00 KiB Rounding size (896 extents) up to stripe boundary size (897 extents) Logical volume "stripe" created [root@node02:~]$ vgextend pv_shuffle_A /dev/sdf1 Volume group "pv_shuffle_A" successfully extended [root@node02:~]$ vgextend pv_shuffle_B /dev/sdg2 Volume group "pv_shuffle_B" successfully extended [root@node02:~]$ vgchange -an pv_shuffle_A 0 logical volume(s) in volume group "pv_shuffle_A" now active [root@node02:~]$ vgchange -an pv_shuffle_B 0 logical volume(s) in volume group "pv_shuffle_B" now active [root@node02:~]$ pvs PV VG Fmt Attr PSize PFree /dev/sda1 pv_shuffle_B lvm2 a-- 4.88g 212.00m /dev/sda2 pv_shuffle_A lvm2 a-- 4.88g 4.88g /dev/sdb1 pv_shuffle_A lvm2 a-- 4.88g 3.71g /dev/sdb2 pv_shuffle_A lvm2 a-- 4.88g 212.00m /dev/sdd1 pv_shuffle_B lvm2 a-- 4.88g 3.71g /dev/sdd2 pv_shuffle_B lvm2 a-- 4.88g 3.71g /dev/sde1 pv_shuffle_A lvm2 a-- 4.88g 4.88g /dev/sde2 pv_shuffle_A lvm2 a-- 4.88g 3.71g /dev/sdf1 pv_shuffle_A lvm2 a-- 4.88g 4.88g /dev/sdf2 lvm2 a-- 4.88g 4.88g /dev/sdg1 lvm2 a-- 4.88g 4.88g /dev/sdg2 pv_shuffle_B lvm2 a-- 4.88g 4.88g /dev/sdi1 pv_shuffle_B lvm2 a-- 4.88g 4.88g /dev/sdi2 pv_shuffle_B lvm2 a-- 4.88g 4.88g /dev/vda2 VolGroup lvm2 a-- 9.51g 0 [root@node02:~]$ pvmove -v /dev/sdb2 /dev/sdf1 Actual results: [root@node02:~]$ pvmove -v /dev/sdb2 /dev/sdf1 Finding volume group "pv_shuffle_A" Executing: /sbin/modprobe dm-log-userspace Archiving volume group "pv_shuffle_A" metadata (seqno 4). Creating logical volume pvmove0 Moving 896 extents of logical volume pv_shuffle_A/linear Moving 299 extents of logical volume pv_shuffle_A/stripe Updating volume group metadata Creating volume group backup "/etc/lvm/backup/pv_shuffle_A" (seqno 5). Checking progress before waiting every 15 seconds /dev/sdb2: Moved: 0.3% /dev/sdb2: Moved: 10.3% /dev/sdb2: Moved: 19.2% /dev/sdb2: Moved: 21.8% /dev/sdb2: Moved: 28.9% /dev/sdb2: Moved: 38.7% /dev/sdb2: Moved: 44.4% /dev/sdb2: Moved: 50.9% /dev/sdb2: Moved: 62.2% /dev/sdb2: Moved: 71.2% /dev/sdb2: Moved: 72.6% /dev/sdb2: Moved: 75.0% Updating volume group metadata Error locking on node node03: Command timed out Error locking on node node02: Command timed out Error locking on node node01: Command timed out Mar 14 05:54:39 node02 kernel: INFO: task clvmd:2018 blocked for more than 120 seconds. Mar 14 05:54:39 node02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 14 05:54:39 node02 kernel: clvmd D 0000000000000000 0 2018 1 0x00000000 Mar 14 05:54:39 node02 kernel: ffff880037439b18 0000000000000082 ffff8800ffffffff 00000402c6b7f594 Mar 14 05:54:39 node02 kernel: ffffffffa0002720 ffff88003748d980 0000000000057db0 ffffffffb09f8e19 Mar 14 05:54:39 node02 kernel: ffff88003d0e1ab8 ffff880037439fd8 000000000000fb48 ffff88003d0e1ab8 Mar 14 05:54:39 node02 kernel: Call Trace: Mar 14 05:54:39 node02 kernel: [<ffffffffa0002720>] ? dm_unplug_all+0x0/0x70 [dm_mod] Mar 14 05:54:39 node02 kernel: [<ffffffff8109b5d9>] ? ktime_get_ts+0xa9/0xe0 Mar 14 05:54:39 node02 kernel: [<ffffffff814f81b3>] io_schedule+0x73/0xc0 Mar 14 05:54:39 node02 kernel: [<ffffffff811b473e>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90 Mar 14 05:54:39 node02 kernel: [<ffffffff8125b7bd>] ? get_disk+0x7d/0xf0 Mar 14 05:54:39 node02 kernel: [<ffffffff811b4c2e>] __blockdev_direct_IO+0x5e/0xd0 Mar 14 05:54:39 node02 kernel: [<ffffffff811b14f0>] ? blkdev_get_blocks+0x0/0xc0 Mar 14 05:54:39 node02 kernel: [<ffffffff8127065a>] ? kobject_get+0x1a/0x30 Mar 14 05:54:39 node02 kernel: [<ffffffff811b2357>] blkdev_direct_IO+0x57/0x60 Mar 14 05:54:39 node02 kernel: [<ffffffff811b14f0>] ? blkdev_get_blocks+0x0/0xc0 Mar 14 05:54:39 node02 kernel: [<ffffffff8111491b>] generic_file_aio_read+0x6bb/0x700 Mar 14 05:54:39 node02 kernel: [<ffffffff81216d11>] ? avc_has_perm+0x71/0x90 Mar 14 05:54:39 node02 kernel: [<ffffffff8121080f>] ? security_inode_permission+0x1f/0x30 Mar 14 05:54:39 node02 kernel: [<ffffffff8117901a>] do_sync_read+0xfa/0x140 Mar 14 05:54:39 node02 kernel: [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40 Mar 14 05:54:39 node02 kernel: [<ffffffff811b192c>] ? block_ioctl+0x3c/0x40 Mar 14 05:54:39 node02 kernel: [<ffffffff8118bfc2>] ? vfs_ioctl+0x22/0xa0 Mar 14 05:54:39 node02 kernel: [<ffffffff8121c2fb>] ? selinux_file_permission+0xfb/0x150 Mar 14 05:54:39 node02 kernel: [<ffffffff8120f6a6>] ? security_file_permission+0x16/0x20 Mar 14 05:54:39 node02 kernel: [<ffffffff81179a15>] vfs_read+0xb5/0x1a0 Mar 14 05:54:39 node02 kernel: [<ffffffff81179b51>] sys_read+0x51/0x90 Mar 14 05:54:39 node02 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b clmvd gets stuck and times out. Expected results: pvmove should complete successfully Additional info: