Bug 736509 - pvmove now fails to revert changes when left over pvmove target remains
Summary: pvmove now fails to revert changes when left over pvmove target remains
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Zdenek Kabelac
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 756082 765981
TreeView+ depends on / blocked
 
Reported: 2011-09-07 21:18 UTC by Corey Marthaler
Modified: 2012-03-27 17:16 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 765981 (view as bug list)
Environment:
Last Closed: 2012-03-27 17:16:33 UTC


Attachments (Terms of Use)

Description Corey Marthaler 2011-09-07 21:18:21 UTC
Description of problem:
This is a regression of the test case for bug 501473.

SCENARIO - [pvmove_suspend_verification]
Create a linear and a fake left over pvmove target and verify
that doesn't cause a pvmove attempt to leave the linear suspended
grant-01: lvcreate -n suspended -L 50M mirror_sanity
grant-01: dmsetup create mirror_sanity-pvmove0 --notable
Attempting pvmove of /dev/sdc3 on grant-01
grant-01: pvmove /dev/sdc3
  Error locking on node grant-01: device-mapper: create ioctl failed: Device or resource busy
  Failed to suspend suspended
Verifying the linear's dm state
grant-01: dmsetup info mirror_sanity-suspended | grep ACTIVE
grant-01: dmsetup info mirror_sanity-suspended | grep SUSPENDED
grant-01: dmsetup remove mirror_sanity-pvmove0
Deactivating mirror suspended...
[DEADLOCK]

qarshd[18179]: Running cmdline: lvchange -an /dev/mirror_sanity/suspended
udevd[540]: worker [18156] unexpectedly returned with status 0x0100
udevd[540]: worker [18156] failed while handling '/devices/virtual/block/dm-3'
kernel: INFO: task lvchange:18180 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: lvchange      D 0000000000000002     0 18180  18179 0x00000080
kernel: ffff88021c4f3b18 0000000000000086 ffff88021c4f3ad8 ffffffffa00041cc
kernel: ffff88021c4f3ae8 00000000996a433b ffff88021c4f3b08 ffff88011a3ea240
kernel: ffff88021cafa5f8 ffff88021c4f3fd8 000000000000f508 ffff88021cafa5f8
kernel: Call Trace:
kernel: [<ffffffffa00041cc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
kernel: [<ffffffff8109b769>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff814ec413>] io_schedule+0x73/0xc0
kernel: [<ffffffff811b15be>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90
kernel: [<ffffffff811b1aae>] __blockdev_direct_IO+0x5e/0xd0
kernel: [<ffffffff811ae3b0>] ? blkdev_get_blocks+0x0/0xc0
kernel: [<ffffffff811af217>] blkdev_direct_IO+0x57/0x60
kernel: [<ffffffff811ae3b0>] ? blkdev_get_blocks+0x0/0xc0
kernel: [<ffffffff811126bb>] generic_file_aio_read+0x6bb/0x700
kernel: [<ffffffff81213181>] ? avc_has_perm+0x71/0x90
kernel: [<ffffffff8120cc7f>] ? security_inode_permission+0x1f/0x30
kernel: [<ffffffff81175f3a>] do_sync_read+0xfa/0x140
kernel: [<ffffffff81090b70>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffff811ae7ec>] ? block_ioctl+0x3c/0x40
kernel: [<ffffffff81188ed2>] ? vfs_ioctl+0x22/0xa0
kernel: [<ffffffff8121877b>] ? selinux_file_permission+0xfb/0x150
kernel: [<ffffffff8120bb16>] ? security_file_permission+0x16/0x20
kernel: [<ffffffff81176935>] vfs_read+0xb5/0x1a0
kernel: [<ffffffff810d4602>] ? audit_syscall_entry+0x272/0x2a0
kernel: [<ffffffff81176a71>] sys_read+0x51/0x90
kernel: [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b


The problem is that instead of failing "gracefully" like it used to:

  device-mapper: create ioctl failed: Device or resource busy
  Temporary pvmove mirror activation failed.

It now attempts to do the pvmove and leaves pvmove targets on all nodes, causing any other lvm cmds to deadlock.

  Error locking on node grant-01: device-mapper: create ioctl failed: Device or resource busy
  Failed to suspend suspended


Version-Release number of selected component (if applicable):
2.6.32-192.el6.x86_64

lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011


How reproducible:
Everytime

Comment 1 Corey Marthaler 2011-09-08 14:27:31 UTC
Here's the same test case run in single machine mode on the current 6.2 rpms versus the 6.1 stable rpms.


### SINGLE NODE (Current 6.2 RPMS)

[root@grant-03 ~]# lvs -a -o +devices
  LV        VG            Attr   LSize  Devices         
  suspended mirror_sanity -wi-a- 52.00m /dev/sdc3(0)    
[root@grant-03 ~]# dmsetup create mirror_sanity-pvmove0 --notable
[root@grant-03 ~]# pvmove /dev/sdc3
  device-mapper: create ioctl failed: Device or resource busy
  Failed to suspend suspended

2.6.32-192.el6.x86_64
lvm2-2.02.87-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6    BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6    BUILT: Fri Aug 12 06:11:57 CDT 2011


### SINGLE NODE (6.1 RPMS)

[root@grant-03 ~]# lvs -a -o +devices
  LV        VG            Attr   LSize  Devices         
  suspended mirror_sanity -wi-a- 52.00m /dev/sdb1(0)    
[root@grant-03 ~]# dmsetup create mirror_sanity-pvmove0 --notable
[root@grant-03 ~]# dmsetup ls
mirror_sanity-suspended (253, 2)
mirror_sanity-pvmove0   (253, 4)
[root@grant-03 ~]# pvmove /dev/sdb1
  device-mapper: create ioctl failed: Device or resource busy
  Temporary pvmove mirror activation failed.

2.6.32-131.0.15.el6.x86_64
lvm2-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
udev-147-2.35.el6    BUILT: Wed Mar 30 07:32:05 CDT 2011
device-mapper-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011

Comment 3 Milan Broz 2011-09-15 12:09:29 UTC
Corey, can you try it reproduce with the latest scratch build (which contains retry on remove)? (lvm2-2.02.87-2.1.el6.x86_64)

Comment 4 Corey Marthaler 2011-09-15 14:47:08 UTC
This still fails with the latest scratch build.

[root@taft-01 ~]# vgcreate mirror_sanity /dev/sd[bcdefgh]1
  Volume group "mirror_sanity" successfully created

[root@taft-01 ~]# lvcreate -n suspended -L 50M mirror_sanity
  Rounding up size to full physical extent 52.00 MiB

[root@taft-01 ~]# lvs -a -o +devices
  LV        VG            Attr   LSize  Devices         
  suspended mirror_sanity -wi-a- 52.00m /dev/sdb1(0)    

[root@taft-01 ~]# dmsetup create mirror_sanity-pvmove0 --notable

[root@taft-01 ~]# dmsetup ls
mirror_sanity-suspended (253, 4)
mirror_sanity-pvmove0   (253, 3)

[root@taft-01 ~]# pvmove /dev/sdb1
  device-mapper: create ioctl failed: Device or resource busy
  Failed to suspend suspended


2.6.32-195.el6.x86_64

lvm2-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-libs-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-cluster-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
cmirror-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011

Comment 5 Milan Broz 2011-09-16 15:00:12 UTC
Well, there is slight incompatibility in dmsetup, you should fix test to use
dmsetup create mirror_sanity-pvmove0 --notable --addnodeoncreate

(but that's not the real problem though)

Comment 7 Corey Marthaler 2011-10-07 16:15:31 UTC
I've updated the test to create with the '--addnodeoncreate' flag.

grant-01: dmsetup create mirror_sanity-pvmove0 --notable --addnodeoncreate
Attempting pvmove of /dev/sdc6 on grant-01
  device-mapper: create ioctl failed: Device or resource busy
  Failed to suspend suspended

Comment 8 Corey Marthaler 2011-10-17 16:22:45 UTC
In the latest rpms as well.

SCENARIO - [pvmove_suspend_verification]
Create a linear and a fake left over pvmove target and verify
that doesn't cause a pvmove attempt to leave the linear suspended
grant-02: lvcreate -n suspended -L 50M mirror_sanity
grant-02: dmsetup create mirror_sanity-pvmove0 --notable --addnodeoncreate
Attempting pvmove of /dev/sdb1 on grant-02
Failed messages found, possible regression of 736509
  Error locking on node grant-02: device-mapper: create ioctl failed: Device or resource busy
  Failed to suspend suspended
Verifying the linear's dm state
grant-02: dmsetup remove mirror_sanity-pvmove0


qarshd[11716]: Running cmdline: lvchange -an /dev/mirror_sanity/suspended
kernel: INFO: task lvchange:11717 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: lvchange      D 0000000000000001     0 11717  11716 0x00000080
kernel: ffff880102061b18 0000000000000082 ffff880102061ad8 ffffffffa00041cc
kernel: ffff880102061ae8 00000000f187fa8f ffff880102061b08 ffff88021d169c80
kernel: ffff88011ad07af8 ffff880102061fd8 000000000000f508 ffff88011ad07af8
kernel: Call Trace:
kernel: [<ffffffffa00041cc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
kernel: [<ffffffff8109b779>] ? ktime_get_ts+0xa9/0xe0
kernel: [<ffffffff814ecc33>] io_schedule+0x73/0xc0
kernel: [<ffffffff811b17de>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90
kernel: [<ffffffff811b1cce>] __blockdev_direct_IO+0x5e/0xd0
kernel: [<ffffffff811ae5d0>] ? blkdev_get_blocks+0x0/0xc0
kernel: [<ffffffff811af437>] blkdev_direct_IO+0x57/0x60
kernel: [<ffffffff811ae5d0>] ? blkdev_get_blocks+0x0/0xc0
kernel: [<ffffffff811127db>] generic_file_aio_read+0x6bb/0x700
kernel: [<ffffffff81213741>] ? avc_has_perm+0x71/0x90
kernel: [<ffffffff8120d23f>] ? security_inode_permission+0x1f/0x30
kernel: [<ffffffff811761ca>] do_sync_read+0xfa/0x140
kernel: [<ffffffff81090b60>] ? autoremove_wake_function+0x0/0x40
kernel: [<ffffffff811aea0c>] ? block_ioctl+0x3c/0x40
kernel: [<ffffffff811890f2>] ? vfs_ioctl+0x22/0xa0
kernel: [<ffffffff81218d3b>] ? selinux_file_permission+0xfb/0x150
kernel: [<ffffffff8120c0d6>] ? security_file_permission+0x16/0x20
kernel: [<ffffffff81176bc5>] vfs_read+0xb5/0x1a0
kernel: [<ffffffff810d4612>] ? audit_syscall_entry+0x272/0x2a0
kernel: [<ffffffff81176d01>] sys_read+0x51/0x90
kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b


2.6.32-207.el6.x86_64

lvm2-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
lvm2-libs-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
lvm2-cluster-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
device-mapper-libs-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
device-mapper-event-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
device-mapper-event-libs-1.02.66-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011
cmirror-2.02.87-5.el6    BUILT: Wed Oct 12 10:47:46 CDT 2011

Comment 9 Corey Marthaler 2011-10-21 21:39:20 UTC
In the latest as well:

lvm2-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
lvm2-cluster-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011

Comment 12 Alasdair Kergon 2012-03-13 20:16:28 UTC
So.
Current code non-clustered gives me:

  device-mapper: create ioctl on vg2-pvmove0 failed: Device or resource busy
  Failed to suspend lvol0
  ABORTING: Volume group metadata update failed. (first_time: 1)

Checking the code, it's correct behaviour, even though the last message is bogus (the metadata update did not fail).

Comment 13 Alasdair Kergon 2012-03-13 20:23:41 UTC
Error message fixed upstream:

  device-mapper: create ioctl on vg2-pvmove0 failed: Device or resource busy
  Failed to suspend lvol0
  ABORTING: Temporary pvmove mirror activation failed.

http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/tools/pvmove.c.diff?r1=1.93&r2=1.94&cvsroot=lvm2

Comment 14 Alasdair Kergon 2012-03-13 20:31:47 UTC
So it's worth re-testing this with the 6.3 RPMs in a proper cluster now: apart from the cosmetic error message problem I can't get it to leave things in a mess here now.

Comment 15 Corey Marthaler 2012-03-14 22:28:37 UTC
This test case still fails with the latest rpms/kernel.

SCENARIO - [pvmove_suspend_verification]
Create a linear and a fake left over pvmove target and verify
that doesn't cause a pvmove attempt to leave the linear suspended
grant-02: lvcreate -n suspended -L 50M mirror_sanity
grant-02: dmsetup create mirror_sanity-pvmove0 --notable --addnodeoncreate

[root@grant-02 ~]# lvs -a -o +devices
  LV        VG            Attr     LSize  Devices         
  suspended mirror_sanity -wi-a--- 52.00m /dev/sdc6(0)    

[root@grant-02 ~]# dmsetup ls
mirror_sanity-suspended (253:2)
mirror_sanity-pvmove0   (253:4)

[root@grant-02 ~]# pvscan
  PV /dev/sdc6   VG mirror_sanity   lvm2 [54.49 GiB / 54.44 GiB free]
  PV /dev/sdc5   VG mirror_sanity   lvm2 [54.48 GiB / 54.48 GiB free]
  PV /dev/sdc3   VG mirror_sanity   lvm2 [54.49 GiB / 54.49 GiB free]
  PV /dev/sdc2   VG mirror_sanity   lvm2 [54.48 GiB / 54.48 GiB free]
  PV /dev/sdc1   VG mirror_sanity   lvm2 [54.49 GiB / 54.49 GiB free]
  PV /dev/sdb6   VG mirror_sanity   lvm2 [40.87 GiB / 40.87 GiB free]
  PV /dev/sdb5   VG mirror_sanity   lvm2 [40.86 GiB / 40.86 GiB free]
  PV /dev/sdb3   VG mirror_sanity   lvm2 [40.87 GiB / 40.87 GiB free]
  PV /dev/sdb2   VG mirror_sanity   lvm2 [40.87 GiB / 40.87 GiB free]
  PV /dev/sdb1   VG mirror_sanity   lvm2 [40.86 GiB / 40.86 GiB free]
  PV /dev/sda2   VG vg_grant02      lvm2 [74.01 GiB / 0    free]
  Total: 11 [550.79 GiB] / in use: 11 [550.79 GiB] / in no VG: 0 [0   ]


Attempting pvmove of /dev/sdc6 on grant-02

[root@grant-02 ~]# pvmove /dev/sdc6
  Error locking on node grant-02: device-mapper: create ioctl on mirror_sanity-pvmove0 failed: Device or resource busy
  Failed to suspend suspended
  ABORTING: Volume group metadata update failed. (first_time: 1)


2.6.32-251.el6.x86_64

lvm2-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
lvm2-libs-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
lvm2-cluster-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
device-mapper-libs-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
device-mapper-event-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
device-mapper-event-libs-1.02.74-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012
cmirror-2.02.95-1.el6    BUILT: Tue Mar  6 10:00:33 CST 2012

Comment 16 Zdenek Kabelac 2012-03-23 14:33:15 UTC
I think the current lvm package works properly for originally report.

There should not be left any device in  'suspend' state as reported in the description of this bugzilla.

The tool now currently properly 'aborts' since it finds conflicting device with name -pvmove0 and pvmove tool currently doesn't try to use any other name (we may think about smarter behavior in future)

So I think, the test needs to be fixed.

Abort is to be expected, but no suspended devices should be left.

So are there any devices left in suspend on any of cluster nodes ?
(Since in our local test environment we do not get them)

If not - I think this bug could be closed - eventually replaced with a new bz requesting smarter behavior.  Currently lvm does not expect user modifies dm tables and takes away device, lvm tries to use).


Note You need to log in before you can comment on or make changes to this bug.