Bug 742607 - Add wipe_table command to dmsetup to release underlying devices held open
Summary: Add wipe_table command to dmsetup to release underlying devices held open
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.1
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Alasdair Kergon
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 756082
TreeView+ depends on / blocked
 
Reported: 2011-09-30 18:13 UTC by Aleksandr Brezhnev
Modified: 2018-11-29 21:43 UTC (History)
12 users (show)

Fixed In Version: lvm2-2.02.95-1.el6
Doc Type: Bug Fix
Doc Text:
dmsetup has a new command "wipe_table" to wipe the table of the device. Any subsequent I/O sent to the device returns errors. Any devices used by the table (i.e. devices to which the I/O is forwarded) are closed. This could be useful, for example, if a long-running process keeps a device open after it has finished using it and you need to release the underlying devices before that process exits.
Clone Of:
Environment:
Last Closed: 2012-06-20 15:00:08 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0962 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2012-06-19 21:12:11 UTC

Description Aleksandr Brezhnev 2011-09-30 18:13:23 UTC
Description of problem:

If a device-mapper device is open by some process it cannot be removed with the
"dmsetup [--force] remove device_name" command. The "--force" option just fails reporting that the device is busy. This behaviour seams natural but it limits the flexibility of the disk handling because the underlying block device cannot be detached from the system.

The real use case where this limitation shows itself is related to Oracle RAC with ASM but without ASMLib.


Version-Release number of selected component (if applicable):
device-mapper-1.02.62-3.el6
kernel-2.6.32-131.12.1.el6

The same issue exists in RHEL5.

How reproducible:

Setup a virtual machine with RHEL5 or RHEL6. 
Add 6 SCSI local disks (seen as sdb to sdg).  
The following preparation is done on the disks:

for i in sdb sdc sdd sde sdf sdg ; do
        parted /dev/$i mklabel msdos
        parted /dev/$i mkpart primary ext3 64s 100%
        SIZE=`blockdev --getsz /dev/$i`
        ID=`cat /sys/block/$i/dev` 
        echo  "0 $SIZE linear $ID 0" | /sbin/dmsetup create ASM-$i
        kpartx -a /dev/mapper/ASM-${i-NULL}
        chown grid.asmadmin /dev/mapper/ASM-${i-NULL}"1"
done

Note: In a real life Oracle ASM instance will use multipath devices but here we just create a simple linear maps on top of SCSI disks to get device-mapper devices to be able to apply dmsetup commands.

3 disks are used to setup the ASM diskgroup holding the clusterware OCR and voting devices (sdb,sdc,sdd).

The Oracle Grind infrastructure installer was used to create a single node cluster with this diskgroup.

Once the installation completed and the cluster running, you can see the open devices from the ASM instance and also the file descriptors from the OS:

ASMCMD> lsod
Instance Process                       OSPID Path                 
1        oracle@nlprcn1256 (DBW0)      20510 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (DBW0)      20510 /dev/mapper/ASM-sdc1 
1        oracle@nlprcn1256 (DBW0)      20510 /dev/mapper/ASM-sdd1 
1        oracle@nlprcn1256 (GMON)      20520 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (GMON)      20520 /dev/mapper/ASM-sdc1 
1        oracle@nlprcn1256 (GMON)      20520 /dev/mapper/ASM-sdd1 
1        oracle@nlprcn1256 (LGWR)      20512 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (LGWR)      20512 /dev/mapper/ASM-sdc1 
1        oracle@nlprcn1256 (LGWR)      20512 /dev/mapper/ASM-sdd1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sdc1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sdc1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sdd1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sdd1 
1        oracle@nlprcn1256 (TNS V1-V3) 20658 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (TNS V1-V3) 20666 /dev/mapper/ASM-sdb1 
1        oracle@nlprcn1256 (TNS V1-V3) 20658 /dev/mapper/ASM-sdc1 
1        oracle@nlprcn1256 (TNS V1-V3) 20658 /dev/mapper/ASM-sdd1

# lsof /dev/mapper/ASM-sd*1
COMMAND     PID USER   FD   TYPE DEVICE SIZE       NODE NAME
ocssd.bin 18994 grid  256u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
ocssd.bin 18994 grid  257u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
ocssd.bin 18994 grid  258u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
ocssd.bin 18994 grid  259u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
ocssd.bin 18994 grid  260u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
ocssd.bin 18994 grid  261u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
ocssd.bin 18994 grid  262u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
ocssd.bin 18994 grid  263u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
ocssd.bin 18994 grid  264u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle    20510 grid  256u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
oracle    20510 grid  257u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle    20510 grid  258u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
oracle    20512 grid  256u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle    20512 grid  257u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
oracle    20512 grid  258u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
oracle    20518 grid  256u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
oracle    20518 grid  257u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
oracle    20518 grid  258u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle    20520 grid  256u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle    20520 grid  257u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
oracle    20520 grid  258u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
oracle    20658 grid  256u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
oracle    20658 grid  257u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle    20658 grid  258u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1
oracle    20666 grid  256u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1

Now perform migration of these disks to the 3 unused disks to free them from the ASM instance.

Confirm new candidate disks:

ASMCMD> lsdsk --candidate
Path
/dev/mapper/ASM-sde1
/dev/mapper/ASM-sdf1
/dev/mapper/ASM-sdg1

Add the new and drop the old disks in one ASM rebalance operation:

$ sqlplus / as sysasm
SQL> alter diskgroup DG_SYS_NLSMCL1602 add disk '/dev/mapper/ASM-sde1','/dev/mapper/ASM-sdf1','/dev/mapper/ASM-sdg1' rebalance power 0 ;
Diskgroup altered.

SQL> alter diskgroup DG_SYS_NLSMCL1602 drop disk 'DG_SYS_NLSMCL1602_0001','DG_SYS_NLSMCL1602_0002','DG_SYS_NLSMCL1602_0003' rebalance power 11 ;
Diskgroup altered.

Wait for the rebalance operation to complete

ASMCMD> lsop
Group_Name  Dsk_Num  State  Power  
ASMCMD>

We can now confirm the ASM instance no longer use the disks.  

ASMCMD> lsod
Instance Process                       OSPID Path                 
1        oracle@nlprcn1256 (DBW0)      20510 /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (DBW0)      20510 /dev/mapper/ASM-sdf1 
1        oracle@nlprcn1256 (DBW0)      20510 /dev/mapper/ASM-sdg1 
1        oracle@nlprcn1256 (GMON)      20520 /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (GMON)      20520 /dev/mapper/ASM-sdf1 
1        oracle@nlprcn1256 (GMON)      20520 /dev/mapper/ASM-sdg1 
1        oracle@nlprcn1256 (LGWR)      20512 /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (LGWR)      20512 /dev/mapper/ASM-sdf1 
1        oracle@nlprcn1256 (LGWR)      20512 /dev/mapper/ASM-sdg1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (RBAL)      20518 /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (TNS V1-V3) 8647  /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (TNS V1-V3) 8647  /dev/mapper/ASM-sdf1 
1        oracle@nlprcn1256 (TNS V1-V3) 8647  /dev/mapper/ASM-sdg1 
1        oracle@nlprcn1256 (X000)      8362  /dev/mapper/ASM-sde1 
1        oracle@nlprcn1256 (X000)      8362  /dev/mapper/ASM-sdf1 
1        oracle@nlprcn1256 (X000)      8362  /dev/mapper/ASM-sdg1 

ASMCMD> lsdsk --candidate
Path
/dev/mapper/ASM-sdb1
/dev/mapper/ASM-sdc1
/dev/mapper/ASM-sdd1
ASMCMD>

The disks ASM-sdb, ASM-sdc, ASM-sdd are no longer used by the ASM instance as reported by ASMCMD but at the OS level:

# lsof /dev/mapper/ASM-sd[b,c,d]1
COMMAND   PID USER   FD   TYPE DEVICE SIZE       NODE NAME
oracle  20658 grid  256u   BLK 253,10      1002747817 /dev/mapper/ASM-sdd1
oracle  20658 grid  257u   BLK  253,6      1002747487 /dev/mapper/ASM-sdb1
oracle  20658 grid  258u   BLK  253,8      1002747717 /dev/mapper/ASM-sdc1

The PID 20658 is the Oracle grid process oracle+ASM1_ocr.

With these open file descriptors remaining, the device-mapper device cannot be dropped:

# dmsetup info ASM-sdb1
Name:              ASM-sdb1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 6
Number of targets: 1

# kpartx -d /dev/mapper/ASM-sdb
device-mapper: remove ioctl failed: Device or resource busy
# dmsetup -f remove /dev/mapper/ASM-sdb
device-mapper: remove ioctl failed: Device or resource busy 
Command failed

If the disk group contained RDBMS files, you would see the same issue but with other types of processes like front-end process (every client connection) and backend processes (dbwr,lgwr,etc..)

Actual results:
# dmsetup -f remove /dev/mapper/ASM-sdb
device-mapper: remove ioctl failed: Device or resource busy 
Command failed

Expected results:
The kernel and device-mapper utilities should allow to force the removal of open devices. The mapping table should be changed to one special map that fails all I/O requests. The existing device-mapper device and underlying block devices should be released. The corresponding device-mapper device should be removed from the system so it can be reused later without reboot.

Additional info:

Comment 2 Alasdair Kergon 2011-09-30 22:45:22 UTC
Currently, we cannot remove a device while it is still open.
We cannot remove a device while there is 'stuck' I/O in layers below us.
These kernel restrictions are unlikely to be removed.  (Would be major upstream work - not really a dm matter.)

The dmsetup man page says for --force:

  if a device can't be removed because
  an uninterruptible process is waiting for I/O to return from it,
  adding  --force  will  replace the table with one that fails all
  I/O, which might allow the process to be killed.


Now if the device is still open, but has no I/O and is effectively 'dead', the best you can do is rename it out of the way.  However, if you then create a 'new' version of the dm device you will need to give it a different dm uuid as uuids cannot be changed and must be unique.

(Reminds me of an old 'chfd' program I used to use, which could have solved this problem - it let you exchange arbitrary fds between processes!)

Comment 6 Alasdair Kergon 2012-01-18 00:10:04 UTC
If there is no i/o in flight, a 'dmsetup load' followed by 'dmsetup resume' can be used to replace the table that points to the disk with an 'error' table that would return -EIO for any further I/O sent to the device.  This would release the underlying devices, but the dm device itself remains for as long as it remains open.

If that solution fits the problem, then it would be easy to provide a new dmsetup command to perform it, saving having to work out (or script) the exact parameters for 'dmsetup load' every time.

Comment 7 Alasdair Kergon 2012-01-19 00:42:08 UTC
That command would be 'dmsetup wipe_table' which I've committed upstream.

http://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=5c9bebee8a0410baebce675c928673e3fb565488

Is it sufficient for your needs?

Comment 9 Peter Rajnoha 2012-02-10 15:24:30 UTC
OK, marking this one as resolved. You can still use the "dmsetup remove -f" but this will always return error if that device is still opened.

The new "dmsetup wipe_table" will just replace the table with error target, not trying to remove the device itself, so we end up with success.

("remove -f" is actually "wipe_table" + "remove")

Comment 12 Alasdair Kergon 2012-04-25 23:58:49 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
dmsetup has a new command "wipe_table" to wipe the table of the device.  Any subsequent I/O sent to the device returns errors.  Any devices used by the table (i.e. devices to which the I/O is forwarded) are closed.

This could be useful, for example, if a long-running process keeps a device open after it has finished using it and you need to release the underlying devices before that process exits.

Comment 13 Corey Marthaler 2012-05-10 21:55:46 UTC
Fix verified in the latest rpms.
2.6.32-269.el6.x86_64
lvm2-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
lvm2-libs-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
lvm2-cluster-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
udev-147-2.41.el6    BUILT: Thu Mar  1 13:01:08 CST 2012
device-mapper-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-libs-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-event-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
device-mapper-event-libs-1.02.74-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012
cmirror-2.02.95-8.el6    BUILT: Wed May  9 03:33:32 CDT 2012


[root@hayes-01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/hayes-lv   97M  1.6M   91M   2% /mnt/foo

[root@hayes-01 ~]# dmsetup ls
hayes-lv        (253:3)

[root@hayes-01 ~]# dmsetup remove hayes-lv
device-mapper: remove ioctl on hayes-lv failed: Device or resource busy
Command failed

[root@hayes-01 ~]# dmsetup remove -f hayes-lv
device-mapper: remove ioctl on hayes-lv failed: Device or resource busy
Command failed

[root@hayes-01 ~]# dmsetup wipe_table hayes-lv

[root@hayes-01 ~]# dmsetup status
hayes-lv: 0 204800 error

Comment 15 errata-xmlrpc 2012-06-20 15:00:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html


Note You need to log in before you can comment on or make changes to this bug.