Bug 1520972

Summary: vdoformat does not clean up metadata properly after fail
Product: Red Hat Enterprise Linux 7 Reporter: Jakub Krysl <jkrysl>
Component: vdoAssignee: corwin <corwin>
Status: CLOSED ERRATA QA Contact: Jakub Krysl <jkrysl>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: awalsh, bjohnsto, corwin, jkrysl, limershe, rskvaril, salmy, sweettea
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.1.0.123 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 15:48:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jakub Krysl 2017-12-05 14:45:19 UTC
Description of problem:
When creating VDO, vdoformat is called to set up the volume. I found a situation where it fails and leaves new metadata on device, which prevents new vdo to be created there. The approach BZ 1512127 took to fix this does not prevent this behaviour, as vdo remove does not know it should clean this device.
There might be more ways to hit this, but the one I found is using --vdoSlabSize 128M on 5.5T disk to hit slab count limit (8192):


# hexdump -C /dev/sdc -n 8192
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000

# vdo create --name vdo --device /dev/sdc --verbose --vdoSlabSize 128M
Creating VDO vdo
    pvcreate -qq --test /dev/sdc
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 --slab-bits=15 /dev/sdc
vdo: ERROR - vdoformat: formatVDO failed on '/dev/sdc': VDO Status: Exceeds maximum number of slabs supported

# hexdump -C /dev/sdc -n 8192
00000000  64 6d 76 64 6f 30 30 31  05 00 00 00 04 00 00 00  |dmvdo001........|
00000010  00 00 00 00 5d 00 00 00  00 00 00 00 09 01 02 00  |....]...........|
00000020  59 a2 aa fe 9c 5f 05 00  00 b4 8f db 69 0d 46 a1  |Y...._......i.F.|
00000030  be a2 a6 6c 66 c8 ee 8f  00 00 00 00 01 00 00 00  |...lf...........|
00000040  00 00 00 00 01 00 00 00  d8 5c 0a 00 00 00 00 00  |.........\......|
00000050  00 ff ff ff 00 00 00 00  00 fe 96 a1 43 00 00 00  |............C...|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000

# vdo create --name vdo --device /dev/sdc --verbose
Creating VDO vdo
    pvcreate -qq --test /dev/sdc
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 /dev/sdc
vdo: ERROR - vdoformat: Cannot format device already containing a valid VDO!


vdo remove cannot fix this:
# vdo remove --all
# vdo remove --all --force
# vdo remove --device /dev/sdc --all
# vdo remove --device /dev/sdc --all --force
# vdo create --name vdo --device /dev/sdc --verbose
Creating VDO vdo
    pvcreate -qq --test /dev/sdc
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 /dev/sdc
vdo: ERROR - vdoformat: Cannot format device already containing a valid VDO!


vdo create --force can:
# vdo create --name vdo --device /dev/sdc --verbose --force
Creating VDO vdo
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 --force /dev/sdc
    vdodumpconfig /dev/sdc
Starting VDO vdo
    dmsetup status vdo
    modprobe kvdo
    vdodumpconfig /dev/sdc
    dmsetup create vdo --uuid VDO-708f6f14-3b68-4e07-9131-c1549c450661 --table '0 11721045168 dedupe /dev/sdc 4096 disabled 0 32768 16380 on sync vdo ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1'
    dmsetup status vdo
Starting compression on VDO vdo
    dmsetup message vdo 0 compression on
    dmsetup status vdo
VDO instance 25 volume is ready at /dev/mapper/vdo


But to user there is no hint to use vdo create --force not I believe it is the best approach, as this situation can be avoided with cleanup in case vdoformat fails. (checking the superblock for metadata and erasing it)

Version-Release number of selected component (if applicable):
vdo-6.1.0.72-12

How reproducible:
100%

Steps to Reproduce:
1. vdo create --name vdo --device /dev/sdc --verbose --vdoSlabSize 128M on large disk to exceed 8192 slabs.

Actual results:
some metadata created on the device

Expected results:
no metadata created on the device

Additional info:

Comment 2 bjohnsto 2017-12-05 18:34:16 UTC
I think this is a vdo manager issue.

Comment 3 corwin 2017-12-06 18:02:05 UTC
This can be fixed in vdoformat.

Comment 5 Jakub Krysl 2017-12-14 11:13:53 UTC
Tested with vdo-6.1.0.98-13:

# vdo create --name vdo --device /dev/sdc --verbose --vdoSlabSize 128M
Creating VDO vdo
    grep MemAvailable /proc/meminfo
    pvcreate -qq --test /dev/sdc
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 --slab-bits=15 /dev/sdc
vdo: ERROR - vdoformat: formatVDO failed on '/dev/sdc': VDO Status: Exceeds maximum number of slabs supported

# hexdump -C /dev/sdc -n 8192
00000000  64 6d 76 64 6f 30 30 31  05 00 00 00 04 00 00 00  |dmvdo001........|
00000010  00 00 00 00 5d 00 00 00  00 00 00 00 09 01 02 00  |....]...........|
00000020  51 04 22 23 4f 60 05 00  31 15 6f a5 0a e7 46 46  |Q."#O`..1.o...FF|
00000030  91 79 b2 b7 4a e6 82 de  00 00 00 00 01 00 00 00  |.y..J...........|
00000040  00 00 00 00 01 00 00 00  d8 5c 0a 00 00 00 00 00  |.........\......|
00000050  00 ff ff ff 00 00 00 00  00 91 39 37 8e 00 00 00  |..........97....|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000

# vdo create --name vdo --device /dev/sdc --verbose
Creating VDO vdo
    grep MemAvailable /proc/meminfo
    pvcreate -qq --test /dev/sdc
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 /dev/sdc
vdo: ERROR - vdoformat: Cannot format device already containing a valid VDO!


This bug is still reproducible, giving back.

Comment 6 Radka Brychtova 2018-01-08 15:47:55 UTC
Hi, I just hit in similar problem when I tested the blkid command for vdo device, and I never used vdo before. So i just add scsi device and tried to create vdo device. This command failed with message "Out of space"
And when I tried test blkid command - there was the vdo type set even if the command vdo failed before.

# modprobe scsi_debug dev_size_mb=1000
# blkid /dev/sda
<nothing>
#vdo create --name=vdo1 --device=/dev/sda --vdoLogicalSize=1M
Creating VDO vdo1
vdo: ERROR - vdoformat: formatVDO failed on '/dev/sda': VDO Status: Out of space
<return code 2>
# blkid /dev/sda
/dev/sda: UUID="801b85ef-bc0f-40f4-ae82-3a1cbe0f715c" TYPE="vdo" 

I assume it should NOT set the type if the vdo command failed.

#rpm -q util-linux vdo
util-linux-2.23.2-49.el7.x86_64
vdo-6.1.0.98-13.x86_64

Comment 8 corwin 2018-02-06 21:42:53 UTC
This was fixed in version 6.1.0.123 and should have been in the January 19th push.
The fix is to zero the geometry block at the start of format, and then only write a valid geometry block after a valid super block has been written.

The steps to reproduce listed above should now show that the problem does not recur.

Comment 10 Jakub Krysl 2018-02-12 15:22:23 UTC
Now the superblock is deleted even if the vdoFormat fails, so there is no metadata remaining:
# vdo create --name vdo --device /dev/sdc --verbose --vdoSlabSize 128M
Creating VDO vdo
    grep MemAvailable /proc/meminfo
    pvcreate -qq --test /dev/sdc
    modprobe kvdo
    vdoformat --uds-checkpoint-frequency=0 --uds-memory-size=0.25 --slab-bits=15 /dev/sdc
vdo: ERROR - vdoformat: formatVDO failed on '/dev/sdc': VDO Status: Exceeds maximum number of slabs supported

# hexdump -C /dev/sdc -n 4K
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

As there is checking before running vdoFormat to be sure there is nothing on the device already (using pvcreate --test), this should not be an issue as long as user does not use --force in which case he should know what he is doing.
Regression testing found no issues with this new approach.

Comment 13 errata-xmlrpc 2018-04-10 15:48:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0871