Bug 1373711
Summary: | pool-build with --overwrite doesn't work for logical pool | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yisun |
Component: | libvirt | Assignee: | John Ferlan <jferlan> |
Status: | CLOSED ERRATA | QA Contact: | Pei Zhang <pzhang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.4 | CC: | dyuan, dzheng, jferlan, rbalakri, xuzhang, yanyang, ydu, yisun |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-3.2.0-4.el7 | Doc Type: | Enhancement |
Doc Text: |
Feature: Add support for OVERWRITE flags when building a logical pool
Reason: The logical pool had the build functionality, but had no way to force an overwrite if something existed on the target volumes.
Result: It's possible to use the OVERWRITE flags when building a logical pool.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-01 17:14:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1439132 | ||
Bug Blocks: |
Description
yisun
2016-09-07 03:14:58 UTC
The logical pool didn't support --overwrite or --no-overwrite, so if supplied you'd get an error. It was easy enough to add them, although mostly useless since part of the process for a logical backend is to overwrite the first sector of the disk w/ zeros (a requirement for pvcreate to even work). So, even if --no-overwrite is supplied, we'd blast away the previous partition table. However, without --overwrite you're stuck in a strange scenario needing to supply "-ff -y" to your own pvcreate /dev/sdX command in order to allow the second pool-build to succeed. NB: This also affects the "pool-create $file [--build]" command (it too can use the --[no-]overwrite option). I think the reason they weren't supported is because it didn't make any sense since "lvm2" is the only valid format. Running pool-build twice in a row probably doesn't make practical sense, but for testing purposes I can certainly see why this could be a problem. In any case, a posted patch upstream: http://www.redhat.com/archives/libvir-list/2016-November/msg00783.html After review - I decided to take a different approach and disallow the second pool-build entirely. The changes will search the existing list of vg's and if the to be built already exists, reject the build. Comparatively to the pv/vg commands: # dd if=/dev/zero of=/dev/sdX bs=512 count=1 # pvcreate /dev/sdX # vgcreate logical /dev/sdX Volume group "logical" successfully created # echo $? 0 # vgcreate logical /dev/sdX A volume group called logical already exists. # echo $? 5 # A v2 was posted: http://www.redhat.com/archives/libvir-list/2016-November/msg00810.html New approach, now at v3. Will add --overwrite/--no-overwrite to the logical pool to mimic the disk pool. That is - if nothing or --no-overwrite provided, then we'll check if something exists on the target volume*s* before executing the wipe of first 512 bytes followed by pvcreate followed by vgcreate. The updated patch found in patch 10/11 of the series: http://www.redhat.com/archives/libvir-list/2016-December/msg00683.html direct link is: http://www.redhat.com/archives/libvir-list/2016-December/msg00693.html but it doesn't make much sense with the rest of the series because there's a bit of setup needed to be able to probe the target path. Going to use the similar code found in fs and disk backends (probe via blkid API's if available - if not available, then use the parted code to scan). Patches are now pushed upstream: $ git describe f573f84eb732d143ca3cd963f180ec1ef7d1076f v2.5.0-368-gf573f84 $ Hi John, A small issue here. As step 3.2, it seems that --overwrite will execute pvcreate without -f. So it will fail to build if the block device already has a filesystem on it. Version : libvirt-3.0.0-2.el7.x86_64 qemu-kvm-rhev-2.8.0-5.el7.x86_64 Steps: 1. Prepare a block device # lsblk sdc 8:32 1 28.9G 0 disk 2. Define a logical pool as following # cat logical.xml <pool type='logical'> <name>logical</name> <uuid>04e3685b-137b-4f13-8eec-31949a9ee9f5</uuid> <capacity unit='bytes'>0</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>0</available> <source> <device path='/dev/sdc'/> <name>logical</name> <format type='lvm2'/> </source> <target> <path>/dev/logical</path> <permissions> <mode>0755</mode> </permissions> </target> </pool> 3. Define and build this logical pool #virsh pool-define logical 3.1 build with nothing, report errors as expected. # virsh pool-build logical error: Failed to build pool logical error: Storage pool already built: Format of device '/dev/sdc' does not match the expected format 'LVM2_member', forced overwrite is necessary 3.2 build with --overwrite, fail to build. # virsh pool-build logical --overwrite error: Failed to build pool logical error: internal error: Child process (/usr/sbin/pvcreate /dev/sdc) unexpected exit status 5: 2017-03-01 08:00:10.348+0000: 3247: debug : virFileClose:109 : Closed fd 25 2017-03-01 08:00:10.348+0000: 3247: debug : virFileClose:109 : Closed fd 27 2017-03-01 08:00:10.348+0000: 3247: debug : virFileClose:109 : Closed fd 23 WARNING: ext4 signature detected on /dev/sdc at offset 1080. Wipe it? [y/n]: [n] Aborted wiping of ext4. 1 existing signature left on the device. If you follow the response to the patch from comment 1: http://www.redhat.com/archives/libvir-list/2016-November/msg00797.html you'll note that adding '-ff' to the command line was not accepted. Still the --overwrite should work; however, it seems that the 'ext4' type that was already on /dev/sdc has a "header" greater than the 512 bytes that clearing code uses, thus 'entrails' are left. The above message indicates finding something at "offset 1080". I forget what type was on the device that I used during testing, but it must have had it's header within the 512 bytes that are wiped. My suggestion - create a new bz indicating that we need to increase that size from 512 to say 2048 or 4096. Of course as soon as one chooses a value, some other fs type that can be on the device will having something just a bit larger! I think the 512 was chosen since it was the smallest unit and care needs to be taken since it's possible to have very small disk sizes and providing some number larger than the disk size would be disastrous! Thank you so much for your info. File Bug 1430679 to track the issue. and re-verify this bug as following: Versions: libvirt-3.1.0-2.el7.x86_64 qemu-kvm-rhev-2.8.0-6.el7.x86_64 Steps: 1. check documentation for --(no)-overwrite flags #man virsh find pool-build, check the description "For a logical pool......" 2. build logical pool when no label on the device 2.1 check device label: # parted /dev/sdb p Error: /dev/sdb: unrecognised disk label ...... Disk Flags: 2.2 define a logical pool as following: # virsh pool-dumpxml logical <pool type='logical'> <name>logical</name> <uuid>04e3685b-137b-4f13-8eec-31949a9ee9f5</uuid> <capacity unit='bytes'>0</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>0</available> <source> <device path='/dev/sdb'/> <name>logical</name> <format type='lvm2'/> </source> <target> <path>/dev/logical</path> <permissions> <mode>0755</mode> </permissions> </target> </pool> 2.3 check existed pv # pvs ==output nothing 2.4 build logical pool with nothing, it should build successfully. # virsh pool-build logical Pool logical built # pvs PV VG Fmt Attr PSize PFree /dev/sdb logical lvm2 a-- 3.73g 3.73g # vgs VG #PV #LV #SN Attr VSize VFree logical 1 0 0 wz--n- 3.73g 3.73g 2.5 pool-build should report an error if pv, vg already exist. # virsh pool-build logical error: Failed to build pool logical error: Storage pool already built: Device '/dev/sdb' already formatted using 'LVM2_member' 2.6 pool-build should report an error if pv, vg already exist even with --overwrite. # virsh pool-build logical --overwrite error: Failed to build pool logical error: internal error: Child process (/usr/sbin/pvcreate /dev/sdb) unexpected exit status 5: Can't initialize physical volume "/dev/sdb" of volume group "logical" without -ff 3. build logical pool with --no-overwrite when no label on the device Repeat 2.1 ~ 2.6, build logical pool with --no-overwrite should get the same result with building with nothing. 4. build logical pool when the device already has a label. 4.1 check the label, such as xfs # blkid /dev/sdb /dev/sdb: UUID="37976053-79d0-4d55-8e34-f2b6a5651f57" TYPE="xfs" 4.2 define a pool as following : # virsh pool-dumpxml logical <pool type='logical'> <name>logical</name> <uuid>04e3685b-137b-4f13-8eec-31949a9ee9f5</uuid> <capacity unit='bytes'>0</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>0</available> <source> <device path='/dev/sdb'/> <name>logical</name> <format type='lvm2'/> </source> <target> <path>/dev/logical</path> <permissions> <mode>0755</mode> </permissions> </target> </pool> 4.2 build pool with nothing or with --no-overwrite, should report an error. # virsh pool-build logical error: Failed to build pool logical error: Storage pool already built: Format of device '/dev/sdb' does not match the expected format 'LVM2_member', forced overwrite is necessary # virsh pool-build logical --no-overwrite error: Failed to build pool logical error: Storage pool already built: Format of device '/dev/sdb' does not match the expected format 'LVM2_member', forced overwrite is necessary 4.3 build pool with --overwrite, build successfully # virsh pool-build logical --overwrite Pool logical built check created pv: # pvs PV VG Fmt Attr PSize PFree /dev/sdb logical lvm2 a-- 3.73g 3.73g # vgs VG #PV #LV #SN Attr VSize VFree logical 1 0 0 wz--n- 3.73g 3.73g 4.4 pool-build should report an error if pv, vg already exist. # virsh pool-build logical error: Failed to build pool logical error: Storage pool already built: Device '/dev/sdb' already formatted using 'LVM2_member' 4.5 pool-build should report an error if pv, vg already exist even with --overwrite. # virsh pool-build logical --overwrite error: Failed to build pool logical error: internal error: Child process (/usr/sbin/pvcreate /dev/sdb) unexpected exit status 5: Can't initialize physical volume "/dev/sdb" of volume group "logical" without -ff Hi John, I found the pool-build --overwrite will fail/succeed every other time... and the reason seems to be the when it's failed, the vg will be removed ~_~ root@localhost ~ ## virsh pool-dumpxml lpool <pool type='logical'> <name>lpool</name> <uuid>99bfb2be-fbdd-4af2-9b1f-e92003e138e3</uuid> <capacity unit='bytes'>0</capacity> <allocation unit='bytes'>0</allocation> <available unit='bytes'>0</available> <source> <device path='/dev/sdb'/> <name>logical</name> <format type='lvm2'/> </source> <target> <path>/dev/logical</path> <permissions> <mode>0755</mode> </permissions> </target> </pool> root@localhost ~ ## blkid /dev/sdb /dev/sdb: UUID="62801a9a-476d-4d8f-ab0c-f2ea63ec48c1" TYPE="ext4" root@localhost ~ ## virsh pool-build lpool error: Failed to build pool lpool error: Storage pool already built: Format of device '/dev/sdb' does not match the expected format 'LVM2_member', forced overwrite is necessary root@localhost ~ ## virsh pool-build lpool --overwrite Pool lpool built root@localhost ~ ## blkid /dev/sdb /dev/sdb: UUID="4LyV7h-fvhq-2HsB-iumU-xL2r-3dsD-4iua9e" TYPE="LVM2_member" ****now, it'll be successful every other time**** root@localhost ~ ## virsh pool-build lpool --overwrite error: Failed to build pool lpool error: internal error: Child process (/usr/sbin/pvcreate /dev/sdb) unexpected exit status 5: Can't initialize physical volume "/dev/sdb" of volume group "logical" without -ff root@localhost ~ ## virsh pool-build lpool --overwrite Pool lpool built root@localhost ~ ## virsh pool-build lpool --overwrite error: Failed to build pool lpool error: internal error: Child process (/usr/sbin/pvcreate /dev/sdb) unexpected exit status 5: Can't initialize physical volume "/dev/sdb" of volume group "logical" without -ff root@localhost ~ ## virsh pool-build lpool --overwrite Pool lpool built root@localhost ~ ## virsh pool-build lpool --overwrite error: Failed to build pool lpool error: internal error: Child process (/usr/sbin/pvcreate /dev/sdb) unexpected exit status 5: Can't initialize physical volume "/dev/sdb" of volume group "logical" without -ff *** And when it's failed, the vg will be removed ! ## vgdisplay | grep logical <=== nothing here ## blkid /dev/sdb <=== nothing here ## virsh pool-start lpool error: Failed to start pool lpool error: unsupported configuration: cannot find logical volume group name 'logical' I believe the fix to bz 1430679 will handle this case as well. It seems you tested using 3.1 and that fix will be in 3.2. The LVM2_member header is also larger than 512 bytes that were being cleared prior to 1430679 being fixed. FYI: By setting qa_ack to ?, that set off the chain of events by the ack bot (In reply to John Ferlan from comment #11) > I believe the fix to bz 1430679 will handle this case as well. It seems you > tested using 3.1 and that fix will be in 3.2. The LVM2_member header is also > larger than 512 bytes that were being cleared prior to 1430679 being fixed. > > FYI: By setting qa_ack to ?, that set off the chain of events by the ack bot Hi John, comment 9 is based on latest downstream libvirt-3.2.0-1.el7.x86_64. And it's also reproduced on upstream libvirt-3.2.0-1.fc24.x86_64. pls have a double check. thx I rebuilt using the libvirt-3.2.0 tag, but couldn't reproduce the every other time in my environment. I wonder if the environment you have has larger sector sizes than 512 bytes for the device. In which case, I know what the problem would be... The added code only clears 2048 (4 * 512) bytes; however, if a 4096 byte sector sizes, then obviously there won't nearly be enough cleared. This is something that is being discussed in relation to patches posted upstram for bz 1439132 (see patch 3/3 discussion): https://www.redhat.com/archives/libvir-list/2017-April/msg00402.html and has led to a subsequent posting: https://www.redhat.com/archives/libvir-list/2017-April/msg00461.html which would also alter code used by the logical backend. I logged in a little while ago and it seems someone may be using /dev/sdd for a different test (LUKS) - I don't want to disrupt that. In any case, using a similar process on my system - I was able to reproduce what you saw, but not reliably. # fdisk -l /dev/sdk Disk /dev/sdk: 1 GiB, 1073741824 bytes, 2097152 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes # lsscsi | grep sdk [7:0:0:6] disk IET VIRTUAL-DISK 0001 /dev/sdk # mkfs -t ext4 /dev/sdk mke2fs 1.43.3 (04-Sep-2016) /dev/sdk contains a LVM2_member file system Proceed anyway? (y,n) y Creating filesystem with 262144 4k blocks and 65536 inodes Filesystem UUID: 9e899056-de2d-42bc-8a5e-2611d5075bef Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done # parted /dev/sdk p Model: IET VIRTUAL-DISK (scsi) Disk /dev/sdk: 1074MB Sector size (logical/physical): 512B/4096B Partition Table: loop Disk Flags: Number Start End Size File system Flags 1 0.00B 1074MB 1074MB ext4 # virsh pool-define logical.xml Pool logical defined from logical.xml # virsh pool-build logical error: Failed to build pool logical error: Storage pool already built: Format of device '/dev/sdk' does not match the expected format 'LVM2_member', forced overwrite is necessary # virsh pool-build logical --overwrite Pool logical built # virsh pool-build logical --overwrite error: Failed to build pool logical error: internal error: Child process (/usr/sbin/pvcreate /dev/sdk) unexpected exit status 5: Can't initialize physical volume "/dev/sdk" of volume group "logical" without -ff # virsh pool-build logical --overwrite Pool logical built # virsh pool-build logical --overwrite Pool logical built # virsh pool-undefine logical Pool logical has been undefined ... If I went back and repeated those same steps, I didn't get the error. I even tried changing things up a bit and using xfs (mkfs -t xfs -f /dev/sdk) and gpt (parted /dev/sdk mklabel gpt --script). Makes me begin to wonder if there's some sort of "algorithm" that "delays" things a bit not expecting someone to run a format "so quickly" multiple times in a row. I wouldn't know for 100% certainty without running it on your test system whether the fix for bz 1439132 to clear both ends of the device would resolve the issue for the specific device type you have. I suppose one way to "verify" that would be to place a depends on for this bz on that bz. At least that way we can be sure this gets tested again if/when the other one is pushed. (In reply to John Ferlan from comment #15) > I logged in a little while ago and it seems someone may be using /dev/sdd > for a different test (LUKS) - I don't want to disrupt that. > > In any case, using a similar process on my system - I was able to reproduce > what you saw, but not reliably. > > # fdisk -l /dev/sdk > Disk /dev/sdk: 1 GiB, 1073741824 bytes, 2097152 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > # lsscsi | grep sdk > [7:0:0:6] disk IET VIRTUAL-DISK 0001 /dev/sdk > # mkfs -t ext4 /dev/sdk > mke2fs 1.43.3 (04-Sep-2016) > /dev/sdk contains a LVM2_member file system > Proceed anyway? (y,n) y > Creating filesystem with 262144 4k blocks and 65536 inodes > Filesystem UUID: 9e899056-de2d-42bc-8a5e-2611d5075bef > Superblock backups stored on blocks: > 32768, 98304, 163840, 229376 > > Allocating group tables: done > Writing inode tables: done > Creating journal (8192 blocks): done > Writing superblocks and filesystem accounting information: done > > # parted /dev/sdk p > Model: IET VIRTUAL-DISK (scsi) > Disk /dev/sdk: 1074MB > Sector size (logical/physical): 512B/4096B > Partition Table: loop > Disk Flags: > > Number Start End Size File system Flags > 1 0.00B 1074MB 1074MB ext4 > > # virsh pool-define logical.xml > Pool logical defined from logical.xml > > # virsh pool-build logical > error: Failed to build pool logical > error: Storage pool already built: Format of device '/dev/sdk' does not > match the expected format 'LVM2_member', forced overwrite is necessary > > # virsh pool-build logical --overwrite > Pool logical built > > # virsh pool-build logical --overwrite > error: Failed to build pool logical > error: internal error: Child process (/usr/sbin/pvcreate /dev/sdk) > unexpected exit status 5: Can't initialize physical volume "/dev/sdk" of > volume group "logical" without -ff > > > # virsh pool-build logical --overwrite > Pool logical built > > # virsh pool-build logical --overwrite > Pool logical built > > # virsh pool-undefine logical > Pool logical has been undefined > > ... > > If I went back and repeated those same steps, I didn't get the error. I > even tried changing things up a bit and using xfs (mkfs -t xfs -f /dev/sdk) > and gpt (parted /dev/sdk mklabel gpt --script). Makes me begin to wonder if > there's some sort of "algorithm" that "delays" things a bit not expecting > someone to run a format "so quickly" multiple times in a row. > > I wouldn't know for 100% certainty without running it on your test system > whether the fix for bz 1439132 to clear both ends of the device would > resolve the issue for the specific device type you have. > > I suppose one way to "verify" that would be to place a depends on for this > bz on that bz. At least that way we can be sure this gets tested again > if/when the other one is pushed. Thx John, I'll make this one depending on 1439132. BTW, that machine is only used by me, you can use it anytime you like, no jobs in hurry carried on in that machine :) Verified version: libvirt-3.2.0-9.el7.x86_64 qemu-kvm-rhev-2.9.0-9.el7.x86_64 Steps : 1. Define a logical pool as following # virsh pool-dumpxml lpool <pool type='logical'> <name>lpool</name> <uuid>99bfb2be-fbdd-4af2-9b1f-e92003e138e3</uuid> <source> <device path='/dev/sdc'/> <name>logical</name> <format type='lvm2'/> </source> <target> <path>/dev/logical</path> </target> </pool> Check device sdc : # blkid /dev/sdc /dev/sdc: UUID="9ab1817a-ac53-43c1-b624-00d986b140ce" TYPE="ext4" 2. Build this logical pool # virsh pool-build lpool error: Failed to build pool lpool error: Storage pool already built: Format of device '/dev/sdc' does not match the expected format 'LVM2_member', forced overwrite is necessary # blkid /dev/sdc /dev/sdc: UUID="9ab1817a-ac53-43c1-b624-00d986b140ce" TYPE="ext4" Build with --overwrite: # virsh pool-build lpool --overwrite Pool lpool built # vgs VG #PV #LV #SN Attr VSize VFree logical 1 0 0 wz--n- 30.00g 30.00g build again without --overwrite: # virsh pool-build lpool error: Failed to build pool lpool error: Storage pool already built: Device '/dev/sdc' already formatted using 'LVM2_member' build with --overwrite : # virsh pool-build lpool --overwrite Pool lpool built # vgs VG #PV #LV #SN Attr VSize VFree logical 1 0 0 wz--n- 30.00g 30.00g # virsh pool-build lpool --overwrite Pool lpool built # vgs VG #PV #LV #SN Attr VSize VFree logical 1 0 0 wz--n- 30.00g 30.00g # virsh pool-build lpool --overwrite Pool lpool built # virsh pool-build lpool --overwrite Pool lpool built Start pool : # virsh pool-start lpool Pool lpool started Now, it will build successfully every time. move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |