Bug 1415849 - Using sparse_logical_volumes=true for Nova ephemeral leads to disk access errors
Summary: Using sparse_logical_volumes=true for Nova ephemeral leads to disk access errors
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Eoghan Glynn
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-23 23:26 UTC by David Hill
Modified: 2021-03-11 14:55 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-19 15:37:45 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1591084 0 None None None 2017-01-23 23:26:52 UTC

Description David Hill 2017-01-23 23:26:01 UTC
Description of problem:
Attempting to thin provision when images_type=lvm results in a broken volume chain and disk read errors in the Instances.

applicable configs from nova.conf
images_type=lvm
images_volume_group=<vgname>
volume_clear=none #this should bypass the heavy dd on a instance delete
sparse_logical_volumes=true #created disk errors
force_raw_images=false #avoids the heavy qemu convert

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Configure images_type=lvm and sparse_logical_volumes=true
2. Boot a VM using a flavor with 10GB and the RHEL 7.3 KVM guest
3.

Actual results:
Fails to boot 

Expected results:
Boots

Additional info:
This is easily reproducable and if this feature is not supported, perhaps it should be removed from OpenStack at all.

Comment 1 Justin Rackliffe 2017-01-24 13:47:15 UTC
There are some additional checks that could be done for config testing when sparse=true.

One key issue I found was in the OSP9 RHEV install lvm.conf had thin_pool_autoextend_threshold set to 100 so the initial sized thinpool lv would max out, but the qemu-img convert call would still return a 0.

I also found that on delete the thinlv would be removed (eg uuid_disk), but the autogenerated thinpool (datalv and metalv) would be left so you would collect the lvol### instances within lvm and I found no garbage collection routine.

The autoextend (I tested w 70 w a 20% increase) seemingly is not quick enough to match either qemu-img or a dd sourced data addition routine.  I would see a variety of sizes in LSize for the thinpool after a copy complete.  It seems like the writes are being ACK'd, but not being written back correctly.

As a note using lvm sparse or thick vols I don't see an alternative to the raw conversion so that is an expected overhead for using lvm.  I do find it a bit curious why lvm snaps of a thick or thin hosted _base image don't seem to be used when cow is true.

Comment 2 Matthew Booth 2017-01-27 13:58:33 UTC
For reference: not reproducible on devstack. Will try with packstack RHOS 9 and LVM.

In case I can't reproduce, could you please share a sosreport from an affected machine with a failure in the logs?


Note You need to log in before you can comment on or make changes to this bug.