Bug 1415849 - Using sparse_logical_volumes=true for Nova ephemeral leads to disk access errors
Summary: Using sparse_logical_volumes=true for Nova ephemeral leads to disk access errors
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Eoghan Glynn
QA Contact: Prasanth Anbalagan
Depends On:
TreeView+ depends on / blocked
Reported: 2017-01-23 23:26 UTC by David Hill
Modified: 2021-03-11 14:55 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-01-19 15:37:45 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Launchpad 1591084 0 None None None 2017-01-23 23:26:52 UTC

Description David Hill 2017-01-23 23:26:01 UTC
Description of problem:
Attempting to thin provision when images_type=lvm results in a broken volume chain and disk read errors in the Instances.

applicable configs from nova.conf
volume_clear=none #this should bypass the heavy dd on a instance delete
sparse_logical_volumes=true #created disk errors
force_raw_images=false #avoids the heavy qemu convert

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Configure images_type=lvm and sparse_logical_volumes=true
2. Boot a VM using a flavor with 10GB and the RHEL 7.3 KVM guest

Actual results:
Fails to boot 

Expected results:

Additional info:
This is easily reproducable and if this feature is not supported, perhaps it should be removed from OpenStack at all.

Comment 1 Justin Rackliffe 2017-01-24 13:47:15 UTC
There are some additional checks that could be done for config testing when sparse=true.

One key issue I found was in the OSP9 RHEV install lvm.conf had thin_pool_autoextend_threshold set to 100 so the initial sized thinpool lv would max out, but the qemu-img convert call would still return a 0.

I also found that on delete the thinlv would be removed (eg uuid_disk), but the autogenerated thinpool (datalv and metalv) would be left so you would collect the lvol### instances within lvm and I found no garbage collection routine.

The autoextend (I tested w 70 w a 20% increase) seemingly is not quick enough to match either qemu-img or a dd sourced data addition routine.  I would see a variety of sizes in LSize for the thinpool after a copy complete.  It seems like the writes are being ACK'd, but not being written back correctly.

As a note using lvm sparse or thick vols I don't see an alternative to the raw conversion so that is an expected overhead for using lvm.  I do find it a bit curious why lvm snaps of a thick or thin hosted _base image don't seem to be used when cow is true.

Comment 2 Matthew Booth 2017-01-27 13:58:33 UTC
For reference: not reproducible on devstack. Will try with packstack RHOS 9 and LVM.

In case I can't reproduce, could you please share a sosreport from an affected machine with a failure in the logs?

Note You need to log in before you can comment on or make changes to this bug.