Bug 1415849

Summary:	Using sparse_logical_volumes=true for Nova ephemeral leads to disk access errors
Product:	Red Hat OpenStack	Reporter:	David Hill <dhill>
Component:	openstack-nova	Assignee:	Eoghan Glynn <eglynn>
Status:	CLOSED WONTFIX	QA Contact:	Prasanth Anbalagan <panbalag>
Severity:	low	Docs Contact:
Priority:	low
Version:	9.0 (Mitaka)	CC:	berrange, dasmith, dhill, eglynn, justin.rackliffe, kchamart, mbooth, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-01-19 15:37:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description David Hill 2017-01-23 23:26:01 UTC

Description of problem:
Attempting to thin provision when images_type=lvm results in a broken volume chain and disk read errors in the Instances.

applicable configs from nova.conf
images_type=lvm
images_volume_group=<vgname>
volume_clear=none #this should bypass the heavy dd on a instance delete
sparse_logical_volumes=true #created disk errors
force_raw_images=false #avoids the heavy qemu convert

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Configure images_type=lvm and sparse_logical_volumes=true
2. Boot a VM using a flavor with 10GB and the RHEL 7.3 KVM guest
3.

Actual results:
Fails to boot 

Expected results:
Boots

Additional info:
This is easily reproducable and if this feature is not supported, perhaps it should be removed from OpenStack at all.

Comment 1 Justin Rackliffe 2017-01-24 13:47:15 UTC

There are some additional checks that could be done for config testing when sparse=true.

One key issue I found was in the OSP9 RHEV install lvm.conf had thin_pool_autoextend_threshold set to 100 so the initial sized thinpool lv would max out, but the qemu-img convert call would still return a 0.

I also found that on delete the thinlv would be removed (eg uuid_disk), but the autogenerated thinpool (datalv and metalv) would be left so you would collect the lvol### instances within lvm and I found no garbage collection routine.

The autoextend (I tested w 70 w a 20% increase) seemingly is not quick enough to match either qemu-img or a dd sourced data addition routine.  I would see a variety of sizes in LSize for the thinpool after a copy complete.  It seems like the writes are being ACK'd, but not being written back correctly.

As a note using lvm sparse or thick vols I don't see an alternative to the raw conversion so that is an expected overhead for using lvm.  I do find it a bit curious why lvm snaps of a thick or thin hosted _base image don't seem to be used when cow is true.

Comment 2 Matthew Booth 2017-01-27 13:58:33 UTC

For reference: not reproducible on devstack. Will try with packstack RHOS 9 and LVM.

In case I can't reproduce, could you please share a sosreport from an affected machine with a failure in the logs?