Bug 1369097

Summary: After re-sizing an instance the backing image is converted from raw to qcow2 and doesn't boot
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED NOTABUG QA Contact: Prasanth Anbalagan <panbalag>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: berrange, dasmith, eglynn, jthomas, kchamart, mbooth, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-02 08:38:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
nova.conf none

Description Jeremy 2016-08-22 13:34:52 UTC
Description of problem:
instance id=a50d59f1-a999-4d15-9030-8af49047f550
###Compute where the instance starts out
CMD "rsync --sparse --compress --dry-run /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550_resize/disk 10.161.0.73:/var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk" returned: 0 in 0.314s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:254
2016-08-12 09:39:27.984 38016 DEBUG oslo_concurrency.processutils [req-09f4b3ca-3af6-471a-8578-8e706f3a2e5d 995cad06191b44fca6271fa72e8f96ed 2c05ab7895b5476d8c390571cdeacafa - - -] Running cmd (subprocess): rsync --sparse --compress /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550_resize/disk 10.161.0.73:/var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:223

2016-08-12 11:41:38.721 38016 DEBUG nova.compute.manager [req-a4d9de28-32bc-472f-b377-f616e759ee41 995cad06191b44fca6271fa72e8f96ed 30d0f54bf4a74562b6e58d86b2665dc9 - - -] [instance: a50d59f1-a999-4d15-9030-8af49047f550] Stopping instance; current vm_state: resized, current task_state: powering-off, current DB power_state: 4, current VM power_state: 4 do_stop_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:2804

2016-08-12 14:06:27.200 38016 DEBUG oslo_concurrency.processutils [req-84fa6a1b-f783-4c88-a003-7837aa08c19d 995cad06191b44fca6271fa72e8f96ed 30d0f54bf4a74562b6e58d86b2665dc9 - - -] CMD "rm -rf /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550_resize" returned: 0 in 5.869s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:254

2016-08-12 14:06:27.756 38016 DEBUG oslo_concurrency.lockutils [req-84fa6a1b-f783-4c88-a003-7837aa08c19d 995cad06191b44fca6271fa72e8f96ed 30d0f54bf4a74562b6e58d86b2665dc9 - - -] Lock "a50d59f1-a999-4d15-9030-8af49047f550" released by "do_confirm_resize" :: held 6.889s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:456



###new compute node
2016-08-12 09:49:12.182 4840 DEBUG nova.virt.disk.api [req-09f4b3ca-3af6-471a-8578-8e706f3a2e5d 995cad06191b44fca6271fa72e8f96ed 2c05ab7895b5476d8c390571cdeacafa - - -] Checking if we can resize image /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk. size=42949672960 can_resize_image /usr/lib/python2.7/site-packages/nova/virt/disk/api.py:213
2016-08-12 09:50:18.264 4840 INFO nova.compute.manager [-] [instance: a50d59f1-a999-4d15-9030-8af49047f550] During sync_power_state the instance has a pending task (resize_finish). Skip.

###WHy is the image being converted to qcow2? This seems to cause the instance not to boot...

2016-08-12 09:49:12.375 4840 DEBUG oslo_concurrency.processutils [req-09f4b3ca-3af6-471a-8578-8e706f3a2e5d 995cad06191b44fca6271fa72e8f96ed 2c05ab7895b5476d8c390571cdeacafa - - -] Running cmd (subprocess): qemu-img convert -f raw -O qcow2 /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk_qcow execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:223


Version-Release number of selected component (if applicable):
openstack-nova-api-12.0.4-4.el7ost.noarch 
openstack-nova-compute-12.0.4-4.el7ost.noarch 

How reproducible:
100%

Steps to Reproduce:
1. resize instance, it's migrated to a new compute host and re-sized.
2. Noticed that the backing file is converted during the process.
3.

Actual results:
converted instance backing file

Expected results:
Instance backing file not converted

Additional info:
Seems like it may be similar to : https://bugzilla.redhat.com/show_bug.cgi?id=1314031

Comment 2 Jon Thomas 2016-08-26 16:53:27 UTC
I couldn't repro this on OSP8. Entirely different code path in 8. In OSP8 it bails out in compute/manager.py", line 3876, in resize_instance. 

Below is OSP7 log snippet and code. Note that they are trying to resize down and can_resize_image fails. Evidently use_cow_images was set to false and the problem still happened. I need to find an osp7 install to try to repro.


2016-08-12 09:49:12.375 4840 DEBUG nova.virt.disk.api [req-09f4b3ca-3af6-471a-8578-8e706f3a2e5d 995cad06191b44fca6271fa72e8f96ed 2c05ab7895b5476d8c390571cdeacafa - - -] Cannot resize image /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk to a smaller size. can_resize_image /usr/lib/python2.7/site-packages/nova/virt/disk/api.py:219
2016-08-12 09:49:12.375 4840 DEBUG oslo_concurrency.processutils [req-09f4b3ca-3af6-471a-8578-8e706f3a2e5d 995cad06191b44fca6271fa72e8f96ed 2c05ab7895b5476d8c390571cdeacafa - - -] Running cmd (subprocess): qemu-img convert -f raw -O qcow2 /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk /var/lib/nova/instances/a50d59f1-a999-4d15-9030-8af49047f550/disk_qcow execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:223


    """Check whether we can resize the container image file."""
    LOG.debug('Checking if we can resize image %(image)s. '
              'size=%(size)s', {'image': image, 'size': size})

    # Check that we're increasing the size
    virt_size = get_disk_size(image)
    if virt_size >= size:
        LOG.debug('Cannot resize image %s to a smaller size.',
                  image)
        return False
    return True



  def _disk_resize(self, info, size):
        """Attempts to resize a disk to size

        Attempts to resize a disk by checking the capabilities and
        preparing the format, then calling disk.api.extend.

        Note: Currently only support disk extend.
        """
        # If we have a non partitioned image that we can extend
        # then ensure we're in 'raw' format so we can extend file system.
        fmt, org = [info['type']] * 2
        pth = info['path']
        if (size and fmt == 'qcow2' and
                disk.can_resize_image(pth, size) and
                disk.is_image_extendable(pth, use_cow=True)):
            self._disk_qcow2_to_raw(pth)
            fmt = 'raw'

        if size:
            use_cow = fmt == 'qcow2'
            disk.extend(pth, size, use_cow=use_cow)

        if fmt != org:
            # back to qcow2 (no backing_file though) so that snapshot
            # will be available
            self._disk_raw_to_qcow2(pth)

    def finish_migration(self, context, migration, instance, disk_info,
                         network_info, image_meta, resize_instance,
                         block_device_info=None, power_on=True):
        LOG.debug("Starting finish_migration", instance=instance)

        # resize disks. only "disk" and "disk.local" are necessary.
        disk_info = jsonutils.loads(disk_info)
        for info in disk_info:
            size = self._disk_size_from_instance(instance, info)
            if resize_instance:
                self._disk_resize(info, size)
            if info['type'] == 'raw' and CONF.use_cow_images:
                self._disk_raw_to_qcow2(info['path'])
...

Comment 3 Jon Thomas 2016-08-26 16:55:08 UTC
Created attachment 1194417 [details]
nova.conf

Comment 4 Jon Thomas 2016-08-29 21:45:42 UTC

I'm still not able to reproduce the exact code path on OSP7. A resize down fails as in OSP8 with no conversion. However a resize up does convert from raw to cow unless you use use_cow_images=false. Using use_cow_images=false works as expected

So I think at this stage, we need to go back to the customer to verify the commands used and the config.

Comment 5 Jon Thomas 2016-09-01 17:40:49 UTC
They resized from cpt13 to cpt 21. I see that use_cow_images is set differently on both hosts.  Asked cust to set use_cow_images=false on cpt21 and retry. Waiting for feedback.

major differences from config:

nova.conf_ops-lin-vol3-cpt13

snapshot_image_format=raw
images_type=raw
use_cow_images=false

nova.conf_ops-lin-cpt-21

snapshot_image_format=raw
images_type=rbd
use_cow_images=true
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
rbd_secret_uuid = b72604f8-eb06-44f9-a16f-f90aa502d131
inject_password = false
inject_key = false
inject_partition = -2
hw_disk_discard = unmap
libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"

Comment 6 Matthew Booth 2016-09-02 08:38:30 UTC
(In reply to Jon Thomas from comment #5)
> They resized from cpt13 to cpt 21. I see that use_cow_images is set
> differently on both hosts.  Asked cust to set use_cow_images=false on cpt21
> and retry. Waiting for feedback.

I was going to ask you to check this exact thing. Thanks for beating me to it.

Unfortunately this isn't a supported or supportable configuration (yet). Maybe in a few releases time we'll be able to support this. We certainly can't address this in RHOS 7.

I'm going to close this BZ as it's a configuration issue.

> 
> major differences from config:
> 
> nova.conf_ops-lin-vol3-cpt13
> 
> snapshot_image_format=raw
> images_type=raw
> use_cow_images=false
> 
> nova.conf_ops-lin-cpt-21
> 
> snapshot_image_format=raw
> images_type=rbd
> use_cow_images=true
> images_rbd_pool = vms
> images_rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_user = cinder
> rbd_secret_uuid = b72604f8-eb06-44f9-a16f-f90aa502d131
> inject_password = false
> inject_key = false
> inject_partition = -2
> hw_disk_discard = unmap
> libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,
> VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,
> VIR_MIGRATE_TUNNELLED"

Comment 7 awaugama 2017-08-30 17:56:54 UTC
WONTFIX/NOTABUG therefore QE Won't automate