1545330 – I/O latency of cinder volume after live migration increases

Bug 1545330 - I/O latency of cinder volume after live migration increases

Summary: I/O latency of cinder volume after live migration increases

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	zstream
Target Release:	9.0 (Mitaka)
Assignee:	Lee Yarwood
QA Contact:	awaugama
Docs Contact:
URL:
Whiteboard:
Depends On:	1463897 1482921 1545324
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-14 16:35 UTC by Lee Yarwood
Modified:	2022-07-09 10:06 UTC (History)
CC List:	23 users (show)
Fixed In Version:	openstack-nova-13.1.4-18.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1545324
Environment:
Last Closed:	2018-10-02 18:52:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1706083	None	None	None	2018-02-14 16:35:52 UTC
OpenStack gerrit	485752	None	None	None	2018-02-14 16:35:52 UTC
OpenStack gerrit	488959	None	None	None	2018-02-14 16:35:52 UTC
Red Hat Issue Tracker	OSP-4869	None	None	None	2021-12-10 15:57:08 UTC
Red Hat Product Errata	RHSA-2018:2855	None	None	None	2018-10-02 18:53:37 UTC

Description Lee Yarwood 2018-02-14 16:35:53 UTC

+++ This bug was initially created as a clone of Bug #1545324 +++

+++ This bug was initially created as a clone of Bug #1463897 +++

Description of problem:

The I/O latency of a cinder volume after live migration of an instance to which it's attached increases significantly. This stays increased till the VM is stopped and started again.[ vm is booted with cinder volume]

This is not the case when using a disk from a nova store backend [ without cinder volume] (or at least the difference isn't so significantly high after a live migration).

Ceph 2.0 is backend 

Version-Release number of selected component (if applicable):

 


How reproducible:


Steps to Reproduce:
1. create a vm with cinder volume live migrate it 
2. check using ioping 
3.

Actual results:


Expected results:


Additional info:
he I/O latency of a cinder volume after live migration of an instance to which it's attached increases significantly. This stays increased till the VM is stopped and started again.

--- Additional comment from Kashyap Chamarthy on 2017-07-29 11:21:04 EDT ---

The patch for Git master is merged.

And here's the upstream stable/newton backport, in-progress:

    https://review.openstack.org/#/c/488959/

--- Additional comment from Kashyap Chamarthy on 2017-09-18 09:37:30 EDT ---

Verification notes for this bug:

*Without* this bug fix (from openstack-nova-14.0.8-2.el7ost), when you 
migrate a Nova instance with a Cinder volume -- where both Nova
instance's disk and the Cinder volume are on Ceph -- the cache value for 
the Cinder volume (erroneously) changes from 'writeback' to 'none':

    [Check by doing `ps -ef | grep qemu`, and look for the relevant QEMU
    process associated with the Nova instance.]

    Pre-migration, QEMU command-line for the Nova instance:

        [...] -drive file=rbd:volumes/volume-[...],cache=writeback

    Post-migration, QEMU command-line for the Nova instance:

        [...] -drive file=rbd:volumes/volume-[...],cache=none

*With* the bug fix (from openstack-nova-14.0.8-2.el7ost), the cache 
value for the Cinder volume should remain 'writeback':

    Pre-migration, QEMU command-line for the Nova instance:

        [...] -drive file=rbd:volumes/volume-[...],cache=writeback

    Post-migration, QEMU command-line for the Nova instance:

        [...] -drive file=rbd:volumes/volume-[...],cache=writeback

--- Additional comment from Martin Schuppert on 2018-02-14 11:25:30 EST ---

OSP8 is also affected by this:

# rpm -q openstack-nova-compute
openstack-nova-compute-12.0.6-21.el7ost.noarch

* before migration:
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='475b69d9-9ea3-4356-ac22-762b17a875e3'/>
      </auth>
      <source protocol='rbd' name='osp8-vms/9715a493-60be-4d76-9d4c-34b37dad7366_disk'>
        <host name='192.168.122.5' port='6789'/>
        <host name='192.168.122.6' port='6789'/>
        <host name='192.168.122.7' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='475b69d9-9ea3-4356-ac22-762b17a875e3'/>
      </auth>
      <source protocol='rbd' name='osp8-volumes/volume-ce556e6c-dab1-40c2-b186-762d1f8afd4e'>
        <host name='192.168.122.5' port='6789'/>
        <host name='192.168.122.6' port='6789'/>
        <host name='192.168.122.7' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>ce556e6c-dab1-40c2-b186-762d1f8afd4e</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>


* after migration:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='475b69d9-9ea3-4356-ac22-762b17a875e3'/>
      </auth>
      <source protocol='rbd' name='osp8-vms/9715a493-60be-4d76-9d4c-34b37dad7366_disk'>
        <host name='192.168.122.5' port='6789'/>
        <host name='192.168.122.6' port='6789'/>
        <host name='192.168.122.7' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='cinder'>
        <secret type='ceph' uuid='475b69d9-9ea3-4356-ac22-762b17a875e3'/>
      </auth>
      <source protocol='rbd' name='osp8-volumes/volume-ce556e6c-dab1-40c2-b186-762d1f8afd4e'>
        <host name='192.168.122.5' port='6789'/>
        <host name='192.168.122.6' port='6789'/>
        <host name='192.168.122.7' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>ce556e6c-dab1-40c2-b186-762d1f8afd4e</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>

works with:
# diff -u driver.py.org driver.py
--- driver.py.org       2018-02-14 11:00:23.986251918 -0500
+++ driver.py   2018-02-14 11:12:07.310126939 -0500
@@ -1074,8 +1074,10 @@
         driver.disconnect_volume(connection_info, disk_dev)
 
     def _get_volume_config(self, connection_info, disk_info):
-        driver = self._get_volume_driver(connection_info)
-        return driver.get_config(connection_info, disk_info)
+        vol_driver = self._get_volume_driver(connection_info)
+        conf = vol_driver.get_config(connection_info, disk_info)
+        self._set_cache_mode(conf)
+        return conf
 
     def _get_volume_encryptor(self, connection_info, encryption):
         encryptor = encryptors.get_volume_encryptor(connection_info,
@@ -1119,7 +1121,6 @@
             instance, CONF.libvirt.virt_type, image_meta, bdm)
         self._connect_volume(connection_info, disk_info)
         conf = self._get_volume_config(connection_info, disk_info)
-        self._set_cache_mode(conf)
 
         try:
             state = guest.get_power_state(self._host)
@@ -3489,9 +3490,6 @@
             vol['connection_info'] = connection_info
             vol.save()
 
-        for d in devices:
-            self._set_cache_mode(d)
-
         if image_meta.properties.get('hw_scsi_model'):
             hw_scsi_model = image_meta.properties.hw_scsi_model
             scsi_controller = vconfig.LibvirtConfigGuestController()

Comment 2 Lon Hohberger 2018-05-22 10:36:12 UTC

According to our records, this should be resolved by openstack-nova-13.1.4-21.el7ost.  This build is available now.

Comment 7 errata-xmlrpc 2018-10-02 18:52:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2855

Note You need to log in before you can comment on or make changes to this bug.

adhingra
awaugama
berrange
dasmith
dgilbert
eglynn
geguileo
jdillama
jhakimra
jjoyce
kchamart
khan.sana
lyarwood
mbooth
mschuppe
sbauza
scohen
sferdjao
sgordon
sputhenp
srevivo
stefanha
vromanso