Bug 1476830

Summary: IO limits set to flavor are ignored
Product: Red Hat OpenStack Reporter: Siggy Sigwald <ssigwald>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.0 (Mitaka)CC: aludwar, dasmith, eglynn, jjoyce, kchamart, mbooth, mmethot, owalsh, sbauza, sgordon, sinan.polat, srevivo, ssigwald, stephenfin, vromanso
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: 9.0 (Mitaka)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-18 15:07:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
RHOS-9 reproducer (confirmation) none

Description Siggy Sigwald 2017-07-31 14:45:44 UTC
Description of problem:
flavor metadata includes quotas for disk I/O that are not honored by hypervisor.
openstack flavor show 4892675a-9c77-43af-86bf-c1fcb9dbe299
+----------------------------+--------------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                    |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                    |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                        |
| disk                       | 100                                                                                                                      |
| id                         | 4892675a-9c77-43af-86bf-c1fcb9dbe299                                                                                     |
| name                       | Limit_2VCPU_2GBRAM                                                                                                       |
| os-flavor-access:is_public | True                                                                                                                     |
| properties                 | quota:disk_io_limit='100', quota:disk_read_iops_sec='100', quota:disk_total_iops_sec='100',                              |
|                            | quota:disk_write_iops_sec='100', quota:vif_inbound_average='1024', quota:vif_outbound_average='1024'                     |
| ram                        | 2048                                                                                                                     |
| rxtx_factor                | 1.0                                                                                                                      |
| swap                       |                                                                                                                          |
| vcpus                      | 2                                                                                                                        |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------+
[root@linux-test fio-2.0.14]# fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size=512M --numjobs=8 --runtime=240 --group_reporting
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.0.14
Starting 8 processes
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
randwrite: Laying out IO file(s) (1 file(s) / 512MB)
Jobs: 1 (f=1): [__w_____] [31.0% done] [0K/4K/0K /s] [0 /1 /0  iops] [eta 10m:03s]       
randwrite: (groupid=0, jobs=8): err= 0: pid=8122: Mon Jul 10 18:00:40 2017
  write: io=1592.8MB, bw=6030.2KB/s, iops=1507 , runt=270357msec
    slat (usec): min=2 , max=42866K, avg=5111.46, stdev=292359.25
    clat (usec): min=0 , max=20251 , avg= 0.81, stdev=53.50
     lat (usec): min=2 , max=42866K, avg=5112.61, stdev=292359.32
    clat percentiles (usec):
     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
     | 30.00th=[    0], 40.00th=[    0], 50.00th=[    0], 60.00th=[    1],
     | 70.00th=[    1], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
     | 99.00th=[    2], 99.50th=[    5], 99.90th=[   18], 99.95th=[   20],
     | 99.99th=[   52]
    bw (KB/s)  : min=    0, max=45085, per=15.55%, avg=937.47, stdev=2804.24
    lat (usec) : 2=98.53%, 4=0.92%, 10=0.11%, 20=0.38%, 50=0.04%
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=0.02%, sys=0.30%, ctx=129562, majf=0, minf=216
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=407571/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=1592.8MB, aggrb=6030KB/s, minb=6030KB/s, maxb=6030KB/s, mint=270357msec, maxt=270357msec

Disk stats (read/write):
  vda: ios=1/349631, merge=0/126306, ticks=0/20672451, in_queue=20672399, util=99.98%

Comment 2 Sahid Ferdjaoui 2017-08-02 09:24:04 UTC
The sosreport does not include the Nova logs in DEBUG. In order to investigate we need to see what is the domain XML generated by Nova and passed to libvirt.

in /etc/nova/nova.conf please turn the debug to True. Restart the compute services and start an instance with the flavor related. Instead of a sosreport you could just copy/paste the domain XML generated from the nova-compute.log.

Comment 6 Sahid Ferdjaoui 2017-08-08 11:18:03 UTC
effectively we should see a 'iotune' element under the 'disk' element. Can you test to set into the flavor extra-specs integers instead-of strings?

 os-flavor-access:is_public | True                                                                                                                     |
| properties                 | quota:disk_io_limit='100', quota:disk_read_iops_sec='100', quota:disk_total_iops_sec='100',                              |
|                            | quota:disk_write_iops_sec='100', quota:vif_inbound_average='1024', quota:vif_outbound_average='1024'                     |
| ram                        | 2048    


That should look to something:

quota:disk_write_iops_sec=100, quota:vif_inbound_average=1024...

Comment 7 Sahid Ferdjaoui 2017-08-25 09:48:46 UTC
Please feel free to reopen if necessary.

Comment 10 Stephen Finucane 2017-09-08 14:31:49 UTC
I'm going to take a look into this. To start, it's worth point out that 'quota:disk_io_limit' is an invalid option for the libvirt drive, so you can drop this. As for the rest of this, could I ask you to provide the following information. I know you've submitted some of it before, but I'd like to sanity check it.

1. The commands you're using to configure the flavor
2. The output of 'openstack flavor show FLAVOR-ID' for the flavor, once configured
3. The commands you're using to boot the instance
4. The output of 'openstack server show SERVER-ID' for the instance, once booted
5. The output of 'virsh dumpxml DOMAIN' for the instance, once booted
6. The nova.conf file you're using
7. A complete sosreport with logging at DEBUG level

At present, I really think this is a silly misconfiguration issue, but I'm willing to be proven wrong :)

Comment 19 Stephen Finucane 2017-10-11 15:44:45 UTC
Dropping triaged keyword as we're no longer sure if this is a nova issue or a libvirt issue

Comment 24 Stephen Finucane 2017-11-10 00:19:46 UTC
Would it be possible to get the XML for this flavor too?

Comment 25 Siggy Sigwald 2017-11-10 00:37:53 UTC
(In reply to Stephen Finucane from comment #24)
> Would it be possible to get the XML for this flavor too?

I've never been asked to provide this before. Can you please tell us how to get that from the system?

Comment 26 Stephen Finucane 2017-11-10 05:09:13 UTC
Ah, sorry. It's step 5 in comment 10.

  5. The output of 'virsh dumpxml DOMAIN' for the instance, once booted

Comment 27 Kashyap Chamarthy 2017-11-10 14:30:53 UTC
(In reply to Stephen Finucane from comment #26)
> Ah, sorry. It's step 5 in comment 10.
> 
>   5. The output of 'virsh dumpxml DOMAIN' for the instance, once booted

Yes, along with that, also please get the complete QEMU command-line (and attach it as a text file to this bug) for that guest from comment#25.

It is located in (on the Compute node where the instance is running):

   $ /var/log/libvirt/qemu/$instance-XXXXXXXX.log

Comment 28 Kashyap Chamarthy 2017-11-10 15:48:21 UTC
(In reply to Siggy Sigwald from comment #22)

[...]

> The test was previously sent with the flavor with the configuration
> indicated:
> 
> [root@limit-io fio-2.0.14]# fio --name=randwrite --ioengine=libaio
> --iodepth=1 --rw=randwrite --bs=4k --direct=0 --size=512M --numjobs=8
> --runtime=240 --group_reporting
> randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=1

[...]

Talking to an QEMU I/O Layer expert, who's also versed with `fio` Stefan 
Hajnoczi (thank you).  Some comments.

Your goal seems to be to throttle disk read and write IOPS.

And your existing `fio` benchmark in comment#22 is using "--direct=0", 
it uses page cache, the I/O may never reach the disk!  So your existing
benchmark is stressing the VFS and page cache, _not_ the disk.

Also, with "--direct=0", there is no way to predict how QEMU I/O
throttling will perform since many requests don't even hit the disk.

Therefore, we suggest to try with: "--direct=1".

IOW, if you are trying to benchmark the disk, make sure requests are
being sent directly to the disk and not the software's page cache.

Couple of related recommendations from Stefan:

  - If you want to benchmark the disk it also helps to have a dedicated
    block device (e.g. /dev/vdb) and use fio filename=/dev/vdb to avoid
    going through a file system.

  - When using a whole block device with filename=/dev/vdb you can drop
    size=512M and let the random writes hit any part of the device.  
    This is a little fairer because most randomly chosen writes will be
    far away from each other.

    What this is hinting at is that: if you set size=8M or something too
    small then the benchmark may just be writing to the disk's write
    cache.)

  - You might also want to use 'ramp_time=30' with `fio` so that to let
    the workload run for a while before measurement begins (so that the
    system is in a steady state.

Comment 30 Matthew Booth 2017-11-30 16:35:24 UTC
I have closed this bug as it has been waiting for more info for at least 2 weeks. We only do this to ensure that we don't accumulate stale bugs which can't be addressed. If you are able to provide the requested information, please feel free to re-open this bug.

Comment 42 Kashyap Chamarthy 2018-01-12 09:23:49 UTC
Meta-comment:

I spent most of this week spending many hours trying to set up a 
reproducer environment for this.  I tried local and remote test
machines, I couldn't get a RHOS-9 running on RHEL 7.4. 

The methods I tried:

  - Upstream Mitaka (EOL) on Fedora-25 -- no luck, fails in many ways,
    requirements conflicts, etc

  - Upstream Mitaka on CentOS 7.4 -- no luck, fails miserably (even
    upstream CI job is not tended to) with conflicting requirements,
    causing you to do whack-a-mole. 

  - RHOS-9 with PackStack 'all-in-one' with essential services: also 
    fails in different ways.

    * * *

TripleO / RHOS Director-- I don't have requisite hardware to set this 
up.  So I'm in the process of getting a temporary environment from our 
fine CI folks (thanks: Migi).

Comment 43 Kashyap Chamarthy 2018-01-12 16:57:16 UTC
Created attachment 1380486 [details]
RHOS-9 reproducer (confirmation)

Seems like I managed to make my own reproducer where I don't even see the 'iotune' element in the guest XML (refer attachment).

Now to examine what's on the upstream Git/master.  And then work from there.

Comment 49 Stephen Finucane 2019-07-18 15:07:49 UTC
I'm going to close this. While we were able to reproduce the issue, this is against OSP 9, which is rapidly approaching EOL, and there haven't been any updates in over a year.