Bug 1977582

Summary: NBD+tls performance regression
Product: Red Hat Enterprise Linux 9 Reporter: zixchen
Component: qemu-kvmAssignee: Eric Blake <eblake>
qemu-kvm sub component: NBD QA Contact: Tingting Mao <timao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: coli, eblake, jinzhao, juzhang, kkiwi, rjones, timao, virt-maint
Version: 9.0Keywords: Regression, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-08 06:46:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zixchen 2021-06-30 06:47:06 UTC
Description of problem:
A NBD+tls guest booting time is over 10 mins, do performance test, the iops on rhel8.5 is about half of it on rhel8.4.

Version-Release number of selected component (if applicable):
8.5 version:
kernel-4.18.0-310.el8.x86_64
qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64

8.4 version
qemu-kvm-5.2.0-13.module+el8.4.0+10397+65cef07b.x86_64
kernel-4.18.0-305.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
The performance test is done on the same host on rhel8.5/8.4 with 1G network and SSD hard disk
1. Boot a guest on host:
...
-m 15360 \
-object memory-backend-ram,size=15360M,id=mem-machine_mem  \
-smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
-object tls-creds-x509,id=image1_access,endpoint=client,dir=/etc/pki/qemu-kvm \
-blockdev node-name=nbd_image1,driver=nbd,auto-read-only=on,discard=unmap,server.type=inet,server.host=hp-dl385g10-15.lab.eng.pek2.redhat.com,server.port=10809,tls-creds=image1_access,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=raw,read-only=off,cache.direct=on,cache.no-flush=off,file=nbd_image1 \
...

2. Login to guest and do performance testing
#  fio --filename=sda --direct=1 --bs=64K --size=2G --name=test --iodepth=4 --ioengine=libaio --rw=randrw


Actual results:
rhel8.5 RW iops is only half of it on rhle8.4.

8.5 read iops = 572 write iops=572
# fio --filename=sda --direct=1 --bs=64K --size=2G --name=test --iodepth=4 --ioengine=libaio --rw=randrw
test: (g=0): rw=randrw, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=4
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=46.6MiB/s,w=46.1MiB/s][r=745,w=737 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=7228: Tue Jun 29 22:18:44 2021
  read: IOPS=572, BW=35.8MiB/s (37.5MB/s)(1024MiB/28614msec)
    slat (usec): min=18, max=511, avg=43.31, stdev=21.98
    clat (usec): min=760, max=181541, avg=2376.29, stdev=7533.66
     lat (usec): min=816, max=181625, avg=2420.27, stdev=7533.88
    clat percentiles (usec):
     |  1.00th=[   881],  5.00th=[  1029], 10.00th=[  1106], 20.00th=[  1237],
     | 30.00th=[  1352], 40.00th=[  1483], 50.00th=[  1631], 60.00th=[  1795],
     | 70.00th=[  1958], 80.00th=[  2147], 90.00th=[  2376], 95.00th=[  2573],
     | 99.00th=[  6652], 99.50th=[ 64226], 99.90th=[112722], 99.95th=[129500],
     | 99.99th=[181404]
   bw (  KiB/s): min=17188, max=55441, per=100.00%, avg=36992.57, stdev=9226.01, samples=56
   iops        : min=  268, max=  866, avg=577.80, stdev=144.20, samples=56
  write: IOPS=572, BW=35.8MiB/s (37.5MB/s)(1024MiB/28614msec); 0 zone resets
    slat (usec): min=21, max=587, avg=5

Comment 1 Klaus Heinrich Kiwi 2021-07-01 21:14:39 UTC
Assigning to Eric, but I'll adjust the priority to 'medium' for the time being since it shouldn't be a functional / blocking issue. Still, needs to be fixed before GA.

Comment 2 Richard W.M. Jones 2021-07-02 07:32:28 UTC
Is qemu-nbd used as the server?  Is it the same version of qemu-nbd
both times, or did the server get upgraded from 8.4->8.5 as well?

Comment 3 zixchen 2021-07-02 08:21:59 UTC
(In reply to Richard W.M. Jones from comment #2)
> Is qemu-nbd used as the server?  Is it the same version of qemu-nbd
> both times, or did the server get upgraded from 8.4->8.5 as well?

Yes, the nbd server and host is the same machine, use qemu-nbd to export the images on the host:
 710737 ?        Ssl    0:00 qemu-nbd -t -p 10819 -e 3 --fork -f raw --object tls-creds-x509,id=nbd_stg_raw_server,endpoint=server,dir=/etc/pki/qemu-nbd --tls-creds nbd_stg_raw_server /home/nbd/nbd_stg.raw
 710744 ?        Ssl    0:00 qemu-nbd -t -p 10820 -e 3 --fork -f raw --object tls-creds-x509,id=nbd_stg_qcow2_server,endpoint=server,dir=/etc/pki/qemu-nbd --tls-creds nbd_stg_qcow2_server /home/nbd/nbd_stg.qcow2
 710751 ?        Ssl    0:00 qemu-nbd -t -p 10821 -e 3 --fork -f raw --object tls-creds-x509,id=nbd_stg_luks_server,endpoint=server,dir=/etc/pki/qemu-nbd --tls-creds nbd_stg_luks_server /home/nbd/nbd_stg.luks
 710758 ?        Ssl    1:58 qemu-nbd -t -e 3 --fork -f raw --object tls-creds-x509,id=nbd_system_raw_server,endpoint=server,dir=/etc/pki/qemu-nbd --tls-creds nbd_system_raw_server /home/nbd/nbd_system.raw
 710765 ?        Ssl    0:00 qemu-nbd -t -p 10822 -e 3 --fork -f raw --object tls-creds-x509,id=nbd_data_raw_server,endpoint=server,dir=/etc/pki/qemu-nbd --tls-creds nbd_data_raw_server /home/nbd/nbd_data.raw

When running performance test on rhel8.4 or rhel8.5, I reinstall the host and the guest, so not same qemu-nbd version.

Comment 6 John Ferlan 2021-09-08 21:46:46 UTC
Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 8 Klaus Heinrich Kiwi 2022-01-27 17:02:49 UTC
(In reply to Klaus Heinrich Kiwi from comment #1)
> Assigning to Eric, but I'll adjust the priority to 'medium' for the time
> being since it shouldn't be a functional / blocking issue. Still, needs to
> be fixed before GA.

The chances for us to fix this before GA are slim now. @zixchen can you try to reproduce this with the latest versions, just so that we know if this could be a problem or not? We are not aware of Layered Products using 9.0 GA, so it's down to how much this could affect end-users realistically.

Comment 9 zixchen 2022-01-28 02:44:40 UTC
(In reply to Klaus Heinrich Kiwi from comment #8)
> (In reply to Klaus Heinrich Kiwi from comment #1)
> > Assigning to Eric, but I'll adjust the priority to 'medium' for the time
> > being since it shouldn't be a functional / blocking issue. Still, needs to
> > be fixed before GA.
> 
> The chances for us to fix this before GA are slim now. @zixchen
> can you try to reproduce this with the latest versions, just so that we know
> if this could be a problem or not? We are not aware of Layered Products
> using 9.0 GA, so it's down to how much this could affect end-users
> realistically.

Tingting handles NBD now, I'll move the needinfo to her, thanks.

Comment 10 Tingting Mao 2022-02-08 06:46:38 UTC
Tried in latest rhel9, there is no the performance issue. So I close this bug as currentrelease accordingly.


Tested with:
qemu-kvm-6.2.0-7.el9
kernel-5.14.0-54.kpq0.el9.x86_64


Steps:
1. Boot up guest from the system disk is over NBD
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 30720 \
    -object memory-backend-ram,size=30720M,id=mem-machine_mem  \
    -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Server',+kvm_pv_unhalt \
    -chardev socket,server=on,id=qmp_id_qmpmonitor1,path=/tmp/avocado_inyzqjyp/monitor-qmpmonitor1-20220207-031531-Zi3WK3kR,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,server=on,id=qmp_id_catch_monitor,path=/tmp/avocado_inyzqjyp/monitor-catch_monitor-20220207-031531-Zi3WK3kR,wait=off  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idBIssWG \
    -chardev socket,server=on,id=chardev_serial0,path=/tmp/avocado_inyzqjyp/serial-serial0-20220207-031531-Zi3WK3kR,wait=off \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20220207-031531-Zi3WK3kR,path=/tmp/avocado_inyzqjyp/seabios-20220207-031531-Zi3WK3kR,server=on,wait=off \
    -device isa-debugcon,chardev=seabioslog_id_20220207-031531-Zi3WK3kR,iobase=0x402 \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object tls-creds-x509,id=image1_access,endpoint=client,dir=/etc/pki/qemu-kvm \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=nbd_image1,driver=nbd,auto-read-only=on,discard=unmap,server.type=inet,server.host=ibm-x3650m5-09.lab.eng.pek2.redhat.com,server.port=10809,tls-creds=image1_access,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=raw,read-only=off,cache.direct=on,cache.no-flush=off,file=nbd_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:2c:8b:39:d4:f3,id=idLCvXi5,netdev=id9JZrqK,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=id9JZrqK,vhost=on \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=d,strict=off  \
    -no-shutdown \
    -enable-kvm \
    -monitor stdio \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

2. Do fio test in the guest
(guest)# fio --filename=sda --direct=1 --bs=64K --size=2G --name=test --iodepth=4 --ioengine=libaio --rw=randrw
test: (g=0): rw=randrw, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=4
fio-3.19
Starting 1 process
test: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=178MiB/s,w=174MiB/s][r=2849,w=2778 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1822: Tue Feb  8 14:11:32 2022
  read: IOPS=2231, BW=139MiB/s (146MB/s)(1024MiB/7346msec)
    slat (usec): min=5, max=130, avg=14.25, stdev= 7.26
    clat (usec): min=181, max=41673, avg=644.49, stdev=2976.26
     lat (usec): min=194, max=41684, avg=658.96, stdev=2976.17
    clat percentiles (usec):
     |  1.00th=[  227],  5.00th=[  258], 10.00th=[  281], 20.00th=[  314],
     | 30.00th=[  347], 40.00th=[  383], 50.00th=[  412], 60.00th=[  445],
     | 70.00th=[  482], 80.00th=[  529], 90.00th=[  594], 95.00th=[  652],
     | 99.00th=[  865], 99.50th=[40633], 99.90th=[41681], 99.95th=[41681],
     | 99.99th=[41681]
   bw (  KiB/s): min=95425, max=193405, per=100.00%, avg=144890.36, stdev=28095.45, samples=14
   iops        : min= 1491, max= 3021, avg=2263.36, stdev=438.90, samples=14
  write: IOPS=2229, BW=139MiB/s (146MB/s)(1024MiB/7346msec); 0 zone resets
    slat (usec): min=5, max=760, avg=15.59, stdev= 9.38
    clat (usec): min=202, max=41912, avg=1115.29, stdev=5119.20
     lat (usec): min=227, max=41941, avg=1131.12, stdev=5119.18
    clat percentiles (usec):
     |  1.00th=[  269],  5.00th=[  310], 10.00th=[  338], 20.00th=[  371],
     | 30.00th=[  396], 40.00th=[  420], 50.00th=[  445], 60.00th=[  469],
     | 70.00th=[  498], 80.00th=[  545], 90.00th=[  627], 95.00th=[  709],
     | 99.00th=[41157], 99.50th=[41157], 99.90th=[41681], 99.95th=[41681],
     | 99.99th=[41681]
   bw (  KiB/s): min=87888, max=191361, per=100.00%, avg=144356.57, stdev=27268.90, samples=14
   iops        : min= 1373, max= 2990, avg=2255.07, stdev=426.09, samples=14
  lat (usec)   : 250=2.15%, 500=70.47%, 750=24.63%, 1000=1.32%
  lat (msec)   : 2=0.35%, 50=1.07%
  cpu          : usr=1.62%, sys=6.59%, ctx=26761, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=16390,16378,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=139MiB/s (146MB/s), 139MiB/s-139MiB/s (146MB/s-146MB/s), io=1024MiB (1074MB), run=7346-7346msec
  WRITE: bw=139MiB/s (146MB/s), 139MiB/s-139MiB/s (146MB/s-146MB/s), io=1024MiB (1073MB), run=7346-7346msec

Disk stats (read/write):
    dm-0: ios=16171/16137, merge=0/0, ticks=10393/17841, in_queue=28234, util=98.57%, aggrios=16390/16378, aggrmerge=0/0, aggrticks=10554/18065, aggrin_queue=28619, aggrutil=98.41%
  sda: ios=16390/16378, merge=0/0, ticks=10554/18065, in_queue=28619, util=98.41%


Results:
After step1, the guest boot up successfully within 1 min.
After step2, as the result above, the read IOPS is 2231, and the write IOPS is 2229, which are much higher than the results in comment0.

Comment 11 Tingting Mao 2022-03-07 08:30:20 UTC
*** Bug 1903137 has been marked as a duplicate of this bug. ***