Bug 1721434

Summary: virsh command delay issue on DPDK node
Product: Red Hat Enterprise Linux 8 Reporter: Meiyan Zheng <mzheng>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Luyao Huang <lhuang>
Severity: high Docs Contact:
Priority: urgent    
Version: 8.0CC: apevec, bdobreli, cchen, cfields, cfontain, chhu, chrisw, dyuan, dzheng, fbaudin, fhallal, jdenemar, jlibosva, jraju, jsuchane, knoha, ktraynor, lhuang, maxime.coquelin, mkim, mprivozn, rbalakri, rhodain, rhos-maint, xuzhang, yalzhang
Target Milestone: rcKeywords: Automation
Target Release: 8.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-4.5.0-31.el8 Doc Type: Bug Fix
Doc Text:
Cause: When creating a child process (e.g. to run another binary), libvirt closes filedescriptors right after the child process is created and keep only handful of them open. This is to enhance security, because we can't blindly trust that binary will not misuse any of the passed FDs. So far, this so called mass close of FDs was done by iterating throug every possible FD in range of 3 to 'limit of open files' and either called close() or made FD survive exec(). Problem is, only a handful of FDs need to be preserved and also only very few FDs need to be closed. The majority of FDs that we are trying to close() are not open really. While there is no technical problem, every close() is a syscall and as such adds some overhead. This is more visible if the limit for open files is way bigger than the default (1024). It that's the case, spawning a runnig a binary in a child process becomes very expensive (time-wise). Consequence: Fetching stats for an OVS interface is delayed and very annoying. Fix: Closing of FDs was changed. Now libvirt utilizes /proc filesystem to learn all FDs that are open and iterate only through them instead of the full range of possible FDs. Because the proportion between the two, this approach takes way less time and achieves the same result. I mean, like 99% of FDs within the range were never open really and thus are closed already, but we were trying to close() them anyway. Result: Fetching statistics is now as smooth as baby skin.
Story Points: ---
Clone Of:
: 1759904 (view as bug list) Environment:
Last Closed: 2019-11-05 20:50:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1759904    

Comment 17 Jaison Raju 2019-06-25 15:41:14 UTC
I am able to reproduce this issue internally to some extent. When i had 2 non-isolated cores or 4 isolated cores both gave same results.

[root@overcloud-computeovsdpdksriov-1 ~]# time virsh domstats
Domain: 'instance-00000004'
  state.state=1
  state.reason=5
  cpu.time=44862369137
  cpu.user=1400000000
  cpu.system=7430000000
  balloon.current=4194304
  balloon.maximum=4194304
  balloon.swap_in=0
  balloon.swap_out=0
  balloon.major_fault=544
  balloon.minor_fault=4175010
  balloon.unused=3564744
  balloon.available=3880392
  balloon.usable=3534000
  balloon.last-update=1561477060
  balloon.rss=51572
  vcpu.current=2
  vcpu.maximum=2
  vcpu.0.state=1
  vcpu.0.time=24750000000
  vcpu.0.wait=0
  vcpu.1.state=1
  vcpu.1.time=15360000000
  vcpu.1.wait=0
  net.count=1
  net.0.name=vhu407e8f22-fb
  net.0.rx.bytes=33432
  net.0.rx.pkts=329
  net.0.rx.drop=0
  net.0.tx.bytes=35683
  net.0.tx.pkts=391
  net.0.tx.errs=0
  net.0.tx.drop=0
  block.count=1
  block.0.name=vda
  block.0.path=/var/lib/nova/instances/0d9c7fc8-f038-4d32-b737-f18f8f560df3/disk
  block.0.rd.reqs=8418
  block.0.rd.bytes=207540736
  block.0.rd.times=11627362440
  block.0.wr.reqs=460
  block.0.wr.bytes=30191616
  block.0.wr.times=13813001989
  block.0.fl.reqs=134
  block.0.fl.times=11489690
  block.0.allocation=38666240
  block.0.capacity=10737418240
  block.0.physical=38608896


real	0m13.735s
user	0m0.013s
sys	0m0.017s

With ceilometer stopped, this reduced to 7.8sec.

[heat-admin@overcloud-computeovsdpdksriov-1 ~]$ rpm -qa | egrep "libvirt-4|glibc|openvswitch"
openvswitch-selinux-extra-policy-1.0-9.el7fdp.noarch
openvswitch-ovn-host-2.9.0-103.el7fdp.x86_64
glibc-common-2.17-260.el7_6.4.x86_64
python-openvswitch-2.9.0-103.el7fdp.x86_64
libvirt-4.5.0-10.el7_6.6.x86_64
openvswitch-2.9.0-103.el7fdp.x86_64
openstack-neutron-openvswitch-12.0.5-11.el7ost.noarch
openvswitch-ovn-central-2.9.0-103.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-103.el7fdp.x86_64
glibc-2.17-260.el7_6.4.x86_64

Comment 26 Jaroslav Suchanek 2019-06-27 07:13:01 UTC
Can we get libvirt debug logs please?

Comment 30 Maxime Coquelin 2019-06-27 07:42:13 UTC
Dave Gilbert suggests to attach to the qemu process with gdb while the issue
happens, and dump all its threads backtraces using:

gdb> thread apply all bt full

Do you think you can arrange that Jaison?

Comment 53 Michal Privoznik 2019-07-01 06:08:20 UTC
I'm wondering if (at least on Linux) libvirt can use /proc to find which FDs it holds opened and then close only those ones instead of iterating over 0..MAX. Note, for specific cases (like this one), libvirt wants to 'leak' some FDs to child. For instance, because it attaches one end of a pipe to children process' stdio. However, if a library libvirt is linking with has opened a file because some other thread is working with the library, then we definitely do not want to leak the FD into the child.

Comment 60 Michal Privoznik 2019-07-08 09:03:41 UTC
Patches posted upstream:

  https://www.redhat.com/archives/libvir-list/2019-July/msg00089.html

Here's a scratch build with those three patches applied:

  https://mprivozn.fedorapeople.org/ovs2/

With this build, the idea from comment 53 is implemented, so the workaround from comment 59 should be no longer needed.

Comment 64 Michal Privoznik 2019-07-16 12:05:22 UTC
Pushed upstream:

432faf259b virCommand: use procfs to learn opened FDs
c1a9bfbbba vircommand: Separate mass FD closing into a function
cc34260f5a test: Introduce virnetdevopenvswitchtest
c297eab525 virNetDevOpenvswitchInterfaceStats: Optimize for speed

v5.5.0-173-g432faf259b

Comment 69 Luyao Huang 2019-08-28 02:36:57 UTC
Verify this bug with libvirt-4.5.0-32.module+el8.1.0+4010+d6842f29.x86_64:

1. prepare openvswitch + dpdk env for vhost-user VIF

# ovs-vsctl show
70ad1586-5ac7-4a4f-9898-2c239eb9886b
    Bridge "ovsbr0"
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal

2. check default nofile limit of libvirtd

# prlimit -p `pidof libvirtd` |grep NOFILE
NOFILE     max number of open files                8192      8192 files

3. start a guest which have 2 vhost-user interface

# virsh dumpxml vm1

    <interface type='vhostuser'>
      <mac address='52:54:00:93:51:dd'/>
      <source type='unix' path='/var/run/openvswitch/vhost-user2' mode='client'/>
      <target dev='vhost-user2'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='52:54:00:a8:55:a1'/>
      <source type='unix' path='/var/run/openvswitch/vhost-user1' mode='client'/>
      <target dev='vhost-user1'/>
      <model type='virtio'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>

# virsh start vm1
Domain vm1 started

4. run domstats command:

# time virsh domstats
Domain: 'vm1'
...

real	0m0.065s
user	0m0.021s
sys	0m0.006s

5. check libvirtd debug log and check the ovs-vsctl command runtimes, and verify that libvirtd only use one ovs-vsctl command for each vhost-user interface:

2019-08-28 02:14:59.997+0000: 23696: debug : virCommandRunAsync:2555 : About to run ovs-vsctl --timeout=5 --if-exists --format=list --data=json --no-headings --columns=statistics list Interface vhost-user2
2019-08-28 02:14:59.999+0000: 23696: debug : virCommandRunAsync:2558 : Command result 0, with PID 23760
2019-08-28 02:15:00.013+0000: 23696: debug : virCommandRun:2406 : Result status 0, stdout: '["map",[["rx_1024_to_1522_packets",0],["rx_128_to_255_packets",0],["rx_1523_to_max_packets",0],["rx_1_to_64_packets",0],["rx_256_to_511_packets",0],["rx_512_to_1023_packets",0],["rx_65_to_127_packets",0],["rx_bytes",0],["rx_dropped",0],["rx_errors",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_packets",0]]]
' stderr: ''
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_1024_to_1522_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_128_to_255_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_1523_to_max_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_1_to_64_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_256_to_511_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_512_to_1023_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_65_to_127_packets val=0
2019-08-28 02:15:00.013+0000: 23696: debug : virCommandRunAsync:2555 : About to run ovs-vsctl --timeout=5 --if-exists --format=list --data=json --no-headings --columns=statistics list Interface vhost-user1
2019-08-28 02:15:00.014+0000: 23696: debug : virCommandRunAsync:2558 : Command result 0, with PID 23761
2019-08-28 02:15:00.026+0000: 23696: debug : virCommandRun:2406 : Result status 0, stdout: '["map",[["rx_1024_to_1522_packets",0],["rx_128_to_255_packets",0],["rx_1523_to_max_packets",0],["rx_1_to_64_packets",0],["rx_256_to_511_packets",0],["rx_512_to_1023_packets",0],["rx_65_to_127_packets",0],["rx_bytes",0],["rx_dropped",0],["rx_errors",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_packets",0]]]
' stderr: ''

6. retest 4 and 5 step when libvirtd run with a big NOFILE limit

# cat /usr/lib/systemd/system/libvirtd.service |grep LimitNOFILE
LimitNOFILE=1048576

# prlimit -p `pidof libvirtd` |grep NOFILE
NOFILE     max number of open files             1048576   1048576 files

# time virsh domstats
...

real	0m0.081s
user	0m0.023s
sys	0m0.004s


# libvirtd log

2019-08-28 02:30:19.281+0000: 24272: debug : virCommandRunAsync:2555 : About to run ovs-vsctl --timeout=5 --if-exists --format=list --data=json --no-headings --columns=statistics list Interface vhost-user2
2019-08-28 02:30:19.284+0000: 24272: debug : virCommandRunAsync:2558 : Command result 0, with PID 24351
2019-08-28 02:30:19.297+0000: 24272: debug : virCommandRun:2406 : Result status 0, stdout: '["map",[["rx_1024_to_1522_packets",0],["rx_128_to_255_packets",0],["rx_1523_to_max_packets",0],["rx_1_to_64_packets",0],["rx_256_to_511_packets",0],["rx_512_to_1023_packets",0],["rx_65_to_127_packets",0],["rx_bytes",0],["rx_dropped",0],["rx_errors",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_packets",0]]]
' stderr: ''
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_1024_to_1522_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_128_to_255_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_1523_to_max_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_1_to_64_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_256_to_511_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_512_to_1023_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virNetDevOpenvswitchInterfaceParseStats:376 : Unused ovs-vsctl stat key=rx_65_to_127_packets val=0
2019-08-28 02:30:19.297+0000: 24272: debug : virCommandRunAsync:2555 : About to run ovs-vsctl --timeout=5 --if-exists --format=list --data=json --no-headings --columns=statistics list Interface vhost-user1
2019-08-28 02:30:19.299+0000: 24272: debug : virCommandRunAsync:2558 : Command result 0, with PID 24352
2019-08-28 02:30:19.312+0000: 24272: debug : virCommandRun:2406 : Result status 0, stdout: '["map",[["rx_1024_to_1522_packets",0],["rx_128_to_255_packets",0],["rx_1523_to_max_packets",0],["rx_1_to_64_packets",0],["rx_256_to_511_packets",0],["rx_512_to_1023_packets",0],["rx_65_to_127_packets",0],["rx_bytes",0],["rx_dropped",0],["rx_errors",0],["rx_packets",0],["tx_bytes",0],["tx_dropped",0],["tx_packets",0]]]
' stderr: ''

Comment 74 errata-xmlrpc 2019-11-05 20:50:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3345

Comment 75 Dan Zheng 2021-02-10 03:41:34 UTC
Covered in automated test:
# avocado run --vt-type libvirt virtual_network.iface_options.iface_type.type_vhostuser
JOB ID     : 23e2ff8128d5fa6c8b2ab2cab0835e04c57e986d
JOB LOG    : /root/avocado/job-results/job-2020-09-22T00.37-23e2ff8/job.log
 (1/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.queue_size_check: PASS (56.37 s)
 (2/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.multi_queue.hotplug: PASS (56.44 s)
 (3/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.multi_queue.passthrough: PASS (179.81 s)
 (4/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.multi_guests.statistics_check: PASS (233.26 s)
 (5/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.multi_guests.interface_check.with_share_mem: PASS (224.36 s)
 (6/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.multi_guests.interface_check.without_share_mem: PASS (216.73 s)
 (7/7) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_type.type_vhostuser.check_performance: PASS (187.43 s)
RESULTS    : PASS 7 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 1155.91 s

# avocado run --vt-type libvirt virtual_network.iface_options.iface_driver.driver_vhost_rx_tx
JOB ID     : 3046a4772a9102f36050ee91cbf6aaef0445a09e
JOB LOG    : /root/avocado/job-results/job-2020-09-22T03.01-3046a47/job.log
 (1/1) type_specific.io-github-autotest-libvirt.virtual_network.iface_options.iface_driver.driver_vhost_rx_tx: PASS (48.04 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 49.24 s