Bug 1683175

Summary: Don't emit a log error message if ovs-vsctl doesn't support a particular statistic type on a vif
Product: Red Hat Enterprise Linux 7 Reporter: Aviv Guetta <aguetta>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Jing Qi <jinqi>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.5CC: akaris, berrange, chhu, dyuan, jdenemar, jkalliya, jsuchane, laine, lmen, xuzhang, yocha
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-16.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:14:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aviv Guetta 2019-02-26 11:59:23 UTC
Description of problem:
We see the following error recurring in messages.log:

Jan 30 13:50:26 hostname ovs-vsctl[33518]: ovs|00001|db_ctl_base|ERR|no key "tx_errors" in Interface record "vhu6111cccc-e9" column statistics
Jan 30 13:50:26 hostname libvirtd[5829]: 2019-01-30 12:50:26.318+0000: 5849: error : virCommandWait:2601 : internal error: Child process (ovs-vsctl --timeout=5 get Interface vhu6111cccc-e9 statistics:tx_errors) unexpected exit status 1: ovs-vsctl: no key "tx_errors" in Interface record "vhu6111cccc-e9" column statistics

It looks like a regression after this issue was fixed (RHEL 7.4) in the following Bugzilla:

  https://bugzilla.redhat.com/1461270
(it has a duplicate which is closer to the actual bug: https://bugzilla.redhat.com/show_bug.cgi?id=1461270)

sosreports are available and will be shared in a private comment.


Version-Release number of selected component (if applicable):

Red Hat OpenStack Platform 10
Red Hat Enterprise Linux Server release 7.5
kernel-3.10.0-862.6.3.el7.x86_64
libvirt-3.9.0-14.el7_5.6.x86_64

Comment 2 Daniel Berrangé 2019-02-26 12:30:39 UTC
What, if any, real functional impact are you seeing ?

The previous bug 1461270 was a problem where libvirt would ask  ovs-vsctl for all statistics at once & report a fatal error if any are missing. This was a problem as it meant no stats would be reported by libvirt to the calling application.

The patch for that bug changed libvirt to ask ovs-vsctl for each stat in turn, and if any of them generate an error, it will ignore that stat & carry on with the next stat.

When this happens a log messge will appear still as shown, but the other stats should still be reported back to the application.

IOW, the quoted log messages are harmless & I don't think there's a real regression here.

Comment 3 Aviv Guetta 2019-02-26 13:15:36 UTC
You're right regarding the impact (this is why this BZ severity is medium) as currently, it's just flooding the log, as such, it can't be ignored.

Comment 5 Daniel Berrangé 2019-02-26 13:20:37 UTC
Libvirt could probably be improved to avoid the log message it produces (the libvirtd one).  

The log message produced by ovs-vsctl is outside libvirt's control though. Probably ovs-vsctl needs a separate bug to request that it does not emit this log message at all, or at least only log it if "debug" type messages are requested.

Comment 8 Jing Qi 2019-03-06 03:06:19 UTC
Reproduced in rhos13 & rhel7.6 env -

2019-02-19 09:32:28.826+0000: 1021156: info : libvirt version: 4.5.0, package: 10.el7_6.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-11-08-09:31:37, x86-020.build.eng.bos.redhat.com)
2019-02-19 09:32:28.826+0000: 1021156: info : hostname: overcloud-computeovsdpdk-0
2019-02-19 09:32:28.826+0000: 1021156: error : virCommandWait:2600 : internal error: Child process (ovs-vsctl --timeout=5 get Interface vhu992a8944-00 statistics:tx_errors) unexpected exit status 1: ovs-vsctl: no key "tx_errors" in Interface record "vhu992a8944-00" column statistics

Comment 9 Jijesh Kalliyat 2019-03-26 17:54:07 UTC
Daniel, 

FYI, there is a comment Aaron Conole in https://bugzilla.redhat.com/show_bug.cgi?id=1684061#c4 that ovs-vsctl already can be told not to emit the message "ovs-vsctl: ovs|00001|db_ctl_base|ERR|no key "tx_errors" in Interface record "vhud1a9329f-78" column statistics" by using --if-exists flag. 

As pet the logs, the command issued by libvirt is "ovs-vsctl --timeout=5 get Interface vhu6111cccc-e9 statistics:tx_errors"
Simply add the --if-exists flag.
ovs-vsctl --timeout=5 --if-exists get Interface vhu6111cccc-e9 statistics:tx_errors

This will not log an error re: missing keys if the key isn't found.

Daniel/Jing, do we know in what situation libvirt poll the vhu statistics information? How we can modify the libvirt to include "--if-exists" option?

Regards,
Jijesh Kalliyat

Comment 10 Laine Stump 2019-03-27 19:08:47 UTC
I posted a patch upstream:

https://www.redhat.com/archives/libvir-list/2019-March/msg01897.html

(BTW, the error would occur whenever someone called virDomainGetInterfaceStats() (e.g., when "virsh domifstat guest ifname" is run). Presumably some part of OpenStack is calling that periodically.)

Comment 12 Laine Stump 2019-03-28 15:22:04 UTC
The patch was pushed upstream. It will be included in libvirt-5.2.0.

commit 3f7cba3f5ea1731b9028b89b671cbd7e7d5e0421 (HEAD -> master, origin/master, origin/HEAD)
Author: Laine Stump <laine>
Date:   Wed Mar 27 14:58:45 2019 -0400

    util: suppress unimportant ovs-vsctl errors when getting interface stats

Comment 19 Jing Qi 2019-06-17 07:52:41 UTC
Verified with 
libvirt-4.5.0-22.virtcov.el7.x86_64
qemu-kvm-rhev-2.12.0-32.el7.x86_64

Start two guests with one vhostuser interface each
1. <interface type='vhostuser'>
      <mac address='52:54:00:c6:3b:96'/>
      <source type='unix' path='/var/run/openvswitch/vhost-user2' mode='client'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>

2.  <interface type='vhostuser'>
      <mac address='52:54:12:c6:3b:93'/>
      <source type='unix' path='/var/run/openvswitch/vhost-user1' mode='client'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface>

the two guests can ping each other


# virsh domifstat 13 vhost-user2
vhost-user2 rx_bytes 15282
vhost-user2 rx_packets 111
vhost-user2 rx_drop 0
vhost-user2 tx_bytes 13642
vhost-user2 tx_packets 105
vhost-user2 tx_errs 0
vhost-user2 tx_drop 0

# virsh domifstat 14 vhost-user1
vhost-user1 rx_bytes 13642
vhost-user1 rx_packets 105
vhost-user1 rx_drop 0
vhost-user1 tx_bytes 15282
vhost-user1 tx_packets 111
vhost-user1 tx_errs 0
vhost-user1 tx_drop 0


# ovs-vsctl get Interface vhost-user2 statistics
{"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0, "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=36, "rx_256_to_511_packets"=22, "rx_512_to_1023_packets"=0, "rx_65_to_127_packets"=47, rx_bytes=13642, rx_dropped=0, rx_errors=0, rx_packets=105, tx_bytes=15282, tx_dropped=0, tx_packets=111}
# ovs-vsctl get Interface vhost-user1 statistics
{"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0, "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=30, "rx_256_to_511_packets"=25, "rx_512_to_1023_packets"=0, "rx_65_to_127_packets"=56, rx_bytes=15282, rx_dropped=0, rx_errors=0, rx_packets=111, tx_bytes=13642, tx_dropped=0, tx_packets=105}

[# ovs-vsctl get Interface vhost-user1 statistics:tx_errors
ovs-vsctl: no key "tx_errors" in Interface record "vhost-user1" column statistics

# ovs-vsctl --if-exists get Interface vhost-user1 statistics:tx_errors

#

There is no return from the command and no record in the log file.

Comment 23 Jing Qi 2019-07-18 08:32:24 UTC
The bug has been fixed in rhel7.7 (comment 19  libvirt-4.5.0-22.virtcov.el7.x86_64) & libvirt-5.2.0 (from comment 12).

Comment 25 errata-xmlrpc 2019-08-06 13:14:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294