Bug 2172574

Summary: Networking receive/transmit_bytes metrics values are swapped
Product: Red Hat Enterprise Linux 9 Reporter: Michal Privoznik <mprivozn>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: yalzhang <yalzhang>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: akalenyu, chhu, dzheng, haizhao, ibezukh, jdenemar, jsuchane, jvilaca, lmen, mprivozn, ngavrilo, sradco, stirabos, virt-maint, ymankad
Version: 9.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.10.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2169168 Environment:
Last Closed: 2023-05-09 07:27:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2169168, 2172578, 2192909    

Description Michal Privoznik 2023-02-22 15:10:16 UTC
+++ This bug was initially created as a clone of Bug #2169168 +++

Description of problem:
Networking receive/transmit_bytes metrics values are swapped

Version-Release number of selected component (if applicable):
CNV 4.12.1

How reproducible:
100%

Steps to Reproduce:
1. Create VM
2. Download large image

Actual results:
transmit goes up

Expected results:
receive goes up

Additional info:
Originally reported in issue https://github.com/kubevirt/kubevirt/issues/9129
Note this is fixed in main:
https://github.com/kubevirt/kubevirt/issues/9129#issuecomment-1415844928
Since libvirt was bumped to a version that contains the fix.


[fedora@simple-vm ~]$ curl -O https://download.fedoraproject.org/pub/fedora/linux/releases/37/Cloud/x86_64/images/Fedora-Cloud-Base-37-1.7.x86_64.qcow2 -L
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
[fedora@simple-vm ~]$ ls -l Fedora-Cloud-Base-37-1.7.x86_64.qcow2 
-rw-r--r--. 1 fedora fedora 492830720 Feb 12 12:36 Fedora-Cloud-Base-37-1.7.x86_64.qcow2

Observe metrics:
kubevirt_vmi_network_transmit_bytes_total: 518623611
kubevirt_vmi_network_receive_bytes_total: 1030624



--- Additional comment from Igor Bezukh on 2023-02-16 17:31:37 CET ---

Hi Michal,

Can you please assist us with backporting https://gitlab.com/libvirt/libvirt/-/commit/0862cb3ce46253a58ca02d36b2b6a6397a60bfc7 
to the libvirt release that is in use in 4.12?

TIA
Igor

--- Additional comment from Jaroslav Suchanek on 2023-02-17 10:53:59 CET ---

(In reply to Igor Bezukh from comment #2)
> Hi,
> 
> There are plans to move to UBI 9 (RHEL 9 based) in 4.12
> But currently 4.12 consumes UBI 8.6. So we will have to ask Libvirt folks to
> do 2 backports for us - one to libvirt 8.0.0 for RHEL 8.6 and one to libvirt
> 8.5.0 for RHEL 9

Can you please elaborate more, why you need it in RHEL-9.0, which is not being used by CNV? Backporting to RHEL-8.6.0 should be fine. RHEL-9.2 already contain the requested fix.

Thanks.

--- Additional comment from Igor Bezukh on 2023-02-21 10:49:31 CET ---

Hi Jaroslav,

Let me correct myself - we will need the backport for RHEL-8.6.0, we don't plan to switch our base containers to be RHEL-9 based in OCPV 4.12

Is there any action from our side regarding whats need to be done about the backport to RHEL-8.6.0?

Comment 1 Michal Privoznik 2023-02-22 15:15:42 UTC
Merged upstream as:

commit 0862cb3ce46253a58ca02d36b2b6a6397a60bfc7
Author:     Michal Prívozník <mprivozn>
AuthorDate: Thu Nov 24 10:28:59 2022 +0100
Commit:     Michal Prívozník <mprivozn>
CommitDate: Thu Nov 24 15:51:34 2022 +0100

    conf: Make VIR_DOMAIN_NET_TYPE_ETHERNET not share 'host view'
    
    When setting up QoS for a domain <interface/>, or when reporting
    its statistics we may need to swap TX/RX values. This is all
    explained in comment to virDomainNetTypeSharesHostView().
    However, this function claims that VIR_DOMAIN_NET_TYPE_ETHERNET
    also shares the 'host view', meaning the TX/RX values must be
    swapped. But that's not true.
    
    An easy reproducer is to start a domain with two <interface/>-s:
    one type of network, the other of type ethernet and configure the
    same <bandwidth/> for both. Reversed setting can then be observed
    (e.g. via tc).
    
    Reported-by: Oleg Vasilev <oleg.vasilev>
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Jiri Denemark <jdenemar>

v8.10.0-rc1~2

Comment 3 yalzhang@redhat.com 2023-02-23 05:11:43 UTC
Reproduce the bug with libvirt-8.9.0-2.el9.x86_64
1. create a linux bridge named "br0";

2. create tap device with root user and add it as slave for the bridge:
# ip tuntap add mode tap user test group test name mytap0
# ip link set mytap0 up
# ip link set mytap0 master br0

3. switch to "test" user and start vm with ethernet type interface:
$ virsh dumpxml rhel --xpath //interface 
<interface type="ethernet">
  <mac address="52:54:00:cb:c5:f8"/>
  <target dev="mytap0" managed="no"/>
  <model type="virtio"/>
  <alias name="net0"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

4.login vm and download a big file, then check the statistics both on host and guest:
On guest:
# curl -O https://download.fedoraproject.org/pub/fedora/linux/releases/37/Cloud/x86_64/images/Fedora-Cloud-Base-37-1.7.x86_64.qcow2 -L
# ip -s l show enp1s0 
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:cb:c5:f8 brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast           
     516977202  342991      0     147       0       0 
    TX:  bytes packets errors dropped carrier collsns           
      19457383  241088      0       0       0       0

On host:
$ virsh domifstat rhel mytap0
mytap0 rx_bytes 19457547
mytap0 rx_packets 241090
mytap0 rx_errs 0
mytap0 rx_drop 0
mytap0 tx_bytes 516979678
mytap0 tx_packets 343025
mytap0 tx_errs 0
mytap0 tx_drop 0

We can find that the domifstat command shows the statistics from the host view: 
mytap0 tx_bytes 516979678 vs RX: 516977202 bytes
While the domifstat should show the statistics from the guest view, so it's reversed.

Comment 4 yalzhang@redhat.com 2023-02-23 05:34:00 UTC
Update to libvirt-9.0.0-7.el9.x86_64 and try the same scenario in comment 3:

1. create tap device and start the vm:
# ip tuntap add mode tap user test group test name mytap1
# ip link set mytap1 up
# ip link set mytap1 master br0
$ virsh dumpxml rhel --xpath //interface 
<interface type="ethernet">
  <mac address="52:54:00:cb:c5:f8"/>
  <target dev="mytap1" managed="no"/>
  <model type="virtio"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
$ virsh start rhel

2. login vm and download a file

3. check the statistics from host and guest:
# ip -s l show enp1s0 
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:cb:c5:f8 brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast           
     517319752  347819      0     383       0       0 
    TX:  bytes packets errors dropped carrier collsns           
      19697969  244505      0       0       0       0 

$ virsh domifstat rhel mytap1
mytap1 rx_bytes 517319902
mytap1 rx_packets 347821
mytap1 rx_errs 0
mytap1 rx_drop 0
mytap1 tx_bytes 19697969
mytap1 tx_packets 244505
mytap1 tx_errs 0
mytap1 tx_drop 0

The issue is fixed.

Comment 5 yalzhang@redhat.com 2023-02-23 08:47:33 UTC
Another scenario reproduce with libvirt-8.9.0-1.el9.x86_64
# virsh dumpxml rhel --xpath //interface 
<interface type="ethernet">
  <mac address="52:54:00:cb:c5:f8"/>
  <bandwidth>
    <inbound average="256" peak="200" burst="20"/>
    <outbound average="64" peak="100" burst="30"/>
  </bandwidth>
  <target dev="mytap1" managed="no"/>
  <model type="virtio"/>
  <alias name="net0"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

From the setting, the inbound should be less than 256 kilobytes/second(2 * 10^6bits/sec), 
Outbound should be less than 64 kilobytes/second(0.5 * 10^6bits/sec)

Run netperf to test the input and output bandwidth on the guest:
For guest inbound:
[guest] # netserver 
[host] # netperf -H ${guest’s_ip} -l 120
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ${guest’s_ip}() port 0 AF_INET : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  
131072  16384  16384    124.00      0.49

For guest outbound:
[host] # netserver 
[guest] # netperf -H ${host’s_ip} -l 120
# netperf -H  ${host’s_ip} -l 120
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.73.178.71 () port 0 AF_INET : histogram : interval : dirty data : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  
131072  16384  16384    120.61      0.75 

Check for inbound:
# tc class show dev mytap1
class htb 1:1 root leaf 2: prio 0 rate 512Kbit ceil 800Kbit burst 30Kb cburst 1600b

Check for outbound:
# tc filter show dev mytap1 parent ffff:
filter protocol all pref 49152 u32 chain 0 
filter protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1 
filter protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid :1 not_in_hw 
  match 00000000/00000000 at 0
 police 0x1 rate 2048Kbit burst 20Kb mtu 64Kb action drop overhead 0b 
	ref 1 bind 1 

Check the tc setting and the actual bandwidth testing, the tx and rx are reversed.

Comment 6 yalzhang@redhat.com 2023-02-23 08:56:00 UTC
upgrade libvirt to libvirt-9.0.0-7.el9.x86_64 and try the scenario in comment 5, the issue is fixed.
# virsh dumpxml rhel --xpath //interface 
<interface type="ethernet">
  <mac address="52:54:00:cb:c5:f8"/>
  <bandwidth>
    <inbound average="256" peak="200" burst="20"/>
    <outbound average="64" peak="100" burst="30"/>
  </bandwidth>
  <target dev="mytap1" managed="no"/>
  <model type="virtio"/>
  <alias name="net0"/>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>

Check guest inbound:
# tc class show dev mytap1
class htb 1:1 root leaf 2: prio 0 rate 2048Kbit ceil 1600Kbit burst 20Kb cburst 1600b

rate 2048Kbit = inbound average 256 kilobytes(256*8=2048)

Check guest outbound:
# tc filter show dev mytap1 parent ffff:
filter protocol all pref 49152 u32 chain 0 
filter protocol all pref 49152 u32 chain 0 fh 800: ht divisor 1 
filter protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid :1 not_in_hw 
  match 00000000/00000000 at 0
 police 0x1 rate 512Kbit burst 30Kb mtu 64Kb action drop overhead 0b 
	ref 1 bind 1

rate 512Kbit = outbound average 64 kilobytes(64*8=512)

Comment 11 yalzhang@redhat.com 2023-02-27 09:15:01 UTC
verified per above comments.

Comment 17 errata-xmlrpc 2023-05-09 07:27:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171