Bug 2009935

Summary:

virtio-vsock: Uperf fails to exchange goodbyes with client when many threads of size 1 are used

Product:

Red Hat Enterprise Linux 9

Reporter:

Lukáš Doktor <ldoktor>

Component:

kernel

Assignee:

Stefano Garzarella <sgarzare>

kernel sub component:

KVM

QA Contact:

Qinghua Cheng <qcheng>

Status:

CLOSED UPSTREAM

Docs Contact:

Severity:

medium

Priority:

medium

CC:

coli, jinzhao, juzhang, sgarzare, virt-maint

Version:

9.0

Keywords:

Triaged

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-11-26 15:11:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
uperf profile which is frequently failing	none

Description Lukáš Doktor 2021-10-02 06:10:34 UTC

Created attachment 1828263 [details]
uperf profile which is frequently failing

Description of problem:
When reviewing the https://github.com/uperf/uperf/pull/47 I tried using multiple threads (64) and small pkt size (1), which resulted in logged errors and a missing client-throughput information of the execution. Note the exit code is still 0.

Version-Release number of selected component (if applicable):
* RHEL-9.0.0-20210930.4
* uperf f0cbb1f (from https://github.com/uperf/uperf/pull/47 )
* kernel-5.14.0-1.5.1.el9.x86_64
* on host: qemu-kvm-core-6.0.0-13.el9_b.4.x86_64

How reproducible:
* 9/10 - RHEL8->F33
* 10/10 - RHEL9->RHEL9

Steps to Reproduce:
1. Install RHEL9 on host, fetch RHEL9 cloudinit img, inject ssh keys there
2. virt-install --import --disk '/var/lib/libvirt/images/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2' --memory '20480' --name 'virtlab722.DefaultLibvirt0' --os-variant 'rhel8.0' --vcpus '10' --serial file,path='/var/log/libvirt/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2__serial.log' --vsock cid.address=42 --dry-run --print-xml > '/var/log/libvirt/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2_.xml'
3. virsh create '/var/log/libvirt/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2_.xml
4. Fetch and deploy uperf from https://github.com/uperf/uperf/pull/47 on host and guest
5. Add the attached vsock-big.xml
6. On host run: ./uperf -s -v -S vsock -P20000
7. Inside guest run: h=2 ./uperf -v -S vsock -m ../workloads/vsock-big.xml

Actual results:
Hostname            Time       Data   Throughput   Operations      Errors
----------------------------------------------------------------------------------------------------------------------
Error exchanging goodbye's with client 
Error saying goodbye with 2
master            12.28s    11.16MB     7.62Mb/s     11700192        0.00
----------------------------------------------------------------------------------------------------------------------
Difference(%)      0.00%      0.00%        -nan%        0.00%       -nan%


Expected results:
Run Statistics
Hostname            Time       Data   Throughput   Operations      Errors
----------------------------------------------------------------------------------------------------------------------
[2] Success2                 23.59s   902.99KB   313.60Kb/s       925742        0.00
master            12.22s    11.31MB     7.77Mb/s     11859680        0.00
----------------------------------------------------------------------------------------------------------------------
Difference(%)    -93.10%     92.20%       95.96%       92.19%       0.00%


Additional info:
* The reproducibility rates were tested using 96s duration but even with 10s it's failing regularly. With 1 or 2s duration it didn't fail for me in 5 iterations.
* Similarly using greater size (16) resulted in no errors
* I noticed that the throughput is slowing down during the execution. Executing it with '-R -a -i 1' uncovered periods with no transactions, attaching the output. Maybe the exchange issue is just a filled buffer waiting for the other side to be read-out.

Comment 3 Stefano Garzarella 2021-10-05 09:01:01 UTC

It could be an issue in the socket (AF_VSOCK) shutdown protocol implementation.
I'll investigate.

Moved to kernel/KVM component.

Comment 4 Qinghua Cheng 2021-10-19 08:17:17 UTC

Reproduced on rhel 9 by using vsock-big.xml workload

Host:
kernel: 5.14.0-5.el9.x86_64
qemu-kvm: qemu-kvm-6.1.0-3.el9.x86_64

Guest: 
kernel: 5.14.0-6.el9.x86_64
qemu-kvm: qemu-kvm-6.1.0-3.el9.x86_64

uperf: https://github.com/uperf/uperf

On host run: ./uperf -s -v -S vsock -P20000
Inside guest run: h=2 ./uperf -v -S vsock -m ../workloads/vsock-big.xml


Run Statistics
Hostname            Time       Data   Throughput   Operations      Errors
-------------------------------------------------------------------------------------------------------------------------------
Error exchanging goodbye's with client 
Error saying goodbye with 2
master            98.33s    40.37MB     3.44Mb/s     42336578        0.00
-------------------------------------------------------------------------------------------------------------------------------
Difference(%)      0.00%      0.00%        -nan%        0.00%       -nan%

Comment 5 Stefano Garzarella 2021-10-21 15:24:52 UTC

Thanks Qinghua for providing an environment where to replicate.

After several checks it seems to be a timeout fired in uperf.

Basically the vsock-big.xml profile creates many threads with a lot of traffic. After 96 seconds the master starts the disconnection phase stopping local threads and sending a request to get statistics from the slave.

It waits for 15 seconds (timeout wired in the uperf code) and then fails if the slave doesn't respond.
In some cases it happens that there are still a lot of packets queued for transmission, so the request to send statistics at the slave is processed late, after the timeout.

Increasing the UPERF_GOODBYE_TIMEOUT in uperf code to 30 sec helps, but in order to avoid code changes, we can put the following lines just before "disconnect" in the profile to wait a bit:

<transaction iterations="0">
    <flowop type="think" options="duration=10s idle"/>
</transaction>

I'm still not sure if the general problem is in uperf or if the high traffic in vsock generates starvation in the control socket, so the goodbay message is delayed too much. I will investigate in the future maybe comparing with a TCP socket, but I think we can close this BZ and eventually open a new one with a specific test case.

Comment 6 Stefano Garzarella 2021-11-26 15:11:17 UTC

Analyzing the code and doing some tests there is indeed a fairness problem in vsock.
The current implementation uses a single list, so if a socket queues a lot of packets, another socket can observe a long delay.

The solution is not easy and since we have a workaround, I'm closing this BZ, but I opened an issue[1] in the upstream project to keep track of this problem and solve it when we have time.

[1] https://gitlab.com/vsock/vsock/-/issues/1