Bug 2009935

Summary: virtio-vsock: Uperf fails to exchange goodbyes with client when many threads of size 1 are used
Product: Red Hat Enterprise Linux 9 Reporter: Lukáš Doktor <ldoktor>
Component: kernelAssignee: Stefano Garzarella <sgarzare>
kernel sub component: KVM QA Contact: Qinghua Cheng <qcheng>
Status: CLOSED UPSTREAM Docs Contact:
Severity: medium    
Priority: medium CC: coli, jinzhao, juzhang, sgarzare, virt-maint
Version: 9.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-26 15:11:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
uperf profile which is frequently failing none

Description Lukáš Doktor 2021-10-02 06:10:34 UTC
Created attachment 1828263 [details]
uperf profile which is frequently failing

Description of problem:
When reviewing the https://github.com/uperf/uperf/pull/47 I tried using multiple threads (64) and small pkt size (1), which resulted in logged errors and a missing client-throughput information of the execution. Note the exit code is still 0.

Version-Release number of selected component (if applicable):
* RHEL-9.0.0-20210930.4
* uperf f0cbb1f (from https://github.com/uperf/uperf/pull/47 )
* kernel-5.14.0-1.5.1.el9.x86_64
* on host: qemu-kvm-core-6.0.0-13.el9_b.4.x86_64

How reproducible:
* 9/10 - RHEL8->F33
* 10/10 - RHEL9->RHEL9

Steps to Reproduce:
1. Install RHEL9 on host, fetch RHEL9 cloudinit img, inject ssh keys there
2. virt-install --import --disk '/var/lib/libvirt/images/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2' --memory '20480' --name 'virtlab722.DefaultLibvirt0' --os-variant 'rhel8.0' --vcpus '10' --serial file,path='/var/log/libvirt/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2__serial.log' --vsock cid.address=42 --dry-run --print-xml > '/var/log/libvirt/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2_.xml'
3. virsh create '/var/log/libvirt/RHEL-9.0.0-20210930.4-virtlab722.DefaultLibvirt0.qcow2_.xml
4. Fetch and deploy uperf from https://github.com/uperf/uperf/pull/47 on host and guest
5. Add the attached vsock-big.xml
6. On host run: ./uperf -s -v -S vsock -P20000
7. Inside guest run: h=2 ./uperf -v -S vsock -m ../workloads/vsock-big.xml

Actual results:
Hostname            Time       Data   Throughput   Operations      Errors
----------------------------------------------------------------------------------------------------------------------
Error exchanging goodbye's with client 
Error saying goodbye with 2
master            12.28s    11.16MB     7.62Mb/s     11700192        0.00
----------------------------------------------------------------------------------------------------------------------
Difference(%)      0.00%      0.00%        -nan%        0.00%       -nan%


Expected results:
Run Statistics
Hostname            Time       Data   Throughput   Operations      Errors
----------------------------------------------------------------------------------------------------------------------
[2] Success2                 23.59s   902.99KB   313.60Kb/s       925742        0.00
master            12.22s    11.31MB     7.77Mb/s     11859680        0.00
----------------------------------------------------------------------------------------------------------------------
Difference(%)    -93.10%     92.20%       95.96%       92.19%       0.00%


Additional info:
* The reproducibility rates were tested using 96s duration but even with 10s it's failing regularly. With 1 or 2s duration it didn't fail for me in 5 iterations.
* Similarly using greater size (16) resulted in no errors
* I noticed that the throughput is slowing down during the execution. Executing it with '-R -a -i 1' uncovered periods with no transactions, attaching the output. Maybe the exchange issue is just a filled buffer waiting for the other side to be read-out.

Comment 3 Stefano Garzarella 2021-10-05 09:01:01 UTC
It could be an issue in the socket (AF_VSOCK) shutdown protocol implementation.
I'll investigate.

Moved to kernel/KVM component.

Comment 4 Qinghua Cheng 2021-10-19 08:17:17 UTC
Reproduced on rhel 9 by using vsock-big.xml workload

Host:
kernel: 5.14.0-5.el9.x86_64
qemu-kvm: qemu-kvm-6.1.0-3.el9.x86_64

Guest: 
kernel: 5.14.0-6.el9.x86_64
qemu-kvm: qemu-kvm-6.1.0-3.el9.x86_64

uperf: https://github.com/uperf/uperf

On host run: ./uperf -s -v -S vsock -P20000
Inside guest run: h=2 ./uperf -v -S vsock -m ../workloads/vsock-big.xml


Run Statistics
Hostname            Time       Data   Throughput   Operations      Errors
-------------------------------------------------------------------------------------------------------------------------------
Error exchanging goodbye's with client 
Error saying goodbye with 2
master            98.33s    40.37MB     3.44Mb/s     42336578        0.00
-------------------------------------------------------------------------------------------------------------------------------
Difference(%)      0.00%      0.00%        -nan%        0.00%       -nan%

Comment 5 Stefano Garzarella 2021-10-21 15:24:52 UTC
Thanks Qinghua for providing an environment where to replicate.

After several checks it seems to be a timeout fired in uperf.

Basically the vsock-big.xml profile creates many threads with a lot of traffic. After 96 seconds the master starts the disconnection phase stopping local threads and sending a request to get statistics from the slave.

It waits for 15 seconds (timeout wired in the uperf code) and then fails if the slave doesn't respond.
In some cases it happens that there are still a lot of packets queued for transmission, so the request to send statistics at the slave is processed late, after the timeout.

Increasing the UPERF_GOODBYE_TIMEOUT in uperf code to 30 sec helps, but in order to avoid code changes, we can put the following lines just before "disconnect" in the profile to wait a bit:

<transaction iterations="0">
    <flowop type="think" options="duration=10s idle"/>
</transaction>

I'm still not sure if the general problem is in uperf or if the high traffic in vsock generates starvation in the control socket, so the goodbay message is delayed too much. I will investigate in the future maybe comparing with a TCP socket, but I think we can close this BZ and eventually open a new one with a specific test case.

Comment 6 Stefano Garzarella 2021-11-26 15:11:17 UTC
Analyzing the code and doing some tests there is indeed a fairness problem in vsock.
The current implementation uses a single list, so if a socket queues a lot of packets, another socket can observe a long delay.

The solution is not easy and since we have a workaround, I'm closing this BZ, but I opened an issue[1] in the upstream project to keep track of this problem and solve it when we have time.

[1] https://gitlab.com/vsock/vsock/-/issues/1