Bug 2009935
Summary: | virtio-vsock: Uperf fails to exchange goodbyes with client when many threads of size 1 are used | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Lukáš Doktor <ldoktor> | ||||
Component: | kernel | Assignee: | Stefano Garzarella <sgarzare> | ||||
kernel sub component: | KVM | QA Contact: | Qinghua Cheng <qcheng> | ||||
Status: | CLOSED UPSTREAM | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | coli, jinzhao, juzhang, sgarzare, virt-maint | ||||
Version: | 9.0 | Keywords: | Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-11-26 15:11:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Lukáš Doktor
2021-10-02 06:10:34 UTC
It could be an issue in the socket (AF_VSOCK) shutdown protocol implementation. I'll investigate. Moved to kernel/KVM component. Reproduced on rhel 9 by using vsock-big.xml workload Host: kernel: 5.14.0-5.el9.x86_64 qemu-kvm: qemu-kvm-6.1.0-3.el9.x86_64 Guest: kernel: 5.14.0-6.el9.x86_64 qemu-kvm: qemu-kvm-6.1.0-3.el9.x86_64 uperf: https://github.com/uperf/uperf On host run: ./uperf -s -v -S vsock -P20000 Inside guest run: h=2 ./uperf -v -S vsock -m ../workloads/vsock-big.xml Run Statistics Hostname Time Data Throughput Operations Errors ------------------------------------------------------------------------------------------------------------------------------- Error exchanging goodbye's with client Error saying goodbye with 2 master 98.33s 40.37MB 3.44Mb/s 42336578 0.00 ------------------------------------------------------------------------------------------------------------------------------- Difference(%) 0.00% 0.00% -nan% 0.00% -nan% Thanks Qinghua for providing an environment where to replicate. After several checks it seems to be a timeout fired in uperf. Basically the vsock-big.xml profile creates many threads with a lot of traffic. After 96 seconds the master starts the disconnection phase stopping local threads and sending a request to get statistics from the slave. It waits for 15 seconds (timeout wired in the uperf code) and then fails if the slave doesn't respond. In some cases it happens that there are still a lot of packets queued for transmission, so the request to send statistics at the slave is processed late, after the timeout. Increasing the UPERF_GOODBYE_TIMEOUT in uperf code to 30 sec helps, but in order to avoid code changes, we can put the following lines just before "disconnect" in the profile to wait a bit: <transaction iterations="0"> <flowop type="think" options="duration=10s idle"/> </transaction> I'm still not sure if the general problem is in uperf or if the high traffic in vsock generates starvation in the control socket, so the goodbay message is delayed too much. I will investigate in the future maybe comparing with a TCP socket, but I think we can close this BZ and eventually open a new one with a specific test case. Analyzing the code and doing some tests there is indeed a fairness problem in vsock. The current implementation uses a single list, so if a socket queues a lot of packets, another socket can observe a long delay. The solution is not easy and since we have a workaround, I'm closing this BZ, but I opened an issue[1] in the upstream project to keep track of this problem and solve it when we have time. [1] https://gitlab.com/vsock/vsock/-/issues/1 |