Bug 2101375
Summary: | passt: failed to transfer file through ipv4 from host to guest | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Quan Wenli <wquan> |
Component: | passt | Assignee: | Stefano Brivio <sbrivio> |
Status: | CLOSED ERRATA | QA Contact: | Lei Yang <leiyang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 9.1 | CC: | aadam, leiyang, lvivier, sbrivio, yalzhang |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | passt-0^20230222.g4ddbcb9-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-09 07:43:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2122788 | ||
Bug Blocks: |
Description
Quan Wenli
2022-06-27 10:20:40 UTC
Could you please collect a guest-side packet capture of the failing case by starting passt with the -p (--pcap) argument, such as "-p file_transfer.pcap"? (In reply to Stefano Brivio from comment #1) > Could you please collect a guest-side packet capture of the failing case by > starting passt with the -p (--pcap) argument, such as "-p > file_transfer.pcap"? Please check the file_transfer.pcap. I reduced the file to 2M, otherwise it's too large to upload in bugzilla. (In reply to Quan Wenli from comment #3) > (In reply to Stefano Brivio from comment #1) > > Could you please collect a guest-side packet capture of the failing case by > > starting passt with the -p (--pcap) argument, such as "-p > > file_transfer.pcap"? > > Please check the file_transfer.pcap. I reduced the file to 2M, otherwise > it's too large to upload in bugzilla. Thanks! And sorry for the delay, I finally had a chance to look at the captured traffic. In the initial part of the file transfer I couldn't spot any particular issue with sequences, acknowledgements, windows, etc. I guess there might be rather a problem toward the end of the transfer, with the connection not closing or something like that. Two requests from my side: - you can reproduce this only using the version (I guess) I installed on your test system (under /usr/local/bin) a while ago. I don't remember exactly what I was trying to debug with that. Note that the package would install binaries under /usr/bin, not /usr/local/bin. Could you try to see if this still happens with the latest RPM I published on my Copr repository? - if yes, would it be possible for you to share the last part of the capture (the last two megabytes or so), instead of the beginning of it? (In reply to Stefano Brivio from comment #4) > (In reply to Quan Wenli from comment #3) > > (In reply to Stefano Brivio from comment #1) > > > Could you please collect a guest-side packet capture of the failing case by > > > starting passt with the -p (--pcap) argument, such as "-p > > > file_transfer.pcap"? > > > > Please check the file_transfer.pcap. I reduced the file to 2M, otherwise > > it's too large to upload in bugzilla. > > Thanks! And sorry for the delay, I finally had a chance to look at the > captured traffic. > > In the initial part of the file transfer I couldn't spot any particular > issue with sequences, acknowledgements, windows, etc. I guess there might be > rather a problem toward the end of the transfer, with the connection not > closing or something like that. Two requests from my side: > > - you can reproduce this only using the version (I guess) I installed on > your test system (under /usr/local/bin) a while ago. I don't remember > exactly what I was trying to debug with that. Note that the package would > install binaries under /usr/bin, not /usr/local/bin. Could you try to see if > this still happens with the latest RPM I published on my Copr repository? > Yes, still can reproduce with latest passt-0.git.2022_07_20.9af2e5d-0.el9.x86_64 > - if yes, would it be possible for you to share the last part of the capture > (the last two megabytes or so), instead of the beginning of it? I found no problem with 1m file transfer, but sill problem with 2m file transfer. since " -p file_transfer.pcap_1 " with passt command, I am not sure how to reduce capture file. Checking the new (In reply to Quan Wenli from comment #6) > Created attachment 1902781 [details] > 2M file transfer capture Actually I see more than 4 MiB transferred there. If that's the full capture, it ends (frame #187) with an ACK segment from the guest, and nothing else. Thinking about it, I wonder if the reason is an issue in virtio-net or qemu (I couldn't really find out) I'm currently working around with the -x-txburst parameter passed for the virtio-net-pci device. Specifically, the issue I observed was that at some point packets would stop to be queued by qemu, until virtio_net is unloaded and reloaded. I haven't filed a ticket about that, yet. There's a simple check that should tell us if that's the case: when you hit this failure, you could check if the guest is now completely unable to reach the network (for example with a ping). If it is, and: rmmod virtio_net; modprobe virtio_net restores connectivity, I guess that might be the case. The workaround is not entirely robust, but I observed that with higher values of x-txburst, for example: -device virtio-net-pci,netdev=hostnet0,x-txburst=524288 failures are quite unlikely to occur. Could you please give that a try? By the way, (In reply to Quan Wenli from comment #5) > since " -p file_transfer.pcap_1 " with passt command, I am not sure how to > reduce capture file. you can do it once the capture is ready with a tool such as editcap(1). As I never remember the syntax, actually, I open the capture file with Wireshark, select a part of the packets, and save only that as a separate file (File -> Export Specified Packets). (In reply to Stefano Brivio from comment #7) > Checking the new (In reply to Quan Wenli from comment #6) > > Created attachment 1902781 [details] > > 2M file transfer capture > > Actually I see more than 4 MiB transferred there. If that's the full > capture, it ends (frame #187) with an ACK segment from the guest, and > nothing else. > > Thinking about it, I wonder if the reason is an issue in virtio-net or qemu But as "Additional info " part of comment #0, can not reproduce with upstream passt. so I think it's not qemu issue. > (I couldn't really find out) I'm currently working around with the > -x-txburst parameter passed for the virtio-net-pci device. Specifically, the > issue I observed was that at some point packets would stop to be queued by > qemu, until virtio_net is unloaded and reloaded. I haven't filed a ticket > about that, yet. > > There's a simple check that should tell us if that's the case: when you hit > this failure, you could check if the guest is now completely unable to reach > the network (for example with a ping). If it is, and: > > rmmod virtio_net; modprobe virtio_net > > restores connectivity, I guess that might be the case. The workaround is not > entirely robust, but I observed that with higher values of x-txburst, for > example: > > -device virtio-net-pci,netdev=hostnet0,x-txburst=524288 > Yes, I tried, but still can reproduce the issue. > failures are quite unlikely to occur. Could you please give that a try? > > By the way, > > (In reply to Quan Wenli from comment #5) > > since " -p file_transfer.pcap_1 " with passt command, I am not sure how to > > reduce capture file. Ok, I splited it to 4 files like file_transfer_00000_xxx.pcap_1_output > > you can do it once the capture is ready with a tool such as editcap(1). As I > never remember the syntax, actually, I open the capture file with Wireshark, > select a part of the packets, and save only that as a separate file (File -> > Export Specified Packets). Wenli, BZ 2122788 (Depends-on) has been fixed in qemu-kvm-7.2.0-1.el9 Could you re-test? (In reply to Laurent Vivier from comment #13) > Wenli, > > BZ 2122788 (Depends-on) has been fixed in qemu-kvm-7.2.0-1.el9 > > Could you re-test? I have tested it with below packages with the steps in comment 0, the issue can still be reproduced. qemu-kvm-7.2.0-7.el9.x86_64 passt-0^20221110.g4129764-1.el9.x86_64 I found the data transfer only happens in the first several seconds and then stoped. The file on the guest keeps as 99M # ll test_big.bin -rw-r--r--. 1 root root 103588817 Feb 7 19:10 test_big.bin file on host: # ll /tmp/tmp.jPM9PLbjTq -rw-r--r--. 1 test test 104857600 Feb 7 05:31 /tmp/tmp.jPM9PLbjTq It works well with twp QEMUs back-to-back, so I think the problem is really in passt. Should we move it to RHEL 9.3.0? ==> Reproduced this bug on passt-0^20221110.g4129764-1.el9.x86_64 Test Version: passt-0^20221110.g4129764-1.el9.x86_64 qemu-kvm-7.2.0-9.el9.x86_64 kernel-5.14.0-281.el9.x86_64 libvirt-9.0.0-6.el9.x86_64 edk2-ovmf-20221207gitfff6d81270b5-6.el9.noarch Test Steps: 1. Create passt pid # passt -f -t 10001 -u 10001 -P passt.pid Outbound interface (IPv4): switch Outbound interface (IPv6): switch MAC: host: ec:2a:72:30:86:32 DHCP: assign: 10.73.212.78 mask: 255.255.254.0 router: 10.73.213.254 DNS: 10.72.17.5 10.68.5.26 DNS search list: lab.eng.pek2.redhat.com NDP/DHCPv6: assign: 2620:52:0:49d4:daa3:c9ff:27c7:4e06 router: fe80::52c7:903:543b:88e1 our link-local: fe80::c3e8:6ed9:7dbc:ba55 DNS search list: lab.eng.pek2.redhat.com UNIX domain socket bound at /tmp/passt_1.socket You can now start qemu (>= 7.2, with commit 13c6be96618c): kvm ... -device virtio-net-pci,netdev=s -netdev stream,id=s,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket or qrap, for earlier qemu versions: ./qrap 5 kvm ... -net socket,fd=5 -net nic,model=virtio accepted connection from PID 3038 NDP: received NS, sending NA DHCP: ack to request from 9a:36:de:f2:81:a1 NDP: received RS, sending RA DHCPv6: received SOLICIT, sending ADVERTISE DHCPv6: received REQUEST/RENEW/CONFIRM, sending REPLY 2. Boot a guest $ PATH=$PATH:/usr/libexec $ qrap 5 qemu-kvm -m 16059 -smp 6 -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/home/test/avocado-vt-vm1_rhel920-64-virtio-scsi_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' -nodefaults -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' -m 62464 -object '{"size": 65498251264, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' -smp 28,maxcpus=28,cores=14,threads=1,dies=1,sockets=2 -cpu 'Icelake-Server',ds=on,ss=on,dtes64=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,kvm_pv_unhalt=on -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/test/rhel920-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' -device '{"driver": "virtio-net-pci", "mac": "9a:36:de:f2:81:a1", "id": "net0", "netdev": "hostnet0", "x-txburst": 16384, "bus": "pcie-root-port-3", "addr": "0x0"}' -netdev socket,fd=5,id=hostnet0 -boot menu=off,order=cdn,once=c,strict=off -vnc :0 -boot menu=off,order=cdn,once=c,strict=off -monitor stdio 3. Disable firewall on both guest and host # systemctl stop firewalld.service || service iptables stop || iptables -F || nft flush ruleset 4. Create a 100M file on host: $ dd if=/dev/urandom bs=1M count=100 > /tmp/tmp.jPM9PLbjTq 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.347883 s, 301 MB/s 5. Transfer file from host to guest on guest: # nc -l 10001 > test_big.bin on host: $ cat /tmp/tmp.jPM9PLbjTq |nc 127.0.0.1 10001 6. Wait for more than 10 mins, the nc command does not stopped. The data transfer only happens in the first several seconds and then stopped. # ll test_big.bin -rw-r--r--. 1 root root 103758237 Feb 23 08:45 test_big.bin ==>So reproduced this problem on passt-0^20221110.g4129764-1.el9.x86_64 ==>Update the passt version to the latest version: passt-0^20230222.g4ddbcb9-1.el9.x86_64 Test Version: passt-0^20230222.g4ddbcb9-1.el9.x86_64 qemu-kvm-7.2.0-9.el9.x86_64 kernel-5.14.0-281.el9.x86_64 libvirt-9.0.0-6.el9.x86_64 edk2-ovmf-20221207gitfff6d81270b5-6.el9.noarch Test Steps: 1. Update the passt version to the latest version: passt-0^20230222.g4ddbcb9-1.el9.x86_64 # yum -y install passt-0^20230222.g4ddbcb9-1.el9.x86_64.rpm # reboot (on the host) 2. After the host is power on, repeat the above test steps.The data will be transferred within seconds and nc command stopped. # ll test_big.bin -rw-r--r--. 1 root root 104857600 Feb 23 09:08 test_big.bin ==> So based on the above test result this bug has been fixed very well on passt-0^20230222.g4ddbcb9-1.el9.x86_64. Moving to modified, thanks Lei Yang for checking! Yes, we found a separate TCP stall issue which is now fixed in passt-0^20230222.g4ddbcb9-1.el9. Based on the Comment 18 test result, move to "VERIFIED". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (passt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2292 |