2101375 – passt: failed to transfer file through ipv4 from host to guest

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2101375 - passt: failed to transfer file through ipv4 from host to guest

Summary: passt: failed to transfer file through ipv4 from host to guest

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	passt
Sub Component:
Version:	9.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Stefano Brivio
QA Contact:	Lei Yang
Docs Contact:
URL:
Whiteboard:
Depends On:	2122788
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-27 10:20 UTC by Quan Wenli
Modified:	2023-05-09 08:55 UTC (History)
CC List:	5 users (show)
Fixed In Version:	passt-0^20230222.g4ddbcb9-1.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-09 07:43:36 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-126342	0	None	None	None	2022-06-27 10:37:44 UTC
Red Hat Product Errata	RHBA-2023:2292	0	None	None	None	2023-05-09 07:43:51 UTC

Description Quan Wenli 2022-06-27 10:20:40 UTC

Description of problem:

Failed to transfer file through ipv4 from host to guest with passt

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.[test@dell-per440-18 ~]$ /usr/local/bin/passt -f -t 10001 -u 10001 -P passt.pid
Outbound interface: eno1
ARP:
    address: 2c:ea:7f:71:b6:ee
DHCP:
    assign: 10.73.114.95
    mask: 255.255.254.0
    router: 10.73.115.254
DNS:
    10.73.2.107
    10.73.2.108
    10.66.127.10
DNS search list:
    lab.eng.pek2.redhat.com
NDP/DHCPv6:
    assign: 2620:52:0:4972:2eea:7fff:fe71:b6ee
    router: fe80::cee1:9402:8b35:be41
    our link-local: fe80::2eea:7fff:fe71:b6ee
DNS search list:
    lab.eng.pek2.redhat.com
UNIX domain socket bound at /tmp/passt_1.socket

You can now start qrap:
    ./qrap 5 kvm ... -net socket,fd=5 -net nic,model=virtio
or directly qemu, patched with:
    qemu/0001-net-Allow-also-UNIX-domain-sockets-to-be-used-as-net.patch
as follows:
    kvm ... -net socket,connect=/tmp/passt_1.socket -net nic,model=virtio
DHCP: ack to request
    from 52:54:00:12:34:56
NDP: received RS, sending RA
DHCPv6: received SOLICIT, sending ADVERTISE
DHCPv6: received REQUEST/RENEW/CONFIRM, sending REPLY
2. boot up guest
[test@dell-per440-18 ~]$PATH=$PATH:/usr/libexec
[test@dell-per440-18 ~]$ qrap 5 qemu-kvm -m 16059 -cpu host -smp 6 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=./rhel900-64-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0 -nographic -serial stdio -nodefaults -device virtio-net-pci,netdev=hostnet0,x-txburst=16384 -netdev socket,fd=5,id=hostnet0
3. Disable firewall on both guest and host

on guest:
[root@dell-per440-18 ~]# systemctl stop firewalld.service || service iptables stop || iptables -F || nft flush ruleset

on host:
[test@dell-per440-18 ~]$ systemctl stop firewalld.service || service iptables stop || iptables -F || nft flush ruleset

4.
on host:
[test@dell-per440-18 ~]$ dd if=/dev/urandom bs=1M count=100 > /tmp/tmp.jPM9PLbjTq
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.632863 s, 166 MB/s

on guest:
[root@dell-per440-18 ~]# nc -l 10001 > test_big.bin 

on host:
[test@dell-per440-18 ~]$ cat /tmp/tmp.jPM9PLbjTq |nc 127.0.0.1 10001 


5. wait for more than 10 mins, the nc command does not stopped. 

Actual results:


Expected results:


Additional info:

1. there is no problem with upstream "$make valgrind"

1.1 it's passed with $valgrind --max-stackframe=4194304 --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt  -f -t 10001 -u 10001 -P passt.pid

1.2. it's passed with "./passt  -f -t 10001 -u 10001 -P passt.pid"


2. problem is existed with upstream "$make"

Comment 1 Stefano Brivio 2022-07-13 06:23:52 UTC

Could you please collect a guest-side packet capture of the failing case by
starting passt with the -p (--pcap) argument, such as "-p file_transfer.pcap"?

Comment 3 Quan Wenli 2022-07-13 12:09:57 UTC

(In reply to Stefano Brivio from comment #1)
> Could you please collect a guest-side packet capture of the failing case by
> starting passt with the -p (--pcap) argument, such as "-p
> file_transfer.pcap"?

Please check the file_transfer.pcap. I reduced the file to 2M, otherwise it's too large to upload in bugzilla.

Comment 4 Stefano Brivio 2022-07-29 19:40:49 UTC

(In reply to Quan Wenli from comment #3)
> (In reply to Stefano Brivio from comment #1)
> > Could you please collect a guest-side packet capture of the failing case by
> > starting passt with the -p (--pcap) argument, such as "-p
> > file_transfer.pcap"?
> 
> Please check the file_transfer.pcap. I reduced the file to 2M, otherwise
> it's too large to upload in bugzilla.

Thanks! And sorry for the delay, I finally had a chance to look at the captured traffic.

In the initial part of the file transfer I couldn't spot any particular issue with sequences, acknowledgements, windows, etc. I guess there might be rather a problem toward the end of the transfer, with the connection not closing or something like that. Two requests from my side:

- you can reproduce this only using the version (I guess) I installed on your test system (under /usr/local/bin) a while ago. I don't remember exactly what I was trying to debug with that. Note that the package would install binaries under /usr/bin, not /usr/local/bin. Could you try to see if this still happens with the latest RPM I published on my Copr repository?

- if yes, would it be possible for you to share the last part of the capture (the last two megabytes or so), instead of the beginning of it?

Comment 5 Quan Wenli 2022-08-02 07:22:50 UTC

(In reply to Stefano Brivio from comment #4)
> (In reply to Quan Wenli from comment #3)
> > (In reply to Stefano Brivio from comment #1)
> > > Could you please collect a guest-side packet capture of the failing case by
> > > starting passt with the -p (--pcap) argument, such as "-p
> > > file_transfer.pcap"?
> > 
> > Please check the file_transfer.pcap. I reduced the file to 2M, otherwise
> > it's too large to upload in bugzilla.
> 
> Thanks! And sorry for the delay, I finally had a chance to look at the
> captured traffic.
> 
> In the initial part of the file transfer I couldn't spot any particular
> issue with sequences, acknowledgements, windows, etc. I guess there might be
> rather a problem toward the end of the transfer, with the connection not
> closing or something like that. Two requests from my side:
> 
> - you can reproduce this only using the version (I guess) I installed on
> your test system (under /usr/local/bin) a while ago. I don't remember
> exactly what I was trying to debug with that. Note that the package would
> install binaries under /usr/bin, not /usr/local/bin. Could you try to see if
> this still happens with the latest RPM I published on my Copr repository?
> 

Yes, still can reproduce with latest passt-0.git.2022_07_20.9af2e5d-0.el9.x86_64


> - if yes, would it be possible for you to share the last part of the capture
> (the last two megabytes or so), instead of the beginning of it?

I found no problem with 1m file transfer, but sill problem with 2m file transfer.

since " -p file_transfer.pcap_1 " with passt command, I am not sure how to reduce capture file.

Comment 7 Stefano Brivio 2022-08-05 17:55:51 UTC

Checking the new (In reply to Quan Wenli from comment #6)
> Created attachment 1902781 [details]
> 2M file transfer capture

Actually I see more than 4 MiB transferred there. If that's the full capture, it ends (frame #187) with an ACK segment from the guest, and nothing else.

Thinking about it, I wonder if the reason is an issue in virtio-net or qemu (I couldn't really find out) I'm currently working around with the -x-txburst parameter passed for the virtio-net-pci device. Specifically, the issue I observed was that at some point packets would stop to be queued by qemu, until virtio_net is unloaded and reloaded. I haven't filed a ticket about that, yet.

There's a simple check that should tell us if that's the case: when you hit this failure, you could check if the guest is now completely unable to reach the network (for example with a ping). If it is, and:

  rmmod virtio_net; modprobe virtio_net

restores connectivity, I guess that might be the case. The workaround is not entirely robust, but I observed that with higher values of x-txburst, for example:

  -device virtio-net-pci,netdev=hostnet0,x-txburst=524288

failures are quite unlikely to occur. Could you please give that a try?

By the way,

(In reply to Quan Wenli from comment #5)
> since " -p file_transfer.pcap_1 " with passt command, I am not sure how to
> reduce capture file.

you can do it once the capture is ready with a tool such as editcap(1). As I never remember the syntax, actually, I open the capture file with Wireshark, select a part of the packets, and save only that as a separate file (File -> Export Specified Packets).

Comment 8 Quan Wenli 2022-09-07 06:40:13 UTC

(In reply to Stefano Brivio from comment #7)
> Checking the new (In reply to Quan Wenli from comment #6)
> > Created attachment 1902781 [details]
> > 2M file transfer capture
> 
> Actually I see more than 4 MiB transferred there. If that's the full
> capture, it ends (frame #187) with an ACK segment from the guest, and
> nothing else.
> 
> Thinking about it, I wonder if the reason is an issue in virtio-net or qemu

But as "Additional info " part of comment #0, can not reproduce with upstream passt. so I think it's not qemu issue. 
 
> (I couldn't really find out) I'm currently working around with the
> -x-txburst parameter passed for the virtio-net-pci device. Specifically, the
> issue I observed was that at some point packets would stop to be queued by
> qemu, until virtio_net is unloaded and reloaded. I haven't filed a ticket
> about that, yet.
> 
> There's a simple check that should tell us if that's the case: when you hit
> this failure, you could check if the guest is now completely unable to reach
> the network (for example with a ping). If it is, and:
> 
>   rmmod virtio_net; modprobe virtio_net
> 
> restores connectivity, I guess that might be the case. The workaround is not
> entirely robust, but I observed that with higher values of x-txburst, for
> example:
> 
>   -device virtio-net-pci,netdev=hostnet0,x-txburst=524288
> 

Yes, I tried, but still can reproduce the issue. 

> failures are quite unlikely to occur. Could you please give that a try?
> 
> By the way,
> 
> (In reply to Quan Wenli from comment #5)
> > since " -p file_transfer.pcap_1 " with passt command, I am not sure how to
> > reduce capture file.

Ok, I splited it to 4 files like file_transfer_00000_xxx.pcap_1_output

> 
> you can do it once the capture is ready with a tool such as editcap(1). As I
> never remember the syntax, actually, I open the capture file with Wireshark,
> select a part of the packets, and save only that as a separate file (File ->
> Export Specified Packets).

Comment 13 Laurent Vivier 2023-01-23 16:04:34 UTC

Wenli,

BZ 2122788 (Depends-on) has been fixed in qemu-kvm-7.2.0-1.el9

Could you re-test?

Comment 15 yalzhang@redhat.com 2023-02-07 11:22:16 UTC

(In reply to Laurent Vivier from comment #13)
> Wenli,
> 
> BZ 2122788 (Depends-on) has been fixed in qemu-kvm-7.2.0-1.el9
> 
> Could you re-test?

I have tested it with below packages with the steps in comment 0, the issue can still be reproduced.
qemu-kvm-7.2.0-7.el9.x86_64
passt-0^20221110.g4129764-1.el9.x86_64

I found the data transfer only happens in the first several seconds and then stoped.
The file on the guest keeps as 99M
# ll test_big.bin 
-rw-r--r--. 1 root root 103588817 Feb  7 19:10 test_big.bin

file on host:
#  ll /tmp/tmp.jPM9PLbjTq
-rw-r--r--. 1 test test 104857600 Feb  7 05:31 /tmp/tmp.jPM9PLbjTq

Comment 17 Laurent Vivier 2023-02-14 18:03:01 UTC

It works well with twp QEMUs back-to-back, so I think the problem is really in passt.

Should we move it to RHEL 9.3.0?

Comment 18 Lei Yang 2023-02-23 01:49:12 UTC

==> Reproduced this bug on passt-0^20221110.g4129764-1.el9.x86_64

Test Version:
passt-0^20221110.g4129764-1.el9.x86_64
qemu-kvm-7.2.0-9.el9.x86_64
kernel-5.14.0-281.el9.x86_64
libvirt-9.0.0-6.el9.x86_64
edk2-ovmf-20221207gitfff6d81270b5-6.el9.noarch

Test Steps:
1. Create passt pid
# passt -f -t 10001 -u 10001 -P passt.pid
Outbound interface (IPv4): switch
Outbound interface (IPv6): switch
MAC:
    host: ec:2a:72:30:86:32
DHCP:
    assign: 10.73.212.78
    mask: 255.255.254.0
    router: 10.73.213.254
DNS:
    10.72.17.5
    10.68.5.26
DNS search list:
    lab.eng.pek2.redhat.com
NDP/DHCPv6:
    assign: 2620:52:0:49d4:daa3:c9ff:27c7:4e06
    router: fe80::52c7:903:543b:88e1
    our link-local: fe80::c3e8:6ed9:7dbc:ba55
DNS search list:
    lab.eng.pek2.redhat.com
UNIX domain socket bound at /tmp/passt_1.socket

You can now start qemu (>= 7.2, with commit 13c6be96618c):
    kvm ... -device virtio-net-pci,netdev=s -netdev stream,id=s,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket
or qrap, for earlier qemu versions:
    ./qrap 5 kvm ... -net socket,fd=5 -net nic,model=virtio
accepted connection from PID 3038
NDP: received NS, sending NA
DHCP: ack to request
    from 9a:36:de:f2:81:a1
NDP: received RS, sending RA
DHCPv6: received SOLICIT, sending ADVERTISE
DHCPv6: received REQUEST/RENEW/CONFIRM, sending REPLY

2. Boot a guest
$ PATH=$PATH:/usr/libexec
$ qrap 5 qemu-kvm -m 16059 -smp 6 -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/home/test/avocado-vt-vm1_rhel920-64-virtio-scsi_qcow2_filesystem_VARS.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' -nodefaults -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' -m 62464 -object '{"size": 65498251264, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' -smp 28,maxcpus=28,cores=14,threads=1,dies=1,sockets=2 -cpu 'Icelake-Server',ds=on,ss=on,dtes64=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,kvm_pv_unhalt=on  -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/test/rhel920-64-virtio-scsi.qcow2", "cache": {"direct": true, "no-flush": false}}' -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' -device '{"driver": "virtio-net-pci", "mac": "9a:36:de:f2:81:a1", "id": "net0", "netdev": "hostnet0", "x-txburst": 16384, "bus": "pcie-root-port-3", "addr": "0x0"}' -netdev socket,fd=5,id=hostnet0 -boot menu=off,order=cdn,once=c,strict=off -vnc :0 -boot menu=off,order=cdn,once=c,strict=off -monitor stdio

3. Disable firewall on both guest and host
# systemctl stop firewalld.service || service iptables stop || iptables -F || nft flush ruleset

4. Create a 100M file
on host:
$ dd if=/dev/urandom bs=1M count=100 > /tmp/tmp.jPM9PLbjTq
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.347883 s, 301 MB/s

5. Transfer file from host to guest
on guest:
# nc -l 10001 > test_big.bin 

on host:
$ cat /tmp/tmp.jPM9PLbjTq |nc 127.0.0.1 10001 


6. Wait for more than 10 mins, the nc command does not stopped. The data transfer only happens in the first several seconds and then stopped.
# ll test_big.bin 
-rw-r--r--. 1 root root 103758237 Feb 23 08:45 test_big.bin

==>So reproduced this problem on passt-0^20221110.g4129764-1.el9.x86_64

==>Update the passt version to the latest version: passt-0^20230222.g4ddbcb9-1.el9.x86_64

Test Version:
passt-0^20230222.g4ddbcb9-1.el9.x86_64
qemu-kvm-7.2.0-9.el9.x86_64
kernel-5.14.0-281.el9.x86_64
libvirt-9.0.0-6.el9.x86_64
edk2-ovmf-20221207gitfff6d81270b5-6.el9.noarch

Test Steps:
1. Update the passt version to the latest version: passt-0^20230222.g4ddbcb9-1.el9.x86_64
# yum -y install passt-0^20230222.g4ddbcb9-1.el9.x86_64.rpm
# reboot (on the host)

2. After the host is power on, repeat the above test steps.The data will be transferred within seconds and nc command stopped.
# ll test_big.bin 
-rw-r--r--. 1 root root 104857600 Feb 23 09:08 test_big.bin

==> So based on the above test result this bug has been fixed very well on passt-0^20230222.g4ddbcb9-1.el9.x86_64.

Comment 19 Stefano Brivio 2023-02-23 08:41:26 UTC

Moving to modified, thanks Lei Yang for checking! Yes, we found a separate TCP stall issue which is now fixed in passt-0^20230222.g4ddbcb9-1.el9.

Comment 24 Lei Yang 2023-02-23 23:36:07 UTC

Based on the Comment 18 test result, move to "VERIFIED".

Comment 26 errata-xmlrpc 2023-05-09 07:43:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (passt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2292

Note You need to log in before you can comment on or make changes to this bug.