Bug 2129713

Summary: passt: can not find IPV6 gateway on vm
Product: Red Hat Enterprise Linux 9 Reporter: Quan Wenli <wquan>
Component: passtAssignee: Stefano Brivio <sbrivio>
Status: CLOSED COMPLETED QA Contact: Lei Yang <leiyang>
Severity: high Docs Contact:
Priority: high    
Version: 9.2CC: chayang, coli, jinzhao, juzhang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 0^20220929.g06aa26f-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-20 10:32:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Quan Wenli 2022-09-26 06:23:40 UTC
Description of problem:

Can not find IPV6 gateway on vm, it's should be a regression issue between 
passt-0.git.2022_06_08.d7d467f-0.el9.x86_64 and passt-0.git.2022_08_29.60ffc5b-1.el9.x86_64. and also there is no issue with latest upstream passt (git clone https://passt.top/passt && cd passt && make && ./passt)


Version-Release number of selected component (if applicable):

passt-0.git.2022_08_29.60ffc5b-1.el9.x86_64

How reproducible:
always

Steps to Reproduce:
1.[test@dell-per440-18 ~]$ /usr/bin/passt
2.[test@dell-per440-18 ~]$PATH=$PATH:/usr/libexec
3.[test@dell-per440-18 ~]$ qrap 5 qemu-kvm -m 16059 -cpu host -smp 6 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=./rhel910-64-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0 -nographic -serial stdio -nodefaults -device virtio-net-pci,netdev=hostnet0-netdev socket,fd=5,id=hostnet0
4. on vm 

[root@localhost ~]# dhclient -6 eth0
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
[root@localhost ~]#  ip -j -6 ro sh|jq -rM '.[] | select(.dst == "default").gateway'

Actual results:

can not find IPV6 gateway on vm

Expected results:

Should be find IPV6 gateway on vm

Additional info:

Comment 1 Stefano Brivio 2022-09-26 08:33:03 UTC
From a capture shared offline: router advertisements are sent with an incorrect checksum, whose value is 58 (decimal, same as ICMPv6 protocol number) plus the expected checksum. It's as if in some builds, this snippet from ndp.c:

	ip6hr->hop_limit = IPPROTO_ICMPV6;
	ihr->icmp6_cksum = 0;
	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
						 sizeof(*ihr) + len, 0);

	ip6hr->version = 6;
	ip6hr->nexthdr = IPPROTO_ICMPV6;
	ip6hr->hop_limit = 255;

where, for convenience, hop_limit is first set to IPPROTO_ICMPV6 to match the IPv6 pseudo-header for ICMPv6 checksum, and later set to its intended value, happened to be equivalent to:

	ihr->icmp6_cksum = 0;
	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
						 sizeof(*ihr) + len, 0);

	ip6hr->version = 6;
	ip6hr->nexthdr = IPPROTO_ICMPV6;
	ip6hr->hop_limit = 255;

At a first glance I don't see any justification why the compiler would be allowed to elide the initial assignment of hop_limit, though.

Comment 3 Stefano Brivio 2022-09-27 21:37:08 UTC
From a second (still quick) look:

- the gcc version used in passt-0.git.2022_06_08.d7d467f-0.el9.x86_64 is 11.2.1-9.4.el9 (https://download.copr.fedorainfracloud.org/results/sbrivio/passt/epel-9-x86_64/04776284-passt/builder-live.log.gz) -- this is the same as the one used for the most recent EPEL 9 build, 0^20220924.g8978f65-1.el9.x86_64

- comparing that part of the NDP implementation between the most recent EPEL 9 build and the most recent Fedora 36 build (using gcc 12.2.1-2.fc36.x86_64), it looks like there are some notable differences -- something makes me think that the hop_limit store is actually missing in the EPEL 9 build, but I couldn't grasp enough of it, yet

- if that store is really missing, this would be similar in nature to the issue described at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101354, which however applies only to functions marked as "naked", not the case here

It might be worth to check for differences in intermediate files (passing -save-temps in CFLAGS) between the two gcc versions.

Comment 4 Stefano Brivio 2022-09-27 22:13:09 UTC
Weird:

$ gcc --version
gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ (make clean; CFLAGS='-g' make) >/dev/null
$ objdump -DSdslrx passt|grep -A15 "ndp.c\:192" -m1
/root/passt_gcc_debug/passt/ndp.c:192
	ihr->icmp6_cksum = 0;
    b8bc:	31 c9                	xor    %ecx,%ecx
/root/passt_gcc_debug/passt/ndp.c:193
	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
    b8be:	31 d2                	xor    %edx,%edx
/root/passt_gcc_debug/passt/ndp.c:191
	ip6hr->hop_limit = IPPROTO_ICMPV6;
    b8c0:	c6 44 24 35 3a       	movb   $0x3a,0x35(%rsp)
__bswap_16():
/usr/include/bits/byteswap.h:37
  return __builtin_bswap16 (__bsx);
    b8c5:	66 c1 c0 08          	rol    $0x8,%ax
ndp():
/root/passt_gcc_debug/passt/ndp.c:193
	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +

Here, 0x3a (IPPROTO_ICMPV6) is stored before the call to csum_unaligned(). But not if I build with -flto=auto (that's a default flag for at least EPEL 9 packages):

$ (make clean; CFLAGS='-g -flto=auto' make) >/dev/null
$ objdump -DSdslrx passt|grep -A15 "ndp.c\:192" -m1
/root/passt_gcc_debug/passt/ndp.c:192
	ihr->icmp6_cksum = 0;
    bb0c:	45 31 c0             	xor    %r8d,%r8d
__bswap_16():
/usr/include/bits/byteswap.h:37
  return __builtin_bswap16 (__bsx);
    bb0f:	66 c1 c0 08          	rol    $0x8,%ax
ndp():
/root/passt_gcc_debug/passt/ndp.c:192
    bb13:	66 44 89 44 24 58    	mov    %r8w,0x58(%rsp)
/root/passt_gcc_debug/passt/ndp.c:190
	ip6hr->payload_len = htons(sizeof(*ihr) + len);
    bb19:	66 89 44 24 32       	mov    %ax,0x32(%rsp)
/root/passt_gcc_debug/passt/ndp.c:193
	ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) +
    bb1e:	48 8d 46 28          	lea    0x28(%rsi),%rax

I also tested on a recent gcc 12.2.0, same behaviour.

Comment 5 Stefano Brivio 2022-09-30 00:26:26 UTC
I introduced a workaround, that is, declaring csum_unaligned() as "noipa" for the affected gcc versions, depending on CFLAGS:
  https://passt.top/passt/commit/?id=06aa26fcf398f5d19ab46e42996190d7f95e837a

and it's now available in the 0^20220929.g06aa26f-1 EPEL 9 build (Copr repository only at the moment).

Comment 6 Quan Wenli 2022-11-18 04:22:02 UTC
Verified with passt-0^20221104.ge308018-1.el9.x86_64, it's passed and can get IPV6 gateway on the VM

on host: 
[root@dell-per440-18 ~]# rpm -qa |grep passt
passt-0^20221104.ge308018-1.el9.x86_64
[root@dell-per440-18 ~]# gcc --version
gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)

on VM:
[root@dell-per440-18 ~]# dhclient -6 eth0
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
[root@dell-per440-18 ~]# ip -j -6 ro sh|jq -rM '.[] | select(.dst == "default").gateway'
fe80::cee1:9402:8b35:be41

Set it to Veirified.