Bug 2252550 - systemd-resolved DNS-over-TLS doesn't work (with /proc/sys/net/ipv4/tcp_fastopen = 0).
Summary: systemd-resolved DNS-over-TLS doesn't work (with /proc/sys/net/ipv4/tcp_fasto...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 39
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-12-02 13:21 UTC by Maciej Żenczykowski
Modified: 2024-03-12 09:48 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Maciej Żenczykowski 2023-12-02 13:21:44 UTC
This appears to be some sort of failure wrt. systemd-resolved's TCP fast open implementation.

I'm trying to debug utterly failing DNS resolution.

# resolvectl status
Global
         Protocols: -LLMNR -mDNS +DNSOverTLS DNSSEC=yes/supported
  resolv.conf mode: stub
Current DNS Server: 8.8.8.8#dns.google
       DNS Servers: 8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.google 2001:4860:4860::8844#dns.google

Link 2 (eth0)
    Current Scopes: none
         Protocols: -DefaultRoute -LLMNR -mDNS +DNSOverTLS DNSSEC=yes/supported


I tried to disable fastopen (as it looks to be misbehaving based on tcpdump):

[root@f38vm ~]# cat /proc/sys/net/ipv4/tcp_fastopen
1
[root@f38vm ~]# echo 0 > /proc/sys/net/ipv4/tcp_fastopen

and now I see the following from strace on the systemd-resolved process:

socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 20
setsockopt(20, SOL_TCP, TCP_NODELAY, [1], 4) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 20, {events=EPOLLIN, data={u32=3532790272, u64=93990302128640}}) = 0
setsockopt(20, SOL_TCP, TCP_ULP, [7564404], 4) = -1 ENOENT (No such file or directory) // 7564404 == "tls\0"
getpid()                                = 1286
getpid()                                = 1286
getpid()                                = 1286
read(20, 0x557bd2934133, 5)             = -1 ENOTCONN (Transport endpoint is not connected)
sendmsg(20, {msg_name={sa_family=AF_INET, sin_port=htons(853), sin_addr=inet_addr("8.8.8.8")}, msg_)
// presumably kernel's tcp_sendmsg_fastopen() returns -EOPNOTSUPP -- userspace should fallback, but:
connect(20, 0x557bd2383f48, 0)          = -1 EINVAL (Invalid argument)
epoll_ctl(4, EPOLL_CTL_DEL, 20, NULL)   = 0
close(20)                               = 0

and tcpdump no longer shows any traffic.


Reproducible: Always

Comment 1 Maciej Żenczykowski 2023-12-02 16:13:58 UTC
btw. here's what things look like with tcp_fastopen = 1

08:09:15.400834 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [S], seq 548744628:548744942, win 32120, options [mss 1460,sackOK,TS val 3477859318 ecr 0,nop,wscale 7,tfo  cookie d2f9ee39dc952129,nop,nop], length 314
08:09:15.443714 IP 8.8.4.4.853 > 192.168.10.2.43064: Flags [S.], seq 466193132, ack 548744943, win 65535, options [mss 1220,sackOK,TS val 1350708716 ecr 3477859318,nop,wscale 8], length 0
08:09:15.443737 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [.], ack 1, win 251, options [nop,nop,TS val 3477859361 ecr 1350708716], length 0
08:09:15.470591 IP 8.8.4.4.853 > 192.168.10.2.43064: Flags [P.], seq 1:4440, ack 1, win 256, options [nop,nop,TS val 1350708733 ecr 3477859318], length 4439
08:09:15.470615 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 3477859388 ecr 1350708733], length 0
08:09:15.476528 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 1:81, ack 4440, win 249, options [nop,nop,TS val 3477859394 ecr 1350708733], length 80
08:09:15.478857 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 81:182, ack 4440, win 249, options [nop,nop,TS val 3477859396 ecr 1350708733], length 101
08:09:15.479763 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 182:283, ack 4440, win 249, options [nop,nop,TS val 3477859397 ecr 1350708733], length 101
08:09:15.573810 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 182:283, ack 4440, win 249, options [nop,nop,TS val 3477859491 ecr 1350708733], length 101
08:09:15.821777 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 1:283, ack 4440, win 249, options [nop,nop,TS val 3477859739 ecr 1350708733], length 282
08:09:15.821857 IP 8.8.4.4.853 > 192.168.10.2.43064: Flags [S.], seq 466193132, ack 548744943, win 65535, options [mss 1220,sackOK,TS val 1350709018 ecr 3477859318,nop,wscale 8], length 0
08:09:15.821877 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 3477859739 ecr 1350708733], length 0
08:09:16.309829 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 1:283, ack 4440, win 249, options [nop,nop,TS val 3477860227 ecr 1350708733], length 282
08:09:17.333589 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 1:283, ack 4440, win 249, options [nop,nop,TS val 3477861251 ecr 1350708733], length 282
08:09:17.871462 IP 8.8.4.4.853 > 192.168.10.2.43064: Flags [S.], seq 466193132, ack 548744943, win 65535, options [mss 1220,sackOK,TS val 1350711058 ecr 3477859318,nop,wscale 8], length 0
08:09:17.871522 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 3477861789 ecr 1350708733], length 0
08:09:19.317536 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 1:283, ack 4440, win 249, options [nop,nop,TS val 3477863235 ecr 1350708733], length 282
08:09:21.863237 IP 8.8.4.4.853 > 192.168.10.2.43064: Flags [S.], seq 466193132, ack 548744943, win 65535, options [mss 1220,sackOK,TS val 1350715090 ecr 3477859318,nop,wscale 8], length 0
08:09:21.863299 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 3477865780 ecr 1350708733], length 0
08:09:23.221582 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [P.], seq 1:283, ack 4440, win 249, options [nop,nop,TS val 3477867139 ecr 1350708733], length 282
08:09:25.556585 IP 8.8.4.4.853 > 192.168.10.2.43064: Flags [F.], seq 4440, ack 1, win 256, options [nop,nop,TS val 1350718723 ecr 3477859318], length 0
08:09:25.560972 IP 192.168.10.2.43064 > 8.8.4.4.853: Flags [FP.], seq 283:307, ack 4441, win 249, options [nop,nop,TS val 3477869478 ecr 1350718723], length 24

the above certainly doesn't look like a healthy tcp conncetion to me... note the SYN-ACK retransmits suggesting it never gets the ACK for it.

I'm guessing there's some firewall that gets utterly confused by the fastopen...

Comment 2 Maciej Żenczykowski 2023-12-02 16:26:28 UTC
For what it's worth, the Fedora 38 VM is:

# uname -r
6.6.2-101.fc38.x86_64

This is a VM on a Debian-ish
$ uname -r
6.5.6-1rodete4-amd64

behind a OpenWrt nat64 router:
# uname -r
5.10.176

which itself is on a cellular link to an ipv6 only cellular provider (Orange PL).

I'm not sure where the fastopen is breaking...

Comment 3 Maciej Żenczykowski 2023-12-02 16:27:40 UTC
Oh, also interesting is that things seem to work for a little bit of time after rebooting the VM...

Comment 4 Maciej Żenczykowski 2023-12-02 16:59:57 UTC
I've upgraded the VM to Fedora 39 (6.6.2-201.fc39.x86_64) and the problem has *not* gone away.

DNS resolution works for a bit immediately after VM reboot, and breaks soon thereafter.
(I'm guessing the initial DNS TLS connections happens without fastopen and thus succeeds)

Comment 5 Maciej Żenczykowski 2023-12-02 17:14:36 UTC
This is indeed a TCP fast open caused problem.

I can 'fix' it by dropping outbound TCP SYN with TCP Fast Open Cookie (option 34).


Immediately after reboot DNS works:

# uname -r
6.6.2-201.fc39.x86_64

# ping -c 1 gmail.com
PING gmail.com (142.250.185.69) 56(84) bytes of data.
64 bytes from fra16s48-in-f5.1e100.net (142.250.185.69): icmp_seq=1 ttl=110 time=46.4 ms

(wait a minute or two, and DNS resolution starts failing)

# ping -c 1 youtube.com
^C

(install a blanket drop rule for outbound TCP Fast Open Cookie possessing SYN,
I believe this should cause kernel to fallback to non tcp fast open mode on 2nd syn)

# iptables -t filter -A OUTPUT -p tcp --dport 853 --syn --tcp-option 34 -j DROP

(wait ~10 seconds for old in progress attempts to time out, and now DNS works)

# ping -c 1 youtube.com
PING youtube.com (142.250.186.46) 56(84) bytes of data.
64 bytes from fra24s04-in-f14.1e100.net (142.250.186.46): icmp_seq=1 ttl=110 time=39.4 ms

Comment 6 Maciej Żenczykowski 2023-12-02 17:17:27 UTC
My guess is some firewall in my setup allows the outbound fast open SYN with payload, but fails to correctly keep track of the payload,
it then allows the inbound SYN-ACK, but (presumably) fails to allow the outbound ACK as it is now not in the right spot in the sequence space (due to missing the outbound syn data bytes).

Comment 7 Maciej Żenczykowski 2023-12-02 17:34:46 UTC
tcpdump on the router:

root@mf286a:~# tcpdump -nn -i wwan0 ip6 and tcp and port 853
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wwan0, link-type RAW (Raw IP), capture size 262144 bytes
17:30:45.148024 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [S], seq 1782320173:1782320487, win 32120, options [mss 1400,sackOK,TS val 982463275 ecr 0,nop,wscale 7,tfo  cookie 2d5d3592b2d9fc8d,nop,nop], length 314
17:30:45.193124 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [S.], seq 2895784947, ack 1782320488, win 65535, options [mss 1220,sackOK,TS val 2260016671 ecr 982463275,nop,wscale 8], length 0
17:30:45.195901 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [.], ack 1, win 251, options [nop,nop,TS val 982463323 ecr 2260016671], length 0
17:30:45.207128 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [.], seq 1:1209, ack 1, win 256, options [nop,nop,TS val 2260016689 ecr 982463275], length 1208
17:30:45.208154 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [.], seq 1209:2417, ack 1, win 256, options [nop,nop,TS val 2260016689 ecr 982463275], length 1208
17:30:45.208157 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [.], seq 2417:3625, ack 1, win 256, options [nop,nop,TS val 2260016689 ecr 982463275], length 1208
17:30:45.209014 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [P.], seq 3625:4440, ack 1, win 256, options [nop,nop,TS val 2260016689 ecr 982463275], length 815
17:30:45.209737 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [.], ack 1209, win 249, options [nop,nop,TS val 982463337 ecr 2260016689], length 0
17:30:45.211921 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 982463339 ecr 2260016689], length 0
17:30:45.215645 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 1:81, ack 4440, win 249, options [nop,nop,TS val 982463343 ecr 2260016689], length 80
17:30:45.216111 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 81:172, ack 4440, win 249, options [nop,nop,TS val 982463343 ecr 2260016689], length 91
17:30:45.216200 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 172:263, ack 4440, win 249, options [nop,nop,TS val 982463343 ecr 2260016689], length 91
17:30:45.315991 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 172:263, ack 4440, win 249, options [nop,nop,TS val 982463443 ecr 2260016689], length 91
17:30:45.499106 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [S.], seq 2895784947, ack 1782320488, win 65535, options [mss 1220,sackOK,TS val 2260016975 ecr 982463275,nop,wscale 8], length 0
17:30:45.501271 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 982463628 ecr 2260016689], length 0
17:30:45.571668 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 1:263, ack 4440, win 249, options [nop,nop,TS val 982463699 ecr 2260016689], length 262
17:30:46.076266 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 1:263, ack 4440, win 249, options [nop,nop,TS val 982464203 ecr 2260016689], length 262
17:30:47.115842 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 1:263, ack 4440, win 249, options [nop,nop,TS val 982465243 ecr 2260016689], length 262
17:30:47.546016 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [S.], seq 2895784947, ack 1782320488, win 65535, options [mss 1220,sackOK,TS val 2260019023 ecr 982463275,nop,wscale 8], length 0
17:30:47.547546 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 982465675 ecr 2260016689], length 0
17:30:49.163699 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 1:263, ack 4440, win 249, options [nop,nop,TS val 982467291 ecr 2260016689], length 262
17:30:51.581058 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [S.], seq 2895784947, ack 1782320488, win 65535, options [mss 1220,sackOK,TS val 2260023055 ecr 982463275,nop,wscale 8], length 0
17:30:51.583231 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [.], ack 4440, win 249, options [nop,nop,TS val 982469711 ecr 2260016689], length 0
17:30:53.195605 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [P.], seq 1:263, ack 4440, win 249, options [nop,nop,TS val 982471323 ecr 2260016689], length 262
17:30:55.191067 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832: Flags [F.], seq 4440, ack 1, win 256, options [nop,nop,TS val 2260026676 ecr 982463275], length 0
17:30:55.195543 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.44832 > 64:ff9b::808:808.853: Flags [FP.], seq 263:287, ack 4441, win 249, options [nop,nop,TS val 982473323 ecr 2260026676], length 24
^C
26 packets captured
26 packets received by filter
0 packets dropped by kernel

seems to suggest the problems are deeper in the network...
(ie. perhaps a bad firewall config at the cellular ISP ?)

Comment 8 Maciej Żenczykowski 2023-12-02 17:43:58 UTC
Wasn't certain if the sequence numbers are relative to the data with or without the syn data.
Based on the following it looks like tcpdump is ignoring the SYN data for relative ACK counts (seems like a bug in tcpdump...)

root@mf286a:~# tcpdump -nn -i wwan0 --absolute-tcp-sequence-numbers ip6 and tcp and port 853
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wwan0, link-type RAW (Raw IP), capture size 262144 bytes
17:39:51.367093 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [S], seq 3046078332:3046078646, win 32120, options [mss 1400,sackOK,TS val 983009491 ecr 0,nop,wscale 7,tfo  cookie 2d5d3592b2d9fc8d,nop,nop], length 314
17:39:51.403444 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [S.], seq 898413103, ack 3046078647, win 65535, options [mss 1220,sackOK,TS val 529949334 ecr 983009491,nop,wscale 8], length 0
17:39:51.405584 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [.], ack 898413104, win 251, options [nop,nop,TS val 983009529 ecr 529949334], length 0
17:39:51.568468 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [.], seq 898413104:898414312, ack 3046078647, win 256, options [nop,nop,TS val 529949499 ecr 983009491], length 1208
17:39:51.568470 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [.], seq 898414312:898415520, ack 3046078647, win 256, options [nop,nop,TS val 529949499 ecr 983009491], length 1208
17:39:51.569478 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [.], seq 898415520:898416728, ack 3046078647, win 256, options [nop,nop,TS val 529949499 ecr 983009491], length 1208
17:39:51.569480 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [P.], seq 898416728:898417542, ack 3046078647, win 256, options [nop,nop,TS val 529949499 ecr 983009491], length 814
17:39:51.571595 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [.], ack 898415520, win 249, options [nop,nop,TS val 983009695 ecr 529949499], length 0
17:39:51.572106 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [.], ack 898417542, win 249, options [nop,nop,TS val 983009696 ecr 529949499], length 0
17:39:51.575491 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078647:3046078727, ack 898417542, win 249, options [nop,nop,TS val 983009699 ecr 529949499], length 80
17:39:51.575911 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078727:3046078818, ack 898417542, win 249, options [nop,nop,TS val 983009699 ecr 529949499], length 91
17:39:51.576273 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078818:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983009700 ecr 529949499], length 91
17:39:51.663058 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078818:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983009787 ecr 529949499], length 91
17:39:51.705444 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [S.], seq 898413103, ack 3046078647, win 65535, options [mss 1220,sackOK,TS val 529949636 ecr 983009491,nop,wscale 8], length 0
17:39:51.707741 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [.], ack 898417542, win 249, options [nop,nop,TS val 983009831 ecr 529949499], length 0
17:39:51.905000 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078647:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983010027 ecr 529949499], length 262
17:39:52.383061 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078647:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983010507 ecr 529949499], length 262
17:39:53.361414 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078647:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983011483 ecr 529949499], length 262
17:39:53.737449 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [S.], seq 898413103, ack 3046078647, win 65535, options [mss 1220,sackOK,TS val 529951668 ecr 983009491,nop,wscale 8], length 0
17:39:53.739432 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [.], ack 898417542, win 249, options [nop,nop,TS val 983011863 ecr 529949499], length 0
17:39:55.278951 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078647:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983013403 ecr 529949499], length 262
17:39:57.779481 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [S.], seq 898413103, ack 3046078647, win 65535, options [mss 1220,sackOK,TS val 529955700 ecr 983009491,nop,wscale 8], length 0
17:39:57.824496 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [.], ack 898417542, win 249, options [nop,nop,TS val 983015948 ecr 529949499], length 0
17:39:59.120688 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [P.], seq 3046078647:3046078909, ack 898417542, win 249, options [nop,nop,TS val 983017243 ecr 529949499], length 262
17:40:01.418499 IP6 64:ff9b::808:808.853 > 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048: Flags [F.], seq 898417542, ack 3046078647, win 256, options [nop,nop,TS val 529959349 ecr 983009491], length 0
17:40:01.421716 IP6 2a00:f41:58ee:f7da:62d2:4e12:7e76:441b.50048 > 64:ff9b::808:808.853: Flags [FP.], seq 3046078909:3046078933, ack 898417543, win 249, options [nop,nop,TS val 983019545 ecr 529959349], length 24
^C
26 packets captured
26 packets received by filter
0 packets dropped by kernel

This is looking like a bug in the Orange PL ISP's PLAT nat64 handling of tcp fast open enabled syn packets.
The SYN goes out, but I think all following outbound packets are dropped by the PLAT's NAT.

Comment 9 Maciej Żenczykowski 2023-12-02 17:58:06 UTC
(and the reason this is only affecting the VM: the VM only has IPv4 via NAT on the host, so *must* use the PLAT.  The host itself and other things in the network can use native IPv6 which presumably works fine)

Comment 10 Maciej Żenczykowski 2023-12-02 18:28:34 UTC
Anyway, I'll try to get Orange PL's PLAT fixed.

But that still leaves things to fix:

(a) echo 0 > /proc/sys/net/ipv4/tcp_fastopen

appears to break systemd-resolved's dns-over-tls.
I think this is a code bug: lack of a fall back path to 'normal' tcp.

Although I guess it could be argued it's a kernel bug with the fastopen apis not working???

(b) the kernel fails to detect that tcp fastopen is busted and doesn't fall back to non fastopen mode
- this may be very very hard or even impossible to fix...
I think we'd likely need to retransmit the initial syn data payload (without the SYN flag)
in spite of the data having already been ACK-ed.  [and of course this is just a hypothesis,
it might not fix things...]

The only thing I can think of as a 'hacky workaround' would be to simply disable tcp fast open on some small fraction of tcp connections.
Or perhaps the 'retransmitted SYN-ACK reception' could serve as a signal that tcp fast open is not actually working???

Comment 11 David Tardon 2024-03-12 09:48:12 UTC
(In reply to Maciej Żenczykowski from comment #10)
> But that still leaves things to fix:
> 
> (a) echo 0 > /proc/sys/net/ipv4/tcp_fastopen
> 
> appears to break systemd-resolved's dns-over-tls.
> I think this is a code bug: lack of a fall back path to 'normal' tcp.
> 
> Although I guess it could be argued it's a kernel bug with the fastopen apis
> not working???
> 
> (b) the kernel fails to detect that tcp fastopen is busted and doesn't fall
> back to non fastopen mode
> - this may be very very hard or even impossible to fix...
> I think we'd likely need to retransmit the initial syn data payload (without
> the SYN flag)
> in spite of the data having already been ACK-ed.  [and of course this is
> just a hypothesis,
> it might not fix things...]

-> moving to kernel


Note You need to log in before you can comment on or make changes to this bug.