Bug 2122641
| Summary: | conntrack -D with filters sometimes ignores the filters and deletes all connection state | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Bharath M V <bmv> | ||||
| Component: | libnetfilter_conntrack | Assignee: | Phil Sutter <psutter> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jiri Peska <jpeska> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 9.0 | CC: | alifshit, bdobreli, bmv, chrisw, dasmith, egarciar, egarver, eglynn, jeckersb, jhakimra, jpeska, kchamart, lmiccini, michele, pgm-rhel-tools, sbauza, scohen, sdodson, sgordon, shuali, skaplons, vromanso, yiche | ||||
| Target Milestone: | rc | Keywords: | AutomationBlocker, OtherQA, Regression, Triaged, ZStream | ||||
| Target Release: | 9.2 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libnetfilter_conntrack-1.0.8-5.el9 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2151586 2151587 (view as bug list) | Environment: | |||||
| Last Closed: | 2023-02-28 08:19:21 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2151586, 2151587 | ||||||
| Attachments: |
|
||||||
|
Comment 2
John Eckersberg
2022-08-30 19:05:29 UTC
After a few debugging sessions, we've finally narrowed this down. In short, something in the iptables_hybrid firewall driver seems to be flushing all of the conntrack state on the compute host, which causes the AMQP connections to temporarily drop. First, a recap of the relevant iptables rules: [root@compute-1 ~]# ip6tables -nL INPUT --line-number Chain INPUT (policy ACCEPT) num target prot opt source destination 1 neutron-openvswi-INPUT all ::/0 ::/0 2 ACCEPT all ::/0 ::/0 ctstate RELATED,ESTABLISHED /* 000 accept related established rules ipv6 */ 3 ACCEPT icmpv6 ::/0 ::/0 ctstate NEW /* 001 accept all ipv6-icmp ipv6 */ 4 ACCEPT all ::/0 ::/0 ctstate NEW /* 002 accept all to lo interface ipv6 */ 5 ACCEPT udp ::/0 fe80::/64 udp dpt:546 ctstate NEW /* 004 accept ipv6 dhcpv6 ipv6 */ 6 ACCEPT udp ::/0 ::/0 udp dpt:123 ctstate NEW /* 105 ntp ipv6 */ 7 ACCEPT tcp fd00:fd00:fd00:2000::/64 ::/0 tcp dpt:2022 ctstate NEW /* 113 nova_migration_target accept api subnet fd00:fd00:fd00:2000::/64 ipv6 */ 8 ACCEPT tcp fd00:fd00:fd00:2000::/64 ::/0 tcp dpt:2022 ctstate NEW /* 113 nova_migration_target accept libvirt subnet fd00:fd00:fd00:2000::/64 ipv6 */ 9 ACCEPT udp ::/0 ::/0 udp dpt:4789 ctstate NEW /* 118 neutron vxlan networks ipv6 */ 10 ACCEPT 47 ::/0 ::/0 /* 136 neutron gre networks ipv6 */ 11 ACCEPT tcp ::/0 ::/0 tcp dpts:5900:6923 ctstate NEW /* 200 nova_libvirt ipv6 */ 12 ACCEPT tcp ::/0 ::/0 tcp dpts:61152:61215 ctstate NEW /* 200 nova_libvirt ipv6 */ 13 ACCEPT tcp ::/0 ::/0 tcp dpt:16514 ctstate NEW /* 200 nova_libvirt ipv6 */ 14 LOG all ::/0 ::/0 ctstate NEW limit: avg 20/min burst 15 /* 998 log all ipv6 */ LOG flags 0 level 4 15 DROP all ::/0 ::/0 ctstate NEW /* 999 drop all ipv6 */ Key points being: - (rule 2) accepts packets from conntrack in ESTABLISHED - (rule 14) logs traffic that falls through and gets dropped - (rule 15) does the actual drop In the journal/dmesg output during VM deletion, we are falling through to rule 14 and seeing many (limited by the "burst 15") entries where amqp traffic is being dropped: [46506.310001] IN=vlan20 OUT= MAC=52:54:00:39:66:f7:52:54:00:13:5d:23:86:dd:60:06:2b:a0 SRC=fd00:fd00:fd00:2000:0000:0000:0000:02e4 DST=fd00:fd00:fd00:2000:0000:0000:0000:01fd LEN=2202 TC=0 HOPLIMIT=64 FLOWLBL=404384 PROTO=TCP SPT=5672 DPT=60130 WINDOW=501 RES=0x00 ACK PSH URGP=0 So we ran through the tempest test again, this time running `conntrack -E` to watch conntrack events in realtime. During VM deletion, we saw many lines like: [1662726775.884023] [DESTROY] tcp 6 CLOSE src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::2e4 sport=40986 dport=5672 src=fd00:fd00:fd00:2000::2e4 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=40986 [ASSURED] Key point being the [DESTROY] bit, where it looks like the entire conntrack table is being flushed. (It's not just amqp traffic; it looks like *all* traffic on the compute node is being flushed). Now we're in the situation that was described in comment #2, where the connection is stuck on retransmit. Here's a newer pcap of the same situation, except this one is long enough to see the connection recover. I want to speak to this one to explain how the connection eventually recovers: No. Time Source Destination Protocol Length Arrival Time Source Info 14439 779.837878 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd AMQP/JSON 3076 Sep 8, 2022 10:07:28.589464000 EDT Basic.Deliver x= rk=reply_2b379f50657b45be92eb29881f397fc5 Content-Header type=application/json Content-Body , JavaScript Object Notation (application/json) 14440 779.838160 fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 TCP 92 Sep 8, 2022 10:07:28.589746000 EDT 33214 → 5672 [ACK] Seq=6259 Ack=438208 Win=24555 Len=0 TSval=2519032027 TSecr=1544072671 14441 779.838860 fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 AMQP 113 Sep 8, 2022 10:07:28.590446000 EDT Basic.Ack 14443 779.839027 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 92 Sep 8, 2022 10:07:28.590613000 EDT 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=0 TSval=1544072672 TSecr=2519032027 14482 781.542455 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd AMQP 4376 Sep 8, 2022 10:07:30.294041000 EDT Basic.Deliver x= rk=reply_2b379f50657b45be92eb29881f397fc5 Content-Header type=application/json 14484 781.542458 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd AMQP/JSON 2341 Sep 8, 2022 10:07:30.294044000 EDT Content-Body , JavaScript Object Notation (application/json) 14486 781.557132 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 913 Sep 8, 2022 10:07:30.308718000 EDT [TCP Retransmission] 5672 → 33214 [PSH, ACK] Seq=443920 Ack=6280 Win=501 Len=821 TSval=1544074381 TSecr=2519032027 14488 781.751866 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:30.503452000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544074585 TSecr=2519032027 14502 782.159925 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:30.911511000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544074993 TSecr=2519032027 14519 782.975859 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:31.727445000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544075809 TSecr=2519032027 14559 784.639831 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:33.391417000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544077473 TSecr=2519032027 14689 787.903949 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:36.655535000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544080737 TSecr=2519032027 14817 794.623894 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:43.375480000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544087457 TSecr=2519032027 14901 807.935916 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:07:56.687502000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544100769 TSecr=2519032027 15054 834.047945 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:08:22.799531000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6280 Win=501 Len=1428 TSval=1544126881 TSecr=2519032027 15119 840.324325 fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 TCP 92 Sep 8, 2022 10:08:29.075911000 EDT [TCP Keep-Alive] 33214 → 5672 [ACK] Seq=6279 Ack=438208 Win=24569 Len=0 TSval=2519092513 TSecr=1544072672 15122 840.324546 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 92 Sep 8, 2022 10:08:29.076132000 EDT [TCP Keep-Alive ACK] 5672 → 33214 [ACK] Seq=444741 Ack=6280 Win=501 Len=0 TSval=1544133158 TSecr=2519032027 15143 840.850599 fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 AMQP 100 Sep 8, 2022 10:08:29.602185000 EDT Heartbeat 15145 840.850894 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 92 Sep 8, 2022 10:08:29.602480000 EDT 5672 → 33214 [ACK] Seq=444741 Ack=6288 Win=501 Len=0 TSval=1544133684 TSecr=2519093039 15673 886.784891 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 1520 Sep 8, 2022 10:09:15.536477000 EDT [TCP Retransmission] 5672 → 33214 [ACK] Seq=438208 Ack=6288 Win=501 Len=1428 TSval=1544179618 TSecr=2519093039 15674 886.825307 fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 TCP 92 Sep 8, 2022 10:09:15.576893000 EDT 33214 → 5672 [ACK] Seq=6288 Ack=439636 Win=24569 Len=0 TSval=2519139014 TSecr=1544179618 15676 886.825573 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd AMQP/JSON/JSON 2948 Sep 8, 2022 10:09:15.577159000 EDT Basic.Deliver x= rk=reply_2b379f50657b45be92eb29881f397fc5 Content-Header type=application/json Content-Body , JavaScript Object Notation (application/json)Basic.Deliver x= rk=reply_2b379f50657b45be92eb29881f397fc5 Content-Header type=application/json Content-Body , JavaScript Object Notation (application/json)Basic.Deliver x= rk=reply_2b379f50657b45be92eb29881f397fc5 Content-Header type=application/json 15677 886.825630 fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 TCP 104 Sep 8, 2022 10:09:15.577216000 EDT [TCP Dup ACK 15674#1] 33214 → 5672 [ACK] Seq=6288 Ack=439636 Win=24569 Len=0 TSval=2519139014 TSecr=1544179618 SLE=444741 SRE=447597 15679 886.825839 fd00:fd00:fd00:2000::309 fd00:fd00:fd00:2000::1fd TCP 2948 Sep 8, 2022 10:09:15.577425000 EDT [TCP Out-Of-Order] 5672 → 33214 [PSH, ACK] Seq=439636 Ack=6288 Win=501 Len=2856 TSval=1544179659 TSecr=2519139014[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)] 15680 *REF* fd00:fd00:fd00:2000::1fd fd00:fd00:fd00:2000::309 TCP 104 Sep 8, 2022 10:09:15.577471000 EDT 33214 → 5672 [ACK] Seq=6288 Ack=442492 Win=24547 Len=0 TSval=2519139014 TSecr=1544179659 SLE=444741 SRE=447597 Packet 14440 is the last packet successfully received by compute before conntrack state gets lost (we know because 14441 is sent to ACK it). Two seconds later when 14482 is sent, there is no ACK. So we know that conntrack state is wiped between 10:07:28.589746000 and 10:07:30.294041000. As the packets are coming in from the controller, they no longer match the "ctstate ESTABLISHED" rule and are being dropped. There are many retransmits but they are all dropped, meanwhile the retransmit timer backs off and we see the packets with less frequency. The next key moment is in packets 15119 and 15122. The TCP keepalive timer is around 60s for these connections: # ss -antpo | grep 5672 | grep nova-compute ESTAB 8 0 [fd00:fd00:fd00:2000::1fd]:58204 [fd00:fd00:fd00:2000::309]:5672 users:(("nova-compute",pid=180855,fd=29)) timer:(keepalive,58sec,0) So once the keepalive timer expires and a keepalive packet is sent *outbound* from the compute node, *now* the connection is re-added to the conntrack table. The keepalive ACK from the controller node puts the connection back in ESTABLISHED state. Now the connection is back to "normal", although now we have to wait for the retransmit timer to expire and fire off the packet again. We finally see that happen in 15673, which flows properly and we see the ACK emitted immediately in 15674. At this point everything is fully recovered and normal operation resumes. Note however how long it takes to recover. The flow was broken at 10:07:30.294041000 and doesn't re-establish fully until 10:09:15.576893000. So almost two minutes waiting for various timers to expire. This is why setting rpc_response_timeout=120 is a valid workaround -- it just happens to be slightly longer than the delay created waiting for the combination of tcp keepalive timer + tcp retransmit timer. Now, this seems to be caused by something in (or related to) the iptables_hybrid firewall driver, because changing the config to: [securitygroup] firewall_driver=noop And restarting the neutron_ovs_agent container fixes the problem. The test runs successfully, there are no dropped packets logged by the kernel, and there are no conntrack rules DESTROYed during VM deletion. One minor correction, I pasted the wrong conntrack line above: [1662726775.884023] [DESTROY] tcp 6 CLOSE src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::2e4 sport=40986 dport=5672 src=fd00:fd00:fd00:2000::2e4 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=40986 [ASSURED] That one is actually valid, because the state is CLOSE. The ones during VM deletion look like: [DESTROY] tcp 6 431998 ESTABLISHED src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=38074 src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=38074 dport=5672 [ASSURED] Those are the valid, established ones that are being erroneously dropped. Another minor correction: Packet 14440 is the last packet successfully received by compute before conntrack state gets lost (we know because 14441 is sent to ACK it). Should instead be: Packet 14439 is the last packet successfully received by compute before conntrack state gets lost (we know because 14440 is sent to ACK it). Off-by-one errors :) I'm investigating this issue from the Neutron PoV but so far I have no idea how such conntrack entries can be deleted by neutron-ovs-agent and it's firewall driver. I will continue that investigation in next days. (In reply to Slawek Kaplonski from comment #13) > I'm investigating this issue from the Neutron PoV but so far I have no idea > how such conntrack entries can be deleted by neutron-ovs-agent and it's > firewall driver. I will continue that investigation in next days. I'm still looking at this too. (Side note, I think we might be using the same CI environment to debug because I saw tempest kick off earlier while I was poking at stuff. Hopefully I didn't step on your toes.) I'm growing suspicious that this might actually be a kernel/netfilter/conntrack bug. I need to do some more poking to gather better data and make sure I can reproduce the issue. The general idea is that the neutron firewall driver is invoking conntrack like: conntrack 577252 562070 0 /usr/sbin/conntrack -D -f ipv4 -d 192.168.9.170 -w 4161 conntrack 577253 562070 0 /usr/sbin/conntrack -D -f ipv4 -s 192.168.9.170 -w 4161 Which is coming from here: https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L136-L162 And this all looks fine to me. But for some reason, running those conntrack commands seems to (sometimes) invoke an internal gc which is erroneously timing out all of the connections on the host and deleting them. I should have some more details in a few hours. Well to give you an idea of how the day went, I just crashed the tmux server with *all* of the PIDONE sessions. So all of the stuff I was about to paste into here for an update is now mostly lost. I'll have to spend some time tomorrow recreating it. I spent some time tracing through the conntrack client code. The client itself doesn't appear to be doing anything too horrible. Although it is worth noting that a lot of the time it errors out part of the way through because it gets ENOENT when trying to delete an entry. I still think the problem is probably somewhere in the kernel. But that's a job for tomorrow. Thx John for help with it. I spent most of today also trying to understand how neutron can destroy those entries and I really don't think it's neutron. As You said, everything what we are doing is something like "/usr/sbin/conntrack -D -f ipv4 -s 192.168.9.170 -w 4161" so we always use filter per source or destination address, ethernet type and zone. If You would have any more info about it from the kernel side, I'm very interested to read about it :) Ok, I have a better grasp on this now. I'm still not sure exactly where the problem is in conntrack (the kernel or userspace parts), but it definitely looks like a problem in conntrack to me.
First, I'm able to reproduce 100% of the time by running this single tempest test:
(overcloud) [stack@undercloud-0 tempest-dir]$ tempest run -r neutron_plugin.tests.scenario.test_multicast.MulticastTestIPv4Common.test_flooding_when_special_groups
But before I'm running that test, I'm setting up a few things to run and collect data.
First, I've crafted a perf probe to log data about conntrack flows as they're being deleted in the kernel:
[root@compute-1 ~]# perf probe -m nf_conntrack 'nf_ct_delete l3num=ct->tuplehash[0].tuple.src.l3num protonum=ct->tuplehash[0].tuple.dst.protonum sport=ct->tuplehash[0].tuple.src.u.tcp.port dport=ct->tuplehash[0].tuple.dst.u.tcp.port'
Then, I'm recording against that probe and filtering out only those things that are (1) IPv4/IPv6, (2) TCP, and (3) TCP dst port of 5672:
[root@compute-1 ~]# perf record -e probe:nf_ct_delete --filter '(l3num == 2 || l3num == 10) && protonum == 6 && dport == 0x2816' -aRg sleep infinity
(0x2816 is just 5672 in network byte-order)
Next, I'm running execsnoop to capture the pid and cmdline for any invocation of the conntrack userspace binary:
[root@compute-1 ~]# /usr/share/bcc/tools/execsnoop -n conntrack
PCOMM PID PPID RET ARGS
conntrack 276741 921136 0 /usr/sbin/conntrack -D -f ipv4 -d 10.100.0.13 -w 4210
conntrack 276742 921136 0 /usr/sbin/conntrack -D -f ipv4 -s 10.100.0.13 -w 4210
And finally, I'm watching conntrack events, filtered down to just established AMQP connections:
[root@compute-1 ~]# conntrack -E -o timestamp,exSo, a few takeaways:tended -e DESTROY -p tcp --dport 5672 --state ESTABLISHED | tee conntrack.log
[1663171649.312111] [DESTROY] ipv6 10 tcp 6 431997 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=57846 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=57846 [ASSURED]
[1663171649.312902] [DESTROY] ipv6 10 tcp 6 431997 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=33840 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=33840 [ASSURED]
[1663171649.312912] [DESTROY] ipv6 10 tcp 6 431996 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=42928 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=42928 [ASSURED]
[1663171649.312925] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=47990 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=47990 [ASSURED]
[1663171649.312957] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=48002 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=48002 [ASSURED]
[1663171649.312973] [DESTROY] ipv6 10 tcp 6 431994 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::2e4 sport=56890 dport=5672 src=fd00:fd00:fd00:2000::2e4 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=56890 [ASSURED]
[1663171649.313005] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::2e4 sport=56896 dport=5672 src=fd00:fd00:fd00:2000::2e4 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=56896 [ASSURED]
[1663171649.313019] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=58472 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=58472 [ASSURED]
[1663171649.313108] [DESTROY] ipv6 10 tcp 6 296 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=53684 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=53684 [ASSURED]
[1663171649.313129] [DESTROY] ipv6 10 tcp 6 431996 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=42944 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=42944 [ASSURED]
[1663171649.313149] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=38678 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=38678 [ASSURED]
[1663171649.313179] [DESTROY] ipv6 10 tcp 6 431996 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=38682 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=38682 [ASSURED]
[1663171649.313204] [DESTROY] ipv6 10 tcp 6 431997 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::309 sport=51488 dport=5672 src=fd00:fd00:fd00:2000::309 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=51488 [ASSURED]
[1663171649.313230] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::2e4 sport=56862 dport=5672 src=fd00:fd00:fd00:2000::2e4 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=56862 [ASSURED]
[1663171649.313258] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=34048 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=34048 [ASSURED]
[1663171649.313280] [DESTROY] ipv6 10 tcp 6 431997 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=38686 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=38686 [ASSURED]
[1663171649.313297] [DESTROY] ipv6 10 tcp 6 431998 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=48244 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=48244 [ASSURED]
[1663171649.313315] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::2e4 sport=56878 dport=5672 src=fd00:fd00:fd00:2000::2e4 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=56878 [ASSURED]
[1663171649.313353] [DESTROY] ipv6 10 tcp 6 431999 ESTABLISHED src=fd00:fd00:fd00:2000::1fd dst=fd00:fd00:fd00:2000::115 sport=40188 dport=5672 src=fd00:fd00:fd00:2000::115 dst=fd00:fd00:fd00:2000::1fd sport=5672 dport=40188 [ASSURED]
Once we hit the point that those two conntrack commands run and all of the AMQP connections get killed, I stop the perf record, and here's the output:
conntrack 276742 [002] 547028.478906: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xf6e1 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479011: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x3084 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479023: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xb0a7 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479104: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x76bb dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479173: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x82bb dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479181: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x3ade dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479230: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x40de dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479238: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x68e4 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479451: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xb4d1 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479475: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xc0a7 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479493: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x1697 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479523: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x1a97 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479567: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x20c9 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479616: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x1ede dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479647: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x85 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479689: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x1e97 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479710: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x74bc dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479717: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0x2ede dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
conntrack 276742 [002] 547028.479836: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xfc9c dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7fce642e5c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
ed5c135c [unknown] ([unknown])
So it looks to me like conntrack is just getting very confused somewhere and deleting *everything*.
From the perf output, we can see that the entries are being deleted by the second invocation of conntrack which was pid 276742:
conntrack 276742 [002] 547028.478906: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xf6e1 dport=0x2816
which was executed as:
conntrack 276742 921136 0 /usr/sbin/conntrack -D -f ipv4 -s 10.100.0.13 -w 4210
The entries that are getting deleted do not match either the requested family (ipv4) or the requested src address (10.100.0.13).
My next step is to try and figure out how to make perf spit out the userspace part of the backtrace. So instead of:
ed5c135c [unknown] ([unknown])
we can get an idea of how we're getting to the delete on the client side.
I re-ran the perf record with --call-graph dwarf and that fixed the userspace issue. Here's the first entry, they all look similar so I won't paste everything again:
conntrack 285695 [004] 549219.462527: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xf6e1 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7ffa5ca42c8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
7ffa5cafab16 nfnl_send+0x26 (/usr/lib64/libnfnetlink.so.0.2.0)
7ffa5cafc9e6 nfnl_query+0x16 (/usr/lib64/libnfnetlink.so.0.2.0)
7ffa5cb19fa1 nfct_query+0x71 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
55c9e4950c33 delete_cb.part.0+0x33 (/usr/sbin/conntrack)
55c9e4950ceb delete_cb+0x1b (/usr/sbin/conntrack)
7ffa5cb1b4e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7ffa5cafbe33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7ffa5cafc5df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7ffa5cafc97d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
Also relevant, since I haven't included this yet: [root@compute-1 ~]# uname -r 5.14.0-70.22.1.el9_0.x86_64 [root@compute-1 ~]# rpm -q conntrack-tools libnetfilter_conntrack conntrack-tools-1.4.5-10.el9_0.x86_64 libnetfilter_conntrack-1.0.8-4.el9.x86_64 [root@compute-1 ~]# Updating the summary and moving this to RHEL/conntrack-tools for feedback. Please let me know anything I can do to help move this along! It's just a shot in the dark, but we had a memory corruption in Bug 2067272 and it is not fixed in RHEL9.0 yet. So in case someone is able to, I suggest testing with the current version of conntrack-tools from RHEL9.1: http://download.eng.bos.redhat.com/brewroot/vol/rhel-9/packages/conntrack-tools/1.4.5/15.el9/x86_64/conntrack-tools-1.4.5-15.el9.x86_64.rpm I just had a debugging session with Bharath, but it seems he installed the RPM in the wrong system - at least I don't see it ever being called with '-D' during test execution. I see the updated package was installed on the hypervisor, but neutron invokes the conntrack binary from inside of the neutron_ovs_agent container. I think that's where the confusion was. In any case, I updated to conntrack-tools-1.4.5-15.el9.x86_64.rpm inside of the container, and re-ran the test. I still see the same result and the same backtrace in perf, so the 9.1 build does not appear to fix this. If I get some time after lunch today I'll try to poke a bit more at the userspace side of things and see if I can figure out why it seems to think those entries should get deleted. EOD progress on tracing into the userspace bits:
I added some uprobes to get some insight into the filtering being done in delete_cb:
(Need to attach to the binaries *inside* of the container)
[root@compute-1 ~]# ls -l libnetfilter_conntrack.so.3 neutron-conntrack
lrwxrwxrwx. 1 root root 141 Sep 15 18:19 libnetfilter_conntrack.so.3 -> /var/lib/containers/storage/overlay/ee574134e021645256d21f2dc0d58ad7a233e3f44fcfc028a2af4016f3c85503/merged/lib64/libnetfilter_conntrack.so.3
lrwxrwxrwx. 1 root root 126 Sep 12 20:28 neutron-conntrack -> /var/lib/containers/storage/overlay/ee574134e021645256d21f2dc0d58ad7a233e3f44fcfc028a2af4016f3c85503/merged/usr/sbin/conntrack
perf probe -x neutron-conntrack 'nfct_filter ct1=obj ct2=ct'
perf probe -x neutron-conntrack 'nfct_filter%return $retval'
perf probe -x libnetfilter_conntrack.so.3 'nfct_cmp ct1=ct1 ct1_l3num=ct1->head.orig.l3protonum ct1_protonum=ct1->head.orig.protonum ct1_zone=ct1->head.orig.zone ct1_sport=ct1->head.orig.l4src.tcp.port ct1_dport=ct1->head.orig.l4dst.tcp.port ct2=ct2 ct2_l3num=ct2->head.orig.l3protonum ct2_protonum=ct2->head.orig.protonum ct2_zone=ct2->head.orig.zone ct2_sport=ct2->head.orig.l4src.tcp.port ct2_dport=ct2->head.orig.l4dst.tcp.port'
perf probe -x libnetfilter_conntrack.so.3 'nfct_cmp%return $retval'
And then recorded with:
perf record --call-graph dwarf -e probe_libnetfilter_conntrack:nfct_cmp -e probe_libnetfilter_conntrack:nfct_cmp__return -e probe_conntrack:nfct_filter -e probe_conntrack:nfct_filter__return -e probe:nf_ct_delete --filter '(l3num == 2 || l3num == 10) && protonum == 6 && dport == 0x2816' -aRg sleep infinity
Here's what I'm noticing. I'm going to focus on a particular ct with src port 0xf6e1. The two conntrack processes are:
[root@compute-1 ~]# /usr/share/bcc/tools/execsnoop -n conntrack
PCOMM PID PPID RET ARGS
conntrack 682218 921136 0 /usr/sbin/conntrack -D -f ipv4 -d 10.100.0.29 -w 4225
conntrack 682219 921136 0 /usr/sbin/conntrack -D -f ipv4 -s 10.100.0.29 -w 4225
The first one (682218) sees the connection and skips over it as expected (nfct_cmp returns 0, and then nfct_filter inverts it to return 1, which causes delete_cb to return early with NFCT_CB_CONTINUE):
conntrack 682218 [004] 648877.503401: probe_conntrack:nfct_filter: (55c20969ea90) ct1=0x55c20ade22a0 ct2=0x55c20ade3a10
55c20969ea90 nfct_filter+0x0 (/usr/sbin/conntrack)
55c2096a1ccf delete_cb+0xf (/usr/sbin/conntrack)
7fab204004e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7fab203e0e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7fab203e15df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7fab203e197d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
conntrack 682218 [004] 648877.503408: probe_libnetfilter_conntrack:nfct_cmp: (7fab203fe770) ct1=0x55c20ade22a0 ct1_l3num=0x2 ct1_protonum=0x0 ct1_zone=0x0 ct1_sport=0x0 ct1_dport=0x0 ct2=0x55c20ade3a10 ct2_l3num=0xa ct2_protonum=0x6 ct2_zone=0x0 ct2_sport=0xf6e1 ct2_dport=0x2816
7fab203fe770 nfct_cmp+0x0 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
55c20969eafe nfct_filter+0x6e (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
conntrack 682218 [004] 648877.503420: probe_libnetfilter_conntrack:nfct_cmp__return: (7fab203fe770 <- 55c20969eaff) arg1=0x0
55c20969eaff nfct_filter+0x6f (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
conntrack 682218 [004] 648877.503437: probe_conntrack:nfct_filter__return: (55c20969ea90 <- 55c2096a1cd0) arg1=0x1
55c2096a1cd0 delete_cb+0x10 (/usr/sbin/conntrack)
7fab204004e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7fab203e0e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7fab203e15df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7fab203e197d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
The second conntrack process (682219) does the same (I left some lines in from 682218, just to note that the processes are deleting in parallel. So possibility of a race somewhere, but I don't think so because of what follows at the end):
conntrack 682219 [001] 648877.508014: probe_conntrack:nfct_filter: (56141456ba90) ct1=0x5614163982a0 ct2=0x56141639aa20
56141456ba90 nfct_filter+0x0 (/usr/sbin/conntrack)
56141456eccf delete_cb+0xf (/usr/sbin/conntrack)
7f89c4ea74e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7f89c4e87e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e885df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e8897d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
conntrack 682218 [004] 648877.508019: probe_libnetfilter_conntrack:nfct_cmp__return: (7fab203fe770 <- 55c20969eaff) arg1=0x1
55c20969eaff nfct_filter+0x6f (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
conntrack 682219 [001] 648877.508020: probe_libnetfilter_conntrack:nfct_cmp: (7f89c4ea5770) ct1=0x5614163982a0 ct1_l3num=0x2 ct1_protonum=0x0 ct1_zone=0x0 ct1_sport=0x0 ct1_dport=0x0 ct2=0x56141639aa20 ct2_l3num=0xa ct2_protonum=0x6 ct2_zone=0x0 ct2_sport=0xf6e1 ct2_dport=0x2816
7f89c4ea5770 nfct_cmp+0x0 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
56141456bafe nfct_filter+0x6e (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
conntrack 682218 [004] 648877.508024: probe_conntrack:nfct_filter__return: (55c20969ea90 <- 55c2096a1cd0) arg1=0x0
55c2096a1cd0 delete_cb+0x10 (/usr/sbin/conntrack)
7fab204004e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7fab203e0e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7fab203e15df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7fab203e197d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
conntrack 682219 [001] 648877.508028: probe_libnetfilter_conntrack:nfct_cmp__return: (7f89c4ea5770 <- 56141456baff) arg1=0x0
56141456baff nfct_filter+0x6f (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
conntrack 682219 [001] 648877.508033: probe_conntrack:nfct_filter__return: (56141456ba90 <- 56141456ecd0) arg1=0x1
56141456ecd0 delete_cb+0x10 (/usr/sbin/conntrack)
7f89c4ea74e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7f89c4e87e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e885df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e8897d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
Thus far this looks perfectly reasonable. Here's where it goes off the rails:
conntrack 682219 [001] 648877.508140: probe_conntrack:nfct_filter: (56141456ba90) ct1=0x5614163982a0 ct2=0x56141639aa20
56141456ba90 nfct_filter+0x0 (/usr/sbin/conntrack)
56141456eccf delete_cb+0xf (/usr/sbin/conntrack)
7f89c4ea74e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7f89c4e87e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e885df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e8897d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
conntrack 682219 [001] 648877.508146: probe_libnetfilter_conntrack:nfct_cmp: (7f89c4ea5770) ct1=0x5614163982a0 ct1_l3num=0x2 ct1_protonum=0x0 ct1_zone=0x0 ct1_sport=0x0 ct1_dport=0x0 ct2=0x56141639aa20 ct2_l3num=0x2 ct2_protonum=0x2 ct2_zone=0x0 ct2_sport=0x0 ct2_dport=0x0
7f89c4ea5770 nfct_cmp+0x0 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
56141456bafe nfct_filter+0x6e (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
Note that ct2_protonum=0x2, which is IGMP. This would explain why only this particular test suite triggers this issue -- it's doing multicast testing. Continuing:
conntrack 682219 [001] 648877.508154: probe_libnetfilter_conntrack:nfct_cmp__return: (7f89c4ea5770 <- 56141456baff) arg1=0x1
56141456baff nfct_filter+0x6f (/usr/sbin/conntrack)
7fffffffdfff [unknown] ([unknown])
conntrack 682219 [001] 648877.508157: probe_conntrack:nfct_filter__return: (56141456ba90 <- 56141456ecd0) arg1=0x0
56141456ecd0 delete_cb+0x10 (/usr/sbin/conntrack)
7f89c4ea74e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7f89c4e87e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e885df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e8897d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
So in this case, this multicast ct *does* match (nfct_cmp returns 1, nfct_returns 0), which means delete_cb will process and delete this entry.
While deleting that ct, we see our connection from src port 0xf6e1 get deleted as unintentional collateral damage:
conntrack 682219 [001] 648877.508208: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xf6e1 dport=0x2816
ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz)
ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink])
ffffffffc06ed70e nfnetlink_rcv_msg+0x2ae ([nfnetlink])
ffffffffb5ad6240 netlink_rcv_skb+0x50 ([kernel.kallsyms])
ffffffffb5ad58c7 netlink_unicast+0x237 ([kernel.kallsyms])
ffffffffb5ad5c1b netlink_sendmsg+0x23b ([kernel.kallsyms])
ffffffffb5a2c9fe sock_sendmsg+0x5e ([kernel.kallsyms])
ffffffffb5a2e570 __sys_sendto+0xf0 ([kernel.kallsyms])
ffffffffb5a2e600 __x64_sys_sendto+0x20 ([kernel.kallsyms])
ffffffffb5c7bb2b do_syscall_64+0x3b ([kernel.kallsyms])
ffffffffb5e0007c entry_SYSCALL_64+0x7c ([kernel.kallsyms])
7f89c4dcec8a __libc_sendto+0x1a (/usr/lib64/libc.so.6)
7f89c4e86b16 nfnl_send+0x26 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e889e6 nfnl_query+0x16 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4ea5fa1 nfct_query+0x71 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
56141456ec23 delete_cb.part.0+0x33 (/usr/sbin/conntrack)
56141456ecdb delete_cb+0x1b (/usr/sbin/conntrack)
7f89c4ea74e3 __callback+0x133 (/usr/lib64/libnetfilter_conntrack.so.3.7.0)
7f89c4e87e33 nfnl_step+0x1a3 (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e885df nfnl_process+0x4f (/usr/lib64/libnfnetlink.so.0.2.0)
7f89c4e8897d nfnl_catch+0xbd (/usr/lib64/libnfnetlink.so.0.2.0)
And within that same invocation of delete_cb I can see it iterating through and deleting *all* of the AMQP connections (and presumably the non-AMQP connections as well) stemming from that one IGMP ct.
I'll attach the full perf output to the bug so we have the full context saved.
Created attachment 1912233 [details]
perf script output with IGMP flow
(In reply to John Eckersberg from comment #24) > EOD progress on tracing into the userspace bits: Thanks for the detailed analysis, John! [...] > While deleting that ct, we see our connection from src port 0xf6e1 get > deleted as unintentional collateral damage: > > conntrack 682219 [001] 648877.508208: probe:nf_ct_delete: (ffffffffc07471c0) l3num=0xa protonum=0x6 sport=0xf6e1 dport=0x2816 > ffffffffc07471c1 nf_ct_delete+0x1 (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz) > ffffffffc0747d1a nf_ct_iterate_cleanup+0x13a (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz) > ffffffffc0747dec nf_ct_iterate_cleanup_net+0x7c (/lib/modules/5.14.0-70.22.1.el9_0.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz) > ffffffffc0aab16b ctnetlink_del_conntrack+0x1db ([nf_conntrack_netlink]) The above shows that ctnetlink_del_conntrack() calls ctnetlink_flush_conntrack() (presumably inlined) which contains the call to nf_ct_iterate_cleanup_net(). So in the IPCTNL_MSG_CT_DELETE netlink request, neither CTA_TUPLE_ORIG nor CTA_TUPLE_REPLY are set. I guess conntrack has problems building a suitable message from the given IGMP entry. I think this behaviour is caused by a bug in libnetfiter_conntrack which aborted building the nested CTA_TUPLE_(ORIG|REPLY) building if an unsupported layer4 protocol was found. IIUC, this leads to submitting a blank delete request to the kernel. For verification, I backported libnetfilter_conntrack commit 45ec4b51e8290 ("conntrack: don't cancel nest on unknown layer 4 protocols") and created a scratch build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47792234 Could you please give it a try? Thanks, Phil I ran the test suite 3x with the scratch build installed, and can confirm that that patch fixes the problem. Thanks for your help tracking this down! (In reply to John Eckersberg from comment #27) > I ran the test suite 3x with the scratch build installed, and can confirm > that that patch fixes the problem. > > Thanks for your help tracking this down! Perfect, thanks for verifying! Hereby fixing the ticket's component and setting the target release to RHEL9.2 - the fix should land in RHEL9.1 automatically. Also promoting for RHEL9.0.z, the fix is trivial with high impact. *** Bug 2151586 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libnetfilter_conntrack bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:0948 |