Bug 2160403
| Summary: | OVN is not respecting the ct zone request from ovnk | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Surya Seetharaman <surya> | ||||
| Component: | ovn22.12 | Assignee: | Mark Michelson <mmichels> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | Jianlin Shi <jishi> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | FDP 22.L | CC: | ctrautma, dceara, jiji, mmichels | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovn22.12-22.12.0-16.el8fdp | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2025-02-10 04:01:38 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Surya Seetharaman
2023-01-12 09:54:01 UTC
Created attachment 1937555 [details]
NBDB
Created attachment 1937556 [details]
OVSDB from wrong node
Attaching gcore: root@ovn-control-plane:/# gcore -o /tmp/ovn-controller.live.core 1866 [New LWP 1867] [New LWP 1868] [New LWP 1870] warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib64/libc.so.6. warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. 0x00007fbd4e48390f in poll () from target:/lib64/libc.so.6 warning: target file /proc/1866/cmdline contained unexpected null characters warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000. Saved corefile /tmp/ovn-controller.live.core.1866 [Inferior 1 (process 1866) detached] sh-5.2# ovs-appctl -t ovn-controller ct-zone-list 8bb103d2-854c-420a-87cd-7525751740fd_dnat 1 8bb103d2-854c-420a-87cd-7525751740fd_snat 2 01e8a93a-ff57-4d6a-8390-2140c5fe9b50_dnat 3 3acc2a68-4767-4b74-adec-5648d4889d3f_dnat 8 f12103c8-3115-4c2e-8e09-d6677c7296bf_snat 4 01e8a93a-ff57-4d6a-8390-2140c5fe9b50_snat 6 3acc2a68-4767-4b74-adec-5648d4889d3f_snat 9 7c11dfeb-393a-4020-8861-00718efd1a9a_dnat 10 7c11dfeb-393a-4020-8861-00718efd1a9a_snat 11 kube-system_coredns-6d4b75cb6d-lptr5 14 f12103c8-3115-4c2e-8e09-d6677c7296bf_dnat 5 k8s-ovn-control-plane 7 local-path-storage_local-path-provisioner-9cd9bd544-k2d22 13 kube-system_coredns-6d4b75cb6d-tm4ng 12 sh-5.2# ovn-appctl -t ovn-controller recompute sh-5.2# ovs-appctl -t ovn-controller ct-zone-list 8bb103d2-854c-420a-87cd-7525751740fd_dnat 1 8bb103d2-854c-420a-87cd-7525751740fd_snat 2 01e8a93a-ff57-4d6a-8390-2140c5fe9b50_dnat 3 3acc2a68-4767-4b74-adec-5648d4889d3f_dnat 8 f12103c8-3115-4c2e-8e09-d6677c7296bf_snat 4 01e8a93a-ff57-4d6a-8390-2140c5fe9b50_snat 6 3acc2a68-4767-4b74-adec-5648d4889d3f_snat 0 7c11dfeb-393a-4020-8861-00718efd1a9a_dnat 10 7c11dfeb-393a-4020-8861-00718efd1a9a_snat 11 kube-system_coredns-6d4b75cb6d-lptr5 14 f12103c8-3115-4c2e-8e09-d6677c7296bf_dnat 5 k8s-ovn-control-plane 7 local-path-storage_local-path-provisioner-9cd9bd544-k2d22 13 kube-system_coredns-6d4b75cb6d-tm4ng 12 This was spotted with ovn-22.12.0-0.fc36.x86_64 (in kind). Debug symbols from this version are needed to decode the live core. I have figured out the issue and reproduced it locally. As strange as it may seem, the bug is in the recompute code of CT zones and not the incremental code. The issue only manifests if the logical router and all of its settings are created in a single transaction, and then no further changes are made to the logical router. The issue is that the requested SNAT zone only takes effect if there is already an assigned CT zone for the logical router. When everything is created in one transaction, then the logical router has no existing CT zone assigned to it, so the requested zone is ignored. Since no further changes are made to the logical router, the incremental code cannot detect the mismatch between the assigned and requested zone and correct the issue. I have a fix in the works that corrects the issue in a sandbox. What I need to do is to write a formal test that proves it before I submit a patch to the mailing list. I have posted a patch to fix this: https://patchwork.ozlabs.org/project/ovn/patch/20230118133113.1253910-1-mmichels@redhat.com/ ovn22.12 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162817 ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162818 ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162819 ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162820 ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162821 ovn22.03 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162822 ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162823 reproducer:
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
ovn-nbctl --wait=hv sync
ovn-sbctl list datapath_binding
ovn-nbctl lr-add lr0 -- set Logical_Router lr0 options:snat-ct-zone=666 -- lrp-add lr0 lrp-gw 01:00:00:00:00:01 172.16.0.1 -- lrp-set-gateway-chassis lrp-gw hv1
ovn-nbctl --wait=hv sync
lr_uuid=$(ovn-sbctl find datapath_bind external_ids:name=lr0 | awk '/_uuid/{print $3}')
ct_zones=$(ovn-appctl -t ovn-controller ct-zone-list)
zone_num=$(printf "$ct_zones" | grep ${lr_uuid}_snat | cut -d ' ' -f 2)
test "$zone_num" -eq 666
echo $?
reproduced on ovn22.12-22.12.0-4.el8:
[root@wsfd-advnetlab16 bz2160403]# rpm -qa | grep -E "ovn22.12|openvswitch2.17"
python3-openvswitch2.17-2.17.0-60.el8fdp.x86_64
ovn22.12-host-22.12.0-4.el8fdp.x86_64
ovn22.12-22.12.0-4.el8fdp.x86_64
openvswitch2.17-2.17.0-60.el8fdp.x86_64
ovn22.12-central-22.12.0-4.el8fdp.x86_64
+ ovn-nbctl --wait=hv sync
++ ovn-sbctl find datapath_bind external_ids:name=lr0
++ awk '/_uuid/{print $3}'
+ lr_uuid=66e6a0f9-826d-49dd-a404-61ed463f7114
++ ovn-appctl -t ovn-controller ct-zone-list
+ ct_zones='66e6a0f9-826d-49dd-a404-61ed463f7114_snat 1
66e6a0f9-826d-49dd-a404-61ed463f7114_dnat 2'
++ printf '66e6a0f9-826d-49dd-a404-61ed463f7114_snat 1
66e6a0f9-826d-49dd-a404-61ed463f7114_dnat 2'
++ cut -d ' ' -f 2
++ grep 66e6a0f9-826d-49dd-a404-61ed463f7114_snat
+ zone_num=1
+ test 1 -eq 666
<=== snat ct zone is not the snat-ct-zone
+ echo 1
1
Verified on ovn22.12-22.12.0-20.el8:
[root@wsfd-advnetlab16 bz2160403]# rpm -qa | grep -E "ovn22.12|openvswitch2.17"
python3-openvswitch2.17-2.17.0-60.el8fdp.x86_64
ovn22.12-central-22.12.0-20.el8fdp.x86_64
openvswitch2.17-2.17.0-60.el8fdp.x86_64
ovn22.12-host-22.12.0-20.el8fdp.x86_64
ovn22.12-22.12.0-20.el8fdp.x86_64
+ ovn-nbctl --wait=hv sync
++ ovn-sbctl find datapath_bind external_ids:name=lr0
++ awk '/_uuid/{print $3}'
+ lr_uuid=db0defa1-ff88-4252-be9a-afba5261fe70
++ ovn-appctl -t ovn-controller ct-zone-list
+ ct_zones='db0defa1-ff88-4252-be9a-afba5261fe70_dnat 1
db0defa1-ff88-4252-be9a-afba5261fe70_snat 666'
++ printf 'db0defa1-ff88-4252-be9a-afba5261fe70_dnat 1
db0defa1-ff88-4252-be9a-afba5261fe70_snat 666'
++ grep db0defa1-ff88-4252-be9a-afba5261fe70_snat
++ cut -d ' ' -f 2
+ zone_num=666
+ test 666 -eq 666
<=== snat ct zone is the snat-ct-zone
+ echo 0
0
This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |