Bug 2150905
| Summary: | ovn-controller thinks SB is read-only when it's not | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Dan Williams <dcbw> |
| Component: | ovn22.09 | Assignee: | xsimonar |
| Status: | VERIFIED --- | QA Contact: | Ehsan Elahi <eelahi> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 22.L | CC: | ctrautma, eelahi, jiji, jishi, mmichels, xsimonar |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn22.09-22.09.0-91.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dan Williams
2022-12-05 15:38:27 UTC
Is ovn-controller maybe not claiming the port binding request from ovnkube-node at 05:54:08 because of the sb_readonly=1 at that time? I think that the issue is with releasing the port when sb is read only. In that case, ovn-controller might delete its internal binding for this port before sb becomes writable. When sb becomes writable, it's too late: the binding for this port is gone in ovn-controller, and ovn-controller never updates sb (it never removes chassis from port_binding in sb). Then, next time this port is claimed, ovn-controller sees the port as already claimed, does not log the claim, and does not create the interface in if-status (causing next WARN when releasing) I'll work on a fix. Seeing this in a few more CI tests in OCP 4.13: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1449/pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn/1603172544557355008/artifacts/e2e-gcp-ovn/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-node-572nb_ovn-controller.log for port e2e-statefulset-7013_ss2-0 2022-12-15T00:43:24.001Z|01585|binding|INFO|Claiming lport e2e-statefulset-7013_ss2-0 for this chassis. 2022-12-15T00:43:24.001Z|01586|binding|INFO|e2e-statefulset-7013_ss2-0: Claiming 0a:58:0a:83:01:68 10.131.1.104 2022-12-15T00:43:24.140Z|01587|binding|INFO|Setting lport e2e-statefulset-7013_ss2-0 ovn-installed in OVS 2022-12-15T00:43:24.140Z|01588|binding|INFO|Setting lport e2e-statefulset-7013_ss2-0 up in Southbound 2022-12-15T00:43:40.534Z|01659|binding|INFO|Mismatch iface-id-ver for lport e2e-statefulset-7013_ss2-0, expected 6e14a1af-8dd4-4656-ad2c-9b145386ed0d, found ddf315b4-06c9-459b-a84f-6fe75194c0c5 2022-12-15T00:43:40.534Z|01660|binding|INFO|Releasing lport e2e-statefulset-7013_ss2-0 from this chassis (sb_readonly=1) 2022-12-15T00:43:40.537Z|01661|binding|INFO|Removing lport e2e-statefulset-7013_ss2-0 ovn-installed in OVS 2022-12-15T00:45:41.819Z|01793|binding|INFO|Releasing lport e2e-statefulset-7013_ss2-0 from this chassis (sb_readonly=0) 2022-12-15T00:45:41.819Z|01794|if_status|WARN|Trying to release unknown interface e2e-statefulset-7013_ss2-0 2022-12-15T00:45:43.574Z|01795|binding|INFO|Claiming lport e2e-statefulset-7013_ss2-0 for this chassis. 2022-12-15T00:45:43.574Z|01796|binding|INFO|e2e-statefulset-7013_ss2-0: Claiming 0a:58:0a:83:01:77 10.131.1.119 2022-12-15T00:45:43.638Z|01797|binding|INFO|Setting lport e2e-statefulset-7013_ss2-0 ovn-installed in OVS 2022-12-15T00:45:43.638Z|01798|binding|INFO|Setting lport e2e-statefulset-7013_ss2-0 up in Southbound 2022-12-15T00:45:50.898Z|01800|binding|INFO|Mismatch iface-id-ver for lport e2e-statefulset-7013_ss2-0, expected 520daa0d-4ad2-4490-8d5e-fd167bc26a7d, found 6e14a1af-8dd4-4656-ad2c-9b145386ed0d 2022-12-15T00:45:51.633Z|01801|binding|INFO|Claiming lport e2e-statefulset-7013_ss2-0 for this chassis. 2022-12-15T00:45:51.633Z|01802|binding|INFO|e2e-statefulset-7013_ss2-0: Claiming 0a:58:0a:83:01:8f 10.131.1.143 2022-12-15T00:45:51.678Z|01803|binding|INFO|Setting lport e2e-statefulset-7013_ss2-0 ovn-installed in OVS 2022-12-15T00:45:51.678Z|01804|binding|INFO|Setting lport e2e-statefulset-7013_ss2-0 up in Southbound 2022-12-15T00:46:04.863Z|01806|binding|INFO|Releasing lport e2e-statefulset-7013_ss2-0 from this chassis (sb_readonly=0) Patch posted upstream for review: http://patchwork.ozlabs.org/project/ovn/patch/20230104083209.3880669-2-xsimonar@redhat.com/ ovn23.03 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2196412 ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2196413 ovn22.12 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2196414 ovn22.12 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2196415 Reproduced on:
ovn22.12-22.12.0-51.el8fdp.x86_64
ovn22.12-central-22.12.0-51.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-30.el8fdp.noarch
ovn22.12-host-22.12.0-51.el8fdp.x86_64
openvswitch2.17-2.17.0-95.el8fdp.x86_64
Here is the reproducer:
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv1
ovs-vsctl set open . external_ids:ovn-remote=tcp:127.0.0.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
# Add switch
ovn-nbctl ls-add ls1
ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16
ovn-nbctl --wait=hv sync
ovn-nbctl lsp-add ls1 lsp1
ovn-nbctl lsp-add ls1 lsp2
# add_ovs_interfaces
ovs-vsctl --no-wait -- add-port br-int vif1 -- set Interface vif1 external_ids:iface-id=lsp1 -- set Interface vif1 type=internal
ovs-vsctl --no-wait -- add-port br-int vif2 -- set Interface vif2 external_ids:iface-id=lsp2 -- set Interface vif2 type=internal
# wait_for_local_bindings
sleep 5
# check bindings, should be 2
ovn-appctl -t ovn-controller debug/dump-local-bindings | grep interface | wc -l |grep 2
# check how many ports are up, should be 2
ovn-nbctl --bare --columns up list Logical_switch_p | grep true |wc -l |grep 2
ovn-nbctl --wait=hv sync
# sleep_sb
kill -STOP $(ps -ef | grep ovnsb_db.pid |grep -v grep |awk '{ print $2 }')
sleep 5
echo "Interface removal"
# remove_ovs_interfaces
ovs-vsctl --no-wait -- del-port vif1
ovs-vsctl --no-wait -- del-port vif2
sleep 5
# ensure_controller_run
# We want to make sure controller could run at least one full loop.
# We can't use wait=hv as sb might be sleeping.
ovn-appctl -t ovn-controller debug/status |grep running
# sleep_controller
ovn-appctl debug/pause
sleep 5
ovn-appctl -t ovn-controller debug/status |grep paused
# wake_up_sb
kill -CONT $(ps -ef | grep ovnsb_db.pid |grep -v grep |awk '{ print $2 }')
# wake_up_controller
ovn-appctl debug/resume
sleep 5
# check_ovn_uninstalled
# check_port_down lsp1
ovn-sbctl get Port_Binding lsp1 up |grep false
# check_port_down lsp2
ovn-sbctl get Port_Binding lsp2 up |grep false
# <<<<<< in non fixed version, these ports are not down >>>>>>
# check_ports_unbound, should be 0
ovn-sbctl --bare --columns chassis list Port_Bind |grep '-' | wc -l |grep 0
# <<<<<< in non fixed version, these ports remained bound >>>>>>
# add_ovs_interfaces again
# add_ovs_interfaces
ovs-vsctl --no-wait -- add-port br-int vif1 -- set Interface vif1 external_ids:iface-id=lsp1 -- set Interface vif1 type=internal
ovs-vsctl --no-wait -- add-port br-int vif2 -- set Interface vif2 external_ids:iface-id=lsp2 -- set Interface vif2 type=internal
ovn-nbctl --wait=hv sync
echo "Interface removal done and added back"
# sleep_sb
kill -STOP $(ps -ef | grep ovnsb_db.pid |grep -v grep |awk '{ print $2 }')
sleep 2
# remove_ovs_interfaces
ovs-vsctl --no-wait -- del-port vif1
ovs-vsctl --no-wait -- del-port vif2
# ensure_controller_run
ovn-appctl -t ovn-controller debug/status |grep running
# sleep_controller
ovn-appctl debug/pause
sleep 5
# add_ovs_interfaces back again
# add_ovs_interfaces
ovs-vsctl --no-wait -- add-port br-int vif1 -- set Interface vif1 external_ids:iface-id=lsp1 -- set Interface vif1 type=internal
ovs-vsctl --no-wait -- add-port br-int vif2 -- set Interface vif2 external_ids:iface-id=lsp2 -- set Interface vif2 type=internal
# wake_up_sb
kill -CONT $(ps -ef | grep ovnsb_db.pid |grep -v grep |awk '{ print $2 }')
# wake_up_controller
ovn-appctl debug/resume
sleep 5
# check_ovn_installed vif1
# check_ovn_installed vif2
# check_ports_up
ovn-sbctl get Port_Binding lsp1 up |grep true
ovn-sbctl get Port_Binding lsp2 up |grep true
# check_ports_bound should be 2
ovn-sbctl --bare --columns chassis list Port_Bind |grep '-' | wc -l |grep 2
As per the reproducer given in Comment 6, verified on: ovn22.09-22.09.0-91.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch openvswitch2.17-2.17.0-106.el8fdp.x86_64 |