Bug 1868392
Summary: | [FDP 20.F] OVN 2.13 breaks pod-pod networking across the nodes on OCP | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Anurag saxena <anusaxen> | ||||
Component: | ovn2.13 | Assignee: | Numan Siddique <nusiddiq> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | FDP 20.E | CC: | ctrautma, dcbw, huirwang, jishi, kfida, nusiddiq, ralongi, rbrattai, zzhao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-10-27 09:49:12 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anurag saxena
2020-08-12 14:07:05 UTC
Bugzilla doesn;t have FDP 20.F listed in Version. Pasting rpm version from cluster $ oc rsh -n openshift-ovn-kubernetes ovnkube-master-4ktht Defaulting container name to northd. Use 'oc describe pod/ovnkube-master-4ktht -n openshift-ovn-kubernetes' to see all of the containers in this pod. sh-4.2# rpm -qa | grep -i openv openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch openvswitch2.13-devel-2.13.0-39.el7fdp.x86_64 openvswitch2.13-2.13.0-39.el7fdp.x86_64 sh-4.2# rpm -qa | grep -i ovn ovn2.13-20.06.1-6.el7fdp.x86_64 ovn2.13-host-20.06.1-6.el7fdp.x86_64 ovn2.13-central-20.06.1-6.el7fdp.x86_64 ovn2.13-vtep-20.06.1-6.el7fdp.x86_64 So this issue seems to be related to why db's are not getting upgraded and i hope the steps to deploy image are correct in comment 8 The ovn-ctl script here [1] takes care of uprading the cluster dbs. **** "$@" "$file" # Initialize the database if it's NOT joining a cluster. if test -z "$cluster_remote_addr"; then $(echo ovn-${db}ctl | tr _ -) --no-leader-only init fi if test $mode = cluster; then upgrade_cluster "$schema" "unix:$sock" fi **** But the problem is that the cluster network operator starts ovn dbs using the command - run_sb_ovsdb/run_nb_ovsdb. and ovsdb-server when started (via "$@" "$file") doesn't daemonize and runs in foreground. And that's why the "uprade_cluster" function is never invoked. For the raft setup, ovsdb-server should be running to uprade the DB from the new schema (which is not the case with stanadlone db). I think cluster network operator ovnkube-master.yaml here [2] should take care of upgrading the cluster. the uprade_cluster code is here [3] [1] - https://github.com/ovn-org/ovn/blob/master/utilities/ovn-ctl#L299 [2] - https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/ovnkube-master.yaml#L148 [3] - https://github.com/openvswitch/ovs/blob/master/utilities/ovs-lib.in#L461 I have submitted a PR to handle this in CNO - https://github.com/openshift/cluster-network-operator/pull/755 @Anurag - Is it possible to test this PR out. Thanks Numan Created attachment 1711386 [details]
log bundle PR 755
I think we can move this BZ to CNO as it is not an OVN issue. reproduced on ovn20.06.1-6 with following steps: 1. install ovn2.13.0-39 [root@wsfd-advnetlab16 bz1868392]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-11.noarch python3-openvswitch2.13-2.13.0-51.el7fdp.x86_64 ovn2.13-2.13.0-39.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-acl-1.0-19.noarch openvswitch2.13-2.13.0-51.el7fdp.x86_64 ovn2.13-central-2.13.0-39.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-host-2.13.0-39.el7fdp.x86_64 2. rm db file [root@wsfd-advnetlab16 bz1868392]# rm -f /etc/ovn/* 3. start run_sb_ovsdb ctl_cmd="/usr/share/ovn/scripts/ovn-ctl" ip_s=1.1.1.16 ip_c1=1.1.1.17 ip_c2=1.1.1.18 $ctl_cmd --db-sb-cluster-local-addr=$ip_s --db-sb-create-insecure-remote=yes --db-sb-cluster-local-port=6642 --db-sb-cluster-remote-proto=tcp --no-monitor run_sb_ovsdb 4. check chassis table in sb db file [root@wsfd-advnetlab16 scripts]# ovsdb-client dump tcp:1.1.1.16:6642 Chassis Chassis table _uuid encaps external_ids hostname name nb_cfg transport_zones vtep_logical_switches ----- ------ ------------ -------- ---- ------ --------------- --------------------- 5. stop the script and upgrade ovn to 20.06.1-6 [root@wsfd-advnetlab16 bz1868392]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-11.noarch python3-openvswitch2.13-2.13.0-51.el7fdp.x86_64 ovn2.13-central-20.06.1-6.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-acl-1.0-19.noarch openvswitch2.13-2.13.0-51.el7fdp.x86_64 ovn2.13-20.06.1-6.el7fdp.x86_64 ovn2.13-host-20.06.1-6.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch 6. start the script again 7. check chassis table: [root@wsfd-advnetlab16 scripts]# ovsdb-client dump tcp:1.1.1.16:6642 Chassis Chassis table _uuid encaps external_ids hostname name nb_cfg transport_zones vtep_logical_switches ----- ------ ------------ -------- ---- ------ --------------- --------------------- <==== the db is not updated. as other_config is added in 20.06.1-6, but it is not listed here Verified on ovn20.09.0-2: [root@wsfd-advnetlab16 bz1868392]# rpm -qa | grep -E "openvswitch|ovn" kernel-kernel-networking-openvswitch-ovn-common-1.0-11.noarch python3-openvswitch2.13-2.13.0-51.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-acl-1.0-19.noarch openvswitch2.13-2.13.0-51.el7fdp.x86_64 ovn2.13-20.09.0-2.el7fdp.x86_64 ovn2.13-host-20.09.0-2.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-central-20.09.0-2.el7fdp.x86_64 [root@wsfd-advnetlab16 scripts]# ovsdb-client dump tcp:1.1.1.16:6642 Chassis Chassis table _uuid encaps external_ids hostname name nb_cfg other_config transport_zones vtep_logical_switches ----- ------ ------------ -------- ---- ------ ------------ --------------- --------------------- <=== db is updated, other_config is listed here Verified on rhel8 version: with ovn2.13.0-39 installed: [root@wsfd-advnetlab18 ~]# ovsdb-client dump tcp:1.1.23.25:6642 Chassis Chassis table _uuid encaps external_ids hostname name nb_cfg transport_zones vtep_logical_switches ----- ------ ------------ -------- ---- ------ --------------- --------------------- upgrade to ovn20.09.0-2: [root@wsfd-advnetlab18 ~]# ovsdb-client dump tcp:1.1.23.25:6642 Chassis Chassis table _uuid encaps external_ids hostname name nb_cfg other_config transport_zones vtep_logical_switches ----- ------ ------------ -------- ---- ------ ------------ --------------- --------------------- <== db is upgraded Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4356 |