Bug 1828752

Summary: [upgrade] Fail to upgrade from 4.3 to 4.4 with OVN network
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Dan Winship <danw>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED WONTFIX Docs Contact:
Severity: urgent    
Priority: high CC: aconstan, anusaxen, scuppett, xtian
Version: 4.4Keywords: Regression, Upgrades
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: All   
OS: All   
Whiteboard: SDN-CI-IMPACT,SDN-BP
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1824522 Environment:
Last Closed: 2020-05-15 12:42:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1824522    
Bug Blocks:    
Attachments:
Description Flags
ovn master logs from 4.3 to 4.4
none
ovn master logs none

Comment 1 zhaozhanqi 2020-04-28 10:14:56 UTC
Created attachment 1682411 [details]
ovn master logs from 4.3 to 4.4

Comment 3 Dan Winship 2020-05-04 14:18:04 UTC
need to backport https://github.com/ovn-org/ovn-kubernetes/pull/1309

Comment 6 zhaozhanqi 2020-05-15 09:17:38 UTC
still failed when upgrading from 4.3.0-0.nightly-2020-05-15-004013 to 4.4.0-0.nightly-2020-05-15-002555

oc get co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h8m
cloud-credential                           4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h22m
cluster-autoscaler                         4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h15m
console                                    4.4.0-0.nightly-2020-05-15-002555   True        False         False      53m
csi-snapshot-controller                    4.4.0-0.nightly-2020-05-15-002555   True        False         False      55m
dns                                        4.3.0-0.nightly-2020-05-15-004013   True        False         False      6h19m
etcd                                       4.4.0-0.nightly-2020-05-15-002555   True        False         False      65m
image-registry                             4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h14m
ingress                                    4.4.0-0.nightly-2020-05-15-002555   True        False         False      90m
insights                                   4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h15m
kube-apiserver                             4.4.0-0.nightly-2020-05-15-002555   True        False         False      64m
kube-controller-manager                    4.4.0-0.nightly-2020-05-15-002555   True        False         False      62m
kube-scheduler                             4.4.0-0.nightly-2020-05-15-002555   True        False         False      63m
kube-storage-version-migrator              4.4.0-0.nightly-2020-05-15-002555   True        False         False      56m
machine-api                                4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h19m
machine-config                             4.3.0-0.nightly-2020-05-15-004013   True        False         False      6h19m
marketplace                                4.4.0-0.nightly-2020-05-15-002555   True        False         False      54m
monitoring                                 4.4.0-0.nightly-2020-05-15-002555   False       True          True       49m
network                                    4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h20m
node-tuning                                4.4.0-0.nightly-2020-05-15-002555   True        False         False      55m
openshift-apiserver                        4.4.0-0.nightly-2020-05-15-002555   True        False         False      60m
openshift-controller-manager               4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h20m
openshift-samples                          4.4.0-0.nightly-2020-05-15-002555   False       True          True       51m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h16m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h16m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-05-15-002555   False       True          False      50m
service-ca                                 4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h20m
service-catalog-apiserver                  4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h16m
service-catalog-controller-manager         4.4.0-0.nightly-2020-05-15-002555   True        False         False      6h16m
storage                                    4.4.0-0.nightly-2020-05-15-002555   True        False         False      55m

****found there is one pod work well in  openshift-apiserver

$ oc get pod -n openshift-apiserver -o wide
NAME                         READY   STATUS             RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
apiserver-6678c68cd4-9d5rl   0/1     CrashLoopBackOff   15         64m   10.128.0.15   ip-10-0-175-168.us-east-2.compute.internal   <none>           <none>
apiserver-6678c68cd4-cmbgk   1/1     Running            0          64m   10.129.0.43   ip-10-0-131-24.us-east-2.compute.internal    <none>           <none>
apiserver-6678c68cd4-krqhd   0/1     CrashLoopBackOff   15         65m   10.130.0.14   ip-10-0-150-174.us-east-2.compute.internal   <none>           <none>

####still cannot access the kubernetes service
#oc logs apiserver-6678c68cd4-9d5rl -n openshift-apiserver
Copying system trust bundle
F0515 09:05:45.042819       1 cmd.go:72] unable to load configmap based request-header-client-ca-file: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: i/o timeout

Comment 7 zhaozhanqi 2020-05-15 09:18:53 UTC
Created attachment 1688845 [details]
ovn master logs

Comment 9 Dan Winship 2020-05-15 12:42:44 UTC
ovnkube-master logs show that the bug from the original report is fixed (ie, no more "Failed to add logical port to router, stderr: "ovn-nbctl: rtos-zzhao43ovnup2-ws6vv-worker-westus-vqkpc: port already exists with mac 0A:58:0A:83:00:01\n", error: OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist lrp-add ovn_cluster_router rtos-zzhao43ovnup2-ws6vv-worker-westus-vqkpc 0a:58:0a:83:00:01 10.131.0.1/23' failed: exit status 1")

This should fix upgrades from older 4.4.z releases to newer 4.4.z releases.

We don't actually support 4.3 to 4.4 ovn-kubernetes upgrades; fixing this would require backporting more fixes to 4.3, which we are not doing at this time.