Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1888222

Summary: 4.5.14 -> 4.6.rc3 upgrade in OVN cluster failed with degraded ingress - one ingress pod getting TCP errors.
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: trozet
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-14 13:12:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Fiedler 2020-10-14 12:48:39 UTC
Description of problem:

Upgrading 4.5.14->4.6.rc3 on an OVN cluster failed with degraded ingress.  One ingress pod never came ready and its logs were full of TCP timeouts.   Will add must-gather in private comment.   Starting this with OVN, please transfer to ingress if appropriate

E1014 12:42:52.544897       1 reflector.go:127] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://172.30.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
E1014 12:42:56.843650       1 webhook.go:111] Failed to make webhook authenticator request: Post "https://172.30.0.1:443/apis/authentication.k8s.io/v1/tokenreviews": context deadline exceeded
I1014 12:43:17.233497       1 trace.go:205] Trace[460628971]: "Reflector ListAndWatch" name:github.com/openshift/router/pkg/router/template/service_lookup.go:33 (14-Oct-2020 12:42:47.230) (total time: 30002ms):
Trace[460628971]: [30.002608505s] [30.002608505s] END
E1014 12:43:17.233600       1 reflector.go:127] github.com/openshift/router/pkg/router/template/service_lookup.go:33: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://172.30.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout


Version-Release number of selected component (if applicable): 4.6.0.rc3


How reproducible: Unknown, some upgrades succeeding


Steps to Reproduce:
1. 4.5.14 cluster on OSP using OVN plugin
2. Upgrade to 4.6.0.rc3
3.

Actual results:

Upgrade fails (stuck > 12 hours) with ingress degraded and ingress logs as described above.



Additional info:

Will add must-gather

Comment 3 Tim Rozet 2020-10-14 13:12:04 UTC
I see duplicate patch ports in this setup:

        Port patch-br-local_rp-45-ospovn-nc55s-worker-ww25n-to-br-int
            Interface patch-br-local_rp-45-ospovn-nc55s-worker-ww25n-to-br-int
                type: patch
                options: {peer=patch-br-int-to-br-local_rp-45-ospovn-nc55s-worker-ww25n}
        Port patch--to-br-int
            Interface patch--to-br-int
                type: patch
                options: {peer=patch-br-int-to-}


Which means its the same as 
https://bugzilla.redhat.com/show_bug.cgi?id=1880591#c55

*** This bug has been marked as a duplicate of bug 1880591 ***