Bug 1888222 - 4.5.14 -> 4.6.rc3 upgrade in OVN cluster failed with degraded ingress - one ingress pod getting TCP errors.
Summary: 4.5.14 -> 4.6.rc3 upgrade in OVN cluster failed with degraded ingress - one i...
Keywords:
Status: CLOSED DUPLICATE of bug 1880591
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Ben Bennett
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-14 12:48 UTC by Mike Fiedler
Modified: 2020-10-14 13:12 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-14 13:12:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mike Fiedler 2020-10-14 12:48:39 UTC
Description of problem:

Upgrading 4.5.14->4.6.rc3 on an OVN cluster failed with degraded ingress.  One ingress pod never came ready and its logs were full of TCP timeouts.   Will add must-gather in private comment.   Starting this with OVN, please transfer to ingress if appropriate

E1014 12:42:52.544897       1 reflector.go:127] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://172.30.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
E1014 12:42:56.843650       1 webhook.go:111] Failed to make webhook authenticator request: Post "https://172.30.0.1:443/apis/authentication.k8s.io/v1/tokenreviews": context deadline exceeded
I1014 12:43:17.233497       1 trace.go:205] Trace[460628971]: "Reflector ListAndWatch" name:github.com/openshift/router/pkg/router/template/service_lookup.go:33 (14-Oct-2020 12:42:47.230) (total time: 30002ms):
Trace[460628971]: [30.002608505s] [30.002608505s] END
E1014 12:43:17.233600       1 reflector.go:127] github.com/openshift/router/pkg/router/template/service_lookup.go:33: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://172.30.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout


Version-Release number of selected component (if applicable): 4.6.0.rc3


How reproducible: Unknown, some upgrades succeeding


Steps to Reproduce:
1. 4.5.14 cluster on OSP using OVN plugin
2. Upgrade to 4.6.0.rc3
3.

Actual results:

Upgrade fails (stuck > 12 hours) with ingress degraded and ingress logs as described above.



Additional info:

Will add must-gather

Comment 3 Tim Rozet 2020-10-14 13:12:04 UTC
I see duplicate patch ports in this setup:

        Port patch-br-local_rp-45-ospovn-nc55s-worker-ww25n-to-br-int
            Interface patch-br-local_rp-45-ospovn-nc55s-worker-ww25n-to-br-int
                type: patch
                options: {peer=patch-br-int-to-br-local_rp-45-ospovn-nc55s-worker-ww25n}
        Port patch--to-br-int
            Interface patch--to-br-int
                type: patch
                options: {peer=patch-br-int-to-}


Which means its the same as 
https://bugzilla.redhat.com/show_bug.cgi?id=1880591#c55

*** This bug has been marked as a duplicate of bug 1880591 ***


Note You need to log in before you can comment on or make changes to this bug.