Bug 1968021 - network-check-source deployment does not complete during upgrade
Summary: network-check-source deployment does not complete during upgrade
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: ---
Assignee: Andrew Stoycos
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-04 17:59 UTC by jamo luhrsen
Modified: 2022-02-15 17:17 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-15 17:17:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jamo luhrsen 2021-06-04 17:59:30 UTC
Description of problem:

The 4.7->4.8 ovn upgrade job has a failure in "Cluster should remain functional during upgrade" which
complains that network-check-source does not finish deploying. full test log erorr:

  fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:153]: during upgrade to registry.build01.ci.openshift.org/ci-op-ry63bg7k/release@sha256:e469b0e52d9e030a83b6311447a37264ca49b330ba67488c58a0d66b085fac85
Unexpected error:
    <*errors.errorString | 0xc00095a8e0>: {
        s: "ClusterOperators did not settle: \nclusteroperator/image-registry is Progressing for 15m27.554551071s because \"Progressing: The deployment has not completed\"\n\tclusteroperator/network is Progressing for 4m21.554559551s because \"Deployment \\\"openshift-network-diagnostics/network-check-source\\\" is not available (awaiting 1 nodes)\"",
    }
    ClusterOperators did not settle: 
    clusteroperator/image-registry is Progressing for 15m27.554551071s because "Progressing: The deployment has not completed"
    	clusteroperator/network is Progressing for 4m21.554559551s because "Deployment \"openshift-network-diagnostics/network-check-source\" is not available (awaiting 1 nodes)"


This job has many other failures all pointing to networking, like routes not being up or
OVS port bindings timing out. Maybe they are all related. Here is the port binding bz:
  https://bugzilla.redhat.com/show_bug.cgi?id=1968009

the network check pod log has a 'no route to host' error when trying to connect to something
api/auth:

  F0604 02:43:42.814512       1 cmd.go:129] unable to load configmap based request-header-client-ca-file: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 172.30.0.1:443: connect: no route to host

  https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade/1400613589785513984/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/pods/openshift-network-diagnostics_network-check-source-dbdfd5479-vtmx4_check-endpoints.log


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Note You need to log in before you can comment on or make changes to this bug.