Bug 1901034
Summary: | NO_PROXY is not matched between bootstrap and global cluster setting which lead to desired master machineconfig is not found | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Johnny Liu <jialiu> |
Component: | Installer | Assignee: | Matthew Staebler <mstaeble> |
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED DEFERRED | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | adam.kaplan, akashem, aprabhu, bleanhar, ecordell, esimard, gpei, jluhrsen, kgarriso, lmcfadde, lsm5, mstaeble, mtarsel, rheinzma, sgreene, tsze, wduan, wking, yanyang |
Version: | 4.7 | Keywords: | Regression, Reopened, TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
[sig-sippy] install should work
|
|
Last Closed: | 2021-01-21 07:11:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Johnny Liu
2020-11-24 10:47:55 UTC
$ oc get infrastructures.config.openshift.io cluster -o yaml <--snip--> spec: cloudConfig: key: config name: cloud-provider-config platformSpec: type: Azure status: apiServerInternalURI: https://api-int.miyadav24azur.qe.azure.devcluster.openshift.com:6443 apiServerURL: https://api.miyadav24azur.qe.azure.devcluster.openshift.com:6443 etcdDiscoveryDomain: "" infrastructureName: miyadav24azur-t8cqb platform: Azure platformStatus: azure: cloudName: AzurePublicCloud networkResourceGroupName: miyadav24azur-rg resourceGroupName: miyadav24azur-t8cqb-rg type: Azure The etcdDiscoveryDomain is set to empty, I think that lead to etcd fqdn missing domain. This is causing some jobs to fail very frequently now and bringing down some of the release indicator percentages we track. Here is an example job: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy I'm assuming it's the same problem because I see this in a failed job [0]: cluster-scoped-resources/machineconfiguration.openshift.io/machineconfigs/00-master.yaml: Environment=NO_PROXY=.cluster.local,.ec2.internal,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.ci-op-q4qfig05-2659c.origin-ci-int-aws.dev.rhcloud.com,etcd-0.,etcd-1.,etcd-2.,localhost which seems to be missing some values like we see in the last job [1] that did not fail (it ran a week ago): cluster-scoped-resources/machineconfiguration.openshift.io/machineconfigs/00-master.yaml: Environment=NO_PROXY=.cluster.local,.svc,.us-west-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.ci-op-llgw500b-2659c.origin-ci-int-aws.dev.rhcloud.com,etcd-0.ci-op-llgw500b-2659c.origin-ci-int-aws.dev.rhcloud.com,etcd-1.ci-op-llgw500b-2659c.origin-ci-int-aws.dev.rhcloud.com,etcd-2.ci-op-llgw500b-2659c.origin-ci-int-aws.dev.rhcloud.com,localhost [0] https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy/1331391826971594752 [1] https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy/1328922989076418560 Can we up the priority on this I've found 2 bugs related to this: https://bugzilla.redhat.com/show_bug.cgi?id=1899979 https://bugzilla.redhat.com/show_bug.cgi?id=1904231 Also the entire aws-proxy job is red: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy?buildId= I'm going to dupe the bzs into this so we can have this bug as the tracking bug. *** Bug 1899979 has been marked as a duplicate of this bug. *** *** Bug 1904231 has been marked as a duplicate of this bug. *** *** Bug 1901577 has been marked as a duplicate of this bug. *** (In reply to Kirsten Garrison from comment #4) > Can we up the priority on this I've found 2 bugs related to this: The installer team will be looking at this early next sprint. This regression was introduced in https://github.com/openshift/installer/pull/4067 with removal of the code that sets the status.etcdDiscoveryDomain in infrastructure.config.openshift.io. The cluster-network-operator is relying on that field to fill out the status.noProxy field in proxy.config.openshift.io [1]. [1] https://github.com/openshift/cluster-network-operator/blob/c23495cf6e6ffeffc0290c85ee4608102f7b47d1/pkg/util/proxyconfig/no_proxy.go#L113 *** Bug 1906620 has been marked as a duplicate of this bug. *** *** Bug 1906321 has been marked as a duplicate of this bug. *** Is there any update or proposed fix for this one yet? Since it is blocking and I don't see any update since mid dec, checking on status here. (In reply to lmcfadde from comment #12) > Is there any update or proposed fix for this one yet? Since it is blocking > and I don't see any update since mid dec, checking on status here. This bug will be fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1909502. *** This bug has been marked as a duplicate of bug 1909502 *** This issue got fixed on nightly payload 4.7.0-0.nightly-2021-01-21-012810, close it now. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1909502#c21 for detailed verification steps. *** Bug 1916904 has been marked as a duplicate of this bug. *** |