Bug 1916904 - machine-config-operator degraded when using proxy due to invalid etcd node records "etcd-0.,etcd-1.etcd-2."
Summary: machine-config-operator degraded when using proxy due to invalid etcd node re...
Keywords:
Status: CLOSED DUPLICATE of bug 1901034
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: egarcia
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-15 19:27 UTC by Robert Heinzmann
Modified: 2021-01-27 14:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-27 14:26:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Robert Heinzmann 2021-01-15 19:27:08 UTC
When using 4.7.0-fc.2 with a proxy, the deployment fails.

Version:

Client Version: 4.7.0-fc.2
Server Version: 4.7.0-fc.2

[stack@osp16amd ocp-test1]$ ./openshift-install version
./openshift-install 4.7.0-fc.2
built from commit 69f0bbc18e8c6b1a6e278c54efa2def9b210033a
release image 192.168.100.98:443/ocp4/openshift4@sha256:2f00e3016ca5678e51e9d79d4d3ac5a2926e0c09a8e75df19ea983b9cd6c5d05

Platform:

Openstack 16.1.3
IPI

What happened?

When using 4.7.0-fc.2 with a proxy, the deployment fails with:

~~~
ERROR Cluster operator authentication Degraded is True with ProxyConfigController_SyncError: ProxyConfigControllerDegraded: failed to reach endpoint("https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz") missing in NO_PROXY(".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.150.0/24,x.x.x.x,api-int.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-0.,etcd-1.,etcd-2.,localhost") with error: Get "https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz": EOF
~~~

After the installation the machine config operator is degrated

~~~
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE
...
machine-config                                          False       True          True       57m
...
MachineConfigPool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      57m
worker   rendered-worker-1fd716d4ff4d38bbe58163e9a5b111eb   True      False      False      2              2                   2                     0                      57m
~~~

The machine config daemon reports:

~~~
E0115 17:43:09.195796    7276 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-221a60144d12d5b62cdca9ead787bc3f" not found
I0115 17:44:09.217486    7276 daemon.go:781] In bootstrap mode
~~~

[stack@osp16amd ocp-test1]$ oc get node -o json | jq -r '.items[] | .metadata.name, .metadata.annotations'
ocp-qmgnh-master-0
{
  "machine.openshift.io/machine": "openshift-machine-api/ocp-qmgnh-master-0",
  "machineconfiguration.openshift.io/currentConfig": "rendered-master-221a60144d12d5b62cdca9ead787bc3f",
  "machineconfiguration.openshift.io/desiredConfig": "rendered-master-221a60144d12d5b62cdca9ead787bc3f",
  "machineconfiguration.openshift.io/reason": "machineconfig.machineconfiguration.openshift.io \"rendered-master-221a60144d12d5b62cdca9ead787bc3f\" not found",
  "machineconfiguration.openshift.io/state": "Degraded",
  "volumes.kubernetes.io/controller-managed-attach-detach": "true"
}

Reason may be that the installer genertes no_proxy records differ from the once generated by the network cluster operator.

Installer:
~~~
[stack@osp16amd ocp-test1]$ cat bootstrap.ign | jq -r '.storage.files[] | select(.path=="/opt/openshift/manifests/cluster-proxy-01-config.yaml") | .contents.source | split(",")[1]' | base64 -d
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: null
  name: cluster
spec:
  httpProxy: http://192.168.100.73:3128
  httpsProxy: http://192.168.100.73:3128
  noProxy: x.x.x.x,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io
  trustedCA:
    name: user-ca-bundle
status:
  httpProxy: http://192.168.100.73:3128
  httpsProxy: http://192.168.100.73:3128
  noProxy: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.150.0/24,x.x.x.x,api-int.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-0.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-1.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-2.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,localhost
~~~

Network Operator:
~~~
[stack@osp16amd ocp-test1]$ oc get proxy cluster -o json | jq -r '.spec,.status'
{
  "httpProxy": "http://192.168.100.73:3128",
  "httpsProxy": "http://192.168.100.73:3128",
  "noProxy": "x.x.x.x,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io",
  "trustedCA": {
    "name": "user-ca-bundle"
  }
}
{
  "httpProxy": "http://192.168.100.73:3128",
  "httpsProxy": "http://192.168.100.73:3128",
  "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.150.0/24,x.x.x.x,api-int.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-0.,etcd-1.,etcd-2.,localhost"
}
~~~

Reason seems to be this commit: 

https://github.com/openshift/installer/commit/24e2573b119d10698a71fcf55b9ef439bedb109e#diff-306ba1253c4b1cd58fb44dbca55765f1aa8a2bea356a3a095070c327a9a2880a

Which removes this https://github.com/openshift/installer/blob/release-4.6/pkg/asset/manifests/infrastructure.go#L79

What did you expect to happen?

Deployment works

How to reproduce it (as minimally and precisely as possible)?

Configure Proxy and no_proxy in install-config.yaml
Deploy with ipi

Comment 2 egarcia 2021-01-25 16:18:42 UTC
It looks like there was a known issue with proxy in 4.7: https://bugzilla.redhat.com/show_bug.cgi?id=1901034

I believe that it was just recently resolved.

Comment 3 Eric Rich 2021-01-27 13:43:50 UTC
@egarcia should this be CLOSED DUPLICATE?

Comment 4 egarcia 2021-01-27 14:26:29 UTC
Yes. CLosing.

*** This bug has been marked as a duplicate of bug 1901034 ***


Note You need to log in before you can comment on or make changes to this bug.