Bug 2089971

Summary: Machine-config daemon does not recover from broken Proxy configuration
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Machine Config OperatorAssignee: Yu Qi Zhang <jerzhang>
Machine Config Operator sub component: Machine Config Operator QA Contact: Sergio <sregidor>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: alitke, aos-bugs, dollierp, jerzhang, mbargenq, mkrejci, obulatov, palonsor, rludva, sregidor, wking
Version: 4.7   
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-27 05:44:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2071689    
Bug Blocks:    

Comment 4 Sergio 2022-10-20 17:10:31 UTC
Verified using ipi aws deployment with version: 

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2022-10-18-104608   True        False         96m     Cluster version is 4.8.0-0.nightly-2022-10-18-104608

1. Pause the MCPs

2. oc edit proxy cluster
...
  spec:
    httpProxy: http://user:pass@proxy-fake:1111
    httpsProxy: http://user:pass@proxy-fake:1111
    noProxy: test.no-proxy.com
    trustedCA:
      name: ""

2. Verfify that the proxy has been added to MCD pods environment variables (in openshift-machine-config-operator namespace)

$ oc get pods -o yaml machine-config-daemon-7zv7m  |grep env -A 11
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: HTTP_PROXY
      value: http://user:pass@proxy-fake:1111
    - name: HTTPS_PROXY
      value: http://user:pass@proxy-fake:1111
    - name: NO_PROXY
      value: .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.sregidor-onqa1.qe.devcluster.openshift.com,localhost,test.no-proxy.com


3. Remove the proxy from the cluster

oc edit proxy cluster
...
spec:
  trustedCA:
    name: ""

4. Verify that the proxy has been removed from MCD pods environmnet variables (in openshift-machine-config-operator namespace)

$ oc get pods -o yaml machine-config-daemon-8q67k | grep env -A 11
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5149e09733e608652625c4e79838dc043f631abb9cd0fca42cb2b675cfd276be
    imagePullPolicy: IfNotPresent
    name: machine-config-daemon

It took a while (a few minutes, 10 minutes or so) to delete the proxy env vars from the machine-config-daemon, but they were finally deleted.



We move the BZ to VERIFIED status.

Comment 6 errata-xmlrpc 2022-10-27 05:44:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.52 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7034