Bug 2005694
Summary: | Removing proxy object takes up to 10 minutes for the changes to propagate to the MCO | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Alex Kalenyuk <akalenyu> |
Component: | Machine Config Operator | Assignee: | John Kyros <jkyros> |
Machine Config Operator sub component: | Machine Config Operator | QA Contact: | Rio Liu <rioliu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, dollierp, eduen, jcaamano, jerzhang, jkyros, mkrejci, sregidor, wking |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:37:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alex Kalenyuk
2021-09-19 15:36:58 UTC
I suspect that this behaviour is not platform-specific. Maybe the machine-config-operator team can sort this out more easily? There's a few possibilities here: 1. The changes to proxy has not yet rendered to MCO yet when you reverted the change. What happens is when you modify the proxy object, the MCController needs to read the updated proxy, update the corresponding base machineconfig, then render from that updated config, and then finally select nodes from a pool to apply this to. It could still have the stale rendered config from the previous apply and not yet the revert 2. Some other change is being rendered in and the update is not due to your proxy change (unlikely but possible) 3. The proxy object actually has a minor diff between the original and the now-reverted (maybe the networking operator is parsing it differently) that is causing the diff 4. There is a bug somewhere in the pool logic that doesn't immediately update back to the old config Please attach a must-gather of a cluster right after you do this, or when it has settled. At least we would need to see the rendered machineconfigs that its updating to, to see what the diff in contents are. Attaching must gather after reproducing scenario + MCPs settled: https://drive.google.com/file/d/1PMus4KMKwnYq-_NKTXWqNmWnGzML049e/view?usp=sharing Regarding 1 - how long can this propagation take potentially? (paused is still true at this point) (In reply to Yu Qi Zhang from comment #2) > There's a few possibilities here: > > 4. There is a bug somewhere in the pool logic that doesn't immediately > update back to the old config > I suspect this bug causing this: https://bugzilla.redhat.com/show_bug.cgi?id=1981549. We found a variation of this BZ where proxy1 cannot be reconfigured to proxy2. The reconfiguration is not delayed, the reconfiguration never happens at all (16 hours waiting). When verifying this BZ we need to make sure that we can do this too: (with paused pools) 1. configure proxy1 2. check that MCDs have the right values for proxy1 2. edit proxy resource and reconfigure proxy1 -> proxy 3. check taht MCDs have the right values for proxy2 I made a typo in my previous comment, sorry These steps: (with paused pools) 1. configure proxy1 3. check that MCDs have the right values for proxy1 4. edit proxy resource and reconfigure proxy1 -> proxy2 5. check taht MCDs have the right values for proxy2 Verified using ipi on AWS version: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-15-095020 True False 76m Cluster version is 4.11.0-0.nightly-2022-06-15-095020 Verification steps: 1. Pause worker and master MachineConfigPools 2. Edit proxy resource to add a proxy oc edit proxy .... spec: httpProxy: http://user:pass@proxy-fake:1111 httpsProxy: http://user:pass@proxy-fake:1111 noProxy: test.no-proxy.com trustedCA: name: "" 3. Check that the proxy info is displayed in the Daemonset $ oc get ds machine-config-daemon -o yaml |grep -i proxy - name: HTTP_PROXY value: http://user:pass@proxy-fake:1111 - name: HTTPS_PROXY value: http://user:pass@proxy-fake:1111 - name: NO_PROXY value: .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.sregidor-bz1.qe.devcluster.openshift.com,localhost,test.no-proxy.com name: oauth-proxy name: proxy-tls - name: proxy-tls secretName: proxy-tls name: oauth-proxy 4. Check that the operator has been marked as degraded $ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE machine-config 4.11.0-0.nightly-2022-06-15-095020 True False True 79m Failed to resync 4.11.0-0.nightly-2022-06-15-095020 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused 5. Remove the proxy config from the proxy object oc edit proxy .... spec: trustedCA: name: "" 6. Check that the proxy is not configured in the Daemonsets anymore. (In less than 10 minutes) $ oc get ds machine-config-daemon -o yaml |grep -i proxy name: oauth-proxy name: proxy-tls - name: proxy-tls secretName: proxy-tls 7. Check that the operator is not marked as degraded any more $ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE machine-config 4.11.0-0.nightly-2022-06-15-095020 True False False 87m We move the status to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |