1981146 – When cluster proxy is removed and proxy server is switched off upgrade is failing

Bug 1981146 - When cluster proxy is removed and proxy server is switched off upgrade is failing

Summary: When cluster proxy is removed and proxy server is switched off upgrade is fai...

Keywords:
Status:	CLOSED DUPLICATE of bug 1981549
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Yu Qi Zhang
QA Contact:	Rio Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-11 17:47 UTC by Radomir Ludva
Modified:	2021-10-19 15:27 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-19 15:27:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Radomir Ludva 2021-07-11 17:47:41 UTC

Description of problem:
-----------------------
After installation UPI bare-metal OCP cluster v4.5.6 with proxy, it is not possible to process update to 4.6.36 when the proxy is removed from the cluster. Machine Config Operator (MCO) is not able to download images during the update (timeout) and the update stuck while one master and one worker node are in SchedulingDisabled state. When images are downloaded manually and nodes restarted and manually flagged: SchedulingEnabled the process continues; but the proxy must be started even in situation it was removed from the cluster  

So it means even the proxy is removed from the cluster it is still used.
Checked also when updating from 4.6.36 -> 4.7.20, here is the message clear while in the previous version it is not mentioned that images are downloading using a proxy. In the previous version there was only information about timeout:

```
[4.6.36 -> 4.7.30]
$ oc get nodes -o yaml master1.kami.nutius.com 
apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-master-f1e4b76e49568e1d794c285beb0ccf6e
    machineconfiguration.openshift.io/desiredConfig: rendered-master-343cd91c4f080e08dd1186555b1e54b1
    machineconfiguration.openshift.io/reason: |-
      failed to run command nice (6 tries): timed out waiting for the condition: running nice -- ionice -c 3 podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c failed: Error: error pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c": unable to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c: error pinging docker registry quay.io: Get "https://quay.io/v2/": proxyconnect tcp: dial tcp 192.168.0.64:3128: connect: no route to host
```

Proxy was switched off so should not be used at all:
```
[rludva@personal ~]$ oc get proxies.config.openshift.io cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2021-06-24T15:23:42Z"
  generation: 2
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:trustedCA:
          .: {}
          f:name: {}
      f:status: {}
    manager: cluster-bootstrap
    operation: Update
    time: "2021-06-24T15:23:42Z"
  name: cluster
  resourceVersion: "975554"
  selfLink: /apis/config.openshift.io/v1/proxies/cluster
  uid: 5985dec4-bd01-4012-8625-f8fdfd1b9edc
spec:
  trustedCA:
    name: ""
status: {}

```


Version of all relevant components:
-----------------------------------
* UPI, bare-metal 4.5.6 -> 4.6.36 -> 4.7.30


Does this issue impact your ability to continue to work with the product:
-------------------------------------------------------------------------
No, but it is not possible to process updates without contacting support or without any issue on a future production clusters.


Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------
- It is important to switch on the old proxy server
- Find what image is not processed with oc get node $NODE -o yaml
- Get the image with `ssh core@$NODE; sudo /run/bin/machine-config-daemon pivot $IMAGE`
- Restart the node manually
- Set the node as schedulable manually and then the update will continue
- https://access.redhat.com/solutions/5598401


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes


Actual results:
---------------
The update is not continuing without any issue when the proxy is removed from the cluster and switched off. 
Looks like for QE there is a missing test case for this scenario when the cluster is installed with proxy and then the proxy is removed.


Expected results:
-----------------
Update service must work as announced: without any issue.

Comment 1 W. Trevor King 2021-07-12 22:12:50 UTC

Moving to the MCO component, because it's the machine-config daemons having trouble.  This bug sounds a lot like bug 1981549, so it's possible one of them should be closed as a dup of the other.

Comment 2 Yu Qi Zhang 2021-10-19 15:27:48 UTC

Yes, this is most likely a duplicate. At the very least, removing a proxy today does not get honored. Closing in favour of 1981549 as the tracking bug.

*** This bug has been marked as a duplicate of bug 1981549 ***

Note You need to log in before you can comment on or make changes to this bug.