Description of problem: ----------------------- After installation UPI bare-metal OCP cluster v4.5.6 with proxy, it is not possible to process update to 4.6.36 when the proxy is removed from the cluster. Machine Config Operator (MCO) is not able to download images during the update (timeout) and the update stuck while one master and one worker node are in SchedulingDisabled state. When images are downloaded manually and nodes restarted and manually flagged: SchedulingEnabled the process continues; but the proxy must be started even in situation it was removed from the cluster So it means even the proxy is removed from the cluster it is still used. Checked also when updating from 4.6.36 -> 4.7.20, here is the message clear while in the previous version it is not mentioned that images are downloading using a proxy. In the previous version there was only information about timeout: ``` [4.6.36 -> 4.7.30] $ oc get nodes -o yaml master1.kami.nutius.com apiVersion: v1 kind: Node metadata: annotations: machineconfiguration.openshift.io/currentConfig: rendered-master-f1e4b76e49568e1d794c285beb0ccf6e machineconfiguration.openshift.io/desiredConfig: rendered-master-343cd91c4f080e08dd1186555b1e54b1 machineconfiguration.openshift.io/reason: |- failed to run command nice (6 tries): timed out waiting for the condition: running nice -- ionice -c 3 podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c failed: Error: error pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c": unable to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e85998d51fcb9695e2eb32e2ec22cd7490131f5e38bf15bf29b05265a00d321c: error pinging docker registry quay.io: Get "https://quay.io/v2/": proxyconnect tcp: dial tcp 192.168.0.64:3128: connect: no route to host ``` Proxy was switched off so should not be used at all: ``` [rludva@personal ~]$ oc get proxies.config.openshift.io cluster -o yaml apiVersion: config.openshift.io/v1 kind: Proxy metadata: creationTimestamp: "2021-06-24T15:23:42Z" generation: 2 managedFields: - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:trustedCA: .: {} f:name: {} f:status: {} manager: cluster-bootstrap operation: Update time: "2021-06-24T15:23:42Z" name: cluster resourceVersion: "975554" selfLink: /apis/config.openshift.io/v1/proxies/cluster uid: 5985dec4-bd01-4012-8625-f8fdfd1b9edc spec: trustedCA: name: "" status: {} ``` Version of all relevant components: ----------------------------------- * UPI, bare-metal 4.5.6 -> 4.6.36 -> 4.7.30 Does this issue impact your ability to continue to work with the product: ------------------------------------------------------------------------- No, but it is not possible to process updates without contacting support or without any issue on a future production clusters. Is there any workaround available to the best of your knowledge? ---------------------------------------------------------------- - It is important to switch on the old proxy server - Find what image is not processed with oc get node $NODE -o yaml - Get the image with `ssh core@$NODE; sudo /run/bin/machine-config-daemon pivot $IMAGE` - Restart the node manually - Set the node as schedulable manually and then the update will continue - https://access.redhat.com/solutions/5598401 Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes Actual results: --------------- The update is not continuing without any issue when the proxy is removed from the cluster and switched off. Looks like for QE there is a missing test case for this scenario when the cluster is installed with proxy and then the proxy is removed. Expected results: ----------------- Update service must work as announced: without any issue.
Moving to the MCO component, because it's the machine-config daemons having trouble. This bug sounds a lot like bug 1981549, so it's possible one of them should be closed as a dup of the other.
Yes, this is most likely a duplicate. At the very least, removing a proxy today does not get honored. Closing in favour of 1981549 as the tracking bug. *** This bug has been marked as a duplicate of bug 1981549 ***