For single node clusters or any other edge nodes that lose the connection to the image registry or in an environment where the connection is bandwidth limited, the Performance Addon Operator cannot restart properly.
This issue is resolved by ensuring the image is not pulled again from registry.redhat.io if the image is available on the node already.
The fix ensures the Performance Addon Operator restarts correctly using the image from the local image cache.
Comment 8Shereen Haj Makhoul
2022-02-16 17:18:57 UTC
Verification:
pao version: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48 corresponding to v4.9.6-2
Steps:
-Install pao using PPC & inspect into the pod's replicaset:
...
image: registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48
imagePullPolicy: IfNotPresent [1]
name: performance-operator
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m26s default-scheduler Successfully assigned openshift-performance-addon-operator/performance-operator-5878b64cf7-g5c6x to ocp47-master-0.demo.lab.shajmakh
Normal AddedInterface 7m23s multus Add eth0 [10.133.0.59/23]
Normal Pulling 7m23s kubelet Pulling image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48"
Normal Pulled 6m38s kubelet Successfully pulled image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" in 45.028753771s
Normal Created 6m38s kubelet Created container performance-operator
Normal Started 6m38s kubelet Started container performance-operator
-Reboot the node on which the pod currently running (ocp47-master-0.demo.lab.shajmakh)
-The reboot triggered a pod restart on a different node. when the initial node is up, restart the operator pod on it, making the other nodes unavailable (NotReady,SchedulingDisabled):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14s default-scheduler Successfully assigned openshift-performance-addon-operator/performance-operator-5878b64cf7-7zvrq to ocp47-master-0.demo.lab.shajmakh
Normal AddedInterface 13s multus Add eth0 [10.133.0.60/23]
Normal Pulled 12s kubelet Container image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" already present on machine [3]
Normal Created 12s kubelet Created container performance-operator
Normal Started 12s kubelet Started container performance-operator
[1] the pull policy is now IfNotPresent.
[2] the node is rebooting
[3] after the machine is up again & the operator pod is scheduled on the same node, since the image already exists on that node, it uses it & does no additional pulling.
[4] the first time the image was pulled in the node because it didn't exist.
Verified successfully.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.9 low-latency extras update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:0572
Verification: pao version: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48 corresponding to v4.9.6-2 Steps: -Install pao using PPC & inspect into the pod's replicaset: ... image: registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48 imagePullPolicy: IfNotPresent [1] name: performance-operator ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7m26s default-scheduler Successfully assigned openshift-performance-addon-operator/performance-operator-5878b64cf7-g5c6x to ocp47-master-0.demo.lab.shajmakh Normal AddedInterface 7m23s multus Add eth0 [10.133.0.59/23] Normal Pulling 7m23s kubelet Pulling image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" Normal Pulled 6m38s kubelet Successfully pulled image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" in 45.028753771s Normal Created 6m38s kubelet Created container performance-operator Normal Started 6m38s kubelet Started container performance-operator -Reboot the node on which the pod currently running (ocp47-master-0.demo.lab.shajmakh) -The reboot triggered a pod restart on a different node. when the initial node is up, restart the operator pod on it, making the other nodes unavailable (NotReady,SchedulingDisabled): Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 14s default-scheduler Successfully assigned openshift-performance-addon-operator/performance-operator-5878b64cf7-7zvrq to ocp47-master-0.demo.lab.shajmakh Normal AddedInterface 13s multus Add eth0 [10.133.0.60/23] Normal Pulled 12s kubelet Container image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" already present on machine [3] Normal Created 12s kubelet Created container performance-operator Normal Started 12s kubelet Started container performance-operator [1] the pull policy is now IfNotPresent. [2] the node is rebooting [3] after the machine is up again & the operator pod is scheduled on the same node, since the image already exists on that node, it uses it & does no additional pulling. [4] the first time the image was pulled in the node because it didn't exist. Verified successfully.