Bug 2055019

Summary: imagePullPolicy is "Always" for performance-addon-rhel8-operator image
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Performance Addon OperatorAssignee: Martin Sivák <msivak>
Status: CLOSED ERRATA QA Contact: Niranjan Mallapadi Raghavender <mniranja>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.9CC: aos-bugs, dgonyier, grajaiya, kquinn, shajmakh, sobarzan, yquinn
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
For single node clusters or any other edge nodes that lose the connection to the image registry or in an environment where the connection is bandwidth limited, the Performance Addon Operator cannot restart properly. This issue is resolved by ensuring the image is not pulled again from registry.redhat.io if the image is available on the node already. The fix ensures the Performance Addon Operator restarts correctly using the image from the local image cache.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-21 10:20:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2021202    
Bug Blocks: 2052622    

Comment 8 Shereen Haj Makhoul 2022-02-16 17:18:57 UTC
Verification:
 
pao version: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48 corresponding to v4.9.6-2
 
Steps:
 
-Install pao using PPC & inspect into the pod's replicaset:
 
...
        image: registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48
        imagePullPolicy: IfNotPresent  [1]
        name: performance-operator
 ...

Events:
  Type    Reason          Age    From               Message
  ----    ------          ----   ----               -------
  Normal  Scheduled       7m26s  default-scheduler  Successfully assigned openshift-performance-addon-operator/performance-operator-5878b64cf7-g5c6x to ocp47-master-0.demo.lab.shajmakh
  Normal  AddedInterface  7m23s  multus             Add eth0 [10.133.0.59/23]
  Normal  Pulling         7m23s  kubelet            Pulling image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48"
  Normal  Pulled          6m38s  kubelet            Successfully pulled image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" in 45.028753771s
  Normal  Created         6m38s  kubelet            Created container performance-operator
  Normal  Started         6m38s  kubelet            Started container performance-operator


-Reboot the node on which the pod currently running (ocp47-master-0.demo.lab.shajmakh)

-The reboot triggered a pod restart on a different node. when the initial node is up, restart the operator pod on it, making the other nodes unavailable (NotReady,SchedulingDisabled):
Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       14s   default-scheduler  Successfully assigned openshift-performance-addon-operator/performance-operator-5878b64cf7-7zvrq to ocp47-master-0.demo.lab.shajmakh
  Normal  AddedInterface  13s   multus             Add eth0 [10.133.0.60/23]
  Normal  Pulled          12s   kubelet            Container image "registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:998417de4cd7ab251cd3fc5f92223d3c30bc1688884470c9b211a1c6c3038c48" already present on machine [3]
  Normal  Created         12s   kubelet            Created container performance-operator
  Normal  Started         12s   kubelet            Started container performance-operator

[1] the pull policy is now IfNotPresent. 
[2] the node is rebooting 
[3] after the machine is up again & the operator pod is scheduled on the same node, since the image already exists on that node, it uses it & does no additional pulling.
[4] the first time the image was pulled in the node because it didn't exist.

Verified successfully.

Comment 10 errata-xmlrpc 2022-02-21 10:20:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9 low-latency extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0572

Comment 11 Shai Oren 2022-08-18 12:19:33 UTC
Link to the PR that covers it - https://github.com/openshift-kni/performance-addon-operators/pull/936