Bug 2107302

Summary: Images for odf-operator and lvm-operator can't be pulled due to redhat-operator probe issue
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Shay Rozen <srozen>
Component: odf-operatorAssignee: Nitin Goyal <nigoyal>
Status: CLOSED NOTABUG QA Contact: Martin Bukatovic <mbukatov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.11CC: jrivera, muagarwa, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-14 18:03:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shay Rozen 2022-07-14 17:20:50 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Livnes and readiness probes are failing on redhat-operators pod hence can't pull any image from the manifest:

openshift-marketplace                  10m         Normal    Killing                             pod/redhat-operators-j8qw5                                            Stopping container registry-server
openshift-marketplace                  9m55s       Normal    Scheduled                           pod/redhat-operators-hnmps                                            Successfully assigned openshift-marketplace/redhat-operators-hnmps to compute-2 by control-plane-2
openshift-marketplace                  9m4s        Normal    Pulling                             pod/redhat-operators-hnmps                                            Pulling image "quay.io/rhceph-dev/ocs-registry:4.11.0-115"
openshift-marketplace                  9m54s       Normal    AddedInterface                      pod/redhat-operators-hnmps                                            Add eth0 [10.129.2.214/23] from openshift-sdn
openshift-marketplace                  9m2s        Normal    Created                             pod/redhat-operators-hnmps                                            Created container registry-server
openshift-marketplace                  9m2s        Normal    Started                             pod/redhat-operators-hnmps                                            Started container registry-server
openshift-marketplace                  9m39s       Normal    Pulled                              pod/redhat-operators-hnmps                                            Successfully pulled image "quay.io/rhceph-dev/ocs-registry:4.11.0-115" in 14.4209307s
openshift-marketplace                  8m44s       Warning   Unhealthy                           pod/redhat-operators-hnmps                                            Liveness probe failed: timeout: failed to connect service ":50051" within 1s
openshift-marketplace                  8m44s       Warning   Unhealthy                           pod/redhat-operators-hnmps                                            Readiness probe failed: timeout: failed to connect service ":50051" within 1s
openshift-marketplace                  9m4s        Warning   Unhealthy                           pod/redhat-operators-hnmps                                            Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of fba769d81c3091270ed847a5ad65ef538ded93240689bade0ca821ffb3315238 is running failed: open /proc/3634056/stat: no such file or directory: container process not found
openshift-marketplace                  9m4s        Normal    Killing                             pod/redhat-operators-hnmps                                            Container registry-server failed liveness probe, will be restarted

Maybe this is the reason:
openshift-marketplace                  11m         Warning   Unhealthy                           pod/redhat-operators-hnmps                                            Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of fba769d81c3091270ed847a5ad65ef538ded93240689bade0ca821ffb3315238 is running failed: open /proc/3634056/stat: no such file or directory: container process not found





Version of all relevant components (if applicable):
odf-4.11.0-115


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Can't install odf +lvm

Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Yes, on 4.11.0-113 both clusters were pulling image.

Steps to Reproduce:
1. Install catalogsource 4.11.0-115
2. Try to deploy LVM or ODF-OPERATOR



Actual results:
Images are not pulled with 
openshift-marketplace                  6m25s       Warning   Failed                              pod/5953a0ca15d876bcdec70414f9fa1504c4175b964da1afbc7baaae21f3n4qgn   Failed to pull image "quay.io/rhceph-dev/odf4-odf-operator-bundle@sha256:4b7a966329d13af8b1ed5813972c9167c36febc4c05caff1597d9679481f3618": rpc error: code = Unknown desc = reading manifest sha256:4b7a966329d13af8b1ed5813972c9167c36febc4c05caff1597d9679481f3618 in quay.io/rhceph-dev/odf4-odf-operator-bundle: manifest unknown: manifest unknown
openshift-marketplace                  6m41s       Warning   Failed                              pod/8c2d59d1001957885e96e61744a4544a24d7c7debf28491e586b83af74x9k2b   Failed to pull image "quay.io/rhceph-dev/odf4-mcg-operator-bundle@sha256:56c13849ffeb234f12ea650c3c20505a935808a03cbb8c0050e981ddf26d4554": rpc error: code = Unknown desc = reading manifest sha256:56c13849ffeb234f12ea650c3c20505a935808a03cbb8c0050e981ddf26d4554 in quay.io/rhceph-dev/odf4-mcg-operator-bundle: manifest unknown: manifest unknown


Expected results:
Image should be pulled succesfully.

Additional info:

Comment 5 Shay Rozen 2022-07-14 18:03:09 UTC
It is a problem with some compose bundle that was shared between two builds which re-run the mirroring pipeline. Next build should be ok