Description of problem: In EO 4.5 if ES pods are in a crashloopbackoff state, the operator does not treat this as an 'unschedulable' node and will try to communicate with the cluster before making changes which will not be possible. So the cluster gets into a 'wedged' state. Version-Release number of selected component (if applicable): 4.5 How reproducible: Always Steps to Reproduce: 1. Force ES pods into a 'crashloopbackoff' state (using a 4.5+ EO update the CSV to use an elasticsearch 5 image) 2. Make a change for EO to perform (Update the CSV to use an elasticsearch6 image) 3. Observe EO is unable to make these changes to the ES deployments. Actual results: EO does not update the deployments. Expected results: EO will update the deployments so the pods can correctly start. Additional info: This should be fixed in EO 4.6+ already, the logic to consider a crashloopbackoff an unschedulable condition was added in 4.6 as part of a feature.
Verified with elasticsearch-operator.4.7.0-202011030448.p0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652