+++ This bug was initially created as a clone of Bug #1881709 +++ Description of problem: elasticsearch-{delete|rollover}-* pods (from cronjobs) hang on curl -s https://elasticsearch:9200/audit/_settings/index.creation_date Version-Release number of selected component (if applicable): registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:3ec62b62cfe3a47f9798e05ecce2bae104e4d1a9d4ca57fe16471ada0e32227a How reproducible: Unknown. Happening intermittently on OSD clusters. Steps to Reproduce: ? Actual results: Some pods are Running for a really long time. Like: `oc get pod -l component=indexManagement | grep Running` shows really long durations. Logs are empty. Expected results: These jobs should take well under an hour to complete. Additional info: Debugging an elasticsearch-delete-audit pod, the `delete` script is hanging on the following `curl`:curl -s https://elasticsearch:9200/audit/_settings/index.creation_date --cacert /etc/indexmanagement/keys/admin-ca '-HAuthorization: Bearer {redacted}' -HContent-Type:application/json --- Additional comment from efried on 2020-09-22 22:45:41 UTC --- Looked at an elasticsearch-rollover-app pod and it's hanging here: curl -s 'https://elasticsearch:9200/app-write/_rollover?pretty' -w '%{response_code}' --cacert /etc/indexmanagement/keys/admin-ca -HContent-Type:application/json -XPOST '-HAuthorization: Bearer {redacted}' -o /tmp/response.txt -d '{"conditions":{"max_age":"8h","max_docs":122880000,"max_size":"120gb"}}' --- Additional comment from kramraja on 2020-09-23 02:06:58 UTC --- Hi, I am one of the SREPs working with Eric during APAC hours. Here is what I found: It is worth noting that the cluster which has this issue does not have the recent changes (from - https://github.com/openshift/elasticsearch-operator/pull/477). Therefore I grabbed the latest delete script with the try/catch blocks and ran it to see if we get anything useful. But it basically hangs silently and indefinitely. Trying to run the delete script a few times I also found that it also hangs here (https://github.com/openshift/elasticsearch-operator/blob/8d1d59fcbbf8031f3d5dbbaa8a9eb17a0c1184f8/pkg/indexmanagement/scripts.go#L25) : curl -s 'https://elasticsearch:9200/audit-*/_alias/audit-write' --cacert {redacted} '-HAuthorization: Bearer {redacted} -HContent-Type:application/json Looks like all curls hang (indefinitely without a timeout) on the indexmanagement cronjobs.
Setting UpcomingSprint as unable to resolve before EOD
Verified with elasticsearch-operator.4.6.0-202101280459.p0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.6.16 extras security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0310