Description of problem: The rollover job for the infra logs fails with a null pointer exception and the executing pod goes into an error state. The issue occurres when upgrading from 4.5.11 to 4.5.13 Version-Release number of selected component (if applicable): 4.5.13 How reproducible: Upgrade from OpenShift Container Platform (OCP) 4.5.11 to 4.5.13. Actual results: Logoutput from the elasticsearch-rollover-infra-xxx pod ~~~ "error" : { "root_cause" : [ { "type" : "null_pointer_exception", "reason" : null } ], "type" : "null_pointer_exception", "reason" : null }, "status" : 500 ~~~ Expected results: * No null pointer exception Additional info:
Can you confirm if this is repeatable? Does it go away on a subsequent run of the deletion job?
(In reply to Jeff Cantrill from comment #4) > Can you confirm if this is repeatable? Does it go away on a subsequent run > of the deletion job? I still see errors in the OCP summary, see screenshot. I checked the logs, they are either not existing/empty or show exactly the same content (NPE).
Created attachment 1722784 [details] Screenshot with failed rollover/delete jobs.
Created attachment 1722785 [details] rollover-audit log with the NPE
Hi, Andreas could you paste the CRL CR's yaml file? I'd like try reproducing it in a 4.5.13 cluster. Thanks.
Closing since linked customer cases are closed and we mitigate this with retrying in the cronjobs and with logic that verifies the rollover indices aren't in a bad state when this can happen. To fix the actual NPE that this stems from, we would need to bump Elasticsearch up to 6.8.6 (which would also require its plugins and Kibana and its plugins to be bumped to the same version which is unclear when we will do this)