Bug 1916910

Summary: Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get"
Product: OpenShift Container Platform Reporter: Jeff Cantrill <jcantril>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Giriyamma <gkarager>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: akhaire, alchan, andcosta, anisal, anli, aos-bugs, apaladug, ChetRHosey, cruhm, dageoffr, dahernan, dkulkarn, dseals, jcantril, juherrer, kiyyappa, ksathe, luaparicio, lvlcek, mrdest, mrobson, naoto30, naygupta, nnosenzo, ocasalsa, periklis, prdeshpa, qitang, rkant, ronald.rademaker, sauchter, shishika, sreber, ssadhale, ssonigra, tmicheli, vhernand, vjaypurk, xingli, ykarajag
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
collapses the multiple policy cronjobs to a single job with multiple tasks it runs: delete rollover The reasoning is there is a potential race condition between the previous jobs which both rely upon a -write alias that may lead to false information. Additionally, ES does not have transactions or is ACID. By converting these into tasks we execute for management we: potentially free disk for ES to do additional work give a better chance for the rollover to be successful
Story Points: ---
Clone Of: 1890838
: 1928772 (view as bug list) Environment:
Last Closed: 2021-02-08 13:41:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1890838    
Bug Blocks: 1919075, 1928772    

Comment 2 Giriyamma 2021-01-28 07:51:40 UTC
Verified this issue using clusterlogging.4.6.0-202101271348.p0, elasticsearch-operator.4.6.0-202101271348.p0.

Comment 6 errata-xmlrpc 2021-02-08 13:41:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.16 extras security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0310

Comment 8 Ronald 2021-02-16 12:25:28 UTC
Guys I've installed the 4.6.16 and still seeing the following err:


{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Current write index for audit-write: audit-000160
Checking results from _rollover call
Next write index for audit-write: audit-000160
Checking if audit-000160 exists
Checking if audit-000160 is the write index for audit-write
Done!



Thanks,
Ronald

Comment 9 David Hernández Fernández 2021-02-16 12:39:59 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1928772 for 4.6.16+

Comment 11 David Hernández Fernández 2021-03-27 09:07:14 UTC
Vedanti, here are the new bugs as the issue is still not fixed. It's being verified in https://github.com/openshift/elasticsearch-operator/pull/678

Bug 1929688 (VERIFIED)   
Sometimes The Elasticsearch-Delete-Xxx Job Failed At "Unexpected Exception Indices:admin/Aliases/Get" - OCP 4.6.16
Bug 1928772 (VERIFIED)   
Sometimes The Elasticsearch-Delete-Xxx Job Failed: After OCP 4.6.16 Patch.