1885723 – Old kibana index causing crashloop

Bug 1885723 - Old kibana index causing crashloop

Summary: Old kibana index causing crashloop

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.5
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Periklis Tsirakidis
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:	osd-45-logging, logging-exploration
Duplicates (1):	1870371 (view as bug list)
Depends On:
Blocks:	1909614
TreeView+	depends on / blocked

Reported:	2020-10-06 19:44 UTC by tfahlman
Modified:	2024-06-13 23:11 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 11:21:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
reindex failed (30.39 KB, text/plain) 2020-10-11 14:41 UTC, Anping Li	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 603	None	closed	Bug 1885723: Allow kibana server to access OD tenantinfo	2021-02-14 11:13:54 UTC
Red Hat Knowledge Base (Solution)	5332221	None	None	None	2020-11-08 20:43:22 UTC
Red Hat Knowledge Base (Solution)	5652591	None	None	None	2020-12-18 08:41:00 UTC
Red Hat Product Errata	RHBA-2021:0652	None	None	None	2021-02-24 11:22:11 UTC

Description tfahlman 2020-10-06 19:44:53 UTC

Description of problem:

Using NODE_OPTIONS: '--max_old_space_size=368' Memory setting is in MB
{"type":"log","@timestamp":"2020-10-06T14:23:56Z","tags":["fatal","root"],"pid":121,"message":"Error: Index .kibana belongs to a version of Kibana that cannot be automatically migrated. Reset it or use the X-Pack upgrade assistant.\n    at assertIsSupportedIndex (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:246:15)\n    at Object.fetchInfo (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:52:12)"}
 FATAL  Error: Index .kibana belongs to a version of Kibana that cannot be automatically migrated. Reset it or use the X-Pack upgrade assistant.

Version-Release number of selected component (if applicable):

4.5.0

This happened after an upgrade for 4.4.x to 4.5.0. The pod restarted 25 times before this issue went away. 

Seems similar to this: https://bugzilla.redhat.com/show_bug.cgi?id=1835903
 
As I understand the status of that bz, this could be a regression.

Comment 1 Anping Li 2020-10-11 14:41:51 UTC

Created attachment 1720673 [details]
reindex failed

reproduced it when upgrade from elasticsearch-operator.4.4.0-202009161309.p0 to elasticsearch-operator.4.5.0-202009182238.p0.    Not always reproducible.

Comment 3 Jeff Cantrill 2020-10-12 14:24:20 UTC

*** Bug 1870371 has been marked as a duplicate of this bug. ***

Comment 4 Jeff Cantrill 2020-10-23 15:20:10 UTC

Setting UpcomingSprint as unable to resolve before EOD

Comment 5 Periklis Tsirakidis 2020-11-09 10:24:09 UTC

(In reply to tfahlman from comment #0)
> Description of problem:
> 
> Using NODE_OPTIONS: '--max_old_space_size=368' Memory setting is in MB
> {"type":"log","@timestamp":"2020-10-06T14:23:56Z","tags":["fatal","root"],
> "pid":121,"message":"Error: Index .kibana belongs to a version of Kibana
> that cannot be automatically migrated. Reset it or use the X-Pack upgrade
> assistant.\n    at assertIsSupportedIndex
> (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:
> 246:15)\n    at Object.fetchInfo
> (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:
> 52:12)"}
>  FATAL  Error: Index .kibana belongs to a version of Kibana that cannot be
> automatically migrated. Reset it or use the X-Pack upgrade assistant.
> 
> Version-Release number of selected component (if applicable):
> 
> 4.5.0
> 
> This happened after an upgrade for 4.4.x to 4.5.0. The pod restarted 25
> times before this issue went away. 
> 
> Seems similar to this: https://bugzilla.redhat.com/show_bug.cgi?id=1835903
>  
> As I understand the status of that bz, this could be a regression.

Kibana Index migration is done by the elasticsearch-operator in 4.5 because kibana6 requires some manual steps. Could you provide a cluster-logging must-gather for this cluster to ensure that it doesn't fail there?

The crashloop you see is nothing serious, it is a indicator only that the elasticsearch-operator did not complete with migrating the index.

Comment 6 Periklis Tsirakidis 2020-11-11 13:03:28 UTC

@sreber 

Please provide a must-gather for your customer case.

Comment 14 Periklis Tsirakidis 2020-11-26 17:01:37 UTC

@tmicheli

Looking through the various uploads, none of them is a proper cluster-logging must-gather taken with:

https://github.com/openshift/cluster-logging-operator/tree/master/must-gather

Can you please provide one latest snapshot using this must-gather please?

Comment 17 Periklis Tsirakidis 2020-11-27 15:51:35 UTC

@tmicheli

Based on a live session with @sreber I have a hypothesis which I would you both to validate on the customer side or on lab setup. 

First of all the key observations:
Case 1: Kibana crashloops once after upgrading from 4.4 to 4.5 on old user `.kibana*` indices, because they are still point to the old data model. Once they are deleted, everything works fine.

Case 2: Kibana crashloops again on a cluster where the internal migration from the old data model to the new model is in progress after users create their index patterns.

---

The hypothesis is that users on case 2 create index patterns that refer to the new data model (e.g. app*) and to the old data model (e.g. project*). 
While the migration happens, old indices (e.g. project*, operations*) get deleted. Thus the index patterns should be in a broken state.

Could you please take a look on the users' `.kibana*` indices to identify what index pattern they are creating?
Would it possible to get a dump of these indices to inspect ourselves?

Comment 37 Anping Li 2020-12-21 07:23:24 UTC

No regression was found in 4.7. so move to verified.  Futher more testing will be one in 4.5.

Comment 55 errata-xmlrpc 2021-02-24 11:21:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0652

Note You need to log in before you can comment on or make changes to this bug.

achakrat
aivaras.laimikis
anisal
aos-bugs
asonmez
bjarolim
checheng
cshereme
dkulkarn
ewolinet
hgomes
jcantril
jeder
mrobson
naoto30
openshift-bugs-escalate
periklis
sreber
ssadhale
stwalter
tmicheli
ykarajag