Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1913952

Summary: Kibana Yellow and Inaccessible - plugin:opendistro_security@6.8.1 Setting up index template.
Product: OpenShift Container Platform Reporter: Matthew Robson <mrobson>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: high    
Version: 4.5CC: aivaraslaimikis, anli, aos-bugs, bjarolim, cruhm, dkulkarn, ewolinet, jnordell, joboyer, mifiedle, mmohan, naoto30, openshift-bugs-escalate, rsandu, sreber, tmicheli, ykarajag
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: All   
OS: Linux   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-09 13:25:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1873493    
Bug Blocks:    

Description Matthew Robson 2021-01-07 20:41:47 UTC
Description of problem:

OCP 4.5.20 Cluster which has been running 4.5 for quite some time now. It was upgraded to 4.5.x before 4.5.20. The 4.5.20 upgrade was done in early December.

This is the same issue as defined in: https://bugzilla.redhat.com/show_bug.cgi?id=1885723

Attempted to apply the fix as noted in: https://access.redhat.com/solutions/5332221

Kibana is still in a Yellow state.

Initially, all users saw: plugin:opendistro_security.1 Tenant indices migration failed when they logged in.

After applying the workaround, the issue changed to: plugin:opendistro_security.1 Setting up index template.

This error is also a documented symptom of the original BZ.

ES is green and looks fine.

There are no errors or interesting logs in the Kibana or es pods.

Version-Release number of selected component (if applicable):
4.5.20

How reproducible:
Happened once, can not recover.

Steps to Reproduce:
1. Unknown
2.
3.

Actual results:
Kibana is inaccessible for all users of the cluster.

Expected results:


Additional info:

Comment 14 Anping Li 2021-01-27 01:56:00 UTC
In some situation, the kibana index is in migrations status, that result in 'Tenant indices migration failed'. For example: The ES pods are restarted while you are creating indics pattern. The web browser is closed while you are creating indics pattern.

You may see the error logs as below.
{"type":"log","@timestamp":"2021-01-26T14:41:12Z","tags":["warning","migrations"],"pid":117,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana."}

The workaround is to delete the blocked kibana index.

Comment 18 Simon Reber 2021-01-28 12:01:58 UTC
(In reply to Anping Li from comment #14)
> In some situation, the kibana index is in migrations status, that result in
> 'Tenant indices migration failed'. For example: The ES pods are restarted
> while you are creating indics pattern. The web browser is closed while you
> are creating indics pattern.
> 
> You may see the error logs as below.
> {"type":"log","@timestamp":"2021-01-26T14:41:12Z","tags":["warning",
> "migrations"],"pid":117,"message":"Another Kibana instance appears to be
> migrating the index. Waiting for that migration to complete. If no other
> Kibana instance is attempting migrations, you can get past this message by
> deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana."}
> 
> The workaround is to delete the blocked kibana index.
What is the procedure to find the blocked indices? And would it be possible to unblock them rather removing them?

Comment 19 ewolinet 2021-01-28 14:48:57 UTC
(In reply to Simon Reber from comment #18)
> (In reply to Anping Li from comment #14)
> > In some situation, the kibana index is in migrations status, that result in
> > 'Tenant indices migration failed'. For example: The ES pods are restarted
> > while you are creating indics pattern. The web browser is closed while you
> > are creating indics pattern.
> > 
> > You may see the error logs as below.
> > {"type":"log","@timestamp":"2021-01-26T14:41:12Z","tags":["warning",
> > "migrations"],"pid":117,"message":"Another Kibana instance appears to be
> > migrating the index. Waiting for that migration to complete. If no other
> > Kibana instance is attempting migrations, you can get past this message by
> > deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana."}
> > 
> > The workaround is to delete the blocked kibana index.
> What is the procedure to find the blocked indices?

In later (4.6+) versions of the Kibana image, it is in the logs when you view them with `oc logs -c kibana <your kibana pod>`
For 4.5 however, you will need to take steps as outlined in https://bugzilla.redhat.com/show_bug.cgi?id=1913952#c11:

1) Get the name of one of the Kibana pods and `oc rsh` into it

oc rsh -c kibana <kibana_pod>

2) Edit the config file to be able to spin up a second, temporary kibana instance in the container

<vi> config/kibana.yml

3) Search and update the following lines:
#server.port: 5061            => server.port: 5062
#server.host: "localhost"     => server.host: "localhost"
pid.file: ${HOME}/kibana.pid  => pid.file: ${HOME}/kibana2.pid 
logging.quiet: true           => #logging.quiet: true

4) Save the configuration file and then start up kibana

bin/kibana


It should start up another instance of Kibana and will display the status including any errors for setting up the index template.


> And would it be possible to unblock them rather removing them?
Unfortunately, no. The error comes from within the document migration code for Kibana. It will fail to create the next index for migration (based on what the alias currently points to). When that fails, it exits out with the error we're all familiar with seeing now.

The index that is being deleted is not being used yet and any data that would have been migrated to it will be migrated again the next time Kibana is restarted.

Comment 23 Anping Li 2021-02-03 14:06:18 UTC
Before fix, The kibana became Yellow, plugin:opendistro_security.1, Tenant indices migration failed.

#You can see message as following.
$ oc logs -c kibana kibana-84689c6479-8hbfn
#The following values dynamically added from environment variable overrides:
Using NODE_OPTIONS: '--max_old_space_size=1024' Memory setting is in MB
{"type":"log","@timestamp":"2021-02-03T13:07:44Z","tags":["status","plugin:elasticsearch.1","error"],"pid":121,"state":"red","message":"Status changed from yellow to red - Service Unavailable","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","@timestamp":"2021-02-03T13:08:22Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nGET https://elasticsearch.openshift-logging.svc.cluster.local:9200/.kibana?include_type_name=true => socket hang up"}
{"type":"log","@timestamp":"2021-02-03T13:08:22Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nGET https://elasticsearch.openshift-logging.svc.cluster.local:9200/.kibana_-377444158_kubeadmin?include_type_name=true => socket hang up"}
{"type":"log","@timestamp":"2021-02-03T13:08:25Z","tags":["listening","info"],"pid":121,"message":"Server running at http://localhost:5601"}
{"type":"log","@timestamp":"2021-02-03T13:08:26Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nPUT https://elasticsearch.openshift-logging.svc.cluster.local:9200/.kibana_-377444158_kubeadmin_2?include_type_name=true => socket hang up"}
{"type":"log","@timestamp":"2021-02-03T13:08:26Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nPUT https://elasticsearch.openshift-logging.svc.cluster.local:9200/_template/kibana_index_template%3A.kibana_* => socket hang up"}
{"type":"error","@timestamp":"2021-02-03T13:08:26Z","tags":["error","migration"],"pid":121,"level":"error","error":{"message":"No Living connections","name":"Error","stack":"Error: No Living connections\n    at sendReqWithConnection (/opt/app-root/src/node_modules/elasticsearch/src/lib/transport.js:226:15)\n    at next (/opt/app-root/src/node_modules/elasticsearch/src/lib/connection_pool.js:214:7)\n    at process._tickCallback (internal/process/next_tick.js:61:11)"},"message":"No Living connections"}
{"type":"log","@timestamp":"2021-02-03T13:09:18Z","tags":["status","plugin:elasticsearch.1","error"],"pid":121,"state":"red","message":"Status changed from green to red - Service Unavailable","prevState":"green","prevMsg":"Ready"}

After delete the index, the kibana still Yellow, plugin:opendistro_security.1, Tenant indices migration failed.

Comment 31 Anping Li 2021-02-05 13:46:05 UTC
@Eric, Could we provide Doc Text to guide the OCP adm to fix this issue

Comment 32 Anping Li 2021-02-05 14:40:06 UTC
@Eric, Is this message you want to show? What is the action to take?

"type":"error","@timestamp":"2021-02-05T14:23:06Z","tags":["error","migration"],"pid":122,"level":"error","error":{"message":"Re-index failed :: 1 of 3 shards failed. Check Elasticsearch cluster health for more information.","name":"Error","stack":"Error: Re-index failed :: 1 of 3 shards failed. Check Elasticsearch cluster health for more information.\n    at assertResponseIncludeAllShards (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:263:15)\n    at Object.migrationsUpToDate (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:156:9)"},"message":"Re-index failed :: 1 of 3 shards failed. Check Elasticsearch cluster health for more information."}
{"type":"log","@timestamp":"2021-02-05T14:23:06Z","tags":["status","plugin:opendistro_security.1","info"],"pid":122,"state":"yellow","message":"Status changed from yellow to yellow - Tenant indices migration failed","prevState":"yellow","prevMsg":"Setting up index template."}

Comment 34 Mike Fiedler 2021-02-05 16:12:57 UTC
@ewolinet  Let me know if this is OK for now and if we should create a follow up bz or if we want to address in this bz.

Comment 35 ewolinet 2021-02-05 17:24:05 UTC
(In reply to Anping Li from comment #32)
> @Eric, Is this message you want to show? What is the action to take?
> 
> "type":"error","@timestamp":"2021-02-05T14:23:06Z","tags":["error",
> "migration"],"pid":122,"level":"error","error":{"message":"Re-index failed
> :: 1 of 3 shards failed. Check Elasticsearch cluster health for more
> information.","name":"Error","stack":"Error: Re-index failed :: 1 of 3
> shards failed. Check Elasticsearch cluster health for more information.\n   
> at assertResponseIncludeAllShards
> (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:
> 263:15)\n    at Object.migrationsUpToDate
> (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:
> 156:9)"},"message":"Re-index failed :: 1 of 3 shards failed. Check
> Elasticsearch cluster health for more information."}
> {"type":"log","@timestamp":"2021-02-05T14:23:06Z","tags":["status","plugin:
> opendistro_security.1","info"],"pid":122,"state":"yellow","message":
> "Status changed from yellow to yellow - Tenant indices migration
> failed","prevState":"yellow","prevMsg":"Setting up index template."}

@Anli,

That doesn't look like the error we are looking for.

I logged into your cluster to try to force the condition where the kibana plugin failed to migrate user tenants and I couldn't recreate it, which is probably why we don't see the messages.


> @ewolinet  Let me know if this is OK for now and if we should create a follow up bz or if we want to address in this bz.

@Mike

I'm not sure which thing you're referring to, I feel like there's a couple in this thread now... can you clarify please?

Comment 36 Mike Fiedler 2021-02-05 19:37:45 UTC
Is the issue in comment 32 a problem?   or is this bug good to mark VERIFIED?

Comment 37 ewolinet 2021-02-05 22:34:23 UTC
(In reply to Mike Fiedler from comment #36)
> Is the issue in comment 32 a problem?   or is this bug good to mark VERIFIED?

I believe so, I couldn't recreate the actual issue on Anping's cluster so we didn't see the message we expect for this issue which is something like the following:

[warning][migrations] Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana.


However, we are now seeing those messages displayed in the Kibana log, which is what the basis of the change was.

Comment 38 ewolinet 2021-02-05 22:36:05 UTC
(In reply to ewolinet from comment #37)
> (In reply to Mike Fiedler from comment #36)
> > Is the issue in comment 32 a problem?   or is this bug good to mark VERIFIED?
> 
> I believe so, I couldn't recreate the actual issue on Anping's cluster so we
> didn't see the message we expect for this issue which is something like the
> following:
> 
> [warning][migrations] Another Kibana instance appears to be migrating the
> index. Waiting for that migration to complete. If no other Kibana instance
> is attempting migrations, you can get past this message by deleting index
> .kibana_-377444158_kubeadmin_1 and restarting Kibana.
> 
> 
> However, we are now seeing those messages displayed in the Kibana log, which
> is what the basis of the change was.

Sorry, hit Save Changes too soon... 

We are seeing messages printed out that that level (e.g. info) where-as before this change we would only see at Warning/Error level which was hiding messages like the one above.

Comment 39 Anping Li 2021-02-07 07:16:22 UTC
Verified, The message @comment 37 will be helpfull to workaround this issue.

Comment 40 ewolinet 2021-02-08 17:32:17 UTC
(In reply to Anping Li from comment #39)
> Verified, The message @comment 37 will be helpfull to workaround this issue.

To clarify, that is not a workaround, that is currently the intended fix as this is what Kibana (of elastic.co) has documented as the fix -- manual user intervention to mitigate data loss.

Changes to the opendistro security plugin to handle this in an automated way would be a feature and should be tracked as a JIRA card to be sized/prioritized.

Comment 42 errata-xmlrpc 2021-02-09 13:25:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.31 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0315

Comment 43 Periklis Tsirakidis 2021-10-21 12:00:11 UTC
*** Bug 2009756 has been marked as a duplicate of this bug. ***