Bug 1913952
| Summary: | Kibana Yellow and Inaccessible - plugin:opendistro_security@6.8.1 Setting up index template. | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Matthew Robson <mrobson> |
| Component: | Logging | Assignee: | ewolinet |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.5 | CC: | aivaraslaimikis, anli, aos-bugs, bjarolim, cruhm, dkulkarn, ewolinet, jnordell, joboyer, mifiedle, mmohan, naoto30, openshift-bugs-escalate, rsandu, sreber, tmicheli, ykarajag |
| Target Milestone: | --- | ||
| Target Release: | 4.5.z | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | logging-exploration | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-09 13:25:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1873493 | ||
| Bug Blocks: | |||
|
Description
Matthew Robson
2021-01-07 20:41:47 UTC
In some situation, the kibana index is in migrations status, that result in 'Tenant indices migration failed'. For example: The ES pods are restarted while you are creating indics pattern. The web browser is closed while you are creating indics pattern.
You may see the error logs as below.
{"type":"log","@timestamp":"2021-01-26T14:41:12Z","tags":["warning","migrations"],"pid":117,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana."}
The workaround is to delete the blocked kibana index.
(In reply to Anping Li from comment #14) > In some situation, the kibana index is in migrations status, that result in > 'Tenant indices migration failed'. For example: The ES pods are restarted > while you are creating indics pattern. The web browser is closed while you > are creating indics pattern. > > You may see the error logs as below. > {"type":"log","@timestamp":"2021-01-26T14:41:12Z","tags":["warning", > "migrations"],"pid":117,"message":"Another Kibana instance appears to be > migrating the index. Waiting for that migration to complete. If no other > Kibana instance is attempting migrations, you can get past this message by > deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana."} > > The workaround is to delete the blocked kibana index. What is the procedure to find the blocked indices? And would it be possible to unblock them rather removing them? (In reply to Simon Reber from comment #18) > (In reply to Anping Li from comment #14) > > In some situation, the kibana index is in migrations status, that result in > > 'Tenant indices migration failed'. For example: The ES pods are restarted > > while you are creating indics pattern. The web browser is closed while you > > are creating indics pattern. > > > > You may see the error logs as below. > > {"type":"log","@timestamp":"2021-01-26T14:41:12Z","tags":["warning", > > "migrations"],"pid":117,"message":"Another Kibana instance appears to be > > migrating the index. Waiting for that migration to complete. If no other > > Kibana instance is attempting migrations, you can get past this message by > > deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana."} > > > > The workaround is to delete the blocked kibana index. > What is the procedure to find the blocked indices? In later (4.6+) versions of the Kibana image, it is in the logs when you view them with `oc logs -c kibana <your kibana pod>` For 4.5 however, you will need to take steps as outlined in https://bugzilla.redhat.com/show_bug.cgi?id=1913952#c11: 1) Get the name of one of the Kibana pods and `oc rsh` into it oc rsh -c kibana <kibana_pod> 2) Edit the config file to be able to spin up a second, temporary kibana instance in the container <vi> config/kibana.yml 3) Search and update the following lines: #server.port: 5061 => server.port: 5062 #server.host: "localhost" => server.host: "localhost" pid.file: ${HOME}/kibana.pid => pid.file: ${HOME}/kibana2.pid logging.quiet: true => #logging.quiet: true 4) Save the configuration file and then start up kibana bin/kibana It should start up another instance of Kibana and will display the status including any errors for setting up the index template. > And would it be possible to unblock them rather removing them? Unfortunately, no. The error comes from within the document migration code for Kibana. It will fail to create the next index for migration (based on what the alias currently points to). When that fails, it exits out with the error we're all familiar with seeing now. The index that is being deleted is not being used yet and any data that would have been migrated to it will be migrated again the next time Kibana is restarted. Before fix, The kibana became Yellow, plugin:opendistro_security.1, Tenant indices migration failed.
#You can see message as following.
$ oc logs -c kibana kibana-84689c6479-8hbfn
#The following values dynamically added from environment variable overrides:
Using NODE_OPTIONS: '--max_old_space_size=1024' Memory setting is in MB
{"type":"log","@timestamp":"2021-02-03T13:07:44Z","tags":["status","plugin:elasticsearch.1","error"],"pid":121,"state":"red","message":"Status changed from yellow to red - Service Unavailable","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","@timestamp":"2021-02-03T13:08:22Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nGET https://elasticsearch.openshift-logging.svc.cluster.local:9200/.kibana?include_type_name=true => socket hang up"}
{"type":"log","@timestamp":"2021-02-03T13:08:22Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nGET https://elasticsearch.openshift-logging.svc.cluster.local:9200/.kibana_-377444158_kubeadmin?include_type_name=true => socket hang up"}
{"type":"log","@timestamp":"2021-02-03T13:08:25Z","tags":["listening","info"],"pid":121,"message":"Server running at http://localhost:5601"}
{"type":"log","@timestamp":"2021-02-03T13:08:26Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nPUT https://elasticsearch.openshift-logging.svc.cluster.local:9200/.kibana_-377444158_kubeadmin_2?include_type_name=true => socket hang up"}
{"type":"log","@timestamp":"2021-02-03T13:08:26Z","tags":["error","elasticsearch","admin"],"pid":121,"message":"Request error, retrying\nPUT https://elasticsearch.openshift-logging.svc.cluster.local:9200/_template/kibana_index_template%3A.kibana_* => socket hang up"}
{"type":"error","@timestamp":"2021-02-03T13:08:26Z","tags":["error","migration"],"pid":121,"level":"error","error":{"message":"No Living connections","name":"Error","stack":"Error: No Living connections\n at sendReqWithConnection (/opt/app-root/src/node_modules/elasticsearch/src/lib/transport.js:226:15)\n at next (/opt/app-root/src/node_modules/elasticsearch/src/lib/connection_pool.js:214:7)\n at process._tickCallback (internal/process/next_tick.js:61:11)"},"message":"No Living connections"}
{"type":"log","@timestamp":"2021-02-03T13:09:18Z","tags":["status","plugin:elasticsearch.1","error"],"pid":121,"state":"red","message":"Status changed from green to red - Service Unavailable","prevState":"green","prevMsg":"Ready"}
After delete the index, the kibana still Yellow, plugin:opendistro_security.1, Tenant indices migration failed.
@Eric, Could we provide Doc Text to guide the OCP adm to fix this issue @Eric, Is this message you want to show? What is the action to take?
"type":"error","@timestamp":"2021-02-05T14:23:06Z","tags":["error","migration"],"pid":122,"level":"error","error":{"message":"Re-index failed :: 1 of 3 shards failed. Check Elasticsearch cluster health for more information.","name":"Error","stack":"Error: Re-index failed :: 1 of 3 shards failed. Check Elasticsearch cluster health for more information.\n at assertResponseIncludeAllShards (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:263:15)\n at Object.migrationsUpToDate (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js:156:9)"},"message":"Re-index failed :: 1 of 3 shards failed. Check Elasticsearch cluster health for more information."}
{"type":"log","@timestamp":"2021-02-05T14:23:06Z","tags":["status","plugin:opendistro_security.1","info"],"pid":122,"state":"yellow","message":"Status changed from yellow to yellow - Tenant indices migration failed","prevState":"yellow","prevMsg":"Setting up index template."}
@ewolinet Let me know if this is OK for now and if we should create a follow up bz or if we want to address in this bz. (In reply to Anping Li from comment #32) > @Eric, Is this message you want to show? What is the action to take? > > "type":"error","@timestamp":"2021-02-05T14:23:06Z","tags":["error", > "migration"],"pid":122,"level":"error","error":{"message":"Re-index failed > :: 1 of 3 shards failed. Check Elasticsearch cluster health for more > information.","name":"Error","stack":"Error: Re-index failed :: 1 of 3 > shards failed. Check Elasticsearch cluster health for more information.\n > at assertResponseIncludeAllShards > (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js: > 263:15)\n at Object.migrationsUpToDate > (/opt/app-root/src/src/server/saved_objects/migrations/core/elastic_index.js: > 156:9)"},"message":"Re-index failed :: 1 of 3 shards failed. Check > Elasticsearch cluster health for more information."} > {"type":"log","@timestamp":"2021-02-05T14:23:06Z","tags":["status","plugin: > opendistro_security.1","info"],"pid":122,"state":"yellow","message": > "Status changed from yellow to yellow - Tenant indices migration > failed","prevState":"yellow","prevMsg":"Setting up index template."} @Anli, That doesn't look like the error we are looking for. I logged into your cluster to try to force the condition where the kibana plugin failed to migrate user tenants and I couldn't recreate it, which is probably why we don't see the messages. > @ewolinet Let me know if this is OK for now and if we should create a follow up bz or if we want to address in this bz. @Mike I'm not sure which thing you're referring to, I feel like there's a couple in this thread now... can you clarify please? Is the issue in comment 32 a problem? or is this bug good to mark VERIFIED? (In reply to Mike Fiedler from comment #36) > Is the issue in comment 32 a problem? or is this bug good to mark VERIFIED? I believe so, I couldn't recreate the actual issue on Anping's cluster so we didn't see the message we expect for this issue which is something like the following: [warning][migrations] Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_-377444158_kubeadmin_1 and restarting Kibana. However, we are now seeing those messages displayed in the Kibana log, which is what the basis of the change was. (In reply to ewolinet from comment #37) > (In reply to Mike Fiedler from comment #36) > > Is the issue in comment 32 a problem? or is this bug good to mark VERIFIED? > > I believe so, I couldn't recreate the actual issue on Anping's cluster so we > didn't see the message we expect for this issue which is something like the > following: > > [warning][migrations] Another Kibana instance appears to be migrating the > index. Waiting for that migration to complete. If no other Kibana instance > is attempting migrations, you can get past this message by deleting index > .kibana_-377444158_kubeadmin_1 and restarting Kibana. > > > However, we are now seeing those messages displayed in the Kibana log, which > is what the basis of the change was. Sorry, hit Save Changes too soon... We are seeing messages printed out that that level (e.g. info) where-as before this change we would only see at Warning/Error level which was hiding messages like the one above. Verified, The message @comment 37 will be helpfull to workaround this issue. (In reply to Anping Li from comment #39) > Verified, The message @comment 37 will be helpfull to workaround this issue. To clarify, that is not a workaround, that is currently the intended fix as this is what Kibana (of elastic.co) has documented as the fix -- manual user intervention to mitigate data loss. Changes to the opendistro security plugin to handle this in an automated way would be a feature and should be tracked as a JIRA card to be sized/prioritized. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.31 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0315 *** Bug 2009756 has been marked as a duplicate of this bug. *** |