Created attachment 1272502 [details] logs and pod description Description of problem: I have upgraded openshift logging from 3.4.0 to 3.4.1.12 and it has failed. elastic search logs mostly consist of [2017-04-19 01:26:57,923][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:26:59,562][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:26:59,713][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:04,680][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:08,814][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:12,288][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:21,013][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:22,815][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:24,514][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:28,116][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:32,623][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:33,412][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-04-19 01:27:37,730][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized Version-Release number of selected component (if applicable): 3.4.0 How reproducible: Customer end Steps to Reproduce: 1.Openshift logging - upgrade from 3.4.0 to 3.4.1.12 2. 3. Actual results: Expected results: Additional info:
This fix for this is not available until 3.4.1.15. This issue is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1431551
They upgraded to 3.4.1-17 but still the same issue :( Why so ? Attached the logs in the earlier comment.
In the recent log, I seem image lines like: image: registry.access.redhat.com/openshift3/logging-elasticsearch:3.4.0 Should that read 3.4.1.17?
Not sure if "3.4.0" means "use the latest 3.4.x image" or "use the latest 3.4.0.x" image. Using "v3.4" or "3.4.1" would guarantee using the latest 3.4.x image.
Sorry if there was some confusion caused. openshift Node rpms -> atomic-openshift-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 Wed Apr 19 11:47:43 2017 atomic-openshift-clients-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 Wed Apr 19 11:47:37 2017 atomic-openshift-docker-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch Wed Apr 19 10:34:11 2017 atomic-openshift-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch Wed Apr 19 10:34:11 2017 atomic-openshift-node-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 Wed Apr 19 11:47:43 2017 atomic-openshift-sdn-ovs-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 Wed Apr 19 11:47:44 2017 atomic-openshift-utils-3.2.7-1.git.0.1bf8fbe.el7.noarch Wed Oct 26 10:44:40 2016 In the dc-> image: registry.access.redhat.com/openshift3/logging-elasticsearch:3.4.1 From master-> [root@drlosm02 ~]# openshift version openshift v3.4.1.12 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 [root@drlosi01 ~]# rpm -qa | grep atomic atomic-openshift-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 atomic-openshift-utils-3.2.7-1.git.0.1bf8fbe.el7.noarch atomic-openshift-docker-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch atomic-openshift-clients-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 atomic-openshift-node-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 atomic-openshift-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch tuned-profiles-atomic-openshift-node-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 atomic-openshift-sdn-ovs-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 They say they tried upgrading to the latest version again and still got the same issue. Now using logging-deployer-v3.4.1.18-3
Please provide the following information so we can determine the exact version of the elasticsearch image being used: docker images|grep elasticsearch
We are working to get a release of 3.4 which includes a number of bug fixes including the one identified here. I would refrain from upgrading until 3.4.1-20 is available. Fluentd uses a position file to identify the last place in a log it was able to read and push to elasticsearch. It will begin reading from that position once ES is available and online. With regards to data loss: you are at the mercy of the log source rotation policy.
Hi Daniel, it looks like based on the output from the gist script that there is only one ES pod in the cluster, but it appears that most indices are set to have a replica. So the cluster is in a "yellow" state because one or more replicas do not have host to hold it, since only one ES pod is available. It is likely that your customer problem is the result of a bad configuration. Perhaps it is best to talk about this in another forum?
Attached (index_storage_check.out) the output of the following script -> ############################################# #!/bin/bash file=unassigned_shards_check.txt pods=$(oc get po | grep logging-es | awk '{print$1}') anypod=$(oc get po | grep logging-es | awk '{print$1}' | tail -1) echo getting current UNASSIGNED shards to $file oc exec $anypod -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED > $file for index in `cat $file | grep "0 p" | awk '{print$1}'`; do echo * Checking index $index in all pods for pod in $pods; do echo ** Checking index $index in pod: $pod; path=$(oc exec $pod -- find /elasticsearch/persistent/logging-es/data/logging-es/nodes/ -name $index) if [ -z "$path" ] then echo Path $path not found in pod $pod else oc exec $pod -- ls -R $path fi done; done ######################################################### The script does the following -> - Iterate over all the indices - Iterate over all the es pods - Search in the storage to see if there is a folder with the name of the $index in the $pod Check attached file index_storage_check.out $ ./index_storage_check.sh > index_storage_check.out
In the future, using "echo *" and "echo **" dumped what the shell found for files in the current directory into the output stream.
Filed https://bugzilla.redhat.com/show_bug.cgi?id=1460564 to cover a fix for allowing more than one ES pod to access and EBS volume.
Upgraded logging from 3.3.1 to 3.4.1, after upgrade, didn't meet this error in es log (attached it): [ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized So set this bz verified. The thing is, the NodSelector in fluentd daemonset was changed to random UUIDs post upgrade, as is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1446504. logging images tested with: logging-deployer v3.4.1.44.11-1 logging-elasticsearch 3.4.1-38
Created attachment 1311017 [details] es_log_after_upgraded_to_es_3.4.1-38
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049