1443350 – Openshift logging - upgrade from 3.4.0 to 3.4.1.12

Bug 1443350 - Openshift logging - upgrade from 3.4.0 to 3.4.1.12

Summary: Openshift logging - upgrade from 3.4.0 to 3.4.1.12

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.4.z
Assignee:	Jeff Cantrill
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-19 06:57 UTC by Miheer Salunke
Modified:	2021-09-09 12:15 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2017-10-25 13:00:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
es_log_after_upgraded_to_es_3.4.1-38 (11.07 KB, text/plain) 2017-08-09 06:43 UTC, Xia Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Article)	3029411	0	None	None	None	2017-05-10 15:50:11 UTC
Red Hat Product Errata	RHBA-2017:3049	0	normal	SHIPPED_LIVE	OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update	2017-10-25 15:57:15 UTC

Description Miheer Salunke 2017-04-19 06:57:00 UTC

Created attachment 1272502 [details]
logs and pod description

Description of problem:
I have upgraded openshift logging from 3.4.0 to 3.4.1.12 and it has failed. 

elastic search logs mostly consist of 
[2017-04-19 01:26:57,923][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:26:59,562][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:26:59,713][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:04,680][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:08,814][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:12,288][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:21,013][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:22,815][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:24,514][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:28,116][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:32,623][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:33,412][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-04-19 01:27:37,730][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized


Version-Release number of selected component (if applicable):
3.4.0

How reproducible:
Customer end

Steps to Reproduce:
1.Openshift logging - upgrade from 3.4.0 to 3.4.1.12
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jeff Cantrill 2017-04-28 16:20:17 UTC

This fix for this is not available until 3.4.1.15.  This issue is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1431551

Comment 5 Miheer Salunke 2017-05-01 06:41:09 UTC

They upgraded to 3.4.1-17 but still the same issue :( Why so ? Attached the logs in the earlier comment.

Comment 6 Peter Portante 2017-05-03 05:16:26 UTC

In the recent log, I seem image lines like:

  image: registry.access.redhat.com/openshift3/logging-elasticsearch:3.4.0

Should that read 3.4.1.17?

Comment 7 Rich Megginson 2017-05-03 18:48:40 UTC

Not sure if "3.4.0" means "use the latest 3.4.x image" or "use the latest 3.4.0.x" image.  Using "v3.4" or "3.4.1" would guarantee using the latest 3.4.x image.

Comment 8 Miheer Salunke 2017-05-08 08:21:47 UTC

Sorry if there was some confusion caused.

openshift Node rpms ->

atomic-openshift-3.4.1.12-1.git.0.57d7e1d.el7.x86_64        Wed Apr 19 11:47:43 2017
atomic-openshift-clients-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 Wed Apr 19 11:47:37 2017
atomic-openshift-docker-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch Wed Apr 19 10:34:11 2017
atomic-openshift-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch Wed Apr 19 10:34:11 2017
atomic-openshift-node-3.4.1.12-1.git.0.57d7e1d.el7.x86_64   Wed Apr 19 11:47:43 2017
atomic-openshift-sdn-ovs-3.4.1.12-1.git.0.57d7e1d.el7.x86_64 Wed Apr 19 11:47:44 2017
atomic-openshift-utils-3.2.7-1.git.0.1bf8fbe.el7.noarch     Wed Oct 26 10:44:40 2016



In the dc->

          image: registry.access.redhat.com/openshift3/logging-elasticsearch:3.4.1


From master->

[root@drlosm02 ~]# openshift version
openshift v3.4.1.12
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

[root@drlosi01 ~]# rpm -qa | grep atomic
atomic-openshift-3.4.1.12-1.git.0.57d7e1d.el7.x86_64
atomic-openshift-utils-3.2.7-1.git.0.1bf8fbe.el7.noarch
atomic-openshift-docker-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch
atomic-openshift-clients-3.4.1.12-1.git.0.57d7e1d.el7.x86_64
atomic-openshift-node-3.4.1.12-1.git.0.57d7e1d.el7.x86_64
atomic-openshift-excluder-3.4.1.12-1.git.0.57d7e1d.el7.noarch
tuned-profiles-atomic-openshift-node-3.4.1.12-1.git.0.57d7e1d.el7.x86_64
atomic-openshift-sdn-ovs-3.4.1.12-1.git.0.57d7e1d.el7.x86_64


They say they tried upgrading to the latest version again and still got the same issue.  Now using logging-deployer-v3.4.1.18-3

Comment 10 Rich Megginson 2017-05-08 23:19:02 UTC

Please provide the following information so we can determine the exact version of the elasticsearch image being used:

docker images|grep elasticsearch

Comment 14 Jeff Cantrill 2017-05-11 14:44:50 UTC

We are working to get a release of 3.4 which includes a number of bug fixes including the one identified here.  I would refrain from upgrading until 3.4.1-20 is available.

Fluentd uses a position file to identify the last place in a log it was able to read and push to elasticsearch.  It will begin reading from that position once ES is available and online.  With regards to data loss: you are at the mercy of the log source rotation policy.

Comment 25 Peter Portante 2017-05-23 15:40:47 UTC

Hi Daniel, it looks like based on the output from the gist script that there is only one ES pod in the cluster, but it appears that most indices are set to have a replica.  So the cluster is in a "yellow" state because one or more replicas do not have host to hold it, since only one ES pod is available.

It is likely that your customer problem is the result of a bad configuration.  Perhaps it is best to talk about this in another forum?

Comment 38 Miheer Salunke 2017-06-12 01:10:03 UTC

Attached (index_storage_check.out) the output of the following script  ->

#############################################

#!/bin/bash

file=unassigned_shards_check.txt
pods=$(oc get po | grep logging-es | awk '{print$1}')
anypod=$(oc get po | grep logging-es | awk '{print$1}' | tail -1)

echo getting current UNASSIGNED shards to $file
oc exec $anypod -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED > $file

for index in `cat $file | grep "0 p" | awk '{print$1}'`;
do
  echo * Checking index $index in all pods
  for pod in $pods;
  do
    echo ** Checking index $index in pod: $pod;
    path=$(oc exec $pod -- find /elasticsearch/persistent/logging-es/data/logging-es/nodes/ -name $index)
    if [ -z "$path" ]
    then
      echo Path $path not found in pod $pod
    else
      oc exec $pod -- ls -R $path
    fi
  done;
done



#########################################################

The script does the following ->

 - Iterate over all the indices
 - Iterate over all the es pods
 - Search in the storage to see if there is a folder with the name of the $index in the $pod

Check attached file index_storage_check.out
 $ ./index_storage_check.sh > index_storage_check.out

Comment 40 Peter Portante 2017-06-12 01:40:01 UTC

In the future, using "echo *" and "echo **" dumped what the shell found for files in the current directory into the output stream.

Comment 43 Peter Portante 2017-06-12 02:39:27 UTC

Filed https://bugzilla.redhat.com/show_bug.cgi?id=1460564 to cover a fix for allowing more than one ES pod to access and EBS volume.

Comment 55 Xia Zhao 2017-08-09 06:42:52 UTC

Upgraded logging from 3.3.1 to 3.4.1, after upgrade, didn't meet this error in es log (attached it):

[ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized

So set this bz verified. 

The thing is, the NodSelector in fluentd daemonset was changed to random UUIDs post upgrade, as is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1446504.

logging images tested with:

logging-deployer    v3.4.1.44.11-1
logging-elasticsearch     3.4.1-38

Comment 56 Xia Zhao 2017-08-09 06:43:45 UTC

Created attachment 1311017 [details]
es_log_after_upgraded_to_es_3.4.1-38

Comment 58 errata-xmlrpc 2017-10-25 13:00:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049

Note You need to log in before you can comment on or make changes to this bug.