Bug 1461294

Summary: Logging upgrade to 3.4.1 failed by "Unable to find log message from cluster.service from pod logging-es-b14738tr-3-ia26w within 300 seconds"
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Xia Zhao <xiazhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.4.1CC: aos-bugs, jcantril, juzhao, nnosenzo, pportant, pvarma, rkharwar, rmeggins
Target Milestone: ---Keywords: Regression
Target Release: 3.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When ES logging is not configured with console logging, the method by which we determined the cluster is available is not written to the logs returned by 'oc logs' Consequence: The runs.sh script times out and exits looking for the log message Fix: Evaluate the logging configuration to determine where to look for the cluster.service message Result: The run.sh script finds the desired message and continues to start the cluster.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-11 10:47:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
upgrade log
none
upgrade log, issue not fixed
none
use logging-deployer:v3.4.1.44-1, same error none

Description Xia Zhao 2017-06-14 07:20:12 UTC
Created attachment 1287549 [details]
upgrade log

Description of problem:
Similar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1393775
Upgrade logging stacks from 3.3.1 level to 3.4.1 level, it failed by:
Unable to find log message from cluster.service from pod logging-es-b14738tr-3-ia26w within 300 seconds:
$ oc get po
NAME                          READY     STATUS      RESTARTS   AGE
logging-deployer-0rauv        0/1       Completed   0          30m
logging-deployer-d8yqu        0/1       Error       0          17m
logging-es-b14738tr-3-ia26w   1/1       Running     0          10m

Version-Release number of selected component (if applicable):
openshift3/logging-deployer        3.4.1               df8b49eaca4f        5 days ago          886.3 MB
(I have no way to know the exact image tag in format "3.4.1-xx" due to https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c19)

# openshift version
openshift v3.4.1.33
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

How reproducible:
100%

Steps to Reproduce:
1.Install openshift 3.4.1
2.Deploy logging 3.3.1 level with dynamic PV bounded to es, logged in kibana and made sure log entries are visible there.
3.Upgrade logging stacks to 3.4.1:

$oadm policy add-cluster-role-to-user cluster-admin xiazhao
$oc delete template logging-deployer-account-template logging-deployer-template
$oc create -f https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/deployer/deployer.yaml

$oc new-app logging-deployer-account-template
$oc get template logging-deployer-template -o yaml -n logging | sed  's/\(image:\s.*\)logging-deployment\(.*\)/\1logging-deployer\2/g' | oc apply -n logging -f -

$oc policy add-role-to-user edit --serviceaccount logging-deployer
$oc policy add-role-to-user daemonset-admin --serviceaccount logging-deployer

$oc adm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer
$oc adm policy add-cluster-role-to-user rolebinding-reader system:serviceaccount:logging:aggregated-logging-elasticsearch

$oc new-app logging-deployer-template -p IMAGE_PREFIX=${image_registry}/openshift3/ -p IMAGE_VERSION=3.4.1 -p MODE=upgrade

4.Check for upgrade result


Actual results:
4.Upgrade failed

Expected results:
Upgraded to 3.4.0 successfully

Additional info:
deployer pod logs attached

Comment 1 Xia Zhao 2017-06-14 07:27:00 UTC
This is a regression since this image:

openshift3/logging-deployer        3.4.1               dcee53833a87

according to https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c5

Comment 2 Xia Zhao 2017-06-14 07:28:51 UTC
@Jeff,

Could you please help take a look at the problem in https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c19, and educate me how to find out the exact image tag with format "3.4.1-xx" with image id from the cli output of "docker images"? Thanks in advance!

Thanks,
Xia

Comment 6 Junqi Zhao 2017-06-23 07:51:37 UTC
Verified this issue according to steps in Comment 0, but still failed at "cat: /elasticsearch/logging-es/logs/logging-es.log: No such file or directory", see the upgrade log.

# oc get po -n logging
NAME                              READY     STATUS      RESTARTS   AGE
logging-deployer-03py4            0/1       Completed   0          44m
logging-deployer-v1g6h            0/1       Error       0          19m
logging-es-ops-op2ua0y1-3-pd1si   1/1       Running     0          12m
logging-es-r84qbjr4-3-qrzhn       1/1       Running     0          12m


Images from brew regsitry
# docker images | grep logging
logging-deployer           3.4.1               80ca9c90d261        40 hours ago        857.5 MB
logging-kibana             3.4.1               0c2759ddfcd9        40 hours ago        338.8 MB
logging-elasticsearch      3.4.1               2240ae237369        40 hours ago        399.6 MB
logging-fluentd            3.4.1               059b92a39419        40 hours ago        232.7 MB
logging-curator            3.4.1               46fd26ad9a8b        40 hours ago        244.5 MB
logging-auth-proxy         3.4.1               990787824baf        40 hours ago        215.3 MB

Comment 7 Junqi Zhao 2017-06-23 07:52:25 UTC
Created attachment 1290917 [details]
upgrade log, issue not fixed

Comment 8 Jeff Cantrill 2017-06-26 20:16:01 UTC
I believe this is resolved in upstream:

https://github.com/openshift/origin-aggregated-logging/commit/84a5c99b46ba5819811c7dfed65a1cc8fb505b43

and downstream:

http://pkgs.devel.redhat.com/cgit/rpms/logging-deployment-docker/commit/?h=rhaos-3.4-rhel-7&id=365c72ce8de21af7be63263fc8882c0a8b4ff33a

Should be available in downstream images of v3.4.1.41-2 or better

Comment 9 Junqi Zhao 2017-06-27 06:37:41 UTC
3.4.1 latest logging-deployer image is v3.4.1.44-1, tested againn same error as Comment 6. see attached log

Comment 10 Junqi Zhao 2017-06-27 06:38:27 UTC
Created attachment 1292188 [details]
use logging-deployer:v3.4.1.44-1, same error

Comment 11 Junqi Zhao 2017-06-27 06:39:47 UTC
logging-deployer image
logging-deployer        3.4.1               3cfbb48d63f0        3 days ago          855.8 MB
logging-deployer        v3.4.1.44-1         3cfbb48d63f0        3 days ago          855.8 MB

Comment 12 Praveen Varma 2017-06-28 04:05:20 UTC
@Jeff - We have a situation here with regards to the errata - https://errata.devel.redhat.com/advisory/29143 where the release date is tomorrow (29th June) and the customer is looking for this for quite some time. Customer also escalated this several times and Mustafa, Sudhir, Satish and a lot of others from the senior management is directly involved to get the issues taken care for the customer. Just received an update from Xiaoli Tan that if these bugs are fixed today, we could still have the timely release tomorrow.

Thanks,
Praveen
Escalation Manager

Comment 13 Jeff Cantrill 2017-06-28 19:57:24 UTC
I am unable to reproduce the problem as described.  Using both the images from #8 and #9 and I was able to migrate a 3.3.1 cluster to 3.4.1. Steps:

1. deploy 3.3.1 logging in install mode - used the referenced 3.4.1 deployer
2. deploy 3.4.1 logging in upgrade mode

Comment 14 Xia Zhao 2017-06-29 07:23:16 UTC
Verified with the latest 3.4.1 deployer image, the original issue reported here are fixed, upgrade pod can finish successfully, but encouter this bz again: https://bugzilla.redhat.com/show_bug.cgi?id=1446504.

Images tested with:
logging-deployer        3.4.1               3cfbb48d63f0        5 days ago          855.8 MB

# openshift version
openshift v3.4.1.44
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Set to verified and track the fluentd issue seprately in bz #1446504

Comment 18 Jeff Cantrill 2017-06-30 17:57:46 UTC
Workaround is to enable console logging:

 oc edit configmap logging-elasticsearch 
rootLogger: ${es.logger.level}, file, console

Comment 19 Peter Portante 2017-07-01 01:22:44 UTC
FWIW, I tried this work-around today at a customer size and was not successful getting it to work.  The MODE=upgrade path would replace the configmap with its own which does not have console logging, and the upgrade failed each time.

Comment 21 errata-xmlrpc 2017-07-11 10:47:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1640