Bug 1393775 - Logging upgrade to 3.4.0 failed by "Unable to find log message from cluster.service from pod logging-es-3bjvollr-4-mhyt5 within 300 seconds"
Summary: Logging upgrade to 3.4.0 failed by "Unable to find log message from cluster.s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: ewolinet
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-10 10:24 UTC by Xia Zhao
Modified: 2017-03-08 18:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-18 12:54:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
deployer_pod_log (179.45 KB, text/plain)
2016-11-10 10:24 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Xia Zhao 2016-11-10 10:24:08 UTC
Created attachment 1219302 [details]
deployer_pod_log

Description of problem:
Upgrade logging stacks from 3.2.0 level to 3.4.0 level, it failed by:
Unable to find log message from cluster.service from pod logging-es-3bjvollr-4-mhyt5 within 300 seconds
# oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-rbae8       0/1       CrashLoopBackOff   4          18m
logging-deployer-cwpmt        0/1       Error              0          22m
logging-deployer-pdkwp        0/1       Completed          0          31m
logging-es-3bjvollr-4-mhyt5   0/1       CrashLoopBackOff   8          17m
logging-fluentd-f31ok         1/1       Running            0          18m


Version-Release number of selected component (if applicable):
brew registry:
openshift3/logging-deployer        3.4.0               c364ab9c2f75

# openshift version
openshift v3.4.0.23+24b1a58
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


How reproducible:
Always

Steps to Reproduce:
1.Install openshift 3.2.0
2.Deploy logging 3.2.0 level:
IMAGE_PREFIX=brew...:xxxx/openshift3/,IMAGE_VERSION=3.2.0,MODE=install

3.Upgrade logging stacks:
$oadm policy add-cluster-role-to-user cluster-admin xiazhao
$oc delete template logging-deployer-account-template logging-deployer-template
$oc create -f https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/deployer/deployer.yaml
$oc new-app logging-deployer-account-template
$oc get template logging-deployer-template -o yaml -n logging | sed  's/\(image:\s.*\)logging-deployment\(.*\)/\1logging-deployer\2/g' | oc apply -n logging -f -
$oc policy add-role-to-user edit --serviceaccount logging-deployer
$oc policy add-role-to-user daemonset-admin --serviceaccount logging-deployer
$oadm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer
$oadm policy add-cluster-role-to-user rolebinding-reader system:serviceaccount:logging:aggregated-logging-elasticsearch
$oc new-app logging-deployer-template -p PUBLIC_MASTER_URL=https://{master-domain}:8443,ENABLE_OPS_CLUSTER=false,IMAGE_PREFIX=brew...:xxxx/openshift3/,IMAGE_VERSION=3.4.0,ES_INSTANCE_RAM=1G,ES_CLUSTER_SIZE=1,KIBANA_HOSTNAME={kibana-route},KIBANA_OPS_HOSTNAME={kibana-ops-route},MASTER_URL=https://{master-domain}:8443,MODE=upgrade

4.Check for upgrade result


Actual results:
4.Upgrade failed

Expected results:
Upgraded to 3.4.0 successfully

Additional info:
deployer pod logs attached

Comment 1 ewolinet 2016-11-10 17:31:48 UTC
This looks to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1393769

The deployer failed while waiting for the EFK components to scale up.

The difference in the error message is that there was a window of time that the Deployer was able to see the ES pod had started, but it couldn't find a message in the logs to confirm that the service was available.

Comment 4 Xia Zhao 2016-11-14 09:50:44 UTC
It's fixed. Tested with latest deployer image of 3.4.0, upgraded successfully, and kibana & kibana ops UI accesible with log entries:

$ oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-n27sm           1/1       Running     0          4m
logging-curator-ops-1-izno3       1/1       Running     0          4m
logging-deployer-o8b77            0/1       Completed   0          10m
logging-deployer-r8kpd            0/1       Completed   0          6m
logging-es-flruj8ta-4-4gnaz       1/1       Running     0          4m
logging-es-ops-rpbmoj63-4-mgqhe   1/1       Running     0          4m
logging-fluentd-qxxjh             1/1       Running     0          4m
logging-kibana-2-j5ohm            2/2       Running     0          3m
logging-kibana-ops-3-9pr69        2/2       Running     0          3m

I'm not sure why the upgrade pod refused to show all logs by a short write issue:

$ oc logs -f logging-deployer-r8kpd
++ oc get dc -l logging-infra=elasticsearch -o 'jsonpath={.items[*].metadata.name}'
+ for dc in '$(oc get dc -l $label -o jsonpath='\''{.items[*].metadata.name}'\'')'
+ patchDCImage logging-es-flruj8ta logging-elasticsearch false
+ local dc=logging-es-flruj8ta
+ local image=logging-elasticsearch
+ local kibana=false
++ oc get dc/logging-es-flruj8ta -o 'jsonpath={.status.latestVersion}'
+ local version=1
+ local authProxy_patch
+ '[' false = true ']'
+ patchIfValid dc/logging-es-flruj8ta
'{.spec.template.spec.containers[0].image}=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:3.4.0
'
error: short write


# openshift version
openshift v3.4.0.25+1f36858
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


Images tested with:
brew....:xxxx/openshift3/logging-deployer        3.4.0               08eaf2753130        2 days ago          764.3 MB

Comment 5 ewolinet 2016-12-12 15:48:12 UTC
Prerelease issue, no docs needed.

Comment 7 errata-xmlrpc 2017-01-18 12:54:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.