Bug 1461294
Summary: | Logging upgrade to 3.4.1 failed by "Unable to find log message from cluster.service from pod logging-es-b14738tr-3-ia26w within 300 seconds" | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||||
Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Xia Zhao <xiazhao> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 3.4.1 | CC: | aos-bugs, jcantril, juzhao, nnosenzo, pportant, pvarma, rkharwar, rmeggins | ||||||||
Target Milestone: | --- | Keywords: | Regression | ||||||||
Target Release: | 3.4.z | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
Cause: When ES logging is not configured with console logging, the method by which we determined the cluster is available
is not written to the logs returned by 'oc logs'
Consequence: The runs.sh script times out and exits looking for the log message
Fix: Evaluate the logging configuration to determine where to look for the cluster.service message
Result: The run.sh script finds the desired message and continues to start the cluster.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-07-11 10:47:38 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Xia Zhao
2017-06-14 07:20:12 UTC
This is a regression since this image: openshift3/logging-deployer 3.4.1 dcee53833a87 according to https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c5 @Jeff, Could you please help take a look at the problem in https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c19, and educate me how to find out the exact image tag with format "3.4.1-xx" with image id from the cli output of "docker images"? Thanks in advance! Thanks, Xia Verified this issue according to steps in Comment 0, but still failed at "cat: /elasticsearch/logging-es/logs/logging-es.log: No such file or directory", see the upgrade log. # oc get po -n logging NAME READY STATUS RESTARTS AGE logging-deployer-03py4 0/1 Completed 0 44m logging-deployer-v1g6h 0/1 Error 0 19m logging-es-ops-op2ua0y1-3-pd1si 1/1 Running 0 12m logging-es-r84qbjr4-3-qrzhn 1/1 Running 0 12m Images from brew regsitry # docker images | grep logging logging-deployer 3.4.1 80ca9c90d261 40 hours ago 857.5 MB logging-kibana 3.4.1 0c2759ddfcd9 40 hours ago 338.8 MB logging-elasticsearch 3.4.1 2240ae237369 40 hours ago 399.6 MB logging-fluentd 3.4.1 059b92a39419 40 hours ago 232.7 MB logging-curator 3.4.1 46fd26ad9a8b 40 hours ago 244.5 MB logging-auth-proxy 3.4.1 990787824baf 40 hours ago 215.3 MB Created attachment 1290917 [details]
upgrade log, issue not fixed
I believe this is resolved in upstream: https://github.com/openshift/origin-aggregated-logging/commit/84a5c99b46ba5819811c7dfed65a1cc8fb505b43 and downstream: http://pkgs.devel.redhat.com/cgit/rpms/logging-deployment-docker/commit/?h=rhaos-3.4-rhel-7&id=365c72ce8de21af7be63263fc8882c0a8b4ff33a Should be available in downstream images of v3.4.1.41-2 or better 3.4.1 latest logging-deployer image is v3.4.1.44-1, tested againn same error as Comment 6. see attached log Created attachment 1292188 [details]
use logging-deployer:v3.4.1.44-1, same error
logging-deployer image logging-deployer 3.4.1 3cfbb48d63f0 3 days ago 855.8 MB logging-deployer v3.4.1.44-1 3cfbb48d63f0 3 days ago 855.8 MB @Jeff - We have a situation here with regards to the errata - https://errata.devel.redhat.com/advisory/29143 where the release date is tomorrow (29th June) and the customer is looking for this for quite some time. Customer also escalated this several times and Mustafa, Sudhir, Satish and a lot of others from the senior management is directly involved to get the issues taken care for the customer. Just received an update from Xiaoli Tan that if these bugs are fixed today, we could still have the timely release tomorrow. Thanks, Praveen Escalation Manager I am unable to reproduce the problem as described. Using both the images from #8 and #9 and I was able to migrate a 3.3.1 cluster to 3.4.1. Steps: 1. deploy 3.3.1 logging in install mode - used the referenced 3.4.1 deployer 2. deploy 3.4.1 logging in upgrade mode Verified with the latest 3.4.1 deployer image, the original issue reported here are fixed, upgrade pod can finish successfully, but encouter this bz again: https://bugzilla.redhat.com/show_bug.cgi?id=1446504. Images tested with: logging-deployer 3.4.1 3cfbb48d63f0 5 days ago 855.8 MB # openshift version openshift v3.4.1.44 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 Set to verified and track the fluentd issue seprately in bz #1446504 Workaround is to enable console logging: oc edit configmap logging-elasticsearch rootLogger: ${es.logger.level}, file, console FWIW, I tried this work-around today at a customer size and was not successful getting it to work. The MODE=upgrade path would replace the configmap with its own which does not have console logging, and the upgrade failed each time. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1640 |