Created attachment 1287549 [details] upgrade log Description of problem: Similar issue: https://bugzilla.redhat.com/show_bug.cgi?id=1393775 Upgrade logging stacks from 3.3.1 level to 3.4.1 level, it failed by: Unable to find log message from cluster.service from pod logging-es-b14738tr-3-ia26w within 300 seconds: $ oc get po NAME READY STATUS RESTARTS AGE logging-deployer-0rauv 0/1 Completed 0 30m logging-deployer-d8yqu 0/1 Error 0 17m logging-es-b14738tr-3-ia26w 1/1 Running 0 10m Version-Release number of selected component (if applicable): openshift3/logging-deployer 3.4.1 df8b49eaca4f 5 days ago 886.3 MB (I have no way to know the exact image tag in format "3.4.1-xx" due to https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c19) # openshift version openshift v3.4.1.33 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 How reproducible: 100% Steps to Reproduce: 1.Install openshift 3.4.1 2.Deploy logging 3.3.1 level with dynamic PV bounded to es, logged in kibana and made sure log entries are visible there. 3.Upgrade logging stacks to 3.4.1: $oadm policy add-cluster-role-to-user cluster-admin xiazhao $oc delete template logging-deployer-account-template logging-deployer-template $oc create -f https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/deployer/deployer.yaml $oc new-app logging-deployer-account-template $oc get template logging-deployer-template -o yaml -n logging | sed 's/\(image:\s.*\)logging-deployment\(.*\)/\1logging-deployer\2/g' | oc apply -n logging -f - $oc policy add-role-to-user edit --serviceaccount logging-deployer $oc policy add-role-to-user daemonset-admin --serviceaccount logging-deployer $oc adm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer $oc adm policy add-cluster-role-to-user rolebinding-reader system:serviceaccount:logging:aggregated-logging-elasticsearch $oc new-app logging-deployer-template -p IMAGE_PREFIX=${image_registry}/openshift3/ -p IMAGE_VERSION=3.4.1 -p MODE=upgrade 4.Check for upgrade result Actual results: 4.Upgrade failed Expected results: Upgraded to 3.4.0 successfully Additional info: deployer pod logs attached
This is a regression since this image: openshift3/logging-deployer 3.4.1 dcee53833a87 according to https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c5
@Jeff, Could you please help take a look at the problem in https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c19, and educate me how to find out the exact image tag with format "3.4.1-xx" with image id from the cli output of "docker images"? Thanks in advance! Thanks, Xia
Verified this issue according to steps in Comment 0, but still failed at "cat: /elasticsearch/logging-es/logs/logging-es.log: No such file or directory", see the upgrade log. # oc get po -n logging NAME READY STATUS RESTARTS AGE logging-deployer-03py4 0/1 Completed 0 44m logging-deployer-v1g6h 0/1 Error 0 19m logging-es-ops-op2ua0y1-3-pd1si 1/1 Running 0 12m logging-es-r84qbjr4-3-qrzhn 1/1 Running 0 12m Images from brew regsitry # docker images | grep logging logging-deployer 3.4.1 80ca9c90d261 40 hours ago 857.5 MB logging-kibana 3.4.1 0c2759ddfcd9 40 hours ago 338.8 MB logging-elasticsearch 3.4.1 2240ae237369 40 hours ago 399.6 MB logging-fluentd 3.4.1 059b92a39419 40 hours ago 232.7 MB logging-curator 3.4.1 46fd26ad9a8b 40 hours ago 244.5 MB logging-auth-proxy 3.4.1 990787824baf 40 hours ago 215.3 MB
Created attachment 1290917 [details] upgrade log, issue not fixed
I believe this is resolved in upstream: https://github.com/openshift/origin-aggregated-logging/commit/84a5c99b46ba5819811c7dfed65a1cc8fb505b43 and downstream: http://pkgs.devel.redhat.com/cgit/rpms/logging-deployment-docker/commit/?h=rhaos-3.4-rhel-7&id=365c72ce8de21af7be63263fc8882c0a8b4ff33a Should be available in downstream images of v3.4.1.41-2 or better
3.4.1 latest logging-deployer image is v3.4.1.44-1, tested againn same error as Comment 6. see attached log
Created attachment 1292188 [details] use logging-deployer:v3.4.1.44-1, same error
logging-deployer image logging-deployer 3.4.1 3cfbb48d63f0 3 days ago 855.8 MB logging-deployer v3.4.1.44-1 3cfbb48d63f0 3 days ago 855.8 MB
@Jeff - We have a situation here with regards to the errata - https://errata.devel.redhat.com/advisory/29143 where the release date is tomorrow (29th June) and the customer is looking for this for quite some time. Customer also escalated this several times and Mustafa, Sudhir, Satish and a lot of others from the senior management is directly involved to get the issues taken care for the customer. Just received an update from Xiaoli Tan that if these bugs are fixed today, we could still have the timely release tomorrow. Thanks, Praveen Escalation Manager
I am unable to reproduce the problem as described. Using both the images from #8 and #9 and I was able to migrate a 3.3.1 cluster to 3.4.1. Steps: 1. deploy 3.3.1 logging in install mode - used the referenced 3.4.1 deployer 2. deploy 3.4.1 logging in upgrade mode
Verified with the latest 3.4.1 deployer image, the original issue reported here are fixed, upgrade pod can finish successfully, but encouter this bz again: https://bugzilla.redhat.com/show_bug.cgi?id=1446504. Images tested with: logging-deployer 3.4.1 3cfbb48d63f0 5 days ago 855.8 MB # openshift version openshift v3.4.1.44 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 Set to verified and track the fluentd issue seprately in bz #1446504
Workaround is to enable console logging: oc edit configmap logging-elasticsearch rootLogger: ${es.logger.level}, file, console
FWIW, I tried this work-around today at a customer size and was not successful getting it to work. The MODE=upgrade path would replace the configmap with its own which does not have console logging, and the upgrade failed each time.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1640