Description of problem: We tried to upgrade OCP 3.4 (OS + infra + logging + metrics) to the latest version, by following the guides we managed to successfully upgrade OS and OCP infra: https://docs.openshift.com/container-platform/3.4/install_config/upgrading/os_upgrades.html https://docs.openshift.com/container-platform/3.4/install_config/upgrading/automated_upgrades.html However, when trying to upgrade EFK as per https://docs.openshift.com/container-platform/3.4/install_config/upgrading/manual_upgrades.html#manual-upgrading-efk-logging-stack we hit https://bugzilla.redhat.com/show_bug.cgi?id=1439356 The workaround suggestion there was not helpful and when retesting upgrade again something else might have happened as well as now the elasticsearch pod won't start at all. We need help how to recover logging and upgrade to the latest version. # rpm -qa | grep openshift | sort atomic-openshift-3.4.1.18-1.git.0.0f9d380.el7.x86_64 atomic-openshift-clients-3.4.1.18-1.git.0.0f9d380.el7.x86_64 openshift-ansible-3.4.79-1.git.0.6faa668.el7.noarch openshift-ansible-callback-plugins-3.4.79-1.git.0.6faa668.el7.noarch openshift-ansible-docs-3.4.79-1.git.0.6faa668.el7.noarch openshift-ansible-filter-plugins-3.4.79-1.git.0.6faa668.el7.noarch openshift-ansible-lookup-plugins-3.4.79-1.git.0.6faa668.el7.noarch openshift-ansible-playbooks-3.4.79-1.git.0.6faa668.el7.noarch openshift-ansible-roles-3.4.79-1.git.0.6faa668.el7.noarch # rpm -qa | grep openshift | xargs rpm -V We're using a locally created "root" user who is a cluster-admin: # oc whoami root # oc project logging Already on project "logging" on server "https://masterlb.example.com:8443". # oc apply -n openshift -f /usr/share/ansible/openshift-ansible/roles/openshift_hosted_templates/files/v1.4/enterprise/logging-deployer.yaml template "logging-deployer-account-template" configured template "logging-deployer-template" configured # oc process logging-deployer-account-template | oc apply -f - serviceaccount "logging-deployer" configured serviceaccount "aggregated-logging-kibana" configured serviceaccount "aggregated-logging-elasticsearch" configured serviceaccount "aggregated-logging-fluentd" configured serviceaccount "aggregated-logging-curator" configured clusterrole "oauth-editor" configured clusterrole "daemonset-admin" configured clusterrole "rolebinding-reader" configured rolebinding "logging-deployer-edit-role" configured rolebinding "logging-deployer-dsadmin-role" configured rolebinding "logging-elasticsearch-view-role" configured # oadm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer # oadm policy add-cluster-role-to-user rolebinding-reader system:serviceaccount:logging:aggregated-logging-elasticsearch # oc new-app logging-deployer-template -p MODE=upgrade -p IMAGE_VERSION=3.4.1 --> Deploying template "logging/logging-deployer-template" to project logging logging-deployer-template --------- Template for running the aggregated logging deployer in a pod. Requires empowered 'logging-deployer' service account. * With parameters: * MODE=upgrade * IMAGE_PREFIX=epor.netact.nsn-rdnet.net:5000/openshift3/ * IMAGE_VERSION=3.4.1 ... --> Success Run 'oc status' to view your app. # oc get pods NAME READY STATUS RESTARTS AGE logging-deployer-6l06y 0/1 Error 0 20m logging-deployer-7co20 1/1 Running 0 <invalid> # oc delete pod logging-deployer-6l06y pod "logging-deployer-6l06y" deleted # oc get pods NAME READY STATUS RESTARTS AGE logging-deployer-7co20 1/1 Running 0 2s # oc get pods NAME READY STATUS RESTARTS AGE logging-deployer-7co20 0/1 Error 0 1m logging-fluentd-u09pc 1/1 Running 0 15s logging-fluentd-yhqbr 1/1 Running 0 15s # oc logs logging-deployer-7co20 | tail -n 20 Recreating ES configmap configmap "logging-elasticsearch" created configmap "logging-elasticsearch" labeled Adding downward API NAMESPACE var to ES and updating config mountPath "logging-es-ghuflqyv" patched Started deployment #12 Use 'oc logs -f dc/logging-es-ghuflqyv' to track its progress. --> Deploying template "logging/logging-fluentd-template" to project logging logging-fluentd-template --------- Template for logging fluentd deployment. * With parameters: * IMAGE_PREFIX=registry.cloudapps.example.com:5000/openshift3/ * IMAGE_VERSION=3.4.1 --> Creating resources ... daemonset "logging-fluentd" created --> Success Run 'oc status' to view your app. No Elasticsearch pods found running. Cannot update common data model. Scale up ES prior to running with MODE=migrate Trying with MODE=migrate will just cause the deployer pod to fail. So to recap, we don't know what is wrong nor how to recover nor how to achieve the original goal of upgrading logging of OCP 3.4. Version-Release number of selected component (if applicable): OCP 3.4 (Upgrade) How reproducible: On customer side Steps to Reproduce: 1.Mentioned in the description 2. 3. Actual results: Upgrade of logging fails. Expected results: Upgrade of logging shall not fail Additional info:
This appears to be an upgrade from 3.3 to 3.4.1, correct? Can you attach all of the deployer logs to this BZ for inspection?
Workaround is: oc edit configmap logging-elasticsearch Make min master setting be the same as: https://github.com/openshift/origin-aggregated-logging/commit/56793ee729196f4b3c43769141f2b78df47d1a39#diff-bb1ad2cd0b762aef9387d3b94258a8a9R32
I stand corrected. This issue is available with the release of 3.4.1-19 which needs to be pushed out.
*** This bug has been marked as a duplicate of bug 1439356 ***