Bug 1446499 - Logging upgrade failed, can't recover logging after upgrade to OCP 3.4
Summary: Logging upgrade failed, can't recover logging after upgrade to OCP 3.4
Keywords:
Status: CLOSED DUPLICATE of bug 1439356
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Peter Portante
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-28 08:47 UTC by Miheer Salunke
Modified: 2020-06-11 13:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-11 17:33:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 1 Miheer Salunke 2017-04-28 08:53:52 UTC
Description of problem:
We tried to upgrade OCP 3.4 (OS + infra + logging + metrics) to the latest version, by following the guides we managed to successfully upgrade OS and OCP infra:

https://docs.openshift.com/container-platform/3.4/install_config/upgrading/os_upgrades.html
https://docs.openshift.com/container-platform/3.4/install_config/upgrading/automated_upgrades.html

However, when trying to upgrade EFK as per

https://docs.openshift.com/container-platform/3.4/install_config/upgrading/manual_upgrades.html#manual-upgrading-efk-logging-stack

we hit

https://bugzilla.redhat.com/show_bug.cgi?id=1439356

The workaround suggestion there was not helpful and when retesting upgrade again something else might have happened as well as now the elasticsearch pod won't start at all.

We need help how to recover logging and upgrade to the latest version.

# rpm -qa | grep openshift | sort
atomic-openshift-3.4.1.18-1.git.0.0f9d380.el7.x86_64
atomic-openshift-clients-3.4.1.18-1.git.0.0f9d380.el7.x86_64
openshift-ansible-3.4.79-1.git.0.6faa668.el7.noarch
openshift-ansible-callback-plugins-3.4.79-1.git.0.6faa668.el7.noarch
openshift-ansible-docs-3.4.79-1.git.0.6faa668.el7.noarch
openshift-ansible-filter-plugins-3.4.79-1.git.0.6faa668.el7.noarch
openshift-ansible-lookup-plugins-3.4.79-1.git.0.6faa668.el7.noarch
openshift-ansible-playbooks-3.4.79-1.git.0.6faa668.el7.noarch
openshift-ansible-roles-3.4.79-1.git.0.6faa668.el7.noarch

# rpm -qa | grep openshift | xargs rpm -V


We're using a locally created "root" user who is a cluster-admin:

# oc whoami
root

# oc project logging
Already on project "logging" on server "https://masterlb.example.com:8443".

# oc apply -n openshift -f /usr/share/ansible/openshift-ansible/roles/openshift_hosted_templates/files/v1.4/enterprise/logging-deployer.yaml
template "logging-deployer-account-template" configured
template "logging-deployer-template" configured

# oc process logging-deployer-account-template | oc apply -f -
serviceaccount "logging-deployer" configured
serviceaccount "aggregated-logging-kibana" configured
serviceaccount "aggregated-logging-elasticsearch" configured
serviceaccount "aggregated-logging-fluentd" configured
serviceaccount "aggregated-logging-curator" configured
clusterrole "oauth-editor" configured
clusterrole "daemonset-admin" configured
clusterrole "rolebinding-reader" configured
rolebinding "logging-deployer-edit-role" configured
rolebinding "logging-deployer-dsadmin-role" configured
rolebinding "logging-elasticsearch-view-role" configured

# oadm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer

# oadm policy add-cluster-role-to-user rolebinding-reader system:serviceaccount:logging:aggregated-logging-elasticsearch

# oc new-app logging-deployer-template -p MODE=upgrade -p IMAGE_VERSION=3.4.1
--> Deploying template "logging/logging-deployer-template" to project logging

     logging-deployer-template
     ---------
     Template for running the aggregated logging deployer in a pod. Requires empowered 'logging-deployer' service account.

     * With parameters:
        * MODE=upgrade
        * IMAGE_PREFIX=epor.netact.nsn-rdnet.net:5000/openshift3/
        * IMAGE_VERSION=3.4.1
...
--> Success
    Run 'oc status' to view your app.

# oc get pods
NAME                     READY     STATUS    RESTARTS   AGE
logging-deployer-6l06y   0/1       Error     0          20m
logging-deployer-7co20   1/1       Running   0          <invalid>
# oc delete pod logging-deployer-6l06y
pod "logging-deployer-6l06y" deleted

# oc get pods
NAME                     READY     STATUS    RESTARTS   AGE
logging-deployer-7co20   1/1       Running   0          2s

# oc get pods
NAME                     READY     STATUS    RESTARTS   AGE
logging-deployer-7co20   0/1       Error     0          1m
logging-fluentd-u09pc    1/1       Running   0          15s
logging-fluentd-yhqbr    1/1       Running   0          15s

# oc logs logging-deployer-7co20 | tail -n 20
Recreating ES configmap
configmap "logging-elasticsearch" created
configmap "logging-elasticsearch" labeled
Adding downward API NAMESPACE var to ES and updating config mountPath
"logging-es-ghuflqyv" patched
Started deployment #12
Use 'oc logs -f dc/logging-es-ghuflqyv' to track its progress.
--> Deploying template "logging/logging-fluentd-template" to project logging
     logging-fluentd-template
     ---------
     Template for logging fluentd deployment.
     * With parameters:
        * IMAGE_PREFIX=registry.cloudapps.example.com:5000/openshift3/
        * IMAGE_VERSION=3.4.1
--> Creating resources ...
    daemonset "logging-fluentd" created
--> Success
    Run 'oc status' to view your app.
No Elasticsearch pods found running.  Cannot update common data model.
Scale up ES prior to running with MODE=migrate
 

Trying with MODE=migrate will just cause the deployer pod to fail.

So to recap, we don't know what is wrong nor how to recover nor how to achieve the original goal of upgrading logging of OCP 3.4.


Version-Release number of selected component (if applicable):
OCP 3.4 (Upgrade)

How reproducible:
On customer side

Steps to Reproduce:
1.Mentioned in the description
2.
3.

Actual results:
Upgrade of logging fails.

Expected results:
Upgrade of logging shall not fail

Additional info:

Comment 3 Peter Portante 2017-05-01 18:18:27 UTC
This appears to be an upgrade from 3.3 to 3.4.1, correct?

Can you attach all of the deployer logs to this BZ for inspection?

Comment 6 Jeff Cantrill 2017-05-11 16:01:32 UTC
Workaround is:

oc edit configmap logging-elasticsearch
Make min master setting be the same as: https://github.com/openshift/origin-aggregated-logging/commit/56793ee729196f4b3c43769141f2b78df47d1a39#diff-bb1ad2cd0b762aef9387d3b94258a8a9R32

Comment 8 Jeff Cantrill 2017-05-11 17:26:50 UTC
I stand corrected.  This issue is available with the release of 3.4.1-19 which needs to be pushed out.

Comment 9 Jeff Cantrill 2017-05-11 17:33:53 UTC

*** This bug has been marked as a duplicate of bug 1439356 ***


Note You need to log in before you can comment on or make changes to this bug.