Bug 1435144 - [Intservice_public_324] Logging upgrade from 3.4.1 to 3.5.0 failed because "No Elasticsearch pods found running. Cannot update common data model."
Summary: [Intservice_public_324] Logging upgrade from 3.4.1 to 3.5.0 failed because "N...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: Jeff Cantrill
QA Contact: Xia Zhao
URL:
Whiteboard:
: 1435176 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-23 09:24 UTC by Xia Zhao
Modified: 2017-11-28 21:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-11-28 21:53:01 UTC
Target Upstream Version:


Attachments (Terms of Use)
The upgrade log with error "Cannot update common data model" (2.78 MB, text/plain)
2017-03-23 09:24 UTC, Xia Zhao
no flags Details
inventory file used for logging upgrade (874 bytes, text/plain)
2017-03-23 09:26 UTC, Xia Zhao
no flags Details
upgrade logging from 3.4.1 to 3.5.0 log (1.96 MB, text/plain)
2017-03-27 03:10 UTC, Junqi Zhao
no flags Details
ansible inventory file (930 bytes, text/plain)
2017-03-27 03:10 UTC, Junqi Zhao
no flags Details
upgrade logging from 3.4.1 to 3.5.0 log (1.98 MB, text/plain)
2017-03-30 07:39 UTC, Junqi Zhao
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Xia Zhao 2017-03-23 09:24:27 UTC
Created attachment 1265656 [details]
The upgrade log with error "Cannot update common data model"

Description of problem:
I did logging upgrade from 3.4.1 to 3.5.0 two times today: 
--The 1st failed when ansible script is replacing curator dc
--The 2nd time failed because "No Elasticsearch pods found running.  Cannot update common data model."

Version-Release number of selected component (if applicable):
openshift-ansible-3.5.41-1.git.0.e33897c.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install logging 3.4.1 stacks on a OCP 3.5.0 master, attach elasticsearch with the HostPath PV
2. Visit kibana route before upgrade
3. Upgrade logging stacks to 3.5.0 by using ansible scripts (inventory file attached)

Actual results:
2. Kibana route accessible with log entries
3. Ansible script failed when relacing curator dc

Expected results:
3. Should upgrade successfully

Additional info:
Ansible log and upgrade inventory file attached

Comment 1 Xia Zhao 2017-03-23 09:26:13 UTC
Created attachment 1265658 [details]
inventory file used for logging upgrade

Comment 2 Xia Zhao 2017-03-23 09:33:44 UTC
The end about the 1st failure when ansible script is replacing curator dc, didn't get it reproduced in my 2nd attempt, so not able to attach the full log:

failed: [$master] (item=logging-curator) => {
    "failed": true, 
    "invocation": {
        "module_args": {
            "debug": false, 
            "kind": "dc", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": "logging-curator", 
            "namespace": "logging", 
            "replicas": 0, 
            "state": "present"
        }, 
        "module_name": "oc_scale"
    }, 
    "object": "logging-curator"
}

MSG:

{u'cmd': u'/usr/bin/oc replace -f /tmp/logging-curator-Gy41SY -n logging', u'returncode': 1, u'results': {}, u'stderr': u'Error from server (Conflict): error when replacing "/tmp/logging-curator-Gy41SY": Operation cannot be fulfilled on deploymentconfigs "logging-curator": the object has been modified; please apply your changes to the latest version and try again\n', u'stdout': u''}


RUNNING HANDLER [openshift_logging : restart master] ***************************
    to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/openshift_logging.retry

PLAY RECAP *********************************************************************
$master : ok=254  changed=52   unreachable=0    failed=1

Comment 3 Xia Zhao 2017-03-23 10:32:44 UTC
The 1st failure in comment #2 were reproduced with the latest codes from 

https://github.com/openshift/openshift-ansible -b release-1.5

It's now tracked seprately by this bz: https://bugzilla.redhat.com/show_bug.cgi?id=1435176

Comment 4 Xia Zhao 2017-03-23 10:39:04 UTC
I'm removing keyword "regression" since confirmed with my colleague that it hadn't been tested upgrading from 3.4.1 to 3.5.0 with hostPathPV attached, so this may not really be a regression.

Comment 5 Jeff Cantrill 2017-03-23 12:38:33 UTC
@Eric, I have seen similar issues as #2 where there is some object modification in I think an 'oc_apply' step.  Do you have an indications of what might be the problem.  My work around has been to uninstall the logging stack and re-install.  Seems like this is not necessarily a blocker since as a work around could we not advise to uninstall/reinstall and maybe run the upgrade manually?

Comment 8 Jeff Cantrill 2017-03-23 17:37:23 UTC
*** Bug 1435176 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2017-03-27 03:10:20 UTC
Created attachment 1266527 [details]
upgrade logging from 3.4.1 to 3.5.0 log

Comment 10 Junqi Zhao 2017-03-27 03:10:49 UTC
Created attachment 1266528 [details]
ansible inventory file

Comment 11 Junqi Zhao 2017-03-27 03:15:08 UTC
Verified according to xiazhao's steps when upgrade logging form 3.4.1 to 3.5.0, no error showed in the log file, this defect was not re-produced in my environment.

Attached inventory file and upgrade log.

openshift-ansible version:
openshift-ansible-3.5.45-1.git.0.eb0859b.el7.noarch
openshift-ansible-playbooks-3.5.45-1.git.0.eb0859b.el7.noarch

Image ID:
openshift3/logging-deployer        3.4.1               1adc612d46b0        2 days ago          889.5 MB
openshift3/logging-elasticsearch   3.4.1               246537fe4546        4 days ago          399.2 MB
openshift3/logging-auth-proxy      3.4.1               d85303b2c262        2 weeks ago         219.8 MB
openshift3/logging-kibana          3.4.1               03900b0b9416        2 weeks ago         339.1 MB
openshift3/logging-fluentd         3.4.1               e4b97776c79b        2 weeks ago         233 MB
openshift3/logging-curator         3.4.1               091de35492d6        2 weeks ago         244.3 M



openshift3/logging-elasticsearch   3.5.0               5ff198b5c68d        4 days ago          399.4 MB
openshift3/logging-kibana          3.5.0               a6159c640977        2 weeks ago         342.4 MB
openshift3/logging-fluentd         3.5.0               32a4ac0a3e18        2 weeks ago         232.5 MB
openshift3/logging-curator         3.5.0               8cfcb23f26b6        3 weeks ago         211.1 MB
openshift3/logging-auth-proxy      3.5.0               139f7943475e        9 weeks ago         220 MB

Comment 12 Jeff Cantrill 2017-03-27 18:45:30 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1435144

Comment 13 Xia Zhao 2017-03-28 02:21:55 UTC
(In reply to Junqi Zhao from comment #11)
> Verified according to xiazhao's steps 
Vague descripton by "xiazhao's steps": you're expected to mention what exact steps did you do here

> when upgrade logging form 3.4.1 to
> 3.5.0, no error showed in the log file, this defect was not re-produced in
> my environment.

Make sure you know what exactly does it mean if a bug "reproduce" and "not reproduce" next time before entering a comment into bugzilla.
 
> Attached inventory file and upgrade log.
> 
> openshift-ansible version:
> openshift-ansible-3.5.45-1.git.0.eb0859b.el7.noarch
> openshift-ansible-playbooks-3.5.45-1.git.0.eb0859b.el7.noarch

Bug was originally reported to openshift-ansible-3.5.41-1, which is different than yours.

Comment 15 Junqi Zhao 2017-03-30 07:39:10 UTC
Created attachment 1267435 [details]
upgrade logging from 3.4.1 to 3.5.0 log

Comment 16 Jeff Cantrill 2017-03-30 16:22:43 UTC
The upgrade task this is calling is unnecessary when moving from 3.4->3.5.  The work around is to set 'openshift_logging_upgrade_logging=false'.  The dependency will be removed for 3.6

Comment 17 Junqi Zhao 2017-03-31 00:33:56 UTC
@Jeff,
Do you mean we need to set 'openshift_logging_upgrade_logging=false' when upgrading OCP from 3.4 to 3.5, or set 'openshift_logging_upgrade_logging=false' when upgrading logging from 3.4 to 3.5?

Comment 18 Junqi Zhao 2017-03-31 09:06:30 UTC
@Jeff,
Do you mean we should set 'openshift_logging_upgrade_logging=false' if upgrade logging from 3.4.1 to 3.5.0 failed on OCP 3.5.0 ? Is it also mean we install logging 3.5.0 directly?

Since it's only a workaround and we will remove the dependency from 3.6.0, the public upgrade environment was shutdown yesterday and I do not find upgrade error "No Elasticsearch pods found running.  Cannot update common data model." even with "openshift-ansible-3.5.41-1.git.0.e33897c.el7.noarch(openshift version when this issue was reported)" in my own environment, maybe this error is not happen every time.

I think we should open this defect and leave it to verify on 3.6.0.

Comment 19 Jeff Cantrill 2017-03-31 12:52:36 UTC
Removed upgrade logic for logging in master/3.6: https://github.com/openshift/openshift-ansible/pull/3806  

also backported 1.5: https://github.com/openshift/openshift-ansible/pull/3814

The upgrade logic was added when we thought we would use this role for both 3.4 and 3.5.  It would be required in a 3.3->3.4 migration.  Since the EFK stack used for 3.4 and 3.5 is the same, the 'upgrade' functionality is not necessary and is being removed.

Comment 20 Xia Zhao 2017-04-01 08:21:00 UTC
@Jeff, Does this mean ansbile deployment will no longer be provided for 3.5.0? I understand the background EFK stacks of 3.4 and 3.5 is same, but in all the prior releases since 3.3, we used to provide the upgrade function, will customer consider feel surprise if this function is going to be removed? Just want to bring the question and discuss here...

Comment 21 Xia Zhao 2017-04-01 08:22:27 UTC
(In reply to Xia Zhao from comment #20)
> @Jeff, Does this mean ansbile deployment will no longer be provided for
> 3.5.0? 
Clarification: the "ansible deployment" here means to upgrade EFK stacks to v3.5.0 here.

Comment 23 Xia Zhao 2017-04-05 02:47:31 UTC
Thanks to the clarification, Jeff. Set to verified according to comment #22.

Comment 25 openshift-github-bot 2017-10-27 20:30:17 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/b9a5df087a588ca6e64ec0981eeb7dcb304e482c
bug 1435144. Remove uneeded upgrade in openshift_logging role

Comment 29 errata-xmlrpc 2017-11-28 21:53:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.