Bug 1435144
Summary: | [Intservice_public_324] Logging upgrade from 3.4.1 to 3.5.0 failed because "No Elasticsearch pods found running. Cannot update common data model." | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> |
Component: | Installer | Assignee: | Jeff Cantrill <jcantril> |
Status: | CLOSED ERRATA | QA Contact: | Xia Zhao <xiazhao> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.5.0 | CC: | aos-bugs, ewolinet, jcantril, jokerman, juzhao, mmccomas, xiazhao |
Target Milestone: | --- | ||
Target Release: | 3.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-11-28 21:53:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Created attachment 1265658 [details]
inventory file used for logging upgrade
The end about the 1st failure when ansible script is replacing curator dc, didn't get it reproduced in my 2nd attempt, so not able to attach the full log: failed: [$master] (item=logging-curator) => { "failed": true, "invocation": { "module_args": { "debug": false, "kind": "dc", "kubeconfig": "/etc/origin/master/admin.kubeconfig", "name": "logging-curator", "namespace": "logging", "replicas": 0, "state": "present" }, "module_name": "oc_scale" }, "object": "logging-curator" } MSG: {u'cmd': u'/usr/bin/oc replace -f /tmp/logging-curator-Gy41SY -n logging', u'returncode': 1, u'results': {}, u'stderr': u'Error from server (Conflict): error when replacing "/tmp/logging-curator-Gy41SY": Operation cannot be fulfilled on deploymentconfigs "logging-curator": the object has been modified; please apply your changes to the latest version and try again\n', u'stdout': u''} RUNNING HANDLER [openshift_logging : restart master] *************************** to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/openshift_logging.retry PLAY RECAP ********************************************************************* $master : ok=254 changed=52 unreachable=0 failed=1 The 1st failure in comment #2 were reproduced with the latest codes from https://github.com/openshift/openshift-ansible -b release-1.5 It's now tracked seprately by this bz: https://bugzilla.redhat.com/show_bug.cgi?id=1435176 I'm removing keyword "regression" since confirmed with my colleague that it hadn't been tested upgrading from 3.4.1 to 3.5.0 with hostPathPV attached, so this may not really be a regression. @Eric, I have seen similar issues as #2 where there is some object modification in I think an 'oc_apply' step. Do you have an indications of what might be the problem. My work around has been to uninstall the logging stack and re-install. Seems like this is not necessarily a blocker since as a work around could we not advise to uninstall/reinstall and maybe run the upgrade manually? *** Bug 1435176 has been marked as a duplicate of this bug. *** Created attachment 1266527 [details]
upgrade logging from 3.4.1 to 3.5.0 log
Created attachment 1266528 [details]
ansible inventory file
Verified according to xiazhao's steps when upgrade logging form 3.4.1 to 3.5.0, no error showed in the log file, this defect was not re-produced in my environment. Attached inventory file and upgrade log. openshift-ansible version: openshift-ansible-3.5.45-1.git.0.eb0859b.el7.noarch openshift-ansible-playbooks-3.5.45-1.git.0.eb0859b.el7.noarch Image ID: openshift3/logging-deployer 3.4.1 1adc612d46b0 2 days ago 889.5 MB openshift3/logging-elasticsearch 3.4.1 246537fe4546 4 days ago 399.2 MB openshift3/logging-auth-proxy 3.4.1 d85303b2c262 2 weeks ago 219.8 MB openshift3/logging-kibana 3.4.1 03900b0b9416 2 weeks ago 339.1 MB openshift3/logging-fluentd 3.4.1 e4b97776c79b 2 weeks ago 233 MB openshift3/logging-curator 3.4.1 091de35492d6 2 weeks ago 244.3 M openshift3/logging-elasticsearch 3.5.0 5ff198b5c68d 4 days ago 399.4 MB openshift3/logging-kibana 3.5.0 a6159c640977 2 weeks ago 342.4 MB openshift3/logging-fluentd 3.5.0 32a4ac0a3e18 2 weeks ago 232.5 MB openshift3/logging-curator 3.5.0 8cfcb23f26b6 3 weeks ago 211.1 MB openshift3/logging-auth-proxy 3.5.0 139f7943475e 9 weeks ago 220 MB (In reply to Junqi Zhao from comment #11) > Verified according to xiazhao's steps Vague descripton by "xiazhao's steps": you're expected to mention what exact steps did you do here > when upgrade logging form 3.4.1 to > 3.5.0, no error showed in the log file, this defect was not re-produced in > my environment. Make sure you know what exactly does it mean if a bug "reproduce" and "not reproduce" next time before entering a comment into bugzilla. > Attached inventory file and upgrade log. > > openshift-ansible version: > openshift-ansible-3.5.45-1.git.0.eb0859b.el7.noarch > openshift-ansible-playbooks-3.5.45-1.git.0.eb0859b.el7.noarch Bug was originally reported to openshift-ansible-3.5.41-1, which is different than yours. Created attachment 1267435 [details]
upgrade logging from 3.4.1 to 3.5.0 log
The upgrade task this is calling is unnecessary when moving from 3.4->3.5. The work around is to set 'openshift_logging_upgrade_logging=false'. The dependency will be removed for 3.6 @Jeff, Do you mean we need to set 'openshift_logging_upgrade_logging=false' when upgrading OCP from 3.4 to 3.5, or set 'openshift_logging_upgrade_logging=false' when upgrading logging from 3.4 to 3.5? @Jeff, Do you mean we should set 'openshift_logging_upgrade_logging=false' if upgrade logging from 3.4.1 to 3.5.0 failed on OCP 3.5.0 ? Is it also mean we install logging 3.5.0 directly? Since it's only a workaround and we will remove the dependency from 3.6.0, the public upgrade environment was shutdown yesterday and I do not find upgrade error "No Elasticsearch pods found running. Cannot update common data model." even with "openshift-ansible-3.5.41-1.git.0.e33897c.el7.noarch(openshift version when this issue was reported)" in my own environment, maybe this error is not happen every time. I think we should open this defect and leave it to verify on 3.6.0. Removed upgrade logic for logging in master/3.6: https://github.com/openshift/openshift-ansible/pull/3806 also backported 1.5: https://github.com/openshift/openshift-ansible/pull/3814 The upgrade logic was added when we thought we would use this role for both 3.4 and 3.5. It would be required in a 3.3->3.4 migration. Since the EFK stack used for 3.4 and 3.5 is the same, the 'upgrade' functionality is not necessary and is being removed. @Jeff, Does this mean ansbile deployment will no longer be provided for 3.5.0? I understand the background EFK stacks of 3.4 and 3.5 is same, but in all the prior releases since 3.3, we used to provide the upgrade function, will customer consider feel surprise if this function is going to be removed? Just want to bring the question and discuss here... (In reply to Xia Zhao from comment #20) > @Jeff, Does this mean ansbile deployment will no longer be provided for > 3.5.0? Clarification: the "ansible deployment" here means to upgrade EFK stacks to v3.5.0 here. Thanks to the clarification, Jeff. Set to verified according to comment #22. Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/b9a5df087a588ca6e64ec0981eeb7dcb304e482c bug 1435144. Remove uneeded upgrade in openshift_logging role Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |
Created attachment 1265656 [details] The upgrade log with error "Cannot update common data model" Description of problem: I did logging upgrade from 3.4.1 to 3.5.0 two times today: --The 1st failed when ansible script is replacing curator dc --The 2nd time failed because "No Elasticsearch pods found running. Cannot update common data model." Version-Release number of selected component (if applicable): openshift-ansible-3.5.41-1.git.0.e33897c.el7.noarch How reproducible: Always Steps to Reproduce: 1. Install logging 3.4.1 stacks on a OCP 3.5.0 master, attach elasticsearch with the HostPath PV 2. Visit kibana route before upgrade 3. Upgrade logging stacks to 3.5.0 by using ansible scripts (inventory file attached) Actual results: 2. Kibana route accessible with log entries 3. Ansible script failed when relacing curator dc Expected results: 3. Should upgrade successfully Additional info: Ansible log and upgrade inventory file attached