Bug 1586197
| Summary: | Installer fails - node service does not start in time on one of the masters | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> | ||||||
| Component: | Cluster Version Operator | Assignee: | Russell Teague <rteague> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Vikas Laad <vlaad> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mifiedle, mmccomas, wmeng, wsun | ||||||
| Target Milestone: | --- | Keywords: | TestBlocker, Unconfirmed | ||||||
| Target Release: | 3.10.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
The Ansible async timeout was very short which would intermittently cause the async status check later to fail because the original task job would never report completion. The async job timeout was increased to ensure there was enough time for the job to either complete successfully or fail with an appropriate error message.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-12-20 21:36:57 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1447950 [details]
ansible log with -vvv
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/f1ee19b52f49941ac4cf56c41770d5aa3e86f761 Bug 1586197 - Increase async timeout https://github.com/openshift/openshift-ansible/commit/bfa27c9beaa14483134dd8af5e1492716a591cbe Merge pull request #8691 from mtnbikenc/fix-1586197 Bug 1586197 - Increase async timeout openshift-ansible-3.10.0-0.66.0 Created attachment 1450530 [details]
ansible log with -vvv
Tried again with the latest code from openshift-master, here is the head 79d6516f4164b82c7dbfdc120f8f4f229116abc1
Saw the same failure please see latest ansible log attached.
*** Bug 1589531 has been marked as a duplicate of this bug. *** I've attempted several times to reproduce this but have been unsuccessful. If this is reproduced again, please provide the contents of the ansible_sync job id file located in /root/.ansible_sync/. The job id corresponds to the id in the task output. Add testblocker keyword since the duplicate bug is blocking the upgrade testing against HA clusters per https://bugzilla.redhat.com/show_bug.cgi?id=1589531#c1 I completed 2 upgrades on HA cluster both completed fine. I verified with latest git hash a1634c352a0ebc4476c9d961a74f2c3817ad35e8 from openshift-ansible This should be fixed in openshift-ansible-3.10.0-0.66.0 or newer. |
Description of problem: I am trying to upgrade ha cluster with following nodes. I have tried it couple of times. While upgrading one of the masters the installer fails complaining node service cant be started, but after some time node service starts. If I re-run the installer after that it moves to next task. 1 lb 3 masters 3 etcd 2 infra 2 compute Version-Release number of the following components: rpm -q openshift-ansible - latest 340e2f3e86d1119541c300d95b4e7c877b0a6b99 rpm -q ansible ansible-2.4.3.0-1.el7ae.noarch ansible --version config file = /root/openshift-ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Apr 19 2018, 05:40:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] How reproducible: with HA cluster, single master cluster works fine. Steps to Reproduce: 1. create HA 3.9 cluster 2. upgrade cluster to 3.10 Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Hosts: ec2-54-200-4-41.us-west-2.compute.amazonaws.com Play: Update master nodes Task: Check status of node service Message: ^[[0;31mFailed without returning a message.^[[0m Expected results: Installer should complete. Additional info: Please attach logs from ansible-playbook with the -vvv flag