Bug 1636914 - timeout for "Wait for sync DS to set annotations on all nodes" task
Summary: timeout for "Wait for sync DS to set annotations on all nodes" task
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Vadim Rutkovsky
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-08 09:14 UTC by Johnny Liu
Modified: 2019-01-10 09:04 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: sync daemonset didn't wait sufficient amount of time for nodes to restart Consequence: the sync DS verification task failed as nodes didn't come up in time Fix: a number of retries was increased Result: install or upgrade succeeds
Clone Of:
Environment:
Last Closed: 2019-01-10 09:04:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
installation log with inventory file embedded (1.77 MB, text/plain)
2018-10-08 09:14 UTC, Johnny Liu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0024 0 None None None 2019-01-10 09:04:07 UTC

Description Johnny Liu 2018-10-08 09:14:33 UTC
Created attachment 1491564 [details]
installation log with inventory file embedded

Description of problem:
This is related to https://github.com/openshift/openshift-ansible/pull/10039, the newly introduced "Wait for sync DS to set annotations on all nodes" task probably timeout, which lead to the whole install exit.

Version-Release number of the following components:
openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch

How reproducible:
2 of 3 tries

Steps to Reproduce:
1. Trigger an installation against stage registry.
2.
3.

Actual results:
timeout for "Wait for sync DS to set annotations on all nodes" task, after the several mins for the failure,  login into all node, I could assure md5sum annotations is set successfully. Need more retries???

Expected results:
Install pass.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Vadim Rutkovsky 2018-10-08 13:06:43 UTC
`preserve-jialiustg-node-1` doesn't have this annotation set. Since its been set later on we might add more attempts for this.

This seems to happen fairly often, right? I think increasing the delay between the checks would do the trick

Comment 2 Johnny Liu 2018-10-08 15:06:08 UTC
(In reply to Vadim Rutkovsky from comment #1)
> `preserve-jialiustg-node-1` doesn't have this annotation set. Since its been
> set later on we might add more attempts for this.
> 
Yeah.
> This seems to happen fairly often, right? I think increasing the delay
> between the checks would do the trick
When installing against stage registry, this happened often, at least it is today, I am not sure if this is caused by low stage registry performance.

Comment 3 Vadim Rutkovsky 2018-10-09 15:33:30 UTC
release-3.11 PR - https://bugzilla.redhat.com/show_bug.cgi?id=1636914

Comment 4 Scott Dodson 2018-10-11 20:28:23 UTC
https://github.com/openshift/openshift-ansible/pull/10363 is the backport to release-3.11

Comment 5 Vadim Rutkovsky 2018-10-15 09:08:45 UTC
Fix is available in openshift-ansible-3.11.23-1

Comment 6 Johnny Liu 2018-10-16 09:14:23 UTC
When testing this bug with openshift-ansible-3.11.23-1.git.0.19cbe21.el7.noarch, failed at "openshift_control_plane : Wait for control plane pods to appear" task when crio runtime is enabled, even did not come to "Wait for sync DS to set annotations on all nodes" task. 

I will re-run the testing once the fix PR for BZ#1639201 is merged.

Comment 7 Johnny Liu 2018-10-22 02:50:31 UTC
Due to comment 6, disable crio runtime, use docker runtime for testing.

Verified this bug with openshift-ansible-3.11.23-1.git.0.19cbe21.el7.noarch, and PASS.

TASK [openshift_manage_node : Wait for sync DS to set annotations on all nodes] ***
Thursday 18 October 2018  19:07:11 +0800 (0:00:00.694)       0:31:18.013 ****** 
FAILED - RETRYING: Wait for sync DS to set annotations on all nodes (180 retries left).
<--snip-->
<--snip-->
<--snip-->
FAILED - RETRYING: Wait for sync DS to set annotations on all nodes (23 retries left).
ok: [host-8-249-20.host.centralci.eng.rdu2.redhat.com -> host-8-249-20.host.centralci.eng.rdu2.redhat.com] => {"attempts": 159, "changed": false, "results": {"cmd": "/usr/bin/oc get node --selector= -o json -n default"

Comment 12 errata-xmlrpc 2019-01-10 09:04:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0024


Note You need to log in before you can comment on or make changes to this bug.