1669194 – Sanity Check in upgrade and prerequisite playbook is slow and removed vars check does not work

Bug 1669194 - Sanity Check in upgrade and prerequisite playbook is slow and removed vars check does not work

Summary: Sanity Check in upgrade and prerequisite playbook is slow and removed vars ch...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.11.0
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Michael Gugino
QA Contact:	Weihua Meng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-24 15:10 UTC by Matthew Robson
Modified:	2022-03-13 16:51 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-20 14:11:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0326	0	None	None	None	2019-02-20 14:11:07 UTC

Description Matthew Robson 2019-01-24 15:10:18 UTC

Description of problem:

Sanity check is taking well over 60 minutes to run

10:52:07,767 p=127636 u=root |  TASK [Run variable sanity checks] *******************************************************************************************

2019-01-21 10:52:07,767 p=127636 u=root |  task path: /usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:14

2019-01-21 12:08:16,698 p=127636 u=root |  ok: [nodename] => {    "changed": false,    "msg": "Sanity Checks passed"}

Doing some additional debugging, the OCS nodes take the majority of the time inside check_for_removed_vars

Version-Release number of the following components:

3.11.59

How reproducible:

Always

Steps to Reproduce:
1. Run upgrade or check, especially with large OCS nodes.
2.
3.

Actual results:
Very slow versus 3.9

Expected results:
Quick execution

Comment 2 Matthew Robson 2019-01-24 15:13:36 UTC

PR with a fix: https://github.com/openshift/openshift-ansible/pull/11061

Comment 3 Matthew Robson 2019-01-24 15:14:33 UTC

Quick test without and with the fix shows over 2x speed improvement.


Without fix - 7m 37s

2019-01-23 12:46:39,467 p=39217 u=root |  TASK [Run variable sanity checks] **********************************************

2019-01-23 12:54:16,036 p=39217 u=root |  ok: [nodename] => {
    "changed": false,
    "msg": "New Sanity Checks passed"
}

With Fix - 3m 17s

2019-01-23 13:14:57,100 p=71065 u=root |  TASK [Run variable sanity checks] **********************************************

2019-01-23 13:18:14,905 p=71065 u=root |  ok: [nodename] => {
    "changed": false,
    "msg": "New Sanity Checks passed"
}

Comment 4 Scott Dodson 2019-01-24 15:50:54 UTC

https://github.com/openshift/openshift-ansible/pull/11061 merged

Comment 8 Weihua Meng 2019-02-12 07:34:59 UTC

Hi, Mike

I tested with cluster of 6 glusterfs nodes(3 for docker registry), for upgrade time, there is no difference between
openshift-ansible-3.11.59-1.git.0.ba8e948.el7.noarch
openshift-ansible-3.11.82-1.git.0.f29227a.el7.noarch

Could you help? 
Thanks.

Comment 9 Matthew Robson 2019-02-12 15:35:30 UTC

How many devices / volumes / pvc do you have? Where we see this issue, there are around 700 volumes in use.

Comment 12 Weihua Meng 2019-02-13 00:21:12 UTC

move to verified according to comment 10 

Thanks for help, Matthew and Mike.

Comment 14 errata-xmlrpc 2019-02-20 14:11:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326

Note You need to log in before you can comment on or make changes to this bug.