Bug 1669194 - Sanity Check in upgrade and prerequisite playbook is slow and removed vars check does not work
Summary: Sanity Check in upgrade and prerequisite playbook is slow and removed vars ch...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Michael Gugino
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-24 15:10 UTC by Matthew Robson
Modified: 2019-02-20 14:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-20 14:11:02 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0326 None None None 2019-02-20 14:11:07 UTC

Description Matthew Robson 2019-01-24 15:10:18 UTC
Description of problem:

Sanity check is taking well over 60 minutes to run

10:52:07,767 p=127636 u=root |  TASK [Run variable sanity checks] *******************************************************************************************

2019-01-21 10:52:07,767 p=127636 u=root |  task path: /usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:14

2019-01-21 12:08:16,698 p=127636 u=root |  ok: [nodename] => {    "changed": false,    "msg": "Sanity Checks passed"}

Doing some additional debugging, the OCS nodes take the majority of the time inside check_for_removed_vars

Version-Release number of the following components:

3.11.59

How reproducible:

Always

Steps to Reproduce:
1. Run upgrade or check, especially with large OCS nodes.
2.
3.

Actual results:
Very slow versus 3.9

Expected results:
Quick execution

Comment 2 Matthew Robson 2019-01-24 15:13:36 UTC
PR with a fix: https://github.com/openshift/openshift-ansible/pull/11061

Comment 3 Matthew Robson 2019-01-24 15:14:33 UTC
Quick test without and with the fix shows over 2x speed improvement.


Without fix - 7m 37s

2019-01-23 12:46:39,467 p=39217 u=root |  TASK [Run variable sanity checks] **********************************************

2019-01-23 12:54:16,036 p=39217 u=root |  ok: [nodename] => {
    "changed": false,
    "msg": "New Sanity Checks passed"
}

With Fix - 3m 17s

2019-01-23 13:14:57,100 p=71065 u=root |  TASK [Run variable sanity checks] **********************************************

2019-01-23 13:18:14,905 p=71065 u=root |  ok: [nodename] => {
    "changed": false,
    "msg": "New Sanity Checks passed"
}

Comment 4 Scott Dodson 2019-01-24 15:50:54 UTC
https://github.com/openshift/openshift-ansible/pull/11061 merged

Comment 8 Weihua Meng 2019-02-12 07:34:59 UTC
Hi, Mike

I tested with cluster of 6 glusterfs nodes(3 for docker registry), for upgrade time, there is no difference between
openshift-ansible-3.11.59-1.git.0.ba8e948.el7.noarch
openshift-ansible-3.11.82-1.git.0.f29227a.el7.noarch

Could you help? 
Thanks.

Comment 9 Matthew Robson 2019-02-12 15:35:30 UTC
How many devices / volumes / pvc do you have? Where we see this issue, there are around 700 volumes in use.

Comment 12 Weihua Meng 2019-02-13 00:21:12 UTC
move to verified according to comment 10 

Thanks for help, Matthew and Mike.

Comment 14 errata-xmlrpc 2019-02-20 14:11:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326


Note You need to log in before you can comment on or make changes to this bug.