Bug 1374500

Summary: docker checks (and other steps) run 6 times during a scaleup
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: InstallerAssignee: Andrew Butcher <abutcher>
Status: CLOSED DEFERRED QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-24 18:45:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Full ansible log for this scaleup none

Description Mike Fiedler 2016-09-08 20:53:57 UTC
Description of problem:

The Ansible 2.2 based install is much faster than the one based on Ansible 2.1.  However, there are still so many repeated tasks that it makes installing large clusters painful.   I was scaling up a cluster to 500 nodes and even though the steps run relatively quickly on each node, multiplied by 500 nodes the repetition really slows things down.  In particular I noticed a set of tasks that runs 6 times during a "simple" byo/openshift-node/scaleup.yml run to just add nodes the cluster. It would seem these are an example of tasks that need only be run once and not repeated so many times.  There would appear to be others as well, but these caught my attention as I watched this install.

I'm sure this is a post-3.3 thing

From a 2 node scaleup - a set of the docker tasks referred to above.

 593 2016-09-08 16:29:46,402 p=27058 u=root |  TASK [docker : stat] ***********************************************************
 594 2016-09-08 16:29:47,345 p=27058 u=root |  ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 595 2016-09-08 16:29:47,644 p=27058 u=root |  ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 596 2016-09-08 16:29:47,655 p=27058 u=root |  TASK [docker : Get current installed Docker version] ***************************
 597 2016-09-08 16:29:48,784 p=27058 u=root |  ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 598 2016-09-08 16:29:49,074 p=27058 u=root |  ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 599 2016-09-08 16:29:49,084 p=27058 u=root |  TASK [docker : Error out if Docker pre-installed but too old] ******************
 600 2016-09-08 16:29:49,668 p=27058 u=root |  skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 601 2016-09-08 16:29:49,849 p=27058 u=root |  skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 602 2016-09-08 16:29:49,859 p=27058 u=root |  TASK [docker : Error out if requested Docker is too old] ***********************
 603 2016-09-08 16:29:50,336 p=27058 u=root |  skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 604 2016-09-08 16:29:50,554 p=27058 u=root |  skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 605 2016-09-08 16:29:50,565 p=27058 u=root |  TASK [docker : Get latest available version of Docker] *************************
 606 2016-09-08 16:29:51,228 p=27058 u=root |  skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 607 2016-09-08 16:29:51,587 p=27058 u=root |  skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 608 2016-09-08 16:29:51,597 p=27058 u=root |  TASK [docker : Fail if Docker version requested but downgrade is required] *****
 609 2016-09-08 16:29:52,132 p=27058 u=root |  skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 610 2016-09-08 16:29:52,340 p=27058 u=root |  skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 611 2016-09-08 16:29:52,350 p=27058 u=root |  TASK [docker : Error out if attempting to upgrade Docker across the 1.10 boundary] ***
 612 2016-09-08 16:29:52,833 p=27058 u=root |  skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 613 2016-09-08 16:29:53,066 p=27058 u=root |  skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 614 2016-09-08 16:29:53,077 p=27058 u=root |  TASK [docker : Install Docker] *************************************************
 615 2016-09-08 16:29:54,082 p=27058 u=root |  ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 616 2016-09-08 16:29:54,275 p=27058 u=root |  ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 617 2016-09-08 16:29:54,285 p=27058 u=root |  TASK [docker : Start the Docker service] ***************************************
 618 2016-09-08 16:29:55,076 p=27058 u=root |  ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 619 2016-09-08 16:29:55,372 p=27058 u=root |  ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 620 2016-09-08 16:29:55,383 p=27058 u=root |  TASK [docker : set_fact] *******************************************************
 621 2016-09-08 16:29:56,037 p=27058 u=root |  ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 622 2016-09-08 16:29:56,347 p=27058 u=root |  ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 623 2016-09-08 16:29:56,358 p=27058 u=root |  TASK [docker : include] ********************************************************
 624 2016-09-08 16:29:56,843 p=27058 u=root |  skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 625 2016-09-08 16:29:57,038 p=27058 u=root |  skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 626 2016-09-08 16:29:57,049 p=27058 u=root |  TASK [docker : stat] ***********************************************************
 627 2016-09-08 16:29:57,895 p=27058 u=root |  ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com]
 628 2016-09-08 16:29:58,190 p=27058 u=root |  ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com]
 629 2016-09-08 16:29:58,200 p=27058 u=root |  TASK [docker : Set registry params] ********************************************




Version-Release number of selected component (if applicable):

3.3.0.30


How reproducible: Always


Steps to Reproduce:
1.  Enable ansible logging
2.  Install a 3.3 cluster
3.  Scale the cluster up by adding nodes.   The problem is in the initial install there, but it is complicated by the various roles.  It is easiest to see in a scaleup

Actual results:

Certain tasks that would seem to only need to be run once (e.g. checking the docker version), are run multiple times.

Expected results:

Run install tasks only as often as necessary.

Comment 1 Mike Fiedler 2016-09-08 20:55:23 UTC
Created attachment 1199198 [details]
Full ansible log for this scaleup

Comment 2 Jason DeTiberus 2016-09-22 20:42:09 UTC
I'm not sure how much progress we'll make on this directly in the 3.4 cycle, but we are looking at how to improve the efficiency of dependencies as we go forward.

Comment 4 Scott Dodson 2017-08-24 18:45:40 UTC
We don't intent to take any specific action to address this however we're refactoring the entire code base so that there's less inter-dependence between roles and more reliance on completing certain tasks before others.