Hide Forgot
Description of problem: The Ansible 2.2 based install is much faster than the one based on Ansible 2.1. However, there are still so many repeated tasks that it makes installing large clusters painful. I was scaling up a cluster to 500 nodes and even though the steps run relatively quickly on each node, multiplied by 500 nodes the repetition really slows things down. In particular I noticed a set of tasks that runs 6 times during a "simple" byo/openshift-node/scaleup.yml run to just add nodes the cluster. It would seem these are an example of tasks that need only be run once and not repeated so many times. There would appear to be others as well, but these caught my attention as I watched this install. I'm sure this is a post-3.3 thing From a 2 node scaleup - a set of the docker tasks referred to above. 593 2016-09-08 16:29:46,402 p=27058 u=root | TASK [docker : stat] *********************************************************** 594 2016-09-08 16:29:47,345 p=27058 u=root | ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 595 2016-09-08 16:29:47,644 p=27058 u=root | ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 596 2016-09-08 16:29:47,655 p=27058 u=root | TASK [docker : Get current installed Docker version] *************************** 597 2016-09-08 16:29:48,784 p=27058 u=root | ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 598 2016-09-08 16:29:49,074 p=27058 u=root | ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 599 2016-09-08 16:29:49,084 p=27058 u=root | TASK [docker : Error out if Docker pre-installed but too old] ****************** 600 2016-09-08 16:29:49,668 p=27058 u=root | skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 601 2016-09-08 16:29:49,849 p=27058 u=root | skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 602 2016-09-08 16:29:49,859 p=27058 u=root | TASK [docker : Error out if requested Docker is too old] *********************** 603 2016-09-08 16:29:50,336 p=27058 u=root | skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 604 2016-09-08 16:29:50,554 p=27058 u=root | skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 605 2016-09-08 16:29:50,565 p=27058 u=root | TASK [docker : Get latest available version of Docker] ************************* 606 2016-09-08 16:29:51,228 p=27058 u=root | skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 607 2016-09-08 16:29:51,587 p=27058 u=root | skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 608 2016-09-08 16:29:51,597 p=27058 u=root | TASK [docker : Fail if Docker version requested but downgrade is required] ***** 609 2016-09-08 16:29:52,132 p=27058 u=root | skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 610 2016-09-08 16:29:52,340 p=27058 u=root | skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 611 2016-09-08 16:29:52,350 p=27058 u=root | TASK [docker : Error out if attempting to upgrade Docker across the 1.10 boundary] *** 612 2016-09-08 16:29:52,833 p=27058 u=root | skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 613 2016-09-08 16:29:53,066 p=27058 u=root | skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 614 2016-09-08 16:29:53,077 p=27058 u=root | TASK [docker : Install Docker] ************************************************* 615 2016-09-08 16:29:54,082 p=27058 u=root | ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 616 2016-09-08 16:29:54,275 p=27058 u=root | ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 617 2016-09-08 16:29:54,285 p=27058 u=root | TASK [docker : Start the Docker service] *************************************** 618 2016-09-08 16:29:55,076 p=27058 u=root | ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 619 2016-09-08 16:29:55,372 p=27058 u=root | ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 620 2016-09-08 16:29:55,383 p=27058 u=root | TASK [docker : set_fact] ******************************************************* 621 2016-09-08 16:29:56,037 p=27058 u=root | ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 622 2016-09-08 16:29:56,347 p=27058 u=root | ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 623 2016-09-08 16:29:56,358 p=27058 u=root | TASK [docker : include] ******************************************************** 624 2016-09-08 16:29:56,843 p=27058 u=root | skipping: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 625 2016-09-08 16:29:57,038 p=27058 u=root | skipping: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 626 2016-09-08 16:29:57,049 p=27058 u=root | TASK [docker : stat] *********************************************************** 627 2016-09-08 16:29:57,895 p=27058 u=root | ok: [ec2-54-245-5-215.us-west-2.compute.amazonaws.com] 628 2016-09-08 16:29:58,190 p=27058 u=root | ok: [ec2-54-149-169-61.us-west-2.compute.amazonaws.com] 629 2016-09-08 16:29:58,200 p=27058 u=root | TASK [docker : Set registry params] ******************************************** Version-Release number of selected component (if applicable): 3.3.0.30 How reproducible: Always Steps to Reproduce: 1. Enable ansible logging 2. Install a 3.3 cluster 3. Scale the cluster up by adding nodes. The problem is in the initial install there, but it is complicated by the various roles. It is easiest to see in a scaleup Actual results: Certain tasks that would seem to only need to be run once (e.g. checking the docker version), are run multiple times. Expected results: Run install tasks only as often as necessary.
Created attachment 1199198 [details] Full ansible log for this scaleup
I'm not sure how much progress we'll make on this directly in the 3.4 cycle, but we are looking at how to improve the efficiency of dependencies as we go forward.
We don't intent to take any specific action to address this however we're refactoring the entire code base so that there's less inter-dependence between roles and more reliance on completing certain tasks before others.