Created attachment 1500752 [details] host used to deploy the lab env Description of problem: Installing openshift on a lab server with network latency made the prerequisite.yaml playbook to fail. I run the failed command[1] manually and it took 10.5 seconds to responds, and the defined timeout is 10 secs. Version-Release number of selected component (if applicable): openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch Steps to Reproduce: 1. install openshift-ansible 2. run the prerequisite.yml playbook 3. expect the "Create credentials for oreg_url" task to fail Actual results: Expected results: Additional info:
Created attachment 1500765 [details] Ansible debug
We are currently giving our users these instructions to work around this issue: Once install finishes, ssh to master-0 and run the following commands: sudo sed 's/default=20/default=60/' -i openshift-ansible/roles/lib_utils/library/docker_creds.py sudo sed 's/timeout 30/timeout 60/' -i openshift-ansible/roles/openshift_health_checker/openshift_checks/docker_image_availability.py sudo sed 's/timeout 30/timeout 60/' -i openshift-ansible/roles/openshift_health_checker/test/docker_image_availability_test.py This shows the other two tests with hardcoded timeouts that will fail under the same conditions. Chris
Gerald, based on the 10 second timeout and commit version, it looks like your code does not include a bump to the timeout in October: https://github.com/openshift/openshift-ansible/commit/32636fde0e07af35df53f90a03a89a30c1ef7e52 Any chance you can update your version? I opened a PR to bump the timeouts based on Chris's suggestions: https://github.com/openshift/openshift-ansible/pull/11922
Thanks Patrick, 60 seconds worked perfectly!. As Chris wrote, I used the same workaround at the time this bz was reported, that's why I can confirm 60 seconds it's more than enough to discard slow networks during packages download. Gerald
Verified this bug with openshift-ansible-3.11.152-1.git.0.3e13655.el7.noarch.rpm. The `skopeo` command only took 2.5s in QE's environment, so the timeout is quite enough for running the checks. # time skopeo inspect '--creds=xxx' docker://registry.redhat.io/openshift3/ose { "Name": "registry.redhat.io/openshift3/ose", "Digest": "sha256:f4064c56127c75efb83a79e91c3de44f48df930f5a9b9b829bbcfc81ceeffd19", "RepoTags": [ "v3.5.5.5", ... "sha256:2f87e3b75838689a5d28de304b2b012888cf2afd00256d89094d706d5dda0cf6", "sha256:7bf92aa152acaa30396ee2f099dbf92d906ea5574148629f93d05581e5d3cf3f" ] } real 0m2.593s user 0m0.080s sys 0m0.033s After PR applied, no regression issue found.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3139