Bug 1729994

Summary: [disconnected] prerequisites.yml fail pulling openshift/ose
Product: OpenShift Container Platform Reporter: Camino Noguera <mnoguera>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Johnny Liu <jialiu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, eparis, jokerman, mmccomas, wzheng
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The 'ose' image was used for testing registry authentication. Consequence: Since the 'ose' image was no longer used as part of 3.11, the test would fail consistently. Fix: Updated the registry auth test to use 'ose-pod'. Result: Registry auth tests succeed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-18 14:52:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Camino Noguera 2019-07-15 14:09:16 UTC
Description of problem:
In a disconnected environment when it is executed prerequisites.yml playbook it fails to try pulling openshift/ose image.


Version-Release number of selected component (if applicable):
11.X


How reproducible:
Execute prerrequisites.yml in a disconnected environment.
3.11.117

Steps to Reproduce:
1.
2.
3.

Actual results:
Prerequisites failed 

Expected results:
Successfully ends

Additional info:
Thread discussed in:
https://github.com/openshift/openshift-ansible/issues/10756
https://github.com/openshift/openshift-docs/pull/12914
https://github.com/openshift/openshift-ansible/pull/10757


Seems that the solution is:
The root cause is at the following lines in roles/openshift_facts/defaults/main.yml.
l_docker_creds_image_dict:
openshift-enterprise: 'openshift3/ose'
origin: 'openshift/origin'
l_docker_creds_test_image: "{{ l_docker_creds_image_dict[openshift_deployment_type] }}"

The default value - 'openshift3/ose' is not set properly, should be some other value, such as: "openshift3/ose-pod:{{openshift_image_tag}}" or "openshift3/ose-control-plane:{{openshift_image_tag}}"

When you are running the install against the internal registry which need authentication, then would hit your issue. Another workaround is set oreg_test_login=false to skip the cred check.

Comment 1 Johnny Liu 2019-08-07 05:48:20 UTC
Reproduce this bug with openshift-ansible-3.11.135-1.git.0.b7ad55a.el7.noarch

1. Enable auth for your local registry.
2. Mirror all needed 3.11 images to your local registry.
3. Trigger an installation using the mirror registry.
openshift_deployment_type=openshift-enterprise
oreg_url=vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000/testing/ocp3/ose-${component}:${version}
osm_etcd_image=vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000/rhel7/etcd:3.2.22
openshift_examples_modify_imagestreams=true
openshift_docker_insecure_registries=vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000
reg_auth_user=dummy
oreg_auth_password=dummy

Install log:
TASK [container_runtime : Create credentials for oreg_url] *********************
Wednesday 07 August 2019  13:11:17 +0800 (0:00:03.138)       0:02:11.411 ****** 
FAILED - RETRYING: Create credentials for oreg_url (3 retries left).

FAILED - RETRYING: Create credentials for oreg_url (2 retries left).

FAILED - RETRYING: Create credentials for oreg_url (1 retries left).

fatal: [ci-vm-10-0-149-8.hosted.upshift.rdu2.redhat.com]: FAILED! => {"attempts": 3, "changed": false, "msg": "time=\"2019-08-07T01:11:32-04:00\" level=fatal msg=\"Error reading manifest latest in vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000/openshift3/ose: manifest unknown: manifest unknown\" \n", "state": "unknown"}

Comment 4 Johnny Liu 2019-08-26 11:39:43 UTC
Re-test this bug with openshift-ansible-3.11.141-1.git.0.a7e91cd.el7.noarch, failed though the fix PR is merged and take effect.


TASK [container_runtime : Create credentials for oreg_url] *********************
Monday 26 August 2019  19:15:39 +0800 (0:00:03.395)       0:02:04.561 ********* 
FAILED - RETRYING: Create credentials for oreg_url (3 retries left).

FAILED - RETRYING: Create credentials for oreg_url (2 retries left).

FAILED - RETRYING: Create credentials for oreg_url (1 retries left).

fatal: [ci-vm-10-0-151-144.hosted.upshift.rdu2.redhat.com]: FAILED! => {"attempts": 3, "changed": false, "msg": "time=\"2019-08-26T07:15:54-04:00\" level=fatal msg=\"Error reading manifest latest in vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000/openshift3/ose-pod: manifest unknown: manifest unknown\" \n", "state": "unknown"}

Seem like test_image is missing to specify tag. From customer perspective, not always create latest in a disconnected registry.

Comment 7 Johnny Liu 2019-09-16 08:48:12 UTC
Retest this bug with openshift-ansible-3.11.146-1.git.0.fcedb45.el7.noarch, this bug is fixed partially.

When disconnected registry is using some non-default repository, but not the default 'openshift3', the following steps would fail.
oreg_url=vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000/testing/ocp3/ose-${component}:${version}
oreg_auth_user=dummy
oreg_auth_password=dummy

TASK [container_runtime : Create credentials for oreg_url] *********************
Monday 16 September 2019  13:58:13 +0800 (0:00:03.353)       0:02:07.346 ****** 
FAILED - RETRYING: Create credentials for oreg_url (3 retries left).
FAILED - RETRYING: Create credentials for oreg_url (2 retries left).
FAILED - RETRYING: Create credentials for oreg_url (1 retries left).
fatal: [ci-vm-10-0-149-197.hosted.upshift.rdu2.redhat.com]: FAILED! => {"attempts": 3, "changed": false, "msg": "time=\"2019-09-16T01:58:28-04:00\" level=fatal msg=\"Error reading manifest v3.11.141 in vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5000/openshift3/ose-pod: manifest unknown: manifest unknown\" \n", "state": "unknown"}

Installer is trying to detect 'openshift3/ose-pod', but not 'testing/ocp3/ose-pod' image.

When disconnected registry is using the default repository - 'openshift3', the installation is completed successfully.

Comment 10 Johnny Liu 2019-11-04 11:47:20 UTC
Verified this bug with openshift-ansible-3.11.154-1.git.0.7a11cbe.el7.noarch, and PASS.

Ensure no vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5001/openshift3/ose-pod:{version} available.

openshift_deployment_type=openshift-enterprise
oreg_url=vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5001/testing/ocp3/ose-${component}:${version}
oreg_auth_user=dummy
oreg_auth_password=dummy
osm_etcd_image=vm-10-0-77-82.hosted.upshift.rdu2.redhat.com:5001/rhel7/etcd:3.2.22


TASK [container_runtime : Create credentials for oreg_url] *********************
Monday 04 November 2019  19:39:47 +0800 (0:00:03.618)       0:01:45.389 ******* 
changed: [ci-vm-10-0-151-156.hosted.upshift.rdu2.redhat.com] => {"attempts": 1, "changed": true, "rc": 0}

Comment 12 errata-xmlrpc 2019-11-18 14:52:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3817