Bug 1848752

Summary: OCP 4.4 disconnected installation with static ip fails to install and tried to pull images from quay.io instead of internal registry
Product: OpenShift Container Platform Reporter: Novonil Choudhuri <nchoudhu>
Component: NodeAssignee: Urvashi Mohnani <umohnani>
Status: CLOSED NOTABUG QA Contact: Sunil Choudhary <schoudha>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: aos-bugs, bleanhar, dmoessne, dwalsh, jokerman, ltourrea, mfuruta, pfruth, syangsao, tsweeney, umohnani
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-07 16:46:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Novonil Choudhuri 2020-06-18 21:25:13 UTC
Description of problem: 

Customer trying disconnected install of OCP4.4 on VMWare with static ips and getting following install errors. Why is the installer referring to quay.io when it is suppose to pull images from the internal mirror registry ?

Error: error getting image "quay.io/openshift-release-dev/ocp-release@sha256:039a4ef7c128a049ccf916a1d68ce93e8f5494b44d5a75df60c85e9e7191dacc": unable to find 'quay.io/openshift-release-dev/ocp-release@sha256:039a4ef7c128a049ccf916a1d68ce93e8f5494b44d5a75df60c85e9e7191dacc' in local storage: no such image
Warning: Could not resolve release image to pull by digest

CU confirmed he can reach the mirror registry from the bootstrap
$ curl -u ocpuser:<redacted> -k https://ptl01ocpjmphst101:5000/v2/_catalog
{"repositories":["ocp4/openshift4"]}

Version-Release number of the following components:
OCP 4.4 GA

How reproducible: Following https://docs.openshift.com/container-platform/4.4/installing/installing_vsphere/installing-restricted-networks-vsphere.html#installation-vsphere-machines_installing-restricted-networks-vsphere

Install-config.yaml file :

apiVersion: v1
baseDomain: dc.containers.xxxx.com
compute:
- hyperthreading: Enabled   
  name: worker
  replicas: 0 
controlPlane:
  hyperthreading: Enabled   
  name: master
  replicas: 3 
metadata:
  name: test2
platform:
  vsphere:
    vcenter: pda01vcntr002.na.xxxx.com
    username: na\bby-s-dotcominfraops
    password: <redacted>
    datacenter: 'Bloomington DC - NonProduction'
    defaultDatastore: BN01C-NTNX-LCP-LUN01
fips: false 
pullSecret: '{"auths":{"ptl01ocpjmphst101.na.xxxx.com:5000": {"auth": "<redacted>","email": "dotcomcontainerops"}}}' 
sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDyWgtLIAzTnn1vy21XukG1vZLTv6M+iNCF5+JYsks1WjXlfenD3sV011vMl2JxAvJlMxCoC80hzkFNWR+OCRpEoNaZfOtWRImKIHRqNRQQbhOaxYkXlIyX6Cbfbd4lC+F8mP6hVvL4akN8OAh/3ri8/GjjheEFuG68l/p9clTr2Jy13h7B3BY/o3iyQ0BHm9yzCrT90XtcdZaQulFRKLjrJuIGBKbmJILpikApveRfoNu69kWCFt82U2z0kJtaaMQC9kvRJi/vwwbFAj4mLrqmxHz0PxA8+pDVbXdvaxBtt4KVMgzu4i2nwj1ZH2Q6VtHMiACcaDSvOmVM6HghW6iksp2zvgkhnLjctV224MEsFDh/ywr7byF1Afu6YS7OD2gllH/DylpGJiDP25sN7bEjKVMB1RQ9j/2Ue/AKBj5JJB8yhAtSSnPbNmTyFDRTdG7wqyl+saC8ir+PG6p88mi/HbtNWd1erzgoq7zlIczwVfH1rChH4xMH2N8tfVf0W0dqzcyQAoFt05T9o4ZwxOJ5hrXmsGCYMH1u4IrWNlw28cEKSnoLmw5APSVG6i41/M0Un8YFJa6plaWApOQoqwyew7wKL/nmcnq5/1a9OQeeYj8wHmChAunXLnQ8bSS37LfhgHsj4CZh2IRQpz0vV5iHAYtOTsDJPTcclE8dO8b4Pw== u1202294.xxxxx.com'

additionalTrustBundle: | 


imageContentSources:
- mirrors:
  - ptl01ocpjmphst101.na.xxxx.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - ptl01ocpjmphst101.na.xxxx.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev


Actual results: OCP install trying to pull images from quay.io instead of pulling from internal registry


Expected results: OCP install should refer to internal registry instead of going to the internet.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Abhinav Dahiya 2020-06-19 14:15:22 UTC
The pull from external registry is dinner by crio, so moving to container team to triage or help fix the error better.

Comment 5 pfruth 2020-07-06 23:09:51 UTC
I found this BZ because I was having a very similar problem during an airgap install of OCP 4.4.10 on vSphere UPI.
I am using an install-config.yaml structured very much like the one depicted in the opening post of this BZ.
I thought I'd provide some feedback here for a couple reasons;
1) As a testimonial that a disconnected install (aka. airgap) does infact work with the OCP 4.4.10 level of code
2) As a suggestion to check your pullSecret, if you see the error messages (as seen above, and as shown in the example below)

The TL;DR is, in my case, the problem was ultimately due to an error in the way I created the credentials used in the pullSecret.

The longer story

During startup of the bootstrap server, I was seeing these messages being repeatedly issued in the output of 'journalctl -b -f -u bootkube.service'

...
....
Jul 05 23:55:57 bootstrap bootkube.sh[3467]: Starting etcd certificate signer...
Jul 05 23:55:58 bootstrap bootkube.sh[3467]: Error: unable to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e9c82181e91ed172383760a970053304ba850608cfcaa6e4749661811132e454: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e9c82181e91ed172383760a970053304ba850608cfcaa6e4749661811132e454: (Mirrors also failed: [installer.internal.net:5000/x86_64/4.4.10@sha256:e9c82181e91ed172383760a970053304ba850608cfcaa6e4749661811132e454: Error reading manifest sha256:e9c82181e91ed172383760a970053304ba850608cfcaa6e4749661811132e454 in installer.internal.net:5000/x86_64/4.4.10: unauthorized: authentication required]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e9c82181e91ed172383760a970053304ba850608cfcaa6e4749661811132e454: Error reading manifest sha256:e9c82181e91ed172383760a970053304ba850608cfcaa6e4749661811132e454 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized
Jul 05 23:55:58 bootstrap bootkube.sh[3467]: Error: Failed to evict container: "": Failed to find container "etcd-signer" in state: no container with name or ID etcd-signer found: no such container
Jul 05 23:55:58 bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Jul 05 23:55:58 bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.
Jul 05 23:56:03 bootstrap systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Jul 05 23:56:03 bootstrap systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 2.
Jul 05 23:56:03 bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Jul 05 23:56:03 bootstrap systemd[1]: Started Bootstrap a Kubernetes cluster.
....
...


The fundamental cause of my problem was due to the way I generated the base64 encoded userid:password portion of the pullSecret.
I incorrectly used;
echo "userid:password" | base64 -w0

when I should have used;
echo -n "userid:password" | base64 -w0

Notice the "-n" option must be used to suppress the newline character being included in the base64 encoding.

After I created a pullSecret with the correctly generated credential, I am now successfully able to perform an airgap install, using a private image registry, on OCP 4.4.10