Bug 1708663

Summary: OCP4.1 UPI installation fails to create bootstrap-machine-config-operator
Product: OpenShift Container Platform Reporter: Lukas Bednar <lbednar>
Component: ContainersAssignee: Urvashi Mohnani <umohnani>
Status: CLOSED DUPLICATE QA Contact: weiwei jiang <wjiang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, dwalsh, jokerman, mmccomas, nagrawal
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-10 15:41:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log-bundle.tar.gz generated by openshift-install gather bootstrap none

Description Lukas Bednar 2019-05-10 14:00:27 UTC
Created attachment 1566672 [details]
log-bundle.tar.gz generated by openshift-install gather bootstrap

Description of problem:

I am trying to install OCP-4.1 in UPI mode.

I use latest openshift-install-linux-4.1.0-rc.1 installer and rhcos-410.8.20190502.0 image.
I have dns, http server for bootstrap ignition server, haproxy as described here: https://github.com/openshift/installer/blob/master/docs/user/metal/install_upi.md

Then I bring bootstrap, master and worker nodes up.
Bootstrap is able to pick ignition config from http server, and waits for etcd cluster.
Master and worker nodes are having problem to get their machine-configs.

[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (disks) (59min 6s / no limit)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Ignition (disks) (59min 7s / no limit)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (disks) (13h 59min 7s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (disks) (13h 59min 8s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ignition (disks) (13h 59min 8s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (disks) (13h 59min 9s / no limit)[K[[0m[0;31m*     [0m] A start job is running for Ignition (disks) (13h 59min 9s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (disks) (13h 59min 10s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ignition (disks) (13h 59min 10s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (disks) (13h 59min 11s / no limit)[50353.819223] ignition[607]: GET https://api-int.working.oc4:22623/config/master: attempt #10053
[50353.824470] ignition[607]: GET error: Get https://api-int.working.oc4:22623/config/master: EOF

api-int.working.oc4 points to haproxy, which do load-balance between bootstrap and master nodes.
When I try to access https://bootstrap.working.oc4:22623/config/master it tells me "connection refused".
When I go to bootstrap machine I don't see 22623 port to be bind.

And in log I see that it could not crate bootstrap-machine-config-operator-host .

May 09 16:52:39 host-172-16-0-23 hyperkube[1181]: E0509 16:52:39.141412    1181 pod_workers.go:190] Error syncing pod 50348b3c4c0a3abff8cb6c0c802ea28e ("bootstrap-machine-config-operator-host-172-16-0-23_default(50348b3c4c0a3abff8cb6c0c802ea28e)"), skipping: failed to "CreatePodSandbox" for "bootstrap-machine-config-operator-host-172-16-0-23_default(50348b3c4c0a3abff8cb6c0c802ea28e)" with CreatePodSandboxError: "CreatePodSandbox for pod \"bootstrap-machine-config-operator-host-172-16-0-23_default(50348b3c4c0a3abff8cb6c0c802ea28e)\" failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_bootstrap-machine-config-operator-host-172-16-0-23_default_50348b3c4c0a3abff8cb6c0c802ea28e_0\": layer not known"

Version-Release number of the following components:
* openshift-install-linux-4.1.0-rc.1 installer
* rhcos-410.8.20190502.0 image.

How reproducible: 100

Steps to Reproduce:
1. I am following https://github.com/openshift/installer/blob/master/docs/user/metal/install_upi.md

Actual results:
Installation fails on timeout, because bootstrap-machine-config-operator failed to be created.

Expected results:
Installation success.

Additional info:
Attached logs collected by openshift-install gather bootstrap command.

Comment 1 Scott Dodson 2019-05-10 14:49:12 UTC
Looks identical to https://bugzilla.redhat.com/show_bug.cgi?id=1695516 however the image referenced should have that fix

Comment 3 Urvashi Mohnani 2019-05-10 15:34:04 UTC
The issue is fixed in cri-o 1.13.9