1720872 – Openshift 4.1.1/4.1.2 Bare Metal Install Fails due to MCO not starting

Bug 1720872 - Openshift 4.1.1/4.1.2 Bare Metal Install Fails due to MCO not starting

Summary: Openshift 4.1.1/4.1.2 Bare Metal Install Fails due to MCO not starting

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.1.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Steve Milner
QA Contact:	Micah Abbott
Docs Contact:
URL:
Whiteboard:	4.1.3
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-06-16 02:35 UTC by Glenn West
Modified:	2019-06-26 08:51 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: A race condition existed between the MCO and the growpart service. Consequence: Bootstrapping could fail. Fix: The growpart service was updated to become a oneshot and to start before kubelet.service Result: The race condition is averted
Clone Of:
Environment:
Last Closed:	2019-06-26 08:50:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:1589	0	None	None	None	2019-06-26 08:51:05 UTC

Description Glenn West 2019-06-16 02:35:35 UTC

Description of problem:
Openshift Install fails 
1. Machine config operator not running
2. Masters/control plane retrying to get config from machine config operator/never completing boot
3. Bootstrap waiting on etcd, and failing
4. Overall install fails


Jun 15 06:19:49 bootstrap-0 hyperkube[1375]: E0615 06:19:49.270441    1375 file.go:108] Unable to process watch event: can't process config file "/etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml": /etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml: couldn't parse as pod(Object 'Kind' is missing in 'null'), please check config file.
Jun 15 06:19:49 bootstrap-0 hyperkube[1375]: I0615 06:19:49.364262    1375 kubelet.go:1915] SyncLoop (ADD, "file"): "bootstrap-machine-config-operator-bootstrap-0_default(eb26c1cc7cd8e6521dfc95d1a59cd87f)"
Jun 15 06:19:49 bootstrap-0 hyperkube[1375]: W0615 06:19:49.364359    1375 eviction_manager.go:160] Failed to admit pod bootstrap-machine-config-operator-bootstrap-0_default(eb26c1cc7cd8e6521dfc95d1a59cd87f) - node has conditions: [DiskPressure]

Crio Issues (Same Install):
Jun 15 06:19:11 bootstrap-0 openshift.sh[1584]: kubectl create --filename ./99_binding-discovery.yaml failed. Retrying in 5 seconds...
Jun 15 06:19:11 bootstrap-0 systemd[1]: Stopping Open Container Initiative Daemon...
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507439    1375 controlbuf.go:382] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: W0615 06:19:11.507585    1375 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {/var/run/crio/crio.sock 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory". Reconnecting...
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507607    1375 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0001cdc80, TRANSIENT_FAILURE
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507616    1375 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0001cdc80, CONNECTING
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507624    1375 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc0001cdc80, TRANSIENT_FAILURE
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507650    1375 controlbuf.go:382] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: W0615 06:19:11.507709    1375 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {/var/run/crio/crio.sock 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory". Reconnecting...
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507723    1375 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc00014a6e0, TRANSIENT_FAILURE
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507733    1375 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc00014a6e0, CONNECTING
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: I0615 06:19:11.507740    1375 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc00014a6e0, TRANSIENT_FAILURE
Jun 15 06:19:11 bootstrap-0 systemd[1]: Stopped Open Container Initiative Daemon.
Jun 15 06:19:11 bootstrap-0 systemd[1]: Starting Open Container Initiative Daemon...
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: E0615 06:19:11.606854    1375 remote_runtime.go:173] ListPodSandbox with filter &PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory"
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: E0615 06:19:11.607251    1375 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory"
Jun 15 06:19:11 bootstrap-0 hyperkube[1375]: E0615 06:19:11.607315    1375 kubelet_pods.go:1022] Error listing containers: &status.statusError{Code:14, Message:"all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory\"", Details:[]*any.Any(nil)}
Jun 15 06


Version-Release number of selected component (if applicable):

Installer:
openshift-install v4.1.1-201906040019-dirty
built from commit fb776038a1d90b2b83839ab5deb8579287972e11
release image quay.io/openshift-release-dev/ocp-release@sha256:e9415dbf80988553adc6c34740243805a21d92e3cdedeb2fd8d743ca56522a61

Environment:
1. Bare Metal Install - ESXI, pxe boot 
2. dnsmasq providing IP
3. Private ESXI Network with RHEL Ctl node providing nat internet access
4. All nodes pxe boot, and pull ignition files from web server

Full log  from bootstrap:
https://gist.github.com/glennswest/7828c2572feafd80b4d1541a2245a4ef

How reproducible:
Scripted, every time the same. Hard Failure. (1 week of countless attempts)


Steps to Reproduce:
1. Create install-config.yml per bare metal doc
2. Use latest RHCOS image and install components from pxe
3. Validate time is set correctly
4. Generate ignition scripts each time in empty directory


Actual results:
Failed Instal

Expected results:
Working Cluster

Additional info:

Comment 1 Glenn West 2019-06-16 02:55:02 UTC

Problem duplicated on 4.1.2

Jun 15 18:50:15 bootstrap-0 bootkube.sh[1575]: Writing asset: /assets/kube-controller-manager-bootstrap/manifests/00_openshift-kube-controller-manager-operator-ns.yaml
Jun 15 18:50:32 bootstrap-0 hyperkube[1348]: E0615 18:50:32.388397    1348 file.go:108] Unable to process watch event: can't process config file "/etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml": /etc/kubernetes/manifests/machineconfigoperator-bootstrap-pod.yaml: couldn't parse as pod(Object 'Kind' is missing in 'null'), please check config file.
Jun 15 18:50:32 bootstrap-0 hyperkube[1348]: I0615 18:50:32.477829    1348 kubelet.go:1915] SyncLoop (ADD, "file"): "bootstrap-machine-config-operator-bootstrap-0_default(eb26c1cc7cd8e6521dfc95d1a59cd87f)"
Jun 15 18:50:32 bootstrap-0 hyperkube[1348]: W0615 18:50:32.477923    1348 eviction_manager.go:160] Failed to admit pod bootstrap-machine-config-operator-bootstrap-0_default(eb26c1cc7cd8e6521dfc95d1a59cd87f) - node has conditions: [DiskPressure]

Comment 2 Glenn West 2019-06-16 03:01:52 UTC

Openshift install binery for 4.1.2 is not updated to 4.1.2 hash.

Comment 3 Antonio Murdaca 2019-06-16 12:29:52 UTC

Jun 15 18:50:32 bootstrap-0 hyperkube[1348]: W0615 18:50:32.477923    1348 eviction_manager.go:160] Failed to admit pod bootstrap-machine-config-operator-bootstrap-0_default(eb26c1cc7cd8e6521dfc95d1a59cd87f) - node has conditions: [DiskPressure]

The above says that the bootstrap node is under disk pressure - this seems just expected for the pod not to start to me.

Comment 4 Glenn West 2019-06-16 23:48:20 UTC

The node has lots of space

[core@bootstrap-0 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        7.8G     0  7.8G   0% /dev
tmpfs           7.9G     0  7.9G   0% /dev/shm
tmpfs           7.9G  1.2M  7.9G   1% /run
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/sda3       319G  4.8G  315G   2% /sysroot
/dev/sda2       976M   72M  838M   8% /boot
tmpfs           1.6G     0  1.6G   0% /run/user/1000
[core@bootstrap-0 ~]$ 


Dont see any disk preasure.

I started at 200G (Doc says 160 is enough)
I then upgrade all nodes to 320 Gig disks.
It didnt change.

They all have 16Gig ram as well.

Comment 5 Glenn West 2019-06-17 00:38:49 UTC

There may be a race condition:
While this does seem to happen:
Jun 16 16:14:40 bootstrap-0 hyperkube[1388]: I0616 16:14:40.999242    1388 kubelet.go:1915] SyncLoop (ADD, "file"): "bootstrap-machine-config-operator-bootstrap-0_default(7b20d6bcdff316762adc6abcea4bda05)"
Jun 16 16:14:40 bootstrap-0 hyperkube[1388]: W0616 16:14:40.999310    1388 eviction_manager.go:160] Failed to admit pod bootstrap-machine-config-operator-bootstrap-0_default(7b20d6bcdff316762adc6abcea4bda05) - node has conditions: [DiskPressure]

The partition is expanded at:
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]: meta-data=/dev/sda3              isize=512    agcount=4, agsize=136768 blks
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]:          =                       sectsz=512   attr=2, projid32bit=1
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]:          =                       reflink=1
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]: data     =                       bsize=4096   blocks=547072, imaxpct=25
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]:          =                       sunit=0      swidth=0 blks
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]: log      =internal log           bsize=4096   blocks=2560, version=2
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]: realtime =none                   extsz=4096   blocks=0, rtextents=0
Jun 16 16:13:00 bootstrap-0 coreos-growpart[1302]: data blocks changed from 547072 to 41680379


The node shows disk pressure 16:12 thru 16:18 only

Looking at it a bit more detailed:
Jun 16 16:12:50 bootstrap-0 hyperkube[1388]: I0616 16:12:50.705833    1388 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node bootstrap-0
Jun 16 16:12:50 bootstrap-0 hyperkube[1388]: I0616 16:12:50.799577    1388 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node bootstrap-0
Jun 16 16:12:50 bootstrap-0 hyperkube[1388]: I0616 16:12:50.809266    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:13:00 bootstrap-0 hyperkube[1388]: I0616 16:13:00.814215    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:13:10 bootstrap-0 hyperkube[1388]: I0616 16:13:10.821837    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:13:20 bootstrap-0 hyperkube[1388]: I0616 16:13:20.873645    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:13:40 bootstrap-0 hyperkube[1388]: I0616 16:13:40.729792    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:13:50 bootstrap-0 hyperkube[1388]: I0616 16:13:50.741057    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:14:00 bootstrap-0 hyperkube[1388]: I0616 16:14:00.751720    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:14:10 bootstrap-0 hyperkube[1388]: I0616 16:14:10.761797    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:14:23 bootstrap-0 hyperkube[1388]: I0616 16:14:23.026863    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:14:34 bootstrap-0 hyperkube[1388]: I0616 16:14:34.234750    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:14:40 bootstrap-0 hyperkube[1388]: W0616 16:14:40.999310    1388 eviction_manager.go:160] Failed to admit pod bootstrap-machine-config-operator-bootstrap-0_default(7b20d6bcdff316762adc6abcea4bda05) - node has conditions: [DiskPressure]
Jun 16 16:14:44 bootstrap-0 hyperkube[1388]: I0616 16:14:44.297265    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:14:55 bootstrap-0 hyperkube[1388]: I0616 16:14:55.687715    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:15:05 bootstrap-0 hyperkube[1388]: I0616 16:15:05.698484    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:15:15 bootstrap-0 hyperkube[1388]: I0616 16:15:15.708075    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:15:25 bootstrap-0 hyperkube[1388]: I0616 16:15:25.717638    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:15:35 bootstrap-0 hyperkube[1388]: I0616 16:15:35.727809    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:15:45 bootstrap-0 hyperkube[1388]: I0616 16:15:45.737063    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:15:55 bootstrap-0 hyperkube[1388]: I0616 16:15:55.746290    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:16:05 bootstrap-0 hyperkube[1388]: I0616 16:16:05.755426    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:16:15 bootstrap-0 hyperkube[1388]: I0616 16:16:15.765174    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:16:25 bootstrap-0 hyperkube[1388]: I0616 16:16:25.775507    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:16:35 bootstrap-0 hyperkube[1388]: I0616 16:16:35.784761    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:16:45 bootstrap-0 hyperkube[1388]: I0616 16:16:45.794190    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:16:55 bootstrap-0 hyperkube[1388]: I0616 16:16:55.803434    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:17:05 bootstrap-0 hyperkube[1388]: I0616 16:17:05.812588    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:17:15 bootstrap-0 hyperkube[1388]: I0616 16:17:15.827270    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:17:25 bootstrap-0 hyperkube[1388]: I0616 16:17:25.835891    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:17:35 bootstrap-0 hyperkube[1388]: I0616 16:17:35.845064    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:17:45 bootstrap-0 hyperkube[1388]: I0616 16:17:45.854104    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:17:55 bootstrap-0 hyperkube[1388]: I0616 16:17:55.863340    1388 kubelet_node_status.go:446] Recording NodeHasDiskPressure event message for node bootstrap-0
Jun 16 16:18:05 bootstrap-0 hyperkube[1388]: I0616 16:18:05.879401    1388 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node bootstrap-0
Jun 16 16:18:15 bootstrap-0 hyperkube[1388]: I0616 16:18:15.889038    1388 kubelet_node_status.go:446] Recording NodeHasNoDiskPressure event message for node bootstrap-0

Comment 6 Glenn West 2019-06-17 05:51:42 UTC

I believe there some good evidence that this is a race condition. I've tried reducing the vcpu to 1, and I find that all the masters come up, and the bootstrap machine config operator does seem to start with this case. I still dont have a working config, but its later in the process. Note that 1 vcpu is not
a recommended config.

The bootstrap log
https://gist.github.com/9d794b59d2b3f46696614ece0239ff4c

I tried 4.1.0 and 4.1.1 and had similar results.

Comment 7 Glenn West 2019-06-17 06:13:14 UTC

I tried to increase vcpu to 8 for all vm's, and hit the same problem
Jun 17 06:07:51 bootstrap-0 bootkube.sh[1689]: Writing asset: /assets/kube-apiserver-bootstrap/manifests/00_openshift-kube-apiserver-operator-ns.yaml
Jun 17 06:07:58 bootstrap-0 bootkube.sh[1689]: Writing asset: /assets/kube-controller-manager-bootstrap/manifests/00_openshift-kube-controller-manager-operator-ns.yaml
Jun 17 06:08:15 bootstrap-0 hyperkube[1465]: I0617 06:08:15.391171    1465 kubelet.go:1915] SyncLoop (ADD, "file"): "bootstrap-machine-config-operator-bootstrap-0_default(7b20d6bcdff316762adc6abcea4bda05)"
Jun 17 06:08:15 bootstrap-0 hyperkube[1465]: W0617 06:08:15.391239    1465 eviction_manager.go:160] Failed to admit pod bootstrap-machine-config-operator-bootstrap-0_default(7b20d6bcdff316762adc6abcea4bda05) - node has conditions: [DiskPressure]


So only 1 vcpu can I get any movement on the install, and 1 seems to cause problems later.

Comment 8 Antonio Murdaca 2019-06-17 10:22:33 UTC

This doesn't seem an MCO issue as described though. This is the kubelet/kubernetes not admitting the MCO pod.
I'll move this to Node, reassign it back if it's something related to the MCO specifically.

Comment 9 Ryan Phillips 2019-06-17 19:42:44 UTC

There is a race with the kubelet having started while coreos-growpart has started or in the process of running. The coreos-growpart service needs a Type=oneshot to allow the service to be marked started after the main process has exited (completing the resize event).

Comment 10 Ryan Phillips 2019-06-17 19:55:07 UTC

I added the growpart service to the kubelet unit file here. After [1] merges and the tweak is made to coreos-growpart, then the kubelet will wait for the resize event to complete before starting.

1. https://github.com/openshift/machine-config-operator/pull/861

Comment 12 Ryan Phillips 2019-06-17 20:02:07 UTC

I closed my PR which is not needed with Colin's fix.

Comment 15 Colin Walters 2019-06-18 20:20:43 UTC

> when will a image be available for test?

After https://bugzilla.redhat.com/show_bug.cgi?id=1720872#c13 merges.

Hmm, this is about the bootstrap machine, so it will require an update to the installer too and new published bootimages on the mirrors.

Comment 17 Micah Abbott 2019-06-19 19:36:54 UTC

The changes are in 4.1.0-0.nightly-2019-06-19-033215

```
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-06-19-033215   True        False         3h6m    Cluster version is 4.1.0-0.nightly-2019-06-19-033215

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-131-47.us-west-2.compute.internal    Ready    worker   3h12m   v1.13.4+9252851b0
ip-10-0-141-41.us-west-2.compute.internal    Ready    master   3h17m   v1.13.4+9252851b0
ip-10-0-146-21.us-west-2.compute.internal    Ready    master   3h17m   v1.13.4+9252851b0
ip-10-0-154-37.us-west-2.compute.internal    Ready    worker   3h12m   v1.13.4+9252851b0
ip-10-0-168-93.us-west-2.compute.internal    Ready    worker   3h11m   v1.13.4+9252851b0
ip-10-0-174-123.us-west-2.compute.internal   Ready    master   3h17m   v1.13.4+9252851b0

$ oc debug node/ip-10-0-131-47.us-west-2.compute.internal
Starting pod/ip-10-0-131-47us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl cat coreos-growpart.service 
# /usr/lib/systemd/system/coreos-growpart.service
[Unit]
ConditionPathExists=!/var/lib/coreos-growpart.stamp
Before=sshd.service kubelet.service
[Service]
Type=oneshot
ExecStart=/usr/libexec/coreos-growpart /
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
```

Glenn, could you confirm that the changes fix the issue you encountered?  (I don't have access to the necessary bare metal resources to confirm)

After you confirm, I'll move this to VERIFIED.

Comment 24 Glenn West 2019-06-21 02:08:31 UTC

Short answer - It does appear on first try to resolve the problem.
Also accidently included workers, and that worked as well? Is that supposed to work now. (Big win for me)
Authentication seems to be not ok, but I have not done any of the steps after the wait for compete, and I believe there is some additional steps.

Will rerun this several times to make sure it rock solid.

Also I didnt change the installer, so its:
openshift-install unreleased-master-995-gc6517384e71e5f09931c4da5e772fdec225d02ec-dirty
built from commit c6517384e71e5f09931c4da5e772fdec225d02ec
release image quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b



"Create Project: tiny"DEBUG OpenShift Installer unreleased-master-995-gc6517384e71e5f09931c4da5e772fdec225d02ec-dirty 
DEBUG Built from commit c6517384e71e5f09931c4da5e772fdec225d02ec 
INFO Waiting up to 30m0s for the Kubernetes API at https://api.tiny.k.lo:6443... 
DEBUG Still waiting for the Kubernetes API: Get https://api.tiny.k.lo:6443/version?timeout=32s: dial tcp 10.100.1.30:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: Get https://api.tiny.k.lo:6443/version?timeout=32s: dial tcp 10.100.1.30:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: Get https://api.tiny.k.lo:6443/version?timeout=32s: dial tcp 10.100.1.31:6443: connect: connection refused 
INFO API v1.13.4+d4417a7 up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
DEBUG Bootstrap status: complete                   
INFO It is now safe to remove the bootstrap resources 

[root@ctl ocp4wlab]# ./clusteroperator.sh
NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       False      33s
cloud-credential                     4.1.2     True        False         False      4m42s
cluster-autoscaler                   4.1.2     True        False         False      4m44s
dns                                  4.1.2     True        False         False      3m48s
kube-apiserver                       4.1.2     True        True          True       115s
kube-controller-manager              4.1.2     True        True          False      111s
kube-scheduler                       4.1.2     True        True          False      2m5s
machine-api                          4.1.2     True        False         False      4m37s
machine-config                       4.1.2     True        False         False      3m4s
network                              4.1.2     True        False         False      4m26s
node-tuning                          4.1.2     True        False         False      24s
openshift-apiserver                  4.1.2     True        False         False      38s
openshift-controller-manager         4.1.2     True        False         False      3m48s
operator-lifecycle-manager           4.1.2     True        True          False      3m8s
operator-lifecycle-manager-catalog   4.1.2     True        True          False      3m9s
service-ca                           4.1.2     True        False         False      4m29s
service-catalog-apiserver            4.1.2     True        False         False      32s
service-catalog-controller-manager   4.1.2     True        False         False      33s

[root@ctl ocp4wlab]# ./clusteroperatorstatus.sh
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:59:15Z"
    generation: 1
    name: authentication
    resourceVersion: "8786"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/authentication
    uid: 2c430e6f-93c8-11e9-b6fb-0050561f3131
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      message: 'Degraded: failed handling the route: route has no host: &v1.Route{TypeMeta:v1.TypeMeta{Kind:"",
        APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"oauth-openshift", GenerateName:"",
        Namespace:"openshift-authentication", SelfLink:"/apis/route.openshift.io/v1/namespaces/openshift-authentication/routes/oauth-openshift",
        UID:"2c787efd-93c8-11e9-a434-0a580a80000e", ResourceVersion:"6572", Generation:0,
        CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63696679156, loc:(*time.Location)(0x2b32340)}},
        DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil),
        Labels:map[string]string{"app":"oauth-openshift"}, Annotations:map[string]string(nil),
        OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil),
        Finalizers:[]string(nil), ClusterName:""}, Spec:v1.RouteSpec{Host:"", Subdomain:"",
        Path:"", To:v1.RouteTargetReference{Kind:"Service", Name:"oauth-openshift",
        Weight:(*int32)(0xc42077555c)}, AlternateBackends:[]v1.RouteTargetReference(nil),
        Port:(*v1.RoutePort)(0xc420833920), TLS:(*v1.TLSConfig)(0xc421278c60), WildcardPolicy:"None"},
        Status:v1.RouteStatus{Ingress:[]v1.RouteIngress(nil)}}'
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: NoData
      status: Unknown
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: NoData
      status: Unknown
      type: Available
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: NoData
      status: Unknown
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: authentications
    - group: config.openshift.io
      name: cluster
      resource: authentications
    - group: config.openshift.io
      name: cluster
      resource: infrastructures
    - group: config.openshift.io
      name: cluster
      resource: oauths
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-authentication
      resource: namespaces
    - group: ""
      name: authentication-operator
      resource: namespaces
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:06Z"
    generation: 1
    name: cloud-credential
    resourceVersion: "2283"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/cloud-credential
    uid: 97dce012-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:55:06Z"
      message: No credentials requests reporting errors.
      reason: NoCredentialsFailing
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:55:11Z"
      message: 4 of 4 credentials requests provisioned and reconciled.
      reason: ReconcilingComplete
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:55:06Z"
      status: "True"
      type: Available
    extension: null
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:04Z"
    generation: 1
    name: cluster-autoscaler
    resourceVersion: "2938"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/cluster-autoscaler
    uid: 9649c8ad-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:55:04Z"
      message: at version 4.1.2
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:20Z"
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:55:20Z"
      status: "False"
      type: Degraded
    extension: null
    relatedObjects:
    - group: ""
      name: openshift-machine-api
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:59:49Z"
    generation: 1
    name: console
    resourceVersion: "8216"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/console
    uid: 40650cb4-93c8-11e9-b6fb-0050561f3131
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:59:51Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:59:51Z"
      message: "Progressing: route: waiting on route host\nProgressing: "
      reason: ProgressingSyncLoopProgressing
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:59:50Z"
      reason: NoData
      status: Unknown
      type: Available
    - lastTransitionTime: "2019-06-21T01:59:50Z"
      reason: AsExpected
      status: "True"
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: consoles
    - group: config.openshift.io
      name: cluster
      resource: consoles
    - group: config.openshift.io
      name: cluster
      resource: infrastructures
    - group: oauth.openshift.io
      name: console
      resource: oauthclients
    - group: ""
      name: openshift-console-operator
      resource: namespaces
    - group: ""
      name: openshift-console
      resource: namespaces
    - group: ""
      name: console-public
      namespace: openshift-config-managed
      resource: configmaps
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:38Z"
    generation: 1
    name: dns
    resourceVersion: "4107"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/dns
    uid: aa8eaf79-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:56:00Z"
      message: All desired DNS DaemonSets available and operand Namespace exists
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:56:02Z"
      message: Desired and available number of DNS DaemonSets are equal
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:56:00Z"
      message: At least 1 DNS DaemonSet available
      reason: AsExpected
      status: "True"
      type: Available
    extension: null
    relatedObjects:
    - group: ""
      name: openshift-dns-operator
      resource: namespaces
    - group: ""
      name: openshift-dns
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
    - name: coredns
      version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e547e33db54dbcc33f72a556e739c4f9f0961098099dec6180398b4f0de03f5
    - name: openshift-cli
      version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:403bc3725949c3d507065ababc37cf35c44b680441930dc8dfc48263aa3a9a61
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:12Z"
    generation: 1
    name: kube-apiserver
    resourceVersion: "8534"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/kube-apiserver
    uid: 9b4a6baf-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:56:35Z"
      message: 'StaticPodsDegraded: pods "kube-apiserver-control-plane-0" not found'
      reason: StaticPodsDegradedError
      status: "True"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:55:17Z"
      message: 'Progressing: 1 nodes are at revision 0; 1 nodes are at revision 2;
        1 nodes are at revision 3'
      reason: Progressing
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:57:53Z"
      message: 'Available: 2 nodes are active; 1 nodes are at revision 0; 1 nodes
        are at revision 2; 1 nodes are at revision 3'
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:12Z"
      reason: AsExpected
      status: "True"
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: kubeapiservers
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-kube-apiserver-operator
      resource: namespaces
    - group: ""
      name: openshift-kube-apiserver
      resource: namespaces
    versions:
    - name: raw-internal
      version: 4.1.2
    - name: kube-apiserver
      version: 1.13.4
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:12Z"
    generation: 1
    name: kube-controller-manager
    resourceVersion: "8485"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/kube-controller-manager
    uid: 9b4c9fc6-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:58:48Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T02:00:00Z"
      message: 'Progressing: 3 nodes are at revision 3'
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:57:57Z"
      message: 'Available: 3 nodes are active; 3 nodes are at revision 3'
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:13Z"
      reason: AsExpected
      status: "True"
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: kubecontrollermanagers
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-kube-controller-manager
      resource: namespaces
    - group: ""
      name: openshift-kube-controller-manager-operator
      resource: namespaces
    versions:
    - name: raw-internal
      version: 4.1.2
    - name: kube-controller-manager
      version: 1.13.4
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:12Z"
    generation: 1
    name: kube-scheduler
    resourceVersion: "7764"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/kube-scheduler
    uid: 9b571971-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:59:34Z"
      message: 'StaticPodsDegraded: nodes/control-plane-0 pods/openshift-kube-scheduler-control-plane-0
        container="scheduler" is not ready'
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:55:15Z"
      message: 'Progressing: 1 nodes are at revision 0; 2 nodes are at revision 4'
      reason: Progressing
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:57:43Z"
      message: 'Available: 2 nodes are active; 1 nodes are at revision 0; 2 nodes
        are at revision 4'
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:13Z"
      reason: AsExpected
      status: "True"
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: kubeschedulers
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-kube-scheduler
      resource: namespaces
    - group: ""
      name: openshift-kube-scheduler-operator
      resource: namespaces
    versions:
    - name: raw-internal
      version: 4.1.2
    - name: kube-scheduler
      version: 1.13.4
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:10Z"
    generation: 1
    name: machine-api
    resourceVersion: "6057"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-api
    uid: 99dab2fe-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:55:12Z"
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:55:11Z"
      message: 'Cluster Machine API Operator is available at operator: 4.1.2'
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:11Z"
      status: "False"
      type: Degraded
    extension: null
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:17Z"
    generation: 1
    name: machine-config
    resourceVersion: "4545"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
    uid: 9e7af8d1-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:56:44Z"
      message: Cluster has deployed 4.1.2
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:56:44Z"
      message: Cluster version is 4.1.2
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:55:17Z"
      status: "False"
      type: Degraded
    extension:
      master: all 3 nodes are at latest configuration rendered-master-359b2349a98d06ac7ed7d336fa6870fb
      worker: all 2 nodes are at latest configuration rendered-worker-fde4eb7c38bfdb2e6e1144b58b25d4b1
    relatedObjects:
    - group: ""
      name: openshift-machine-config-operator
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:53:25Z"
    generation: 1
    name: network
    resourceVersion: "2998"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/network
    uid: 5b4b9977-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:53:26Z"
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:55:22Z"
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:55:22Z"
      status: "True"
      type: Available
    extension: null
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:59:14Z"
    generation: 1
    name: node-tuning
    resourceVersion: "7146"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/node-tuning
    uid: 2b9aeb0f-93c8-11e9-b6fb-0050561f3131
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:59:24Z"
      message: Cluster has deployed "4.1.2"
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:59:16Z"
      message: Cluster version is "4.1.2"
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:59:14Z"
      status: "False"
      type: Degraded
    extension: null
    relatedObjects:
    - group: ""
      name: openshift-cluster-node-tuning-operator
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:11Z"
    generation: 1
    name: openshift-apiserver
    resourceVersion: "6476"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-apiserver
    uid: 9af44ad4-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:57:55Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:58:40Z"
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:59:10Z"
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:12Z"
      reason: AsExpected
      status: "True"
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: openshiftapiservers
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-apiserver-operator
      resource: namespaces
    - group: ""
      name: openshift-apiserver
      resource: namespaces
    - group: apiregistration.k8s.io
      name: v1.apps.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.authorization.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.build.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.image.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.oauth.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.project.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.quota.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.route.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.security.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.template.openshift.io
      resource: apiservices
    - group: apiregistration.k8s.io
      name: v1.user.openshift.io
      resource: apiservices
    versions:
    - name: operator
      version: 4.1.2
    - name: openshift-apiserver
      version: ""
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:11Z"
    generation: 1
    name: openshift-controller-manager
    resourceVersion: "5010"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-controller-manager
    uid: 9ae87d72-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:55:13Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:57:26Z"
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:56:00Z"
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:12Z"
      reason: NoData
      status: Unknown
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: openshiftcontrollermanagers
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-controller-manager-operator
      resource: namespaces
    - group: ""
      name: openshift-controller-manager
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T02:00:05Z"
    generation: 1
    name: openshift-samples
    resourceVersion: "8679"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-samples
    uid: 49f7b127-93c8-11e9-9df7-0050561f3030
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T02:00:05Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-21T02:00:05Z"
      status: "False"
      type: Progressing
    extension: null
    relatedObjects:
    - group: samples.operator.openshift.io
      name: cluster
      resource: configs
    - group: ""
      name: openshift-cluster-samples-operator
      resource: namespaces
    - group: ""
      name: openshift
      resource: namespaces
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:56:39Z"
    generation: 1
    name: operator-lifecycle-manager
    resourceVersion: "4467"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/operator-lifecycle-manager
    uid: cf6bb803-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:56:40Z"
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:56:40Z"
      message: Deployed 0.9.0
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:56:40Z"
      status: "True"
      type: Available
    extension: null
    versions:
    - name: operator
      version: 4.1.2
    - name: operator-lifecycle-manager
      version: 0.9.0
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:56:39Z"
    generation: 1
    name: operator-lifecycle-manager-catalog
    resourceVersion: "4461"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/operator-lifecycle-manager-catalog
    uid: cef654c0-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:56:39Z"
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:56:39Z"
      message: Deployed 0.9.0
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:56:39Z"
      status: "True"
      type: Available
    extension: null
    versions:
    - name: operator
      version: 4.1.2
    - name: operator-lifecycle-manager
      version: 0.9.0
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:55:12Z"
    generation: 1
    name: service-ca
    resourceVersion: "3510"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/service-ca
    uid: 9b28293f-93c7-11e9-8f65-0050561f2525
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:55:24Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:55:35Z"
      message: 'Progressing: All service-ca-operator deployments updated'
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:55:19Z"
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:55:12Z"
      reason: NoData
      status: Unknown
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: servicecas
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-service-ca-operator
      resource: namespaces
    - group: ""
      name: openshift-service-ca
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:59:15Z"
    generation: 1
    name: service-catalog-apiserver
    resourceVersion: "6569"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/service-catalog-apiserver
    uid: 2bea2c7a-93c8-11e9-b6fb-0050561f3131
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:59:16Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:59:16Z"
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:59:16Z"
      message: 'Available: the apiserver is in the desired state (Removed).'
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: NoData
      status: Unknown
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: ""
      name: openshift-config
      resource: namespaces
    - group: ""
      name: openshift-config-managed
      resource: namespaces
    - group: ""
      name: openshift-service-catalog-apiserver-operator
      resource: namespaces
    - group: ""
      name: openshift-service-catalog-apiserver
      resource: namespaces
    - group: apiregistration.k8s.io
      name: v1beta1.servicecatalog.k8s.io
      resource: apiservices
    versions:
    - name: operator
      version: 4.1.2
- apiVersion: config.openshift.io/v1
  kind: ClusterOperator
  metadata:
    creationTimestamp: "2019-06-21T01:59:15Z"
    generation: 1
    name: service-catalog-controller-manager
    resourceVersion: "6557"
    selfLink: /apis/config.openshift.io/v1/clusteroperators/service-catalog-controller-manager
    uid: 2c4cf67c-93c8-11e9-b6fb-0050561f3131
  spec: {}
  status:
    conditions:
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: AsExpected
      status: "True"
      type: Available
    - lastTransitionTime: "2019-06-21T01:59:15Z"
      reason: NoData
      status: Unknown
      type: Upgradeable
    extension: null
    relatedObjects:
    - group: operator.openshift.io
      name: cluster
      resource: servicecatalogcontrollermanagers
    - group: ""
      name: openshift-service-catalog-controller-manager-operator
      resource: namespaces
    - group: ""
      name: openshift-service-catalog-controller-manager
      resource: namespaces
    versions:
    - name: operator
      version: 4.1.2
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Oops. :) Accidently added in workers
oc --config=${INSTALL_DIR}/auth/kubeconfig get clusteroperator
[root@ctl ocp4wlab]# oc --config=${INSTALL_DIR}/auth/kubeconfig get nodes
NAME              STATUS   ROLES    AGE     VERSION
compute-0         Ready    worker   10m     v1.13.4+9252851b0
compute-1         Ready    worker   10m     v1.13.4+9252851b0
control-plane-0   Ready    master   10m     v1.13.4+9252851b0
control-plane-1   Ready    master   9m56s   v1.13.4+9252851b0
control-plane-2   Ready    master   10m     v1.13.4+9252851b0

Is this problem resolved as well, as in 4.1 official doc this is not recommended.

Comment 25 Glenn West 2019-06-21 02:34:19 UTC

Auth just takes a while, csr for all machines auto approved.

NAME        AGE   REQUESTOR                                                                   CONDITION
csr-4cs4h   39m   system:node:control-plane-0                                                 Approved,Issued
csr-89d8r   39m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-9vrb4   39m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-fmdth   39m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-g45hp   38m   system:node:compute-1                                                       Approved,Issued
csr-g99wl   38m   system:node:control-plane-2                                                 Approved,Issued
csr-p2kbk   39m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-qh466   39m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-xckr2   38m   system:node:compute-0                                                       Approved,Issued
csr-xp76q   38m   system:node:control-plane-1                                                 Approved,Issued
[root@ctl ocp4wlab]# oc get clusteroperators
NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                       4.1.2     True        False         False      25m
cloud-credential                     4.1.2     True        False         False      37m
cluster-autoscaler                   4.1.2     True        False         False      37m
console                              4.1.2     True        False         False      28m
dns                                  4.1.2     True        False         False      37m
image-registry                                 False       False         True       32m
ingress                              4.1.2     True        False         False      31m
kube-apiserver                       4.1.2     True        False         False      35m
kube-controller-manager              4.1.2     True        False         False      35m
kube-scheduler                       4.1.2     True        False         False      35m
machine-api                          4.1.2     True        False         False      37m
machine-config                       4.1.2     True        False         False      36m
marketplace                          4.1.2     True        False         False      31m
monitoring                           4.1.2     True        False         False      30m
network                              4.1.2     True        False         False      37m
node-tuning                          4.1.2     True        False         False      33m
openshift-apiserver                  4.1.2     True        False         False      33m
openshift-controller-manager         4.1.2     True        False         False      37m
openshift-samples                    4.1.2     True        False         False      27m
operator-lifecycle-manager           4.1.2     True        False         False      36m
operator-lifecycle-manager-catalog   4.1.2     True        False         False      36m
service-ca                           4.1.2     True        False         False      37m
service-catalog-apiserver            4.1.2     True        False         False      33m
service-catalog-controller-manager   4.1.2     True        False         False      33m
storage                              4.1.2     True        False         False      32m

Comment 26 Glenn West 2019-06-21 02:44:28 UTC

Is this a upgrade happening automatically?

[root@ctl ocp4wlab]# ./clusteroperatorstatus.sh kube-apiserver
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-21T01:55:12Z"
  generation: 1
  name: kube-apiserver
  resourceVersion: "22623"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/kube-apiserver
  uid: 9b4a6baf-93c7-11e9-8f65-0050561f2525
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-21T02:04:20Z"
    message: |-
      StaticPodsDegraded: nodes/control-plane-0 pods/kube-apiserver-control-plane-0 container="kube-apiserver-6" is not ready
      StaticPodsDegraded: nodes/control-plane-0 pods/kube-apiserver-control-plane-0 container="kube-apiserver-cert-syncer-6" is not ready
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-06-21T02:39:01Z"
    message: 'Progressing: 1 nodes are at revision 5; 2 nodes are at revision 6'
    reason: Progressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-06-21T01:57:53Z"
    message: 'Available: 3 nodes are active; 1 nodes are at revision 5; 2 nodes are
      at revision 6'
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-21T01:55:12Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: kubeapiservers
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-kube-apiserver-operator
    resource: namespaces
  - group: ""
    name: openshift-kube-apiserver
    resource: namespaces
  versions:
  - name: raw-internal
    version: 4.1.2
  - name: kube-apiserver
    version: 1.13.4
  - name: operator
    version: 4.1.2

Comment 27 Glenn West 2019-06-21 02:46:30 UTC

Looks like it was a auto upgrade.

[root@ctl ocp4wlab]# oc get clusteroperators
NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                       4.1.2     True        False         False      38m
cloud-credential                     4.1.2     True        False         False      50m
cluster-autoscaler                   4.1.2     True        False         False      50m
console                              4.1.2     True        False         False      41m
dns                                  4.1.2     True        False         False      49m
image-registry                       4.1.2     True        False         False      6m22s
ingress                              4.1.2     True        False         False      44m
kube-apiserver                       4.1.2     True        False         False      47m
kube-controller-manager              4.1.2     True        False         False      47m
kube-scheduler                       4.1.2     True        False         False      48m
machine-api                          4.1.2     True        False         False      50m
machine-config                       4.1.2     True        False         False      49m
marketplace                          4.1.2     True        False         False      44m
monitoring                           4.1.2     True        False         False      42m
network                              4.1.2     True        False         False      50m
node-tuning                          4.1.2     True        False         False      46m
openshift-apiserver                  4.1.2     True        False         False      46m
openshift-controller-manager         4.1.2     True        False         False      49m
openshift-samples                    4.1.2     True        False         False      39m
operator-lifecycle-manager           4.1.2     True        False         False      49m
operator-lifecycle-manager-catalog   4.1.2     True        False         False      49m
service-ca                           4.1.2     True        False         False      50m
service-catalog-apiserver            4.1.2     True        False         False      46m
service-catalog-controller-manager   4.1.2     True        False         False      46m
storage                              4.1.2     True        False         False      45m

Comment 28 Micah Abbott 2019-06-21 13:50:46 UTC

Glenn, the changes that were made to address this BZ are in RHCOS itself.  If you are running the worker nodes as RHCOS, they should have the same changes applied (as well as the master nodes in the control plane because they are also RHCOS).

From what I can tell, it looks like this BZ has been successfully fixed, so I'm going to move it to VERIFIED.

If you find additional problems, please file new BZs for them.

Comment 30 errata-xmlrpc 2019-06-26 08:50:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1589

Note You need to log in before you can comment on or make changes to this bug.