Description of problem: And OpenShift 4.7.0-fc.1 cluster was provisioned on AWS (3 m4.xlarge masters, 3 m4.2xlarge workers), left idle for approximately 30 minutes, and then all cluster nodes were stopped from the AWS console. After 10 minutes, the instances were started and we observed the following behavior once all 3 masters signaled ready: ``` gbuchana-mac:bootstrap gurnben$ oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-22r86 93m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-26rd4 85m kubernetes.io/kubelet-serving system:node:ip-10-0-158-164.ec2.internal Approved,Issued csr-8r7qw 92m kubernetes.io/kubelet-serving system:node:ip-10-0-155-63.ec2.internal Approved,Issued csr-dh9t5 92m kubernetes.io/kubelet-serving system:node:ip-10-0-136-156.ec2.internal Approved,Issued csr-drs44 82m kubernetes.io/kubelet-serving system:node:ip-10-0-172-113.ec2.internal Approved,Issued csr-frvjr 82m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-h4b8b 92m kubernetes.io/kubelet-serving system:node:ip-10-0-161-97.ec2.internal Approved,Issued csr-hbrxb 85m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-mdths 92m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-mxtbl 85m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-ts2kz 85m kubernetes.io/kubelet-serving system:node:ip-10-0-138-77.ec2.internal Approved,Issued csr-wd7rp 92m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued gbuchana-mac:bootstrap gurnben$ oc get nodes -l node-role.kubernetes.io/master ]NAME STATUS ROLES AGE VERSION ip-10-0-136-156.ec2.internal Ready master 93m v1.20.0+87544c5 ip-10-0-155-63.ec2.internal Ready master 93m v1.20.0+87544c5 ip-10-0-161-97.ec2.internal Ready master 92m v1.20.0+87544c5 gbuchana-mac:bootstrap gurnben$ oc get po --all-namespaces | grep "0/1 Running " openshift-console console-855fc4fc67-8pjr9 0/1 Running 4 64m openshift-console console-855fc4fc67-wxgw9 0/1 Running 4 64m openshift-ingress router-default-7b65d7b64-jtzt7 0/1 Running 0 70m openshift-ingress router-default-7b65d7b64-xfdpr 0/1 Running 0 70m ``` Cluster was unreachable via web ui and `oc login`. Note: this issue was originally reached by using a Hive ClusterPool with hibernation but was reproduced outside of hive to isolate OCP from hive componentry/interactions before this BZ was opened. All info in this issue is from a bare openshift-install provisioned cluster. Version-Release number of selected component (if applicable): 4.7.0-fc.1 How reproducible: Steps to Reproduce: 1. Provision OCP 4.7.0-fc.1 cluster (presumably all platforms, produced on AWS) 2. Shut down all instances via the cloud platform console (as documented in https://docs.openshift.com/container-platform/4.6/backup_and_restore/graceful-cluster-shutdown.html) 3. Start all instances and follow steps as necessary from https://docs.openshift.com/container-platform/4.6/backup_and_restore/graceful-cluster-restart.html 4. Attempt oc login/web console access Actual results: Unreachable cluster web console and auth endpoint Expected results: Cluster successfully resumed from shutdown Additional info: must-gather output will be attached once available (waiting for oc adm must-gather to complete)
Created attachment 1745696 [details] must-gather from 4.7.0-fc.1 cluster after shutdown and resume
going to guess Networking (ingress not coming up preventing console from coming up?)
I successfully reproduced this issue on 4.7.0-fc.2 as well!
Looks like an openshift ingress issue the CNO reports that the network is fine and the router logs have errors, this one being the first. 2021-01-08T18:32:47.847199722Z E0108 18:32:47.847161 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
*** This bug has been marked as a duplicate of bug 1899941 ***