Bug 1825823

Summary: [OSP] haproxy pod from openshift-openstack-infra is crashlooping
Product: OpenShift Container Platform Reporter: Mike Fedosin <mfedosin>
Component: Machine Config OperatorAssignee: Mike Fedosin <mfedosin>
Status: CLOSED ERRATA QA Contact: weiwei jiang <wjiang>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: bperkins, smilner
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:28:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Mike Fedosin 2020-04-20 10:38:50 UTC
During the installation of OpenShift on OpenStack I noticed that all haproxy pods are crashlooping:

haproxy-mfedosin-rwkfm-master-0              1/2     CrashLoopBackOff   13         30m
haproxy-mfedosin-rwkfm-master-1              1/2     CrashLoopBackOff   13         30m
haproxy-mfedosin-rwkfm-master-2              1/2     CrashLoopBackOff   13         30m

But the cluster was deployed successfully anyway.

From the pods logs:

$ oc logs -n openshift-openstack-infra haproxy-mfedosin-rwkfm-master-0 -c haproxy-monitor
time="2020-04-20T09:55:30Z" level=info msg="API is not reachable through HAProxy"
time="2020-04-20T09:55:30Z" level=info msg="Failed to get master Nodes list" err="nodes is forbidden: User \"system:serviceaccount:openshift-machine-config-operator:node-bootstrapper\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"
time="2020-04-20T09:55:30Z" level=error msg="Failed to retrieve API members information" kubeconfigPath=/etc/kubernetes/kubeconfig
time="2020-04-20T09:55:30Z" level=info msg="GetLBConfig failed, sleep half of interval and retry" kubeconfigPath=/etc/kubernetes/kubeconfig
time="2020-04-20T09:55:33Z" level=info msg="Failed to get master Nodes list" err="nodes is forbidden: User \"system:serviceaccount:openshift-machine-config-operator:node-bootstrapper\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"

$ oc logs -n openshift-openstack-infra haproxy-mfedosin-rwkfm-master-0 -c haproxy
+ declare -r haproxy_sock=/var/run/haproxy/haproxy-master.sock
+ declare -r haproxy_log_sock=/var/run/haproxy/haproxy-log.sock
+ export -f msg_handler
+ export -f reload_haproxy
+ export -f verify_old_haproxy_ps_being_deleted
+ rm -f /var/run/haproxy/haproxy-master.sock /var/run/haproxy/haproxy-log.sock
+ '[' -s /etc/haproxy/haproxy.cfg ']'
+ socat UNIX-LISTEN:/var/run/haproxy/haproxy-master.sock,fork 'system:bash -c msg_handler'
+ socat UNIX-RECV:/var/run/haproxy/haproxy-log.sock STDOUT

Comment 4 weiwei jiang 2020-05-18 02:06:30 UTC
Checked with 4.5.0-0.nightly-2020-05-17-220731, moved to verified.

$ oc get pods -n openshift-openstack-infra -l app=openstack-infra-api-lb -o wide 
NAME                                 READY   STATUS    RESTARTS   AGE   IP             NODE                         NOMINATED NODE   READINESS GATES
haproxy-wj45ios518a-qwztx-master-0   2/2     Running   2          31m   192.168.0.27   wj45ios518a-qwztx-master-0   <none>           <none>
haproxy-wj45ios518a-qwztx-master-1   2/2     Running   0          31m   192.168.0.13   wj45ios518a-qwztx-master-1   <none>           <none>
haproxy-wj45ios518a-qwztx-master-2   2/2     Running   0          31m   192.168.0.18   wj45ios518a-qwztx-master-2   <none>           <none>
$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-17-220731   True        False         12m     Cluster version is 4.5.0-0.nightly-2020-05-17-220731

Comment 5 errata-xmlrpc 2020-07-13 17:28:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409