Description of problem: The haproxy container seems to be restarting repeatedly with following error found in kubelet.log: │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=info msg="Apply config change" curConfig="{6443 9443 50000 [{5yz86jiw-12e28-qn9vz-master-1 10.0.0.14 6443} {5yz86jiw-12e28-qn9vz-master-2 10.0.0.16 6443} {5yz86jiw-12e28-qn9│ vz-master-0 10.0.0.25 6443}] }" │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=info msg="Runtimecfg rendering template" path=/etc/haproxy/haproxy.cfg │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=error msg="Failed to write reload to HAProxy master socket" socket=/var/run/haproxy/haproxy-master.sock │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Error: write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Usage: │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: monitor path_to_kubeconfig path_to_haproxy_cfg_template path_to_config [flags] │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Flags: │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: --api-port uint16 Port where the OpenShift API listens at (default 6443) │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: --api-vip ip Virtual IP Address to reach the OpenShift API │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: --check-interval duration Time between monitor checks (default 6s) │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: -h, --help help for monitor │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: --lb-port uint16 Port where the API HAProxy LB will listen at (default 9443) │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: --stat-port uint16 Port where the HAProxy stats API will listen at (default 50000) │ │Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=fatal msg="Failed due to write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe" Noticed in bootstrap log of https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The error seems to be in the openstack-infra namespace: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624/artifacts/e2e-openstack/pods/openshift-openstack-infra_haproxy-5yz86jiw-12e28-qn9vz-master-0_haproxy-monitor_previous.log I'm reassigning this to OpenStack team for now since routing is likely not the correct component to handle this BZ. This may affect other on-prem platforms because they're all based on the same architecture. Not sure this is actually causing the deployment to fail since the pod was eventually able to recover: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624/artifacts/e2e-openstack/pods/openshift-openstack-infra_haproxy-5yz86jiw-12e28-qn9vz-master-0_haproxy-monitor.log For more context, the error message comes from baremetal-runtimecfg: https://github.com/openshift/baremetal-runtimecfg/blob/d8dfe19/pkg/monitor/monitor.go#L89-L95
Lowering the severity as the reported issue did not depend on this observation. Keeping in the queue as it might still be worth investigating this error.
Re-assigning to the newly created runtime-cfg subcomponent.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196