Bug 1849583 - Error: write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe
Summary: Error: write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe
Assignee: Yossi Boaron
QA Contact: Victor Voronkov
Reported: 2020-06-22 10:27 UTC by Michal Fojtik
Modified: 2020-10-27 16:09 UTC (History)
6 users (show)

Doc Type: Bug Fix
Cause: The Liveness probe of haproxy container monitors the health of HAProxy LoadBalancer. HAProxy LoadBalancer start running only after haproxy-monitor container rendered its configuration while the Liveness probe runs as soon as the container is active. Consequence: Haproxy container is wrongly restarted by Kubelet. Fix: Update the initial time of the Liveness probe according to the time it takes for haproxy-monitor container to render the configuration. Result: Haproxy container not being wrongly restarted by Kubelet because of the Liveness probe
Last Closed: 2020-10-27 16:08:40 UTC
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2043 0 None closed Bug 1849583: [openstack,bm,ovirt,vsphere] Adjust HAProxy liveness probe to initial timing 2021-01-25 01:18:26 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:09:01 UTC

Description Michal Fojtik 2020-06-22 10:27:16 UTC
Description of problem:

The haproxy container seems to be restarting repeatedly with following error found in kubelet.log:

│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=info msg="Apply config change" curConfig="{6443 9443 50000 [{5yz86jiw-12e28-qn9vz-master-1 6443} {5yz86jiw-12e28-qn9vz-master-2 6443} {5yz86jiw-12e28-qn9│
vz-master-0 6443}] }"                                                                                                                                                                                                                                                │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=info msg="Runtimecfg rendering template" path=/etc/haproxy/haproxy.cfg                                                                                                       │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=error msg="Failed to write reload to HAProxy master socket" socket=/var/run/haproxy/haproxy-master.sock                                                                      │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Error: write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe                                                                                                                                  │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Usage:                                                                                                                                                                                                         │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:   monitor path_to_kubeconfig path_to_haproxy_cfg_template path_to_config [flags]                                                                                                                               │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Flags:                                                                                                                                                                                                         │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --api-port uint16           Port where the OpenShift API listens at (default 6443)                                                                                                                       │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --api-vip ip                Virtual IP Address to reach the OpenShift API                                                                                                                                │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --check-interval duration   Time between monitor checks (default 6s)                                                                                                                                     │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:   -h, --help                      help for monitor                                                                                                                                                             │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --lb-port uint16            Port where the API HAProxy LB will listen at (default 9443)                                                                                                                  │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --stat-port uint16          Port where the HAProxy stats API will listen at (default 50000)                                                                                                              │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=fatal msg="Failed due to write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe"

Noticed in bootstrap log of https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624

Comment 1 Martin André 2020-06-22 11:47:13 UTC
The error seems to be in the openstack-infra namespace:


I'm reassigning this to OpenStack team for now since routing is likely not the correct component to handle this BZ. This may affect other on-prem platforms because they're all based on the same architecture.

Not sure this is actually causing the deployment to fail since the pod was eventually able to recover:


For more context, the error message comes from baremetal-runtimecfg: https://github.com/openshift/baremetal-runtimecfg/blob/d8dfe19/pkg/monitor/monitor.go#L89-L95

Comment 2 Pierre Prinetti 2020-07-09 14:41:02 UTC
Lowering the severity as the reported issue did not depend on this observation.

Keeping in the queue as it might still be worth investigating this error.

Comment 6 Martin André 2020-08-20 18:33:55 UTC
Re-assigning to the newly created runtime-cfg subcomponent.

Comment 11 errata-xmlrpc 2020-10-27 16:08:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


