Bug 1849583

Summary: Error: write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe
Product: OpenShift Container Platform Reporter: Michal Fojtik <mfojtik>
Component: NetworkingAssignee: Yossi Boaron <yboaron>
Networking sub component: runtime-cfg QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, beth.white, bperkins, m.andre, pprinett, yboaron
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The Liveness probe of haproxy container monitors the health of HAProxy LoadBalancer. HAProxy LoadBalancer start running only after haproxy-monitor container rendered its configuration while the Liveness probe runs as soon as the container is active. Consequence: Haproxy container is wrongly restarted by Kubelet. Fix: Update the initial time of the Liveness probe according to the time it takes for haproxy-monitor container to render the configuration. Result: Haproxy container not being wrongly restarted by Kubelet because of the Liveness probe
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:08:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Fojtik 2020-06-22 10:27:16 UTC
Description of problem:

The haproxy container seems to be restarting repeatedly with following error found in kubelet.log:

│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=info msg="Apply config change" curConfig="{6443 9443 50000 [{5yz86jiw-12e28-qn9vz-master-1 10.0.0.14 6443} {5yz86jiw-12e28-qn9vz-master-2 10.0.0.16 6443} {5yz86jiw-12e28-qn9│
vz-master-0 10.0.0.25 6443}] }"                                                                                                                                                                                                                                                │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=info msg="Runtimecfg rendering template" path=/etc/haproxy/haproxy.cfg                                                                                                       │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=error msg="Failed to write reload to HAProxy master socket" socket=/var/run/haproxy/haproxy-master.sock                                                                      │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Error: write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe                                                                                                                                  │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Usage:                                                                                                                                                                                                         │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:   monitor path_to_kubeconfig path_to_haproxy_cfg_template path_to_config [flags]                                                                                                                               │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: Flags:                                                                                                                                                                                                         │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --api-port uint16           Port where the OpenShift API listens at (default 6443)                                                                                                                       │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --api-vip ip                Virtual IP Address to reach the OpenShift API                                                                                                                                │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --check-interval duration   Time between monitor checks (default 6s)                                                                                                                                     │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:   -h, --help                      help for monitor                                                                                                                                                             │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --lb-port uint16            Port where the API HAProxy LB will listen at (default 9443)                                                                                                                  │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]:       --stat-port uint16          Port where the HAProxy stats API will listen at (default 50000)                                                                                                              │
│Jun 22 03:22:50 5yz86jiw-12e28-qn9vz-master-1 hyperkube[1823]: time="2020-06-22T02:57:35Z" level=fatal msg="Failed due to write unix @->/var/run/haproxy/haproxy-master.sock: write: broken pipe"

Noticed in bootstrap log of https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Martin André 2020-06-22 11:47:13 UTC
The error seems to be in the openstack-infra namespace:

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624/artifacts/e2e-openstack/pods/openshift-openstack-infra_haproxy-5yz86jiw-12e28-qn9vz-master-0_haproxy-monitor_previous.log

I'm reassigning this to OpenStack team for now since routing is likely not the correct component to handle this BZ. This may affect other on-prem platforms because they're all based on the same architecture.

Not sure this is actually causing the deployment to fail since the pod was eventually able to recover:

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.6/1274895066913050624/artifacts/e2e-openstack/pods/openshift-openstack-infra_haproxy-5yz86jiw-12e28-qn9vz-master-0_haproxy-monitor.log

For more context, the error message comes from baremetal-runtimecfg: https://github.com/openshift/baremetal-runtimecfg/blob/d8dfe19/pkg/monitor/monitor.go#L89-L95

Comment 2 Pierre Prinetti 2020-07-09 14:41:02 UTC
Lowering the severity as the reported issue did not depend on this observation.

Keeping in the queue as it might still be worth investigating this error.

Comment 6 Martin André 2020-08-20 18:33:55 UTC
Re-assigning to the newly created runtime-cfg subcomponent.

Comment 11 errata-xmlrpc 2020-10-27 16:08:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196