Bug 1889019 - HAProxyReloadFail flapping on OSD 4.5.11
Summary: HAProxyReloadFail flapping on OSD 4.5.11
Keywords:
Status: CLOSED DUPLICATE of bug 1885688
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Stephen Greene
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-16 20:55 UTC by Naveen Malik
Modified: 2022-08-04 22:30 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-19 14:48:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Naveen Malik 2020-10-16 20:55:54 UTC
Description of problem:
On OSD clusters running 4.5.11 we have seen HAProxyReloadFail trigger and auto-resolve either in 5 or 10 minutes.

Version-Release number of selected component (if applicable):
OSD running OCP 4.5.11

How reproducible:
Intermittent, reoccurs on clusters.

Steps to Reproduce:
1.
2.
3.

Actual results:
HAProxyReloadFail alerts and self resolves.

Expected results:
HAProxyReloadFail only alerts if there is an action required to remediate.

Additional info:
Will provide must-gather from a cluster that has had this fire and resolve 8x in the last 3 days.

Comment 2 Naveen Malik 2020-10-16 21:54:33 UTC
Looks like the metric in this alert was added with https://bugzilla.redhat.com/show_bug.cgi?id=1861455

Comment 4 mfisher 2020-10-19 14:23:51 UTC
Setting target release to 4.7 while this is being investigated.

Comment 5 Miciah Dashiel Butler Masters 2020-10-19 14:48:35 UTC
I checked the router logs from the must-gather data and saw "no child processes" errors, which indicate that the problem is related to a known issue with the zombie process reaper.  We have fixed this issue in 4.6 as bug 1859134 and are in the process of backporting the fix to 4.5 as bug 1885688, so I am marking this bug as a duplicate of the latter.  

To be clear, the alert is harmless, but we are prioritizing fixing it in 4.5 to avoid causing unwarranted alarm.

*** This bug has been marked as a duplicate of bug 1885688 ***


Note You need to log in before you can comment on or make changes to this bug.