Description of problem: On OSD clusters running 4.5.11 we have seen HAProxyReloadFail trigger and auto-resolve either in 5 or 10 minutes. Version-Release number of selected component (if applicable): OSD running OCP 4.5.11 How reproducible: Intermittent, reoccurs on clusters. Steps to Reproduce: 1. 2. 3. Actual results: HAProxyReloadFail alerts and self resolves. Expected results: HAProxyReloadFail only alerts if there is an action required to remediate. Additional info: Will provide must-gather from a cluster that has had this fire and resolve 8x in the last 3 days.
Looks like the metric in this alert was added with https://bugzilla.redhat.com/show_bug.cgi?id=1861455
Setting target release to 4.7 while this is being investigated.
I checked the router logs from the must-gather data and saw "no child processes" errors, which indicate that the problem is related to a known issue with the zombie process reaper. We have fixed this issue in 4.6 as bug 1859134 and are in the process of backporting the fix to 4.5 as bug 1885688, so I am marking this bug as a duplicate of the latter. To be clear, the alert is harmless, but we are prioritizing fixing it in 4.5 to avoid causing unwarranted alarm. *** This bug has been marked as a duplicate of bug 1885688 ***