Bug 1927423

Summary: Happy "Not Found" and no visible error messages on error-list page when /silences 504s
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: MonitoringAssignee: Andrew Pickering <anpicker>
Status: CLOSED ERRATA QA Contact: hongyan li <hongyli>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: alegrand, anpicker, aos-bugs, erooth, hongyli, jokerman, kakkoyun, lcosic, pkrupa
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:43:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
silences 504 leads to "Not Found"
none
alert tab
none
silence tab
none
alert rule tab none

Description W. Trevor King 2021-02-10 17:11:02 UTC
Created attachment 1756280 [details]
silences 504 leads to "Not Found"

Seen in 4.6.  Maybe 4.6.16?  I'll check.  But due to compute-node issues, /api/alertmanager/api/v2/silences 504ed.  Apparently because of that, the alert-listing logic was unable to list alerts, and it showed "Not Found" (screenshot attached).  Ideally it would show an error message about the silence 504, to let folks know that the issue was "I don't know which unsilenced alerts match your filter", which is much more serious than "no unsilenced alerts match your filter, all is well".

And it might also be worth listing all filter-matching alerts with some note about "maybe these are silenced, but we can't tell because the silences API is 504ing".  Or maybe that risks being too distracting?

Comment 2 hongyan li 2021-03-28 12:37:07 UTC
verified with payload 4.8.0-0.nightly-2021-03-25-191436

oc -n openshift-monitoring scale sts alertmanager-main --replicas 0
access api /api/alertmanager/api/v2/silences get 504 gateway timeout
check monitoring-alert page
Get alert for alert tab and alert rule tab, get 504 error for silence tab, for detail see screenshot

Comment 3 hongyan li 2021-03-28 12:37:47 UTC
Created attachment 1767096 [details]
alert tab

Comment 4 hongyan li 2021-03-28 12:38:22 UTC
Created attachment 1767097 [details]
silence tab

Comment 5 hongyan li 2021-03-28 12:39:10 UTC
Created attachment 1767098 [details]
alert rule tab

Comment 8 errata-xmlrpc 2021-07-27 22:43:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438