Description of problem: When diagnosing https://bugzilla.redhat.com/show_bug.cgi?id=1717244 I found that no prometheus alerts were firing when the operator pod was stuck. Version-Release number of selected component (if applicable): 4.1.0-rc.4 How reproducible: 100% Steps to Reproduce: 1. Prevent the cloud-credential-operator pod from running 2. 3. Actual results: No prometheus alerts are raised. Expected results: A prometheus alert should be raised. Additional info: If a service monitor is created for this pod, a absent(up{job=..) rule would have helped to detect the issue.
Going to propose adding metrics for CCO in 4.3.
CCO metrics coming in 4.3, Joel has led the work, lets get this absent rule in as well as I don't think we had that on the list.