Bug 1834370 - Info alerts firing even after reconciliation of namespace succeeds
Summary: Info alerts firing even after reconciliation of namespace succeeds
Keywords:
Status: CLOSED DUPLICATE of bug 1819308
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-11 15:25 UTC by Jose Silva
Modified: 2020-05-20 19:54 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-18 21:51:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
alert firing (106.72 KB, image/png)
2020-05-11 15:25 UTC, Jose Silva
no flags Details
alert firing (57.57 KB, image/png)
2020-05-11 15:25 UTC, Jose Silva
no flags Details

Description Jose Silva 2020-05-11 15:25:00 UTC
Created attachment 1687363 [details]
alert firing

Created attachment 1687363 [details]
alert firing

Description of problem:

We have a automated test case in the RHMI operator (https://github.com/integr8ly/integreatly-operator) where we test namespaces restoration but what we are noticing lately is that info alerts do not stop firing even after the restoration of the namespace succeeds.

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:


1. Install [application-monitoring-operator operator|https://github.com/integr8ly/application-monitoring-operator] through OLM with separate operand namespace using a subscription with a confgimap catalogsource
2. Delete operand and operator namespace
3. Wait for alerts to fire.
4. Recreate namespace
5. Check that alerts do not stop firing

Or you can simply install [RHMI operator|https://github.com/integr8ly/integreatly-operator]

1. oc login to the cluster as kubeadmin
2. clone integreatly-operator
3. install the operator
4. run the destructive test suite: DESTRUCTIVE=true go test -v ./test/functional -run=TestIntegreatly/Destructive
after it finishes, there will be FailingOperator alerts firing 


Actual results:


Expected results:


Additional info:

Metric that is firing

csv_abnormal{endpoint="https-metrics",exported_namespace="redhat-rhmi-middleware-monitoring-operator",instance="10.128.0.85:8081",job="olm-operator-metrics",name="application-monitoring-operator.v1.1.5",namespace="openshift-operator-lifecycle-manager",phase="Failed",pod="olm-operator-7dd88cf55c-8kblk",reason="NoOperatorGroup",service="olm-operator-metrics",version="1.1.5"}

According to the metric description alerts should stop firing once succeeded state is defined.
 
csv_abnormal

When reconciling a CSV, present whenever a CSV version is in any state other than Succeeded. Includes the name, namespace, phase, reason, and version labels. A Prometheus alert is created when this metric is present.

OLM operator logs

time="2020-05-11T13:43:56Z" level=info msg="updated annotations to match current operatorgroup" csv=application-monitoring-operator.v1.1.5 id=oXiXK namespace=redhat-rhmi-middleware-monitoring-operator phase=Succeeded
time="2020-05-11T13:43:56Z" level=info msg="checking application-monitoring-operator.v1.1.5"
time="2020-05-11T13:43:56Z" level=warning msg="unhealthy component: ComponentMissing: missing deployment with name=application-monitoring-operator" csv=application-monitoring-operator.v1.1.5 id=oXiXK namespace=redhat-rhmi-middleware-monitoring-operator phase=Succeeded strategy=deployment
E0511 13:43:56.730339       1 event.go:237] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"application-monitoring-operator.v1.1.5.160dfd3277a037e5", GenerateName:"", Namespace:"redhat-rhmi-middleware-monitoring-operator", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"redhat-rhmi-middleware-monitoring-operator", Name:"application-monitoring-operator.v1.1.5", UID:"71e25a11-baf7-4087-8a71-7d77ee6eb0be", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"290313", FieldPath:""}, Reason:"ComponentUnhealthy", Message:"installing: ComponentMissing: missing deployment with name=application-monitoring-operator", Source:v1.EventSource{Component:"operator-lifecycle-manager", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbfa673872b711fe5, ext:1326227713251, loc:(*time.Location)(0x2a5fc00)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbfa673872b711fe5, ext:1326227713251, loc:(*time.Location)(0x2a5fc00)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "application-monitoring-operator.v1.1.5.160dfd3277a037e5" is forbidden: unable to create new content in namespace redhat-rhmi-middleware-monitoring-operator because it is being terminated' (will not retry!)
time="2020-05-11T13:43:56Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=8ivKE namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Failed
time="2020-05-11T13:43:57Z" level=info msg="operatorgroup not found" csv=application-monitoring-operator.v1.1.5 id=V251d namespace=redhat-rhmi-middleware-monitoring-operator operatorgroup=rhmi-registry-og phase=Failed
time="2020-05-11T13:43:57Z" level=warning msg="csv in namespace with no operatorgroups" csv=application-monitoring-operator.v1.1.5 id=RC8rL namespace=redhat-rhmi-middleware-monitoring-operator phase=Pending
time="2020-05-11T13:43:57Z" level=info msg="operatorgroup incorrect" csv=application-monitoring-operator.v1.1.5 error="csv in namespace with no operatorgroups" id=RC8rL namespace=redhat-rhmi-middleware-monitoring-operator phase=Pending
time="2020-05-11T13:43:57Z" level=info msg="operatorgroup not found" csv=application-monitoring-operator.v1.1.5 id=Ok6Ta namespace=redhat-rhmi-middleware-monitoring-operator operatorgroup=rhmi-registry-og phase=Pending
E0511 13:43:57.375234       1 queueinformer_operator.go:290] sync {"update" "redhat-rhmi-middleware-monitoring-operator/application-monitoring-operator.v1.1.5"} failed: csv in namespace with no operatorgroups
time="2020-05-11T13:43:57Z" level=warning msg="csv in namespace with no operatorgroups" csv=application-monitoring-operator.v1.1.5 id=8JuCh namespace=redhat-rhmi-middleware-monitoring-operator phase=Pending
time="2020-05-11T13:43:57Z" level=info msg="operatorgroup incorrect" csv=application-monitoring-operator.v1.1.5 error="csv in namespace with no operatorgroups" id=8JuCh namespace=redhat-rhmi-middleware-monitoring-operator phase=Pending
time="2020-05-11T13:43:57Z" level=info msg="operatorgroup not found" csv=application-monitoring-operator.v1.1.5 id=jV6pR namespace=redhat-rhmi-middleware-monitoring-operator operatorgroup=rhmi-registry-og phase=Pending
E0511 13:43:57.393131       1 queueinformer_operator.go:290] sync {"update" "redhat-rhmi-middleware-monitoring-operator/application-monitoring-operator.v1.1.5"} failed: error transitioning ClusterServiceVersion: csv in namespace with no operatorgroups and error updating CSV status: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "application-monitoring-operator.v1.1.5": the object has been modified; please apply your changes to the latest version and try again
time="2020-05-11T13:43:57Z" level=warning msg="csv in namespace with no operatorgroups" csv=application-monitoring-operator.v1.1.5 id=kpfMd namespace=redhat-rhmi-middleware-monitoring-operator phase=Failed
time="2020-05-11T13:44:52Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"application-monitoring-operator.v1.1.5\": the object has been modified; please apply your changes to the latest version and try again" csv=application-monitoring-operator.v1.1.5 id=qz86F namespace=redhat-rhmi-middleware-monitoring-operator phase=Installing
E0511 13:44:52.332500       1 queueinformer_operator.go:290] sync {"update" "redhat-rhmi-middleware-monitoring-operator/application-monitoring-operator.v1.1.5"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "application-monitoring-operator.v1.1.5": the object has been modified; please apply your changes to the latest version and try again
time="2020-05-11T13:44:52Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=kcDeS namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Installing
time="2020-05-11T13:45:00Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=XHvuC namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Installing
time="2020-05-11T13:45:00Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=8sgtb namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Installing
time="2020-05-11T13:45:00Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=g5gSw namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Installing

time="2020-05-11T13:45:35Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=RAXj2 namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Succeeded
time="2020-05-11T13:45:35Z" level=info msg="updated annotations to match current operatorgroup" csv=application-monitoring-operator.v1.1.5 id=RAXj2 namespace=redhat-rhmi-middleware-monitoring-operator phase=Succeeded
time="2020-05-11T13:45:35Z" level=info msg="checking application-monitoring-operator.v1.1.5"
time="2020-05-11T13:45:35Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=Nsxkt namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Succeeded
time="2020-05-11T13:45:35Z" level=info msg="updated annotations to match current operatorgroup" csv=application-monitoring-operator.v1.1.5 id=Nsxkt namespace=redhat-rhmi-middleware-monitoring-operator phase=Succeeded
time="2020-05-11T13:45:35Z" level=info msg="checking application-monitoring-operator.v1.1.5"
time="2020-05-11T13:45:55Z" level=info msg="csv in operatorgroup" csv=application-monitoring-operator.v1.1.5 id=Zb8xI namespace=redhat-rhmi-middleware-monitoring-operator opgroup=rhmi-registry-og phase=Succeeded
time="2020-05-11T13:45:55Z" level=info msg="updated annotations to match current operatorgroup" csv=application-monitoring-operator.v1.1.5 id=Zb8xI namespace=redhat-rhmi-middleware-monitoring-operator phase=Succeeded
{noformat}

Comment 1 Jose Silva 2020-05-11 15:25:49 UTC
Created attachment 1687364 [details]
alert firing

Comment 3 Evan Cordell 2020-05-18 21:51:06 UTC
This appears to be a duplicate of an issue that already has a PR. Please see linked BZ.

*** This bug has been marked as a duplicate of bug 1819308 ***


Note You need to log in before you can comment on or make changes to this bug.