Bug 2067106

Summary: insights is degraded for failed to pull SCA certs
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: Insights OperatorAssignee: Tomas Remes <tremes>
Status: CLOSED CANTFIX QA Contact: Joao Fula <jfula>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.11CC: bbabbar, inecas, kgordeev, mirollin, mklika, tremes, vlaad, yasun
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-29 06:30:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2022-03-23 10:26:56 UTC
Description of problem:
fresh IPI_AWS 4.11.0-0.nightly-2022-03-20-160505 cluster, insights is normal at the beginning, check again later, insights is degraded
# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-20-160505   True        False         9h      Error while reconciling 4.11.0-0.nightly-2022-03-20-160505: the cluster operator insights has not yet successfully rolled out

# oc get co insights
NAME       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
insights   4.11.0-0.nightly-2022-03-20-160505   False       False         True       5h30m   Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 500: {"id":"9","kind":"Error","href":"/api/accounts_mgmt/v1/errors/9","code":"ACCT-MGMT-9","reason":"Unable to get Certificates Serials from RHSM for consumer (UUID=778099db-a840-4bc3-b588-8e91248bf062): 502 Bad Gateway","operation_id":"00b4b7b1-b684-41c8-9668-259a5f537158"}

# oc -n openshift-insights get pod
NAME                                READY   STATUS    RESTARTS     AGE
insights-operator-dd5bf5b57-vq5ws   1/1     Running   1 (9h ago)   9h

# oc -n openshift-insights logs insights-operator-dd5bf5b57-vq5ws | head -n 1
I0323 00:44:33.098156       1 cmd.go:209] Using service-serving-cert provided certificates

almost 5 minutes later, 500 error shows
# oc -n openshift-insights logs insights-operator-dd5bf5b57-vq5ws | grep "OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 500" | head -n 2
E0323 00:49:39.743849       1 sca.go:228] OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 500: {"id":"9","kind":"Error","href":"/api/accounts_mgmt/v1/errors/9","code":"ACCT-MGMT-9","reason":"Unable to get Certificates Serials from RHSM for consumer (UUID=778099db-a840-4bc3-b588-8e91248bf062): 502 Bad Gateway","operation_id":"f1b83b1a-e004-4182-b8c0-e25d8416b286"}. Trying again in 15m0s
E0323 01:04:39.934512       1 sca.go:228] OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 500: {"id":"9","kind":"Error","href":"/api/accounts_mgmt/v1/errors/9","code":"ACCT-MGMT-9","reason":"Unable to get Certificates Serials from RHSM for consumer (UUID=778099db-a840-4bc3-b588-8e91248bf062): 502 Bad Gateway","operation_id":"f3bb4225-d96b-44f8-8872-c00505fdcf06"}. Trying again in 30m0s

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-20-160505

How reproducible:
always

Steps to Reproduce:
1. oc get co insights
2.
3.

Actual results:
insights is degraded

Expected results:
should be normal

Additional info:

Comment 2 yasun 2022-03-23 11:13:40 UTC
It is an ocm bug, open a bug https://issues.redhat.com/browse/SDB-2699 to track the issue.

Comment 3 yasun 2022-03-24 07:13:02 UTC
Now we can get the sca successfully on production ocm.

Comment 4 yasun 2022-03-24 07:15:32 UTC
  Please help verify on your side, and close the bug.

Comment 5 Junqi Zhao 2022-03-24 09:18:44 UTC
https://issues.redhat.com/browse/SDB-2699 is fixed, insights is normal
# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-20-160505   True        False         9h      Cluster version is 4.11.0-0.nightly-2022-03-20-160505
# oc get co insights
NAME       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
insights   4.11.0-0.nightly-2022-03-20-160505   True        False         False      9h