Bug 1921531 - Got panic error in ACM multicluster-operators-hub pods
Summary: Got panic error in ACM multicluster-operators-hub pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: App Lifecycle
Version: rhacm-2.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: rhacm-2.0.8
Assignee: Xiangjing Li
QA Contact: Eveline Cai
bswope@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-28 06:21 UTC by pengbo
Modified: 2024-03-25 18:02 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-15 15:14:19 UTC
Target Upstream Version:
Embargoed:
rislam: qe_test_coverage+
amcnamar: rhacm-2.0.z+
ming: needinfo+
xiangli: needinfo-
xiangli: needinfo-


Attachments (Terms of Use)
multicluster-operators-hub-subscription-84cd7cd4bb-nznx6-multicluster-operators-hub-subscription.log (38.16 KB, text/plain)
2021-01-29 03:35 UTC, pengbo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 8928 0 None None None 2021-02-22 14:25:08 UTC
Red Hat Product Errata RHBA-2021:0514 0 None None None 2021-02-15 15:14:22 UTC

Description pengbo 2021-01-28 06:21:35 UTC
Description of problem:

ACM multicluster-operators-hub pods are failing with "panic" error massage while running subscription update.

Version-Release number of selected component (if applicable):

ACM Version: 2.04

How reproducible:

Found the error message as following on multicluster-operators-hub pods.

In this log:

multicluster-operators-hub-subscription-84cd7cd4bb-nznx6-multicluster-operators-hub-subscription.log

"E0125 15:15:32.899139 1 runtime.go:78] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x2ae4ba0), concrete:(*runtime._type)(nil), asserted:(*runtime._type)(0x2a58a40), missingMethod:""} (interface conversion: interface {} is nil, not int64) "

it seems that it is finding a null character that is not int64 type. Not sure what would cause that. So could you help take a look? Thanks in advance.

Comment 1 Nathan Weatherly 2021-01-28 15:11:38 UTC
It looks like the `multicluster-operators-hub-subscription` pod(s) are failing. Assigning App Lifecycle to triage.

Comment 2 Mike Ng 2021-01-28 15:52:25 UTC
G2Bsync 769175729 comment 
 mikeshng Thu, 28 Jan 2021 15:45:25 UTC 
 G2Bsync could you please provide more log entries around the panic error?

Comment 3 Xiangjing Li 2021-01-28 16:20:28 UTC
@pengbo Could you provide the full log multicluster-operators-hub-subscription-84cd7cd4bb-nznx6-multicluster-operators-hub-subscription.log, where the panic stack trace should be included to indicate the detailed code line number?

Also I noticed the panic happened in 2.0.Z, Could you upgrade to 2.1 to see if the issue will be gone?

Comment 4 pengbo 2021-01-29 03:35:32 UTC
Created attachment 1751930 [details]
multicluster-operators-hub-subscription-84cd7cd4bb-nznx6-multicluster-operators-hub-subscription.log

The attachment is multicluster-operators-hub-subscription-84cd7cd4bb-nznx6-multicluster-operators-hub-subscription.log file.

And I will ask customer if they can upgrade to ACM 2.1.x.

Thanks
pb

Comment 5 Mike Ng 2021-01-29 14:52:52 UTC
G2Bsync 769357729 comment 
 mikeshng Thu, 28 Jan 2021 20:19:16 UTC 
 G2Bsync There are not enough log entries to know for sure the exact problematic spot but there seems to be only one `int64` reference in the entire repo. So given that info, fixes have been made to 2.0, 2.1, 2.2 and master branches.

Comment 6 Mike Ng 2021-01-29 14:55:50 UTC
Thanks Peng, code fix has been merged to 2.0, 2.1, 2.2.

Comment 7 Mike Ng 2021-01-29 14:58:01 UTC
Just to clarify, the fix has been merged in all three 2.x branches but until there is an actual release out, this problem might still happen. 
To avoid this issue just make sure the spec replicas value is populated and with an integer value.

Comment 8 Mike Ng 2021-01-29 20:43:48 UTC
G2Bsync 769971367 comment 
 xiangjingli Fri, 29 Jan 2021 18:25:19 UTC 
 G2Bsync
@ pengbo.com. Thanks Peng for the log. That turns out the fix does address the panic.  The fix has been merged to 2.0 and 2.1 for Z release.  We plan to have 2.0.8 GA on Mar 11 right now.

Comment 9 pengbo 2021-02-02 08:29:47 UTC
Ok, I will inform customer your message "We plan to have 2.0.8 GA on Mar 11 right now".
One more question, is that fix already in ACM 2.1.x ? If customer already upgraded to 2.1.X as we suggested last time, the problem should be gone, right?

Thanks

Comment 10 Mike Ng 2021-02-02 14:19:14 UTC
Hi Peng, just to clarify. The fix is not in the current 2.1 release. He/she will need to wait for a new 2.1 release similar to waiting for a new 2.0.8 release.

Comment 11 Mike Ng 2021-02-02 16:06:55 UTC
Peng, FYI 2.1.3 release is planned for Feb 17 perhaps moving up to the 2.1.3 is the best course of action. Thanks.

Comment 13 Xiangjing Li 2021-02-04 18:38:38 UTC
yes, this is another panic. I noticed you have created a new bugzilla #1925281

Comment 14 Chad Scribner 2021-02-04 18:57:04 UTC
(In reply to Xiangjing Li from comment #13)
> yes, this is another panic. I noticed you have created a new bugzilla
> #1925281

I did, thanks for the follow up! After chatting in the forum it seemed like the similarities between the two were only very loose so a separate bug made sense.

Comment 16 Roke Jung 2021-02-11 20:01:16 UTC
Ezequiel. 

The fix will be available in ACM 2.0.8. If you want to patch your ACM cluster before ACM 2.0.8, here is the instruction.

In open-cluster-management namespace on ACM hub cluster, edit the advanced-cluster-management.v2.1.0 csv. (or 2.1.1 CSV)

oc edit csv advanced-cluster-management.v2.0.4 -n open-cluster-management
Look for containers multicluster-operators-standalone-subscription and multicluster-operators-hub-subscription and update their images to quay.io/open-cluster-management/multicluster-operators-subscription:TAG (it is recommended you note the current SHA tag if you want to revert the change). Replace TAG with 2.0.8-SNAPSHOT-2021-02-03-19-04-48 so the whole image URL is quay.io/open-cluster-management/multicluster-operators-subscription:2.0.8-SNAPSHOT-2021-02-03-19-04-48. This will recreate multicluster-operators-standalone-subscription-xxxxxxx and multicluster-operators-hub-subscription-xxxxxxx pods in open-cluster-management namespace. Check that the new pods are running with the new container image.

After this, please let us know if this fixes your problem. Thanks.

Comment 23 errata-xmlrpc 2021-02-15 15:14:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHACM 2.0.Z multicluster-operators-subscription hotfix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0514

Comment 24 Xiangjing Li 2021-02-24 15:31:31 UTC
@ebrizuel @gekis 

The hot fix should have been delivered. Have this comment for stopping the daily reminder email by BZ :-)


Note You need to log in before you can comment on or make changes to this bug.