Bug 2000274 - ACM 2.4 install on OCP 4.9 ipv6 disconnected hub fails due to multicluster pod in clb
Summary: ACM 2.4 install on OCP 4.9 ipv6 disconnected hub fails due to multicluster po...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: App Lifecycle
Version: rhacm-2.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: rhacm-2.4
Assignee: Xiangjing Li
QA Contact: Eveline Cai
bswope@redhat.com
URL:
Whiteboard:
: 2040500 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-01 18:08 UTC by Chad Crum
Modified: 2022-01-28 17:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-11 18:33:32 UTC
Target Upstream Version:
Embargoed:
ming: rhacm-2.4+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github open-cluster-management backlog issues 15777 0 None None None 2021-09-01 19:15:51 UTC
Red Hat Product Errata RHSA-2021:4618 0 None None None 2021-11-11 18:33:58 UTC

Description Chad Crum 2021-09-01 18:08:30 UTC
Description of the problem:

ACM 2.4 install attempt to an OCP 4.9 IPv6 disconnected hub (ipi bm) fails due to multicluster-operators-standalone-subscription pod in CrashLoopBackoff.


Operator snapshot version:
2.4.0-DOWNSTREAM-2021-08-31-23-32-56

OCP version:
4.9.0-0.nightly-2021-08-31-123131

Steps to reproduce:
1. Deploy OCP 4.9 BM ipi hub in ipv6 disconnected env
2. Mirror ACM ds snapshot images and create CatalogSource from mirroed index image
3. Create subscription + multicluster object for ACM

Actual results:
multicluster-operators-hub-subscription-76d697f8ff-p4r9w         1/1     Running            0                85m
multicluster-operators-standalone-subscription-6fffb5758-q2cvf   0/1     CrashLoopBackOff   19 (3m29s ago)   85m
multiclusterhub-operator-6dbdcd8f8d-8gbnh                        1/1     Running            0                85m

Expected results:
All pods running and ACM installs successfully

Additional info:

Comment 2 Chad Crum 2021-09-01 18:14:47 UTC
Attached more detailed logs, but here are excerpts:

## oc get pods
multicluster-operators-channel-7966cc67dc-bbkkw                  1/1     Running            1 (87m ago)     89m
multicluster-operators-hub-subscription-76d697f8ff-p4r9w         1/1     Running            0               89m
multicluster-operators-standalone-subscription-6fffb5758-q2cvf   0/1     CrashLoopBackOff   20 (100s ago)   89m
multiclusterhub-operator-6dbdcd8f8d-8gbnh                        1/1     Running            0               89m




## oc logs from multicluster-operators-standalone-subscription-6fffb5758-q2cvf
  I0901 18:08:04.709040       1 subscription.go:1074] setting auto-reconcile rate to low
  E0901 18:08:04.714746       1 gitrepo.go:263] Get "https://github.com/open-cluster-management/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": dial tcp 140.82.112.4:443: connect: network is unreachable Failed to git clone with the primary channel: Get "https://github.com/open-cluster-management/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": dial tcp 140.82.112.4:443: connect: network is unreachable
  I0901 18:08:04.714838       1 panic.go:965] exit doSubscription: rhacm/hive-clusterimagesets-subscription-fast-0
  E0901 18:08:04.715501       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
  goroutine 5810 [running]:
  k8s.io/apimachinery/pkg/util/runtime.logPanic(0x228c2e0, 0x390a570)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:74 +0x95
  k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:48 +0x86
  panic(0x228c2e0, 0x390a570)
    /usr/lib/golang/src/runtime/panic.go:965 +0x1b9
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/utils.getConnectionOptions(0xc00241b888, 0x0, 0x1, 0xc00241b6e8, 0x3)
    /remote-source/app/pkg/utils/gitrepo.go:169 +0x73
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/utils.CloneGitRepo(0xc00241b888, 0x0, 0xc0015d8500, 0x0, 0x0)
    /remote-source/app/pkg/utils/gitrepo.go:266 +0x13c5
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).cloneGitRepo(0xc0016d2480, 0x29, 0xc00286e1f0, 0x28933f0, 0x1)
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:764 +0x2b4
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).doSubscription(0xc0016d2480, 0x0, 0x0)
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:204 +0x30c
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).doSubscriptionWithRetries(0xc0016d2480, 0x29e8d60800, 0x3)
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:157 +0x45
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).Start.func1()
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:146 +0x18f
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc001b33140)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:155 +0x5f
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001b33140, 0x28c8c80, 0xc001b2a8a0, 0x1, 0xc001a10360)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:156 +0x9b
  k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001b33140, 0x34630b8a000, 0x0, 0xc001d55e01, 0xc001a10360)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:133 +0x98
  k8s.io/apimachinery/pkg/util/wait.Until(0xc001b33140, 0x34630b8a000, 0xc001a10360)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:90 +0x4d
  created by github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).Start
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:128 +0x254
  panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1c60693]

  goroutine 5810 [running]:
  k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:55 +0x109
  panic(0x228c2e0, 0x390a570)
    /usr/lib/golang/src/runtime/panic.go:965 +0x1b9
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/utils.getConnectionOptions(0xc00241b888, 0x0, 0x1, 0xc00241b6e8, 0x3)
    /remote-source/app/pkg/utils/gitrepo.go:169 +0x73
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/utils.CloneGitRepo(0xc00241b888, 0x0, 0xc0015d8500, 0x0, 0x0)
    /remote-source/app/pkg/utils/gitrepo.go:266 +0x13c5
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).cloneGitRepo(0xc0016d2480, 0x29, 0xc00286e1f0, 0x28933f0, 0x1)
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:764 +0x2b4
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).doSubscription(0xc0016d2480, 0x0, 0x0)
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:204 +0x30c
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).doSubscriptionWithRetries(0xc0016d2480, 0x29e8d60800, 0x3)
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:157 +0x45
  github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).Start.func1()
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:146 +0x18f
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc001b33140)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:155 +0x5f
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001b33140, 0x28c8c80, 0xc001b2a8a0, 0x1, 0xc001a10360)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:156 +0x9b
  k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001b33140, 0x34630b8a000, 0x0, 0xc001d55e01, 0xc001a10360)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:133 +0x98
  k8s.io/apimachinery/pkg/util/wait.Until(0xc001b33140, 0x34630b8a000, 0xc001a10360)
    /remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.3/pkg/util/wait/wait.go:90 +0x4d
  created by github.com/open-cluster-management/multicloud-operators-subscription/pkg/subscriber/git.(*SubscriberItem).Start
    /remote-source/app/pkg/subscriber/git/git_subscriber_item.go:128 +0x254

Comment 3 Chad Crum 2021-09-01 18:18:24 UTC
It seems the issue is that the pod can't reach Github, which is expected as this is an ipv6 disconnected env. We do not run into this issue with ACM 2.3

E0901 18:08:04.714746       1 gitrepo.go:263] Get "https://github.com/open-cluster-management/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": dial tcp 140.82.112.4:443: connect: network is unreachable Failed to git clone with the primary channel: Get "https://github.com/open-cluster-management/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": dial tcp 140.82.112.4:443: connect: network is unreachable

Comment 4 Mike Ng 2021-09-01 21:29:07 UTC
As discussed with Chad, there seems to be a local git subscription that is causing this nilpointer. 
The workaround is remove that git subscription for now while we fix the nilpointer.

Comment 5 Mike Ng 2021-09-02 18:40:22 UTC
The "connect: network is unreachable" is not the reason why the standalone subscription pod has been crashing.

The root clause is a nilpointer that was introduced when we added the secondary channel. This has been fixed in https://github.com/open-cluster-management/multicloud-operators-subscription/pull/564/commits/cc19997dbf5af010d73af08e7c42f64bfb77cf6f already.

I think you can try again using a more recent 2.4 development build and the standalone subscription pod should not crash anymore

Comment 6 Chad Crum 2021-09-03 14:17:46 UTC
Validating with latest 2.4 today...

Comment 7 Chad Crum 2021-09-03 14:41:51 UTC
Verified no more nilpointer error crash on 2.4.0-DOWNSTREAM-2021-09-03-01-00-25


multicluster-operators-standalone-subscription-78f8d9bb48-tqgvc   1/1     Running            0          124m

Comment 10 errata-xmlrpc 2021-11-11 18:33:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.4 images and security updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4618

Comment 11 Mike Ng 2022-01-28 17:33:14 UTC
*** Bug 2040500 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.