Bug 2055359 - [DR] ramen-dr-cluster-operator pod go into CrashLoopBackOff state during failover or relocate operation
Summary: [DR] ramen-dr-cluster-operator pod go into CrashLoopBackOff state during fail...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ODF Blockers Only (Development Freeze)
: ODF 4.10.0
Assignee: Shyamsundar
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-16 18:38 UTC by Sidhant Agrawal
Modified: 2023-08-09 17:00 UTC (History)
9 users (show)

Fixed In Version: 4.10.0-167
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-21 09:12:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen pull 396 0 None Merged Fix incorrect format for VRG log message 2022-02-23 02:56:06 UTC
Github red-hat-storage ramen pull 13 0 None open Bug 2055359: Fix incorrect format for VRG log message 2022-02-23 12:49:29 UTC

Description Sidhant Agrawal 2022-02-16 18:38:09 UTC
Description of problem (please be detailed as possible and provide log
snippests):
After initiating failover or relocate operation, ramen-dr-cluster-operator pod goes into CrashLoopBackOff state and restarts ~6 times before coming to Running state.
Also observed the panic "panic: odd number of arguments passed as key-value pairs for logging" in the pod logs.

Output from one of the managed cluster: 
```
2022-02-16T18:22:18.943Z	DPANIC	controllers.VolumeReplicationGroup.vrginstance	controllers/volumereplicationgroup_controller.go:1671	odd number of arguments passed as key-value pairs for logging	{"VolumeReplicationGroup": "busybox-sagrawal-c1/busybox-drpc", "State": "secondary", "pvc": "busybox-sagrawal-c1/busybox-pvc-2", "ignored key": "secondary"}
github.com/ramendr/ramen/controllers.(*VRGInstance).updateVR
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1671
github.com/ramendr/ramen/controllers.(*VRGInstance).createOrUpdateVR
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1640
github.com/ramendr/ramen/controllers.(*VRGInstance).processVRAsSecondary
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1586
github.com/ramendr/ramen/controllers.(*VRGInstance).reconcileVRAsSecondary
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1162
github.com/ramendr/ramen/controllers.(*VRGInstance).reconcileVRsAsSecondary
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1128
github.com/ramendr/ramen/controllers.(*VRGInstance).handleVRGMode
	/remote-source/app/controllers/volumereplicationgroup_controller.go:983
github.com/ramendr/ramen/controllers.(*VRGInstance).processAsSecondary
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1082
github.com/ramendr/ramen/controllers.(*VRGInstance).processVRGActions
	/remote-source/app/controllers/volumereplicationgroup_controller.go:412
github.com/ramendr/ramen/controllers.(*VRGInstance).processVRG
	/remote-source/app/controllers/volumereplicationgroup_controller.go:398
github.com/ramendr/ramen/controllers.(*VolumeReplicationGroupReconciler).Reconcile
	/remote-source/app/controllers/volumereplicationgroup_controller.go:312
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:214
2022-02-16T18:22:19.032Z	INFO	controllers.VolumeReplicationGroup.vrginstance	zapcore/entry.go:234	Exiting processing VolumeReplicationGroup	{"VolumeReplicationGroup": "busybox-sagrawal-c1/busybox-drpc", "State": "secondary"}
2022-02-16T18:22:19.032Z	INFO	controllers.VolumeReplicationGroup	zapcore/entry.go:234	Exiting reconcile loop	{"VolumeReplicationGroup": "busybox-sagrawal-c1/busybox-drpc"}
panic: odd number of arguments passed as key-value pairs for logging

goroutine 565 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000af8480, 0xc00099c540, 0x1, 0x1)
	/remote-source/deps/gomod/pkg/mod/go.uber.org/zap.1/zapcore/entry.go:234 +0x532
go.uber.org/zap.(*Logger).DPanic(0xc000c9aae0, 0x1dc24e8, 0x3d, 0xc00099c540, 0x1, 0x1)
	/remote-source/deps/gomod/pkg/mod/go.uber.org/zap.1/logger.go:220 +0x85
github.com/go-logr/zapr.handleFields(0xc000c9aae0, 0xc000d17830, 0x3, 0x3, 0x0, 0x0, 0x0, 0x10, 0x1a5d760, 0x1)
	/remote-source/deps/gomod/pkg/mod/github.com/go-logr/zapr.0/zapr.go:100 +0x5e5
github.com/go-logr/zapr.(*zapLogger).Info(0xc00014aab0, 0x1d940f6, 0x29, 0xc000d17830, 0x3, 0x3)
	/remote-source/deps/gomod/pkg/mod/github.com/go-logr/zapr.0/zapr.go:127 +0xb0
github.com/ramendr/ramen/controllers.(*VRGInstance).updateVR(0xc000af8cc0, 0xc000db3180, 0x1d5da48, 0x9, 0x207dfc8, 0xc00014aab0, 0xd, 0x20989e0, 0xc000db3180)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1671 +0x6e3
github.com/ramendr/ramen/controllers.(*VRGInstance).createOrUpdateVR(0xc000af8cc0, 0xc0005d53c8, 0x13, 0xc0002f1550, 0xd, 0x1d5da48, 0x9, 0x207dfc8, 0xc00014aab0, 0xc0009f9518, ...)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1640 +0x957
github.com/ramendr/ramen/controllers.(*VRGInstance).processVRAsSecondary(...)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1586
github.com/ramendr/ramen/controllers.(*VRGInstance).reconcileVRAsSecondary(0xc000af8cc0, 0xc000517d98, 0x207dfc8, 0xc00014aab0, 0x2070000)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1162 +0xd2
github.com/ramendr/ramen/controllers.(*VRGInstance).reconcileVRsAsSecondary(0xc000af8cc0, 0x0)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1128 +0x26b
github.com/ramendr/ramen/controllers.(*VRGInstance).handleVRGMode(0xc000af8cc0, 0x1d5da48, 0x9, 0x0)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:983 +0x115
github.com/ramendr/ramen/controllers.(*VRGInstance).processAsSecondary(0xc000af8cc0, 0xc0005b4d00, 0x0, 0x0, 0x0)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:1082 +0x3fe
github.com/ramendr/ramen/controllers.(*VRGInstance).processVRGActions(0xc000af8cc0, 0x0, 0x0, 0x431ae7, 0x585585585585)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:412 +0x1c5
github.com/ramendr/ramen/controllers.(*VRGInstance).processVRG(0xc000af8cc0, 0xc000af8cf0, 0xc000c82d80, 0xc00061abe8, 0x13)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:398 +0x74e
github.com/ramendr/ramen/controllers.(*VolumeReplicationGroupReconciler).Reconcile(0xc000bb8300, 0x2073c78, 0xc000c82d80, 0xc00061abe8, 0x13, 0xc000b872d0, 0xc, 0xc000c82d00, 0x0, 0x0, ...)
	/remote-source/app/controllers/volumereplicationgroup_controller.go:312 +0x710
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000b43ea0, 0x2073bd0, 0xc000d30000, 0x1b72b60, 0xc000d2a440)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:298 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000b43ea0, 0x2073bd0, 0xc000d30000, 0xc000594d00)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc0002f04b0, 0xc000b43ea0, 0x2073bd0, 0xc000d30000)
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:214 +0x6b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0-beta.4/pkg/internal/controller/controller.go:210 +0x425
```
	
Version of all relevant components (if applicable):
OCP: 4.10.0-0.nightly-2022-02-11-123954
ODF: 4.10.0-156

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Configure RDR setup with 1 ACM hub and 2 managed clusters
2. Deploy workload
3. Initiate failover or relocate action and observe the ramen-dr-cluster-operator pod where the workload was running


Actual results:
ramen-dr-cluster-operator pod goes into CrashLoopBackOff state due to panic during failover or relocate operations

Expected results:
ramen-dr-cluster-operator pod should not go into CrashLoopBackOff state during failover or relocate operations

Additional info:

Comment 8 Mudit Agarwal 2022-02-23 02:56:07 UTC
Please backport to 4.10


Note You need to log in before you can comment on or make changes to this bug.