Bug 2134936 - [RDR] Pods remain stuck in ContainerCreating state for more than 45mins when Failover is performed
Summary: [RDR] Pods remain stuck in ContainerCreating state for more than 45mins when ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.12.0
Assignee: Olive Lakra
QA Contact: Aman Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-14 17:58 UTC by Aman Agrawal
Modified: 2023-08-09 16:43 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-08 14:12:56 UTC
Embargoed:


Attachments (Terms of Use)

Comment 4 Benamar Mekhissi 2022-10-14 18:44:29 UTC
The Pods were stuck in the `ContainerCreating` due to the "force promotion" failures.
```
force promoting volume due to known grpc error  {"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication", "VolumeReplication": {"name":"busybox-pvc-12","namespace":"busybox-workloads-1"}, "namespace": "busybox-workloads-1", "name": "busybox-pvc-12", "reconcileID": "32296fda-0d04-494e-bd98-ba245b79a6f7", "Request.Name": "busybox-pvc-12", "Request.Namespace": "busybox-workloads-1", "error": "rpc error: code = FailedPrecondition desc = failed to promote image \"ocs-storagecluster-cephblockpool/csi-vol-cf2bb1d9-0d45-4c6b-bdbb-66eefcd68037\" with error: rbd: ret=-16, Device or resource busy"}
```
...
```
2022-10-14T16:08:13.347Z        ERROR   failed to force promote volume  {"controller": "volumereplication", "controllerGroup": "replication.storage.openshift.io", "controllerKind": "VolumeReplication", "VolumeReplication": {"name":"busybox-pvc-12","namespace":"busybox-workloads-1"}, "namespace": "busybox-workloads-1", "name": "busybox-pvc-12", "reconcileID": "32296fda-0d04-494e-bd98-ba245b79a6f7", "Request.Name": "busybox-pvc-12", "Request.Namespace": "busybox-workloads-1", "error": "rpc error: code = Internal desc = failed to promote image \"ocs-storagecluster-cephblockpool/csi-vol-cf2bb1d9-0d45-4c6b-bdbb-66eefcd68037\" with error: an error (timeout: context deadline exceeded) and stderror () occurred while running rbd args: [mirror image promote ocs-storagecluster-cephblockpool/csi-vol-cf2bb1d9-0d45-4c6b-bdbb-66eefcd68037 --force --id csi-rbd-provisioner -m 172.31.225.184:6789,172.31.218.255:6789,172.31.154.20:6789 --keyfile=***stripped***]"}
github.com/csi-addons/kubernetes-csi-addons/controllers/replication%2estorage.(*VolumeReplicationReconciler).markVolumeAsPrimary

```

Those errors continued for about 47 minutes.

Can you please @mrajanna take a look

Comment 6 Madhu Rajanna 2022-10-17 06:50:27 UTC
http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/2134909/oct15/15-10-2022_00-08-33/15-10-2022_00-08-33-api-amagrawa-c1-qe-rh-ocs-com/ocs-mg/quay-io-rhceph-dev-ocs-must-gather-sha256-562b69969b9212a0f3cf1a588c101c5e6cf6f345fa05c462ca22d2448786aac4/namespaces/all/vr_all_namespaces

NAMESPACE             NAME             AGE    VOLUMEREPLICATIONCLASS                 PVCNAME          DESIREDSTATE   CURRENTSTATE
busybox-workloads-1   busybox-pvc-1    154m   rbd-volumereplicationclass-539797778   busybox-pvc-1    primary        Primary
busybox-workloads-1   busybox-pvc-10   154m   rbd-volumereplicationclass-539797778   busybox-pvc-10   primary        Primary
busybox-workloads-1   busybox-pvc-11   154m   rbd-volumereplicationclass-539797778   busybox-pvc-11   primary        Primary
busybox-workloads-1   busybox-pvc-12   154m   rbd-volumereplicationclass-539797778   busybox-pvc-12   primary        Primary
busybox-workloads-1   busybox-pvc-13   154m   rbd-volumereplicationclass-539797778   busybox-pvc-13   primary        Primary
busybox-workloads-1   busybox-pvc-14   154m   rbd-volumereplicationclass-539797778   busybox-pvc-14   primary        Primary
busybox-workloads-1   busybox-pvc-15   154m   rbd-volumereplicationclass-539797778   busybox-pvc-15   primary        Primary
busybox-workloads-1   busybox-pvc-16   154m   rbd-volumereplicationclass-539797778   busybox-pvc-16   primary        Primary
busybox-workloads-1   busybox-pvc-17   154m   rbd-volumereplicationclass-539797778   busybox-pvc-17   primary        Primary
busybox-workloads-1   busybox-pvc-18   154m   rbd-volumereplicationclass-539797778   busybox-pvc-18   primary        Primary
busybox-workloads-1   busybox-pvc-19   154m   rbd-volumereplicationclass-539797778   busybox-pvc-19   primary        Primary
busybox-workloads-1   busybox-pvc-2    154m   rbd-volumereplicationclass-539797778   busybox-pvc-2    primary        Primary
busybox-workloads-1   busybox-pvc-20   154m   rbd-volumereplicationclass-539797778   busybox-pvc-20   primary        Primary
busybox-workloads-1   busybox-pvc-3    154m   rbd-volumereplicationclass-539797778   busybox-pvc-3    primary        Primary
busybox-workloads-1   busybox-pvc-4    154m   rbd-volumereplicationclass-539797778   busybox-pvc-4    primary        Primary
busybox-workloads-1   busybox-pvc-5    154m   rbd-volumereplicationclass-539797778   busybox-pvc-5    primary        Primary
busybox-workloads-1   busybox-pvc-6    154m   rbd-volumereplicationclass-539797778   busybox-pvc-6    primary        Primary
busybox-workloads-1   busybox-pvc-7    154m   rbd-volumereplicationclass-539797778   busybox-pvc-7    primary        Primary
busybox-workloads-1   busybox-pvc-8    154m   rbd-volumereplicationclass-539797778   busybox-pvc-8    primary        Primary
busybox-workloads-1   busybox-pvc-9    154m   rbd-volumereplicationclass-539797778   busybox-pvc-9    primary        Primary


But looking at the cephcsi level the promote and force promote operation is failing for all the rbd images
http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/2134909/oct15/15-10-2022_00-08-33/15-10-2022_00-08-33-api-amagrawa-c1-qe-rh-ocs-com/ocs-mg/quay-io-rhceph-dev-ocs-must-gather-sha256-562b69969b9212a0f3cf1a588c101c5e6cf6f345fa05c462ca22d2448786aac4/namespaces/openshift-storage/pods/csi-rbdplugin-provisioner-658458dc69-bdd84/csi-rbdplugin/csi-rbdplugin/logs/current.log


[đŸŽŠī¸Ž]mrajanna@fedora ceph-csi $]cat current.log |grep "0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC req"
2022-10-14T16:06:15.672357045Z I1014 16:06:15.672047       1 utils.go:204] ID: 345 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:06:15.717986581Z I1014 16:06:15.717947       1 utils.go:204] ID: 346 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:06:15.780484534Z I1014 16:06:15.780469       1 utils.go:204] ID: 347 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:10:16.264525102Z I1014 16:10:16.264493       1 utils.go:204] ID: 378 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:10:16.327807000Z I1014 16:10:16.327794       1 utils.go:204] ID: 379 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:10:16.386249172Z I1014 16:10:16.386244       1 utils.go:204] ID: 380 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:12:16.470316701Z I1014 16:12:16.470290       1 utils.go:204] ID: 402 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:12:16.510987042Z I1014 16:12:16.510978       1 utils.go:204] ID: 403 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:12:16.569792614Z I1014 16:12:16.569778       1 utils.go:204] ID: 404 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:14:16.662625007Z I1014 16:14:16.662614       1 utils.go:204] ID: 426 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:14:16.706510799Z I1014 16:14:16.706506       1 utils.go:204] ID: 427 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:14:16.765946583Z I1014 16:14:16.765926       1 utils.go:204] ID: 428 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:18:17.299743439Z I1014 16:18:17.299670       1 utils.go:204] ID: 465 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:18:17.341124854Z I1014 16:18:17.341105       1 utils.go:204] ID: 466 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:18:17.394403291Z I1014 16:18:17.394391       1 utils.go:204] ID: 467 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:20:17.788334355Z I1014 16:20:17.788321       1 utils.go:204] ID: 492 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:20:17.836286106Z I1014 16:20:17.836274       1 utils.go:204] ID: 493 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:20:17.921056420Z I1014 16:20:17.921034       1 utils.go:204] ID: 494 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:28:28.734499184Z I1014 16:28:28.734461       1 utils.go:204] ID: 573 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:28:28.787574467Z I1014 16:28:28.787557       1 utils.go:204] ID: 574 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:28:28.859912654Z I1014 16:28:28.859899       1 utils.go:204] ID: 575 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:39:03.246355871Z I1014 16:39:03.246339       1 utils.go:204] ID: 669 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:39:03.294772733Z I1014 16:39:03.294759       1 utils.go:204] ID: 670 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:39:03.378881536Z I1014 16:39:03.378819       1 utils.go:204] ID: 671 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:08:52.035896880Z I1014 17:08:52.035833       1 utils.go:204] ID: 731 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:08:52.104160857Z I1014 17:08:52.104142       1 utils.go:204] ID: 732 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:39:02.221923891Z I1014 17:39:02.221917       1 utils.go:204] ID: 784 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:39:02.279096509Z I1014 17:39:02.279063       1 utils.go:204] ID: 785 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:54:09.377836159Z I1014 17:54:09.377824       1 utils.go:204] ID: 811 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:54:09.435040936Z I1014 17:54:09.435021       1 utils.go:204] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:54:25.700205749Z I1014 17:54:25.700195       1 utils.go:204] ID: 816 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:54:25.752030819Z I1014 17:54:25.752016       1 utils.go:204] ID: 817 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T18:24:33.356632386Z I1014 18:24:33.356625       1 utils.go:204] ID: 878 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T18:24:33.430468149Z I1014 18:24:33.430451       1 utils.go:204] ID: 879 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T18:39:36.150034943Z I1014 18:39:36.150010       1 utils.go:204] ID: 896 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T18:39:36.211297215Z I1014 18:39:36.211283       1 utils.go:204] ID: 897 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}


2022-10-14T17:54:09.435040936Z I1014 17:54:09.434958       1 utils.go:195] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC call: /replication.Controller/PromoteVolume
2022-10-14T17:54:09.435040936Z I1014 17:54:09.435021       1 utils.go:204] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T17:54:09.437142322Z I1014 17:54:09.436892       1 omap.go:88] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b got omap values: (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volume.3fff7e2e-166c-4e48-8cd5-194beed9a66b"): map[csi.imageid:95995b7da016 csi.imagename:csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b csi.volname:pvc-7e810d94-b24c-4824-959d-4533c5be0ac7 csi.volume.owner:busybox-workloads-1]
2022-10-14T17:54:09.505401078Z I1014 17:54:09.505295       1 replicationcontrollerserver.go:603] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b Added scheduling at interval 15m, start time  for volume ocs-storagecluster-cephblockpool/csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b
2022-10-14T17:54:09.505401078Z I1014 17:54:09.505322       1 replicationcontrollerserver.go:611] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b attempting to tickle dummy image for restarting RBD schedules
2022-10-14T17:54:15.102500030Z E1014 17:54:15.102440       1 utils.go:210] ID: 812 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC error: rpc error: code = Internal desc = failed to enable mirroring on dummy image rados: ret=-22, Invalid argument: "Invalid level spec ocs-storagecluster-cephblockpool/csi-vol-dummy-180cd333-b712-4781-93f8-5244f64c5fa0: image csi-vol-dummy-180cd333-b712-4781-93f8-5244f64c5fa0 is not in snapshot mirror mode"

{
            "name": "csi-vol-dummy-180cd333-b712-4781-93f8-5244f64c5fa0",
            "global_id": "626dbf4f-9024-4775-8baf-b3a37c02a438",
            "state": "up+stopped",
            "description": "local image is primary",
            "daemon_service": {
                "service_id": "277290",
                "instance_id": "277389",
                "daemon_id": "a",
                "hostname": "compute-2"
            },
            "last_update": "2022-10-14 18:41:33",
            "peer_sites": [
                {
                    "site_name": "3f4cdd20-8f07-420e-b956-ef5b082f44b6",
                    "mirror_uuids": "dcd5727d-5abc-47ce-8f7a-ac607d784fd8",
                    "state": "up+replaying",
                    "description": "replaying, {\"bytes_per_second\":0.0,\"bytes_per_snapshot\":0.0,\"local_snapshot_timestamp\":1665772860,\"remote_snapshot_timestamp\":1665772860,\"replay_state\":\"idle\"}",
                    "last_update": "2022-10-14 18:41:31"
                }
            ]
        },
        
[đŸŽŠī¸Ž]mrajanna@fedora ceph-csi $]cat current.log |grep "ID: 671"
2022-10-14T16:39:03.378782053Z I1014 16:39:03.378721       1 utils.go:195] ID: 671 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC call: /replication.Controller/PromoteVolume
2022-10-14T16:39:03.378881536Z I1014 16:39:03.378819       1 utils.go:204] ID: 671 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC request: {"force":true,"parameters":{"mirroringMode":"snapshot","schedulingInterval":"15m"},"secrets":"***stripped***","volume_id":"0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b"}
2022-10-14T16:39:03.382953796Z I1014 16:39:03.382846       1 omap.go:88] ID: 671 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b got omap values: (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volume.3fff7e2e-166c-4e48-8cd5-194beed9a66b"): map[csi.imageid:95995b7da016 csi.imagename:csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b csi.volname:pvc-7e810d94-b24c-4824-959d-4533c5be0ac7 csi.volume.owner:busybox-workloads-1]
2022-10-14T16:41:03.432251052Z E1014 16:41:03.432195       1 replicationcontrollerserver.go:576] ID: 671 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b failed to promote image "ocs-storagecluster-cephblockpool/csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b" with error: an error (timeout: context deadline exceeded) and stderror () occurred while running rbd args: [mirror image promote ocs-storagecluster-cephblockpool/csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b --force --id csi-rbd-provisioner -m 172.31.225.184:6789,172.31.218.255:6789,172.31.154.20:6789 --keyfile=***stripped***]
2022-10-14T16:41:03.432293408Z E1014 16:41:03.432275       1 utils.go:210] ID: 671 Req-ID: 0001-0011-openshift-storage-0000000000000001-3fff7e2e-166c-4e48-8cd5-194beed9a66b GRPC error: rpc error: code = Internal desc = failed to promote image "ocs-storagecluster-cephblockpool/csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b" with error: an error (timeout: context deadline exceeded) and stderror () occurred while running rbd args: [mirror image promote ocs-storagecluster-cephblockpool/csi-vol-3fff7e2e-166c-4e48-8cd5-194beed9a66b --force --id csi-rbd-provisioner -m 172.31.225.184:6789,172.31.218.255:6789,172.31.154.20:6789 --keyfile=***stripped***]

 
Promotion started around 16:06:15.765951 and completed at 16:41:03.432275.  The force promotion on this volume it took 34 minutes.	


The pod also got  NodeStageRequest [1] at 16:06 As the image was not primary till 16:41 it was failing because rbdplugin will not map the image until its primary, We got stage request at 16:53:14.963439 at 16:53 the image was successfully mapped as expected.

[1] http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/2134909/oct15/15-10-2022_00-08-33/15-10-2022_00-08-33-api-amagrawa-c1-qe-rh-ocs-com/ocs-mg/quay-io-rhceph-dev-ocs-must-gather-sha256-562b69969b9212a0f3cf1a588c101c5e6cf6f345fa05c462ca22d2448786aac4/namespaces/openshift-storage/pods/csi-rbdplugin-hz2w7/csi-rbdplugin/csi-rbdplugin/logs/current.log



We have two problem here

1. The promote operation both go-ceph calls and CLI calls toook 30+ minutes, @Ilya is this expected?
2. Even though the dummy image is always snapshot mirrored why we are getting error saying its not snapshot mode mirroring. (Am not debugging it as this is the workaround and removed in latest 4.12 builds) @Aman can you please test this with latest 4.12 builds?

Comment 10 Madhu Rajanna 2022-10-20 12:39:01 UTC
*** Bug 2136416 has been marked as a duplicate of this bug. ***

Comment 11 Mudit Agarwal 2022-10-26 05:12:48 UTC
Can we retest this now based on https://bugzilla.redhat.com/show_bug.cgi?id=2134936#c8?


Note You need to log in before you can comment on or make changes to this bug.