2064722 – [Tracker] [DR][ACM 2.5] Applications are not getting deployed on managed cluster

Bug 2064722 - [Tracker] [DR][ACM 2.5] Applications are not getting deployed on managed cluster

Summary: [Tracker] [DR][ACM 2.5] Applications are not getting deployed on managed cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	App Lifecycle
Sub Component:
Version:	rhacm-2.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	rhacm-2.5
Assignee:	Mike Ng
QA Contact:	Napoco Agbetra
Docs Contact:	bswope@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:	2058220
TreeView+	depends on / blocked

Reported:	2022-03-16 12:23 UTC by Sidhant Agrawal
Modified:	2023-02-03 10:48 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-09 02:09:46 UTC
Target Upstream Version:
Embargoed:
Flags:	bot-tracker-sync: rhacm-2.5+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	stolostron backlog issues 20789	0	None	None	None	2022-03-23 21:10:14 UTC
Red Hat Product Errata	RHSA-2022:4956	0	None	None	None	2022-06-09 02:09:54 UTC

Description Sidhant Agrawal 2022-03-16 12:23:53 UTC

Description of problem (please be detailed as possible and provide log
snippets):
With RDR setup configured using ACM 2.5, applications are not getting deployed on managed cluster
Pods,PVCs are not getting created

Version of all relevant components (if applicable):
OCP: 4.10.0-0.nightly-2022-03-14-215709
ODF: 4.10.0-194
ACM: 2.5.0-DOWNSTREAM-2022-03-14-13-03-46

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, unable to deploy application

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes, tried app deployment on same setup multiple times

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Issue observed first time with ACM 2.5

Steps to Reproduce:
1. Configure RDR setup with ACM 2.5 
2. Create DRPC and PlacementRule resource
3. Create application using ACM console 
4. Observe that pods and pvc will not be created

Actual results:
Application deployment is unsuccessful

Expected results:
Application should be deployed with expected no. of pods,pvcs.

Additional info:
>> from hub
$ oc get drpc busybox-drpc -n busybox-sagrawal-c1 -o yaml | grep phase
  phase: Deployed
  
$ oc get placementrule busybox-placement -n busybox-sagrawal-c1 -o yaml
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
  annotations:
    drplacementcontrol.ramendr.openshift.io/drpc-name: busybox-drpc
    drplacementcontrol.ramendr.openshift.io/drpc-namespace: busybox-sagrawal-c1
    open-cluster-management.io/user-group: c3lzdGVtOm1hc3RlcnMsc3lzdGVtOmF1dGhlbnRpY2F0ZWQ=
    open-cluster-management.io/user-identity: c3lzdGVtOmFkbWlu
  creationTimestamp: "2022-03-16T10:50:48Z"
  finalizers:
  - drpc.ramendr.openshift.io/finalizer
  generation: 1
  labels:
    app: busybox-sample
  name: busybox-placement
  namespace: busybox-sagrawal-c1
  resourceVersion: "3461368"
  uid: 867f30c9-a432-42ce-b777-019c1819cb25
spec:
  clusterConditions:
  - status: "True"
    type: ManagedClusterConditionAvailable
  clusterReplicas: 1
  schedulerName: ramen
status:
  decisions:
  - clusterName: sagrawal-c1
    clusterNamespace: sagrawal-c1
    
>> from managed cluster "sagrawal-c1"    
$ oc get vrc,vrg,vr,pods,pvc -n busybox-sagrawal-c1
NAME                                                                                            PROVISIONER
volumereplicationclass.replication.storage.openshift.io/rbd-volumereplicationclass-1625360775   openshift-storage.rbd.csi.ceph.com

NAME                                                       AGE
volumereplicationgroup.ramendr.openshift.io/busybox-drpc   89m
$

Comment 4 Benamar Mekhissi 2022-03-16 18:34:55 UTC

There is a backward compatibility issue with ACM 2.5. For DR, The PlacementRule for a subscription is created with Ramen as the scheduler. Like this:
```
spec:
  clusterConditions:
  - status: "True"
    type: ManagedClusterConditionAvailable
  clusterReplicas: 1
  schedulerName: ramen
```
When Ramen is the scheduler, the PlacementRule will not decide on the location of the subscription, but rather, it depends on Ramen to tell it in which location the subscription is supposed to be placed.
```
status:
  decisions:
  - clusterName: sagrawal-c1
    clusterNamespace: sagrawal-c1
```
In this case, the *.status.decisions* is updated by Ramen.

Since we upgraded to ACM 2.5 this has stopped working as expected. In other words, If the .spec.schedulerName is set to *ramen*, even though Ramen updates the status decisions, the subscription for the target managedcluster will NOT be created. The workaround is to set the clusterSelector and remove the schedulerName from the spec section similar to this
```
spec:
  clusterSelector:
    matchLabels:
      name: sagrawal-c1
status:
  decisions:
  - clusterName: sagrawal-c1
    clusterNamespace: sagrawal-c1
```

This is definitely a blocker. The workaround will require Ramen to change. This is not an option for the DR team at this point. We will request a fix from the appLC team.

Comment 5 Benamar Mekhissi 2022-03-16 19:53:51 UTC

An issue against ACM has been opened here: https://github.com/stolostron/backlog/issues/20789

Comment 6 Yaniv Kaul 2022-03-17 07:56:49 UTC

(In reply to Benamar Mekhissi from comment #5)
> An issue against ACM has been opened here:
> https://github.com/stolostron/backlog/issues/20789

Invalid link?

Comment 7 Benamar Mekhissi 2022-03-17 11:22:37 UTC

The link is valid.  You just have to have rights access to the stolostron backlog.

Comment 8 Yaniv Kaul 2022-03-17 11:37:19 UTC

(In reply to Benamar Mekhissi from comment #7)
> The link is valid.  You just have to have rights access to the stolostron
> backlog.

Thanks - who do I have to ask to get access? (surprised somewhat to see a private repo on github).

Comment 10 Benamar Mekhissi 2022-03-17 16:07:57 UTC

(In reply to Yaniv Kaul from comment #8)
> (In reply to Benamar Mekhissi from comment #7)
> > The link is valid.  You just have to have rights access to the stolostron
> > backlog.
> 
> Thanks - who do I have to ask to get access? (surprised somewhat to see a
> private repo on github).

You need to send request to forum-amc-devops similar to this: https://coreos.slack.com/archives/CSZLMKPS5/p1647533098820809

Comment 18 bot-tracker-sync 2022-03-29 21:48:31 UTC

G2Bsync 1077903053 comment 
 mikeshng Thu, 24 Mar 2022 18:10:12 UTC 
 G2Bsync `2.5.0-SNAPSHOT-2022-03-24-17-14-58` build should contain the fix.

Comment 19 Mike Ng 2022-03-29 21:56:26 UTC

Please ignore the last bot-tracker-sync comment

Comment 20 Napoco Agbetra 2022-05-05 14:32:12 UTC

Verified the bug fix on ACM side (ACM 2.5)
The hub subscription controller reconciles the subscription to read the cluster decision and propagate to the cluster

Comment 23 errata-xmlrpc 2022-06-09 02:09:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956

Comment 24 Jorge J. Brown 2023-02-03 10:48:37 UTC Comment hidden (spam)

I trust you will post more like that later on. https://www.dg-paystub.com/

Note You need to log in before you can comment on or make changes to this bug.