Bug 2084331

Summary: Resource with multiple capabilities included unless all capabilities are disabled
Product: OpenShift Container Platform Reporter: Jack Ottofaro <jack.ottofaro>
Component: Cluster Version OperatorAssignee: Jack Ottofaro <jack.ottofaro>
Status: CLOSED ERRATA QA Contact: Yang Yang <yanyang>
Severity: medium Docs Contact:
Priority: high    
Version: 4.10CC: aos-team-ota, wking, yanyang
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:11:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jack Ottofaro 2022-05-11 21:32:40 UTC
Description of problem:

A resource that's a member of >1 capabilities, e.g. "capability.openshift.io/name: baremetal+openshift-samples", is not filtered out by library-go's manifest.Include function if any of the capabilities are enabled.

These [1] bool's need to be reset to false in the loop.

Expected:

A resource should be filtered out if any of the capabilities are disabled.

[1] https://github.com/openshift/library-go/blob/f2c20ff2ad2771dd39c1482658f18c2587da6469/pkg/manifest/manifest.go#L174-L175

Comment 2 Yang Yang 2022-05-13 10:24:46 UTC
We don't have any manifest including one more capabilities. So it's unable to be verified with current payloads.

I'm trying to make a fake payload to append baremetal annotation to openshift-samples resources [1].

Then building image using cluster-bot
build openshift/cluster-samples-operator#415

Then installing cluster with baremetal disabled

# oc get clusterversion/version -ojson | jq -r '.spec'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "openshift-samples",
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "18c5a1e6-7ef8-4951-8fa2-66bcc54257cb"
}

# oc get co openshift-samples
NAME                VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
openshift-samples   4.11.0-0.ci.test-2022-05-13-033046-ci-ln-jt2ndik-latest   True        False         False      70m     

# oc get deployment.apps/cluster-samples-operator -oyaml -n openshift-cluster-samples-operator
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    capability.openshift.io/name: openshift-samples+baremetal
    deployment.kubernetes.io/revision: "1"
    include.release.openshift.io/self-managed-high-availability: "true"
  creationTimestamp: "2022-05-13T09:01:26Z"
  generation: 1
  name: cluster-samples-operator
  namespace: openshift-cluster-samples-operator
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: ef6febd4-8c3a-4ff5-818b-c9e9692ac61b
  resourceVersion: "11595"
  uid: bd42b16a-d57e-4c5a-a2cd-2a1390bddfd8
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: cluster-samples-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
      creationTimestamp: null
      labels:
        name: cluster-samples-operator
    spec:
      containers:
      - command:
        - cluster-samples-operator
        env:
        - name: WATCH_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: OPERATOR_NAME
          value: cluster-samples-operator
        - name: RELEASE_VERSION
          value: 4.11.0-0.ci.test-2022-05-13-033046-ci-ln-jt2ndik-latest
        image: registry.build01.ci.openshift.org/ci-ln-jt2ndik/stable@sha256:7b27e068fcae6638833a15ff835cb7cd7772c1557b74fee5a2afb08f009e6a7c
        imagePullPolicy: IfNotPresent
        name: cluster-samples-operator
        ports:
        - containerPort: 60000
          name: metrics
          protocol: TCP
        resources:
          requests:
            cpu: 10m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/secrets
          name: samples-operator-tls
      - args:
        - --namespace=openshift-cluster-samples-operator
        - --process-name=cluster-samples-operator
        - --termination-grace-period=30s
        - --files=/etc/secrets/tls.crt,/etc/secrets/tls.key
        command:
        - cluster-samples-operator-watch
        - file-watcher-watchdog
        image: registry.build01.ci.openshift.org/ci-ln-jt2ndik/stable@sha256:7b27e068fcae6638833a15ff835cb7cd7772c1557b74fee5a2afb08f009e6a7c
        imagePullPolicy: IfNotPresent
        name: cluster-samples-operator-watch
        resources:
          requests:
            cpu: 10m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/master: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: cluster-samples-operator
      serviceAccountName: cluster-samples-operator
      shareProcessNamespace: true
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 120
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 120
      volumes:
      - name: samples-operator-tls
        secret:
          defaultMode: 420
          secretName: samples-operator-tls
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2022-05-13T09:01:32Z"
    lastUpdateTime: "2022-05-13T09:01:32Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2022-05-13T09:01:26Z"
    lastUpdateTime: "2022-05-13T09:01:32Z"
    message: ReplicaSet "cluster-samples-operator-7c679d6c85" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Openshift-samples is annotated by openshift-samples and baremetal. Baremetal is disabled. But openshift-sample is installed which looks incorrect.

[1] https://github.com/openshift/cluster-samples-operator/pull/415/files

Comment 3 Jack Ottofaro 2022-05-13 14:26:26 UTC
(In reply to Yang Yang from comment #2)
> We don't have any manifest including one more capabilities. So it's unable
> to be verified with current payloads.
> 
> I'm trying to make a fake payload to append baremetal annotation to
> openshift-samples resources [1].
> 
> Then building image using cluster-bot
> build openshift/cluster-samples-operator#415
> 
> Then installing cluster with baremetal disabled
> 
> # oc get clusterversion/version -ojson | jq -r '.spec'
> {
>   "capabilities": {
>     "additionalEnabledCapabilities": [
>       "openshift-samples",
>       "marketplace"
>     ],
>     "baselineCapabilitySet": "None"
>   },
>   "channel": "stable-4.11",
>   "clusterID": "18c5a1e6-7ef8-4951-8fa2-66bcc54257cb"
> }
> 
> # oc get co openshift-samples
> NAME                VERSION                                                 
> AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
> openshift-samples   4.11.0-0.ci.test-2022-05-13-033046-ci-ln-jt2ndik-latest 
> True        False         False      70m     
> 
> # oc get deployment.apps/cluster-samples-operator -oyaml -n
> openshift-cluster-samples-operator
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   annotations:
>     capability.openshift.io/name: openshift-samples+baremetal
>     deployment.kubernetes.io/revision: "1"
>     include.release.openshift.io/self-managed-high-availability: "true"
>   creationTimestamp: "2022-05-13T09:01:26Z"
>   generation: 1
>   name: cluster-samples-operator
>   namespace: openshift-cluster-samples-operator
>   ownerReferences:
>   - apiVersion: config.openshift.io/v1
>     kind: ClusterVersion
>     name: version
>     uid: ef6febd4-8c3a-4ff5-818b-c9e9692ac61b
>   resourceVersion: "11595"
>   uid: bd42b16a-d57e-4c5a-a2cd-2a1390bddfd8
> spec:
>   progressDeadlineSeconds: 600
>   replicas: 1
>   revisionHistoryLimit: 10
>   selector:
>     matchLabels:
>       name: cluster-samples-operator
>   strategy:
>     rollingUpdate:
>       maxSurge: 25%
>       maxUnavailable: 25%
>     type: RollingUpdate
>   template:
>     metadata:
>       annotations:
>         target.workload.openshift.io/management: '{"effect":
> "PreferredDuringScheduling"}'
>       creationTimestamp: null
>       labels:
>         name: cluster-samples-operator
>     spec:
>       containers:
>       - command:
>         - cluster-samples-operator
>         env:
>         - name: WATCH_NAMESPACE
>           valueFrom:
>             fieldRef:
>               apiVersion: v1
>               fieldPath: metadata.namespace
>         - name: OPERATOR_NAME
>           value: cluster-samples-operator
>         - name: RELEASE_VERSION
>           value: 4.11.0-0.ci.test-2022-05-13-033046-ci-ln-jt2ndik-latest
>         image:
> registry.build01.ci.openshift.org/ci-ln-jt2ndik/stable@sha256:
> 7b27e068fcae6638833a15ff835cb7cd7772c1557b74fee5a2afb08f009e6a7c
>         imagePullPolicy: IfNotPresent
>         name: cluster-samples-operator
>         ports:
>         - containerPort: 60000
>           name: metrics
>           protocol: TCP
>         resources:
>           requests:
>             cpu: 10m
>             memory: 50Mi
>         securityContext:
>           allowPrivilegeEscalation: false
>           capabilities:
>             drop:
>             - ALL
>         terminationMessagePath: /dev/termination-log
>         terminationMessagePolicy: File
>         volumeMounts:
>         - mountPath: /etc/secrets
>           name: samples-operator-tls
>       - args:
>         - --namespace=openshift-cluster-samples-operator
>         - --process-name=cluster-samples-operator
>         - --termination-grace-period=30s
>         - --files=/etc/secrets/tls.crt,/etc/secrets/tls.key
>         command:
>         - cluster-samples-operator-watch
>         - file-watcher-watchdog
>         image:
> registry.build01.ci.openshift.org/ci-ln-jt2ndik/stable@sha256:
> 7b27e068fcae6638833a15ff835cb7cd7772c1557b74fee5a2afb08f009e6a7c
>         imagePullPolicy: IfNotPresent
>         name: cluster-samples-operator-watch
>         resources:
>           requests:
>             cpu: 10m
>             memory: 50Mi
>         securityContext:
>           allowPrivilegeEscalation: false
>           capabilities:
>             drop:
>             - ALL
>         terminationMessagePath: /dev/termination-log
>         terminationMessagePolicy: File
>       dnsPolicy: ClusterFirst
>       nodeSelector:
>         node-role.kubernetes.io/master: ""
>       priorityClassName: system-cluster-critical
>       restartPolicy: Always
>       schedulerName: default-scheduler
>       securityContext:
>         runAsNonRoot: true
>         seccompProfile:
>           type: RuntimeDefault
>       serviceAccount: cluster-samples-operator
>       serviceAccountName: cluster-samples-operator
>       shareProcessNamespace: true
>       terminationGracePeriodSeconds: 30
>       tolerations:
>       - effect: NoSchedule
>         key: node-role.kubernetes.io/master
>         operator: Exists
>       - effect: NoExecute
>         key: node.kubernetes.io/unreachable
>         operator: Exists
>         tolerationSeconds: 120
>       - effect: NoExecute
>         key: node.kubernetes.io/not-ready
>         operator: Exists
>         tolerationSeconds: 120
>       volumes:
>       - name: samples-operator-tls
>         secret:
>           defaultMode: 420
>           secretName: samples-operator-tls
> status:
>   availableReplicas: 1
>   conditions:
>   - lastTransitionTime: "2022-05-13T09:01:32Z"
>     lastUpdateTime: "2022-05-13T09:01:32Z"
>     message: Deployment has minimum availability.
>     reason: MinimumReplicasAvailable
>     status: "True"
>     type: Available
>   - lastTransitionTime: "2022-05-13T09:01:26Z"
>     lastUpdateTime: "2022-05-13T09:01:32Z"
>     message: ReplicaSet "cluster-samples-operator-7c679d6c85" has
> successfully progressed.
>     reason: NewReplicaSetAvailable
>     status: "True"
>     type: Progressing
>   observedGeneration: 1
>   readyReplicas: 1
>   replicas: 1
>   updatedReplicas: 1
> 
> Openshift-samples is annotated by openshift-samples and baremetal. Baremetal
> is disabled. But openshift-sample is installed which looks incorrect.
> 
> [1] https://github.com/openshift/cluster-samples-operator/pull/415/files

If you happen to still have the CVO log file around for that run please attach. I will give your test a try as well.

Comment 4 Jack Ottofaro 2022-05-13 14:42:47 UTC
Forget the comment above. I should have moved this bug back to POST. It won't be ready to test until I vendor in the change to CVO. I'll be sending up another PR to do that.

Comment 6 Yang Yang 2022-05-16 03:22:21 UTC
Verifying by rebuilding an release payload using cluster-bot
build openshift/cluster-samples-operator#415

Then we have openshif-samples annotated with both openshift-samples and baremetal

Then install a cluster with the payload by enabling marketplace and openshift-samples

# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "openshift-samples",
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "ce64c142-9644-419e-aa06-a4b13b52c023"
}
{
  "enabledCapabilities": [
    "marketplace",
    "openshift-samples"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}

# oc get co openshift-samples
Error from server (NotFound): clusteroperators.config.openshift.io "openshift-samples" not found

The openshift-samples is excluded because baremetal is disabled. Looks good to me.

Comment 8 errata-xmlrpc 2022-08-10 11:11:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069