2142891 – VM latency checkup: Failed to create the checkup's Job

Bug 2142891 - VM latency checkup: Failed to create the checkup's Job

Summary: VM latency checkup: Failed to create the checkup's Job

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.12.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Orel Misan
QA Contact:	Yossi Segev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-15 11:58 UTC by Orel Misan
Modified:	2023-01-24 13:42 UTC (History)
CC List:	4 users (show)
Fixed In Version:	vm-network-latency-checkup-container-v4.12.0-88
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-24 13:42:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kiagnose kiagnose pull 217	None	Merged	vmlatency: Define user at Dockerfile	2022-12-08 10:35:30 UTC
Red Hat Issue Tracker	CNV-22506	None	None	None	2022-11-15 13:04:57 UTC
Red Hat Product Errata	RHSA-2023:0408	None	None	None	2023-01-24 13:42:20 UTC

Description Orel Misan 2022-11-15 11:58:28 UTC

Description of problem:
When creating the checkup's Job, the following error occurs (could be seen on the Pod's description):

```
 Warning  FailedCreate  20s (x5 over 2m30s)  job-controller  Error creating: pods "kubevirt-vm-latency-checkup-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider "containerized-data-importer": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000930000, 1000939999], provider "net-admin": Forbidden: not usable by user or serviceaccount, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "noobaa": Forbidden: not usable by user or serviceaccount, provider "noobaa-endpoint": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "kubevirt-controller": Forbidden: not usable by user or serviceaccount, provider "bridge-marker": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "ocs-metrics-exporter": Forbidden: not usable by user or serviceaccount, provider "linux-bridge": Forbidden: not usable by user or serviceaccount, provider "kubevirt-handler": Forbidden: not usable by user or serviceaccount, provider "rook-ceph": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "trident": Forbidden: not usable by user or serviceaccount, provider "rook-ceph-csi": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
```

Version-Release number of selected component (if applicable):
4.12.0

How reproducible:


Steps to Reproduce:
1. Create a NetworkAttachmentDefinition
2. Configure the user-supplied ConfigMap
3. Create the checkup's Job:
```
---
apiVersion: batch/v1
kind: Job
metadata:
  name: kubevirt-vm-latency-checkup
spec:
  backoffLimit: 0
  template:
    spec:
      serviceAccountName: vm-latency-checkup-sa
      restartPolicy: Never
      containers:
        - name: vm-latency-checkup
          image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-vm-network-latency-checkup:v4.12.0
          securityContext:
            runAsUser: 1000
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
            runAsNonRoot: true
            seccompProfile:
              type: "RuntimeDefault"
          env:
            - name: CONFIGMAP_NAMESPACE
              value: <target-namespace>
            - name: CONFIGMAP_NAME
              value: kubevirt-vm-latency-checkup-config
```

4. Describe the created Pod.

Actual results:
The checkup Job underlying pod doesn't start.

Expected results:
The checkup Job underlying pod should start.

Additional info:
Doing all actions as a project-admin.

Comment 1 Petr Horáček 2022-11-15 12:49:33 UTC

This is needed to address https://issues.redhat.com/browse/CNV-18990.

@awax would you get a chance to verify that this is reproducible before we release a fix?

Comment 3 Yossi Segev 2022-12-15 13:53:43 UTC

@omisan Can you please provide a full reproduction scenario?
I am missing the NAD manifest from steps #1, and the ConfigMap from step #2.
Thank you.

Comment 4 Orel Misan 2022-12-15 14:05:15 UTC

Hi @ysegev, the problem was at the Dockerfile used to create the checkup's image,
You can input any valid ConfigMap and NetworkAttachmentDefinition.

Comment 5 Orel Misan 2022-12-15 14:06:41 UTC

Please note that we no longer use the `runAsUser` field, because that information is baked to the container image.

Comment 6 Yossi Segev 2022-12-15 14:28:13 UTC

"You can input any valid ConfigMap and NetworkAttachmentDefinition" is a sure recipe for mis-configuration.
When reproducing/verifying a bug, the exact scenario is needed. This includes the exact resources used, otherwise the verifier might find themselves attempting to reproduce using non-relevant or false resources.
Please provide the NAD and ConfigMAp so I can verify this bug.
Thank you

Comment 7 Orel Misan 2022-12-15 15:25:01 UTC

In this specific scenario, the problem was that the checkup Job couldn't start because of the `runAsUser` definition.
Because the Job couldn't start, the ConfigMap and NetworkAttanchemtDefinitions were never read, so their content matters less with regard to reproducing this bug.

```yaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: bridge-network
spec:
  config: |
    {
      "cniVersion":"0.3.1",
      "name": "br10",
      "plugins": [
          {
              "type": "cnv-bridge",
              "bridge": "br10"
          }
      ]
    }
```


```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubevirt-vm-latency-checkup-config
data:
  spec.timeout: 5m
  spec.param.network_attachment_definition_namespace: <target_namespace>
  spec.param.network_attachment_definition_name: <nad_name>"
  spec.param.max_desired_latency_milliseconds: "10"
  spec.param.sample_duration_seconds: "5"
```

The checkup Job now looks like this (without `runAsUser`):

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: kubevirt-vm-latency-checkup
spec:
  backoffLimit: 0
  template:
    spec:
      serviceAccountName: vm-latency-checkup-sa
      restartPolicy: Never
      containers:
        - name: vm-latency-checkup
          image: registry.redhat.io/container-native-virtualization/vm-network-latency-checkup:v4.12.0
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
            runAsNonRoot: true
            seccompProfile:
              type: "RuntimeDefault"
          env:
            - name: CONFIGMAP_NAMESPACE
              value: <target_namespace>
            - name: CONFIGMAP_NAME
              value: kubevirt-vm-latency-checkup-config
```

Comment 8 Yossi Segev 2022-12-21 11:22:48 UTC

Verified by using these versions:
CNV 4.12.0
vm-network-latency-checkup:v4.12.0-8


Verified by running the following scenario:
1. Create a new namespace and change the context to it:
$ oc create ns yoss-ns
namespace/yoss-ns created
$
$ oc project yoss-ns 
Now using project "yoss-ns" on server "https://api.net-ys-412-2.cnv-qe.rhcloud.com:6443".

2. Apply the following NetworkAttachmentDefinition:
$ cat << EOF | oc apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: bridge-network
spec:
  config: |
    {
      "cniVersion":"0.3.1",
      "name": "br10",
      "plugins": [
          {
              "type": "cnv-bridge",
              "bridge": "br10"
          }
      ]
    }
EOF

3. Apply the following ConfigMap:
$ cat << EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubevirt-vm-latency-checkup-config
data:
  spec.timeout: 5m
  spec.param.network_attachment_definition_namespace: "yoss-ns"
  spec.param.network_attachment_definition_name: "bridge-network"
  spec.param.max_desired_latency_milliseconds: "10"
  spec.param.sample_duration_seconds: "5"
EOF

4. Apply the foolwing ServiceAccounts, Role and RoleBinding:
$ cat << EOF | oc apply -f -
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vm-latency-checkup-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kubevirt-vm-latency-checker
rules:
- apiGroups: ["kubevirt.io"]
  resources: ["virtualmachineinstances"]
  verbs: ["get", "create", "delete"]
- apiGroups: ["subresources.kubevirt.io"]
  resources: ["virtualmachineinstances/console"]
  verbs: ["get"]
- apiGroups: ["k8s.cni.cncf.io"]
  resources: ["network-attachment-definitions"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubevirt-vm-latency-checker
subjects:
- kind: ServiceAccount
  name: vm-latency-checkup-sa
roleRef:
  kind: Role
  name: kubevirt-vm-latency-checker
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kiagnose-configmap-access
rules:
- apiGroups: [ "" ]
  resources: [ "configmaps" ]
  verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kiagnose-configmap-access
subjects:
- kind: ServiceAccount
  name: vm-latency-checkup-sa
roleRef:
  kind: Role
  name: kiagnose-configmap-access
  apiGroup: rbac.authorization.k8s.io

5. Apply the following latency Job:
$ cat << EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: kubevirt-vm-latency-checkup
spec:
  backoffLimit: 0
  template:
    spec:
      serviceAccountName: vm-latency-checkup-sa
      restartPolicy: Never
      containers:
        - name: vm-latency-checkup
#          image: registry.redhat.io/container-native-virtualization/vm-network-latency-checkup:v4.12.0
          image: brew.registry.redhat.io/rh-osbs/container-native-virtualization-vm-network-latency-checkup:v4.12.0
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
            runAsNonRoot: true
            seccompProfile:
              type: "RuntimeDefault"
          env:
            - name: CONFIGMAP_NAMESPACE
              value: yoss-ns
            - name: CONFIGMAP_NAME
              value: kubevirt-vm-latency-checkup-config

EOF


The job and its pod run successfully:
$ oc get job
NAME                          COMPLETIONS   DURATION   AGE
kubevirt-vm-latency-checkup   0/1           56s        56s
$
$ oc get pod
NAME                                       READY   STATUS    RESTARTS   AGE
kubevirt-vm-latency-checkup-lxdwc          1/1     Running   0          61s
virt-launcher-latency-check-source-4d4sk   2/2     Running   0          58s
virt-launcher-latency-check-target-8tch5   2/2     Running   0          58s

Comment 11 errata-xmlrpc 2023-01-24 13:42:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408

Note You need to log in before you can comment on or make changes to this bug.