2048275 – HPP mounter deployment crashes on parsing lsblk output

Bug 2048275 - HPP mounter deployment crashes on parsing lsblk output

Summary: HPP mounter deployment crashes on parsing lsblk output

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Alex Kalenyuk
QA Contact:	Jenia Peimer
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-30 18:45 UTC by Alex Kalenyuk
Modified:	2022-03-16 16:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:	hostpath-provisioner-rhel8-operator v4.10.0-61, CNV v4.10.0-643
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-16 16:06:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt hostpath-provisioner-operator pull 218	None	Merged	Drop some fields from lsblk output so we don't fail on older lsblk	2022-02-01 10:36:28 UTC
Github	kubevirt hostpath-provisioner-operator pull 219	None	Merged	[release-v0.12] Drop some fields from lsblk output so we don't fail on older lsblk	2022-02-01 10:36:28 UTC
Red Hat Product Errata	RHSA-2022:0947	None	None	None	2022-03-16 16:07:04 UTC

Description Alex Kalenyuk 2022-01-30 18:45:57 UTC

Description of problem:
HPP mounter deployment crashes on parsing lsblk output (depending on lsblk version)


Version-Release number of selected component (if applicable):
CNV 4.10.0

How reproducible:
100%


Steps to Reproduce:
1. Install HPP, use a backing PVCTemplate pool of volumeMode: Block in the HPP CR

Actual results:
Mounter deployment failing

Expected results:
Success

Additional info:

The boolean fields in newer versions (rm, ro) are outputted as true/false in --json:
# lsblk /dev/data -J
{
   "blockdevices": [
      {"name":"rbd0", "maj:min":"251:0", "rm":false, "size":"40G", "ro":false, "type":"disk", "mountpoint":"/host/var/hpvolumes/csi"}
   ]
}
bash-5.1# lsblk --version
lsblk from util-linux 2.36.2


In older lsblk versions this is not the case:
# lsblk /dev/data -J
{
   "blockdevices": [
      {"name": "vdb", "maj:min": "252:16", "rm": "0", "size": "120G", "ro": "0", "type": "disk", "mountpoint": null}
   ]
}
[root@hpp-pool-sno-test-infra-cluster-621da615-master-0-7d9ccbfctx8mn /]# lsblk --version
lsblk from util-linux 2.32.1

[cnv-qe-jenkins@psi-hitchhiker-w8k9d-executor ~]$ oc logs -n openshift-cnv -f hpp-pool-ceph-backed-pool-psi-hitchhiker-w8k9d-worker-0-8fqsc57
{"level":"info","ts":1643546741.2130804,"logger":"mounter","msg":"Go Version: go1.16.6"}
{"level":"info","ts":1643546741.2131374,"logger":"mounter","msg":"Go OS/Arch: linux/amd64"}
panic: json: cannot unmarshal string into Go struct field DeviceInfo.blockdevices.rm of type bool

goroutine 1 [running]:
main.mountBlockVolume(0x7ffcb710a1f3, 0x9, 0x7ffcb710a209, 0x1a, 0x7ffcb710a22f, 0x5)
	/remote-source/app/cmd/mounter/main.go:226 +0x6ad
main.main()
	/remote-source/app/cmd/mounter/main.go:163 +0x434

This started occurring since we switched to using a UBI built downstream image for the mounted deployment, and thus a different lsblk version

[cnv-qe-jenkins@psi-hitchhiker-w8k9d-executor debug-tier1-cdi]$ oc get hostpathprovisioner hostpath-provisioner -o yaml
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  creationTimestamp: "2022-01-30T12:31:27Z"
  finalizers:
  - finalizer.delete.hostpath-provisioner
  generation: 78
  name: hostpath-provisioner
  resourceVersion: "417784"
  uid: 8b324ef7-f00a-4d44-ae45-83628aa7de0e
spec:
  imagePullPolicy: IfNotPresent
  storagePools:
  - name: ceph-backed-pool
    path: /var/hppcephbackedpool
    pvcTemplate:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: ocs-storagecluster-ceph-rbd
      volumeMode: Block
  workload: {}

Comment 1 Adam Litke 2022-01-31 13:30:32 UTC

Peter, this affects our ability to use HPP downstream due to different available versions of the lsblk utility.  Please approve.

Comment 2 Jenia Peimer 2022-02-01 18:04:38 UTC

Verified on CNV v4.10.0-643, hostpath-provisioner-operator v4.10.0-61

Installed HPP CR that uses backing pvcTemplate pool of volumeMode: Block

$ oc get pods -n openshift-cnv | grep hpp
hpp-pool-1ce22a1b-6698bdfd89-6fmw9     1/1     Running   0              38m
hpp-pool-1de0d017-5bd8588866-kvnsq     1/1     Running   0              38m
hpp-pool-c36842f7-7f4b595997-8xmj8     1/1     Running   0              38m

$ oc get deployments -n openshift-cnv | grep hpp
hpp-pool-1ce22a1b      1/1     1            1           39m
hpp-pool-1de0d017      1/1     1            1           39m
hpp-pool-c36842f7      1/1     1            1           39m

$ oc get pvc -n openshift-cnv | grep hpp-pool
hpp-pool-1ce22a1b   Bound    pvc-453c0278-b9b5-4239-9cb6-6c9d1130dc09   40Gi       RWO            ocs-storagecluster-ceph-rbd   39m
hpp-pool-1de0d017   Bound    pvc-2e772566-a7f2-439a-9c02-90498931f8ec   40Gi       RWO            ocs-storagecluster-ceph-rbd   39m
hpp-pool-c36842f7   Bound    pvc-21df2206-ac0a-4ec1-9d5a-b24f16d839e5   40Gi       RWO            ocs-storagecluster-ceph-rbd   39m

$ oc get pods -n openshift-cnv | grep hostpath
hostpath-provisioner-csi-bvsrz                                  4/4     Running   0              57m
hostpath-provisioner-csi-m6psb                                  4/4     Running   0              57m
hostpath-provisioner-csi-sr77g                                  4/4     Running   0              57m
hostpath-provisioner-operator-5869d68856-mxf2c                  1/1     Running   1 (54m ago)    130m

Created a VM:

$ oc get vmi -A
NAMESPACE   NAME        AGE   PHASE     IP            NODENAME                             READY
default     vm-cirros   67s   Running   ***********   c01-jp410-fr5-7zthb-worker-0-p9nr8   True

Checked that disk.img can be found in the path we gave in the CR's yaml:

$ oc debug node/c01-jp410-fr5-7zthb-worker-0-p9nr8
sh-4.4# chroot /host
sh-4.4# ls var/hpp-csi-pvc-template-ocs-block/csi/
pvc-8cd5768f-179d-49c3-ae1b-3cbabdedf12c
sh-4.4# 
sh-4.4# ls pvc-8cd5768f-179d-49c3-ae1b-3cbabdedf12c/
disk.img
sh-4.4# 

Yamls used:

$ cat hpp-ocs-block-cr.yaml 
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  name: hostpath-provisioner
spec:
  imagePullPolicy: IfNotPresent
  storagePools:
    - name: hpp-csi-pvc-template-ocs-block
      pvcTemplate:
        volumeMode: Block  # If omitted - FS is the default
        storageClassName: ocs-storagecluster-ceph-rbd
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 40Gi
      path: "/var/hpp-csi-pvc-template-ocs-block"
  workload:
    nodeSelector:
      kubernetes.io/os: linux

$ cat sc-hpp-ocs-block.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hostpath-csi-pvc-template-ocs-block
provisioner: kubevirt.io.hostpath-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
parameters:
 storagePool: hpp-csi-pvc-template-ocs-block

$ cat vm.yaml 
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  creationTimestamp: null
  labels:
    kubevirt.io/vm: vm-cirros
  name: vm-cirros
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: cirros-dv
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        storageClassName: hostpath-csi-pvc-template-ocs-block
      source:
        http:
          url: http://.../cirros-images/cirros-0.4.0-x86_64-disk.qcow2
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolume
        machine:
          type: ""
        resources:
          requests:
            memory: 100M
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: cirros-dv
        name: datavolume

Comment 7 errata-xmlrpc 2022-03-16 16:06:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947

Note You need to log in before you can comment on or make changes to this bug.