Bug 2097436

Summary:	Online disk expansion ignores filesystem overhead change
Product:	Container Native Virtualization (CNV)	Reporter:	Kevin Alon Goldblatt <kgoldbla>
Component:	Storage	Assignee:	Álvaro Romero <alromero>
Status:	CLOSED ERRATA	QA Contact:	Kevin Alon Goldblatt <kgoldbla>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.11.0	CC:	mrashish, yadu
Target Milestone:	---
Target Release:	4.12.0
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	CNV v4.12.0-260	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-01-24 13:36:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kevin Alon Goldblatt 2022-06-15 16:49:21 UTC

Description of problem:
After tuning the filesystem overhead in the HCO cr to 20%, pvc created as part of a vm with template clone is created correctly including the overhead. However the online disk expansion ignores the change and uses the default filesystem overhead of 5.5%

Version-Release number of selected component (if applicable):
The error ocurred using the following code:
--------------------------------------------------------
oc version
Client Version: 4.11.0-202206090038.p0.g194e99e.assembly.stream-194e99e
Kustomize Version: v4.5.4
Server Version: 4.11.0-fc.0
Kubernetes Version: v1.24.0+beaaed6
[cnv-qe-jenkins@stg10-kevin-6v8qf-executor ~]$ oc get csv -n openshift-cnv
NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.11.0   OpenShift Virtualization   4.11.0    kubevirt-hyperconverged-operator.v4.10.1   Succeeded

How reproducible:
100%

Steps to Reproduce:
1. Edit the hco cr 'oc edit hco -n  openshift-cnv' and change the filesystem overhead to 20%:
filesystemOverhead:
    storageClass:
      nfs: "0.2"
2. Create a vm with the yaml below requesting a volume size of 2G 
3. Check the storage request with 'oc get pvc cirros-dv4 -oyaml':
resources:
    requests:
      storage: "2684354560"
  storageClassName: nfs

The pvc size was correctly created to include the filesystem overhead
4. Check the online expansion requested by the vmi 'oc get vmi vm-cirros-datavolume4 -oyaml':
filesystemOverhead: "0.055" >>>>>>>THE DEFAULT FILESYSTEM OVERHEAD WAS REQUESTED
      requests:
        storage: "2684354560"
      volumeMode: Filesystem

Actual results:
The default filesystem overhead of 0.055(5.5%) was requested ignoring the updated value of 20%

Expected results:
The updated filesystem overhead of 20% should have been used in the online expansion

Additional info:
HCO----------------------
oc edit hco -n  openshift-cnv
filesystemOverhead:
    storageClass:
      nfs: "0.2"

PVC---------------------
oc get pvc cirros-dv4 -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/storage.condition.running: "false"
    cdi.kubevirt.io/storage.condition.running.message: Import Complete
    cdi.kubevirt.io/storage.condition.running.reason: Completed
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.import.endpoint: http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/cirros-images/cirros-0.5.1-x86_64-disk.img
    cdi.kubevirt.io/storage.import.importPodName: importer-cirros-dv4
    cdi.kubevirt.io/storage.import.source: http
    cdi.kubevirt.io/storage.pod.phase: Succeeded
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2022-06-15T15:45:00Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.11.0
  name: cirros-dv4
  namespace: default
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: cirros-dv4
    uid: c2f5ea59-ff9d-426b-a271-b49a007168d9
  resourceVersion: "4188763"
  uid: 644a9c0b-616d-4450-ad5b-7c399c9bd37c
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: "2684354560"
  storageClassName: nfs
  volumeMode: Filesystem
  volumeName: nfs-pv-08
status:
  accessModes:
  - ReadWriteMany
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  phase: Bound


VMI--------------------------------
oc get vmi vm-cirros-datavolume4 -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1alpha3
  creationTimestamp: "2022-06-15T15:56:49Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  - foregroundDeleteVirtualMachine
  generation: 8
  labels:
    kubevirt.io/nodeName: stg10-kevin-6v8qf-worker-0-dkg2q
    kubevirt.io/vm: vm-cirros-datavolume4
  name: vm-cirros-datavolume4
  namespace: default
  ownerReferences:
  - apiVersion: kubevirt.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: VirtualMachine
    name: vm-cirros-datavolume4
    uid: c7f5f992-590e-42bd-b234-0aaa95e414f8
  resourceVersion: "4203188"
  uid: 2ea9cb38-fff6-4ec4-ac95-4b17e2301319
spec:
  domain:
    cpu:
      cores: 1
      model: host-model
      sockets: 1
      threads: 1
    devices:
      disks:
      - disk:
          bus: virtio
        name: datavolumedisk4
      interfaces:
      - masquerade: {}
        name: default
    features:
      acpi:
        enabled: true
    firmware:
      uuid: 5e29c93e-7ab4-5c76-b86a-7a8b58f279af
    machine:
      type: pc-q35-rhel8.4.0
    resources:
      requests:
        memory: 128Mi
  networks:
  - name: default
    pod: {}
  terminationGracePeriodSeconds: 0
  volumes:
  - dataVolume:
      name: cirros-dv4
    name: datavolumedisk4
status:
  activePods:
    a82f1f4c-f0d3-4722-a7af-8f42a0b0b534: stg10-kevin-6v8qf-worker-0-dkg2q
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-15T15:56:55Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC cirros-dv4 is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  guestOSInfo: {}
  interfaces:
  - infoSource: domain
    ipAddress: 10.128.2.62
    ipAddresses:
    - 10.128.2.62
    mac: 52:54:00:38:d1:3f
    name: default
  launcherContainerImageVersion: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:a2e887eb37fc7573a4aaba855f1d6ba64aa6c14f8a2c01b1e8bfd51526c51e99
  migrationMethod: BlockMigration
  migrationTransport: Unix
  nodeName: stg10-kevin-6v8qf-worker-0-dkg2q
  phase: Running
  phaseTransitionTimestamps:
  - phase: Pending
    phaseTransitionTimestamp: "2022-06-15T15:56:49Z"
  - phase: Scheduling
    phaseTransitionTimestamp: "2022-06-15T15:56:50Z"
  - phase: Scheduled
    phaseTransitionTimestamp: "2022-06-15T15:56:56Z"
  - phase: Running
    phaseTransitionTimestamp: "2022-06-15T15:57:03Z"
  qosClass: Burstable
  runtimeUser: 107
  virtualMachineRevisionName: revision-start-vm-c7f5f992-590e-42bd-b234-0aaa95e414f8-2
  volumeStatus:
  - name: datavolumedisk4
    persistentVolumeClaimInfo:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 5Gi
      filesystemOverhead: "0.055"
      requests:
        storage: "2684354560"
      volumeMode: Filesystem
    target: vda

Comment 2 Fuhui Yang 2022-08-31 06:35:15 UTC

Test on the following version:

[cloud-user@ocp-psi-executor ~]$ oc version
Client Version: 4.12.0-ec.1
Kustomize Version: v4.5.4
Server Version: 4.12.0-ec.1
Kubernetes Version: v1.24.0+a9d6306

[cloud-user@ocp-psi-executor ~]$ oc get csv -n openshift-cnv
NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.12.0   OpenShift Virtualization   4.12.0    kubevirt-hyperconverged-operator.v4.10.5   Succeeded

Steps to Reproduce:

1. Edit the HCO cr to change the filesystem overhead to 20%:
[cloud-user@ocp-psi-executor ~]$ oc describe hco -n openshift-cnv
...
Spec:
  Filesystem Overhead:
    Storage Class:
      Nfs:  0.2<<<<<<<<<<

2. Create a VM requesting a volume size of 2G:
[cloud-user@ocp-psi-executor ~]$ oc get vm
NAME                    AGE   STATUS    READY
vm-cirros-datavolume1   37m   Running   True

3. Check the storage request:
[cloud-user@ocp-psi-executor ~]$ oc get pvc cirros-dv -oyaml
...
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100M
  storageClassName: nfs
  volumeMode: Filesystem

4. Check the online expansion requested by the vmi:
[cloud-user@ocp-psi-executor ~]$ oc get vmi vm-cirros-datavolume1 -oyaml
...
  volumeStatus:
  - name: datavolumevolume
    persistentVolumeClaimInfo:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 5Gi
      filesystemOverhead: "0.2"<<<<<<<<<<<<<<<<
      requests:
        storage: 100M
      volumeMode: Filesystem

So we get the expected result that the updated filesystem overhead of 20% have been used in the online expansion.

Comment 3 Fuhui Yang 2022-08-31 06:59:23 UTC

The version is 4.12.0-425.

Comment 4 Kevin Alon Goldblatt 2022-09-11 10:27:15 UTC

Verified with the following code:
-----------------------------------------
oc version
Client Version: 4.8.0-fc.2
Server Version: 4.12.0-ec.1
Kubernetes Version: v1.24.0+a9d6306

oc get csv -n openshift-cnv
NAME                                       DISPLAY                    VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.12.0   OpenShift Virtualization   4.12.0    kubevirt-hyperconverged-operator.v4.10.5   Succeeded
volsync-product.v0.5.0                     VolSync                    0.5.0                                                Succeeded

Deployed: OCP-4.12.0-ec.1
Deployed: CNV-v4.12.0-450


Verified with the following scenario:
-----------------------------------------
1. Edit the hco cr 'oc edit hco -n  openshift-cnv' and change the filesystem overhead to 20%:
filesystemOverhead:
    storageClass:
      nfs: "0.2"

2.See that nfs filesystem overhead was updted to 0.2:
oc get cdiconfig -o jsonpath='{.items..status.filesystemOverhead}'
{"global":"0.055","storageClass":{"csi-manila-ceph":"0.055","hostpath-csi-basic":"0.055","hostpath-csi-pvc-block":"0.055","local-block-hpp":"0.055","local-block-ocs":"0.055","nfs":"0.2","ocs-storagecluster-ceph-rbd":"0.055","ocs-storagecluster-ceph-rgw":"0.055","standard-csi":"0.055"}}

3. Create a vm with the yaml below requesting a volume size of 2G 

4. Check the storage request with 'oc get pvc cirros-dv4 -oyaml':
resources:
    requests:
      storage: "2684354560"
  storageClassName: nfs

The pvc size was correctly created to include the filesystem overhead

5. Check the online expansion requested by the vmi 'oc get vmi vm-cirros-datavolume4 -oyaml':
  volumeStatus:
  - name: datavolumedisk1
    persistentVolumeClaimInfo:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 5Gi
      filesystemOverhead: "0.2" >>>>>> THE CORRECT UPDATED EXPANSION WAS USED
      requests:
        storage: "2684354560"
      volumeMode: Filesystem
    target: vda


Actual results:
The updated filesystem overhead of 0.2(20%) was requested.

Expected results:
The correct updated nfs filesystem overhead was used.

6. Accessed the vm and verified the correct requested size is displayed using lsblk:
$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda     252:0    0   2G  0 disk 
|-vda1  252:1    0   2G  0 part /
`-vda15 252:15   0   8M  0 part 



Moving this to VERIFIED!


Additional info:
vm-yaml:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-cirros-datavolume
  name: vm-cirros-datavolume
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: cirros-dv
    spec:
      storage:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Gi
        storageClassName: nfs
      source:
        http:
          url: http://xxx.xxx.xxx.com/files/cnv-tests/cirros-images/cirros-0.5.1-x86_64-disk.img
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros-datavolume
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolumedisk1
        resources:
          requests:
            memory: 128Mi
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: cirros-dv
        name: datavolumedisk1

Comment 7 errata-xmlrpc 2023-01-24 13:36:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408