Bug 1991460

Summary: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment'
Product: Container Native Virtualization (CNV) Reporter: tmicheli
Component: StorageAssignee: Adam Litke <alitke>
Status: CLOSED ERRATA QA Contact: dalia <dafrank>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.8.0CC: ailan, alitke, cnv-qe-bugs, eterrell, kgershon, ribarry, sgott, yadu
Target Milestone: ---   
Target Release: 4.8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.8.2-12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1994737 (view as bug list) Environment:
Last Closed: 2021-09-21 11:06:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1994737    

Description tmicheli 2021-08-09 08:14:40 UTC
Description of problem:
The creation of a virtual machine fails while setting it up with an error message like 

~~~
"component":"virt-handler","kind":"","level":"error","msg":"Synchronizing the VirtualMachineInstance failed.","name":"rhel8-unwilling-hamster","namespace":"openshift-cnv","pos":"vm.go:1538","reason":"server error. command SyncVMI failed: \"LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2021-07-15T21:00:48.315305Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment')
~~~

If cloud-init is disabled for the virtual machines it works. The customer is using block-storage with a blocksize of 4k.

This bug is related to: https://bugzilla.redhat.com/show_bug.cgi?id=1976730.

As discussed in the case I open this bug-report.

If cloud-init is disabled the provisioning of the vm works.


How reproducible:
In every cluster using block-storage with 4k block size.

Steps to Reproduce:
1. Install OCP cluster
2. Set up storage provider with 4k block-size
3. deploy virtual machine with cloud-init enabled

Actual results:
* Provisioning of the virtual machine fails

Expected results:
* Provisioning succeeds

Additional info:

Comment 1 Kobig 2021-08-16 09:10:30 UTC
Hi, 

Do we know to which version the fix for this bug is aimed for? and if its not for 4.7 can we please backport it? 

Thank you

Comment 5 Yan Du 2021-09-16 06:07:24 UTC
Test on CNV-v4.8.2-17, issue have been fixed.

Steps:
1. Connect to a host on the cluster:
oc debug node/yadu48-7bthd-worker-0-t5svz-debug

sh-4.4# chroot /host/

2. Prepare 4K aligned fake block storage
fallocate -l 10GiB /var/iscsi
losetup -f -b 4096 /var/iscsi 
losetup -l 
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                                        DIO LOG-SEC
/dev/loop1
             0      0         0  0 /var/iscsi                                         0    4096

3. Install the filesystem and mount the device:
mkfs.ext4 -b 4096 /var/iscsi
mkdir  /var/hpvolumes/4k  
mount /dev/loop0 /var/hpvolumes/4k/  

4. Label the device:
sh-4.4# ls -ltrZ /var/hpvolumes/4k
total 16
drwx------. 2 root root system_u:object_r:unlabeled_t:s0 16384 Sep 16 02:17 lost+found
sh-4.4# sudo chcon -t container_file_t -R /var/hpvolumes/4k
sh-4.4# ls -ltrZ /var/hpvolumes/4k
total 16
drwx------. 2 root root system_u:object_r:container_file_t:s0 16384 Sep 16 02:17 lost+found

5. Create the sc
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test-sc
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

6. Create the pv
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test-sc
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
[cnv-qe-jenkins@yadu48-7bthd-executor duyan]$ cat pv
apiVersion: v1
kind: PersistentVolume
metadata:
  name: test-pv-volume
  labels:
    type: local
spec:
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - yadu48-7bthd-worker-0-t5svz
  storageClassName: test-sc
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/var/hpvolumes/4k"

7. Create the vm with cloudinit defined
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-fedora
  name: vm-fedora
spec:
  dataVolumeTemplates:
  - metadata:
      name: vm-fedora
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 8Gi
        storageClassName: test-sc
      source:
        http:
          url: "http://xxx/files/cnv-tests/fedora-images/Fedora-Cloud-Base-34-1.2.x86_64.qcow2"
  running: True
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-datavolume
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolumevolume
          - disk:
              bus: virtio
            name: cloudinitdisk
        machine:
          type: ""
        resources:
          requests:
            memory: 1024Mi
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: vm-fedora
        name: datavolumevolume
      - cloudInitNoCloud:
          userData: |-
            #cloud-config
            password: fedora
            chpasswd: { expire: False }
        name: cloudinitdisk

8. Check the vm and ssh the vm by console 
$ oc get vmi
NAMESPACE                            NAME        AGE   PHASE     IP             NODENAME
openshift-virtualization-os-images   vm-fedora   9m   Running   10.129.3.228   yadu48-7bthd-worker-0-t5svz
$ oc get pod -n openshift-virtualization-os-images
NAME                                READY   STATUS    RESTARTS   AGE
virt-launcher-vm-fedora-bk7hg       1/1     Running   0          10m
$ virtctl console vm-fedora
Successfully connected to vm-fedora console. The escape sequence is ^]

vm-fedora login: fedora
Password: 
Last login: Thu Sep 16 05:54:43 on ttyS0
[fedora@vm-fedora ~]$ 

9. Check the log in handler-pod, no Synchronizing the VirtualMachineInstance failed error in the log

Comment 10 errata-xmlrpc 2021-09-21 11:06:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.2 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3598

Comment 11 Red Hat Bugzilla 2023-09-15 01:13:24 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days