Bug 2250659

Summary: Validate Disk Failover Test fails during Windows Shared Cluster validation
Product: Container Native Virtualization (CNV) Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: StorageAssignee: Adam Litke <alitke>
Status: CLOSED MIGRATED QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.14.1CC: mhenriks
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-14 16:04:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Validate Disk Failover test - 1 none

Description Kevin Alon Goldblatt 2023-11-20 12:20:55 UTC
Created attachment 2000491 [details]
Validate Disk Failover test - 1

Description of problem:
Validate Disk Failover Test fails during Windows Shared Cluster validation when attempting to write file data to partition table entry claiming the disk structure is corrupted or unreadable.

Version-Release number of selected component (if applicable):
oc version
Client Version: 4.14.0-rc.2
Kustomize Version: v5.0.1
Server Version: 4.14.0-rc.2
Kubernetes Version: v1.27.6+1648878
[cloud-user@ocp-psi-executor ~]$ oc get csv -n openshift-cnv
NAME                                       DISPLAY                       VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.14.1   OpenShift Virtualization      4.14.1    kubevirt-hyperconverged-operator.v4.14.0   Succeeded
openshift-pipelines-operator-rh.v1.11.0    Red Hat OpenShift Pipelines   1.11.0                                               Succeeded


How reproducible:
100%

Steps to Reproduce:
1. Map an iscsi LUN from the Netapp Storage Provider to 3 worker nodes
2. Create a PV referencing the iscsi LUN and detailing the 3 worker nodes 
3. Create a shared PVC based on the iscsi LUN with volumeMode block and accessMode ReadWriteMany
4. Create 3 Windows12 R2 vms each with their own OS disk and referencing the 2nd shared iscsi LUN based PVC
5. Install the virtio-win-guest-tools to update the drivers on each of the VMs
6. Via Disk Manager access the iscsi LUN, create a partition on all available space and install NTFS
7. Install Windows Shared Cluster software on each of the 3 Windows VMS
8. Install and configure Active Directory/DNS Domain Controller on one of the VMs
9. Run the Failover Cluster - Validate Configuration tool and select to run only the Storage - Validate Disk Failover Test
10. The Validate Disk Failover Test fails when attempting to write file data to the partition table entry claiming the disk structure is corrupted or unreadable.


Actual results:
The Validate Disk Failover Test fails when attempting to write file data to the partition table entry claiming the disk structure is corrupted or unreadable.

Expected results:
The Validate Disk Failover Test should succeed when attempting to write file data to the partition table entry.
Additional info:

Storage class yaml:
------------------------------------------
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-scsi
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate


PV Yaml:
------------------------------------------
apiVersion: v1
kind: PersistentVolume
metadata:
  name: iscsi-pv-root
spec:
  capacity:
    storage: 70Gi
  accessModes:
    - ReadWriteMany
  storageClassName: local-scsi
  iscsi:
     targetPortal: 10.9.96.31:3260
     iqn: iqn.1992-08.com.netapp:sn.438c2b596a3811e894b800a098da27d5:vs.4
     lun: 0
  volumeMode: Block
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - stg03-kevin-zrzbv-worker-0-2c2q7
          - stg03-kevin-zrzbv-worker-0-9wssf
          - stg03-kevin-zrzbv-worker-0-zwwg9



Shared PVC Yaml:
---------------------------------------------------------
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scsi-pvc
spec:
  volumeMode: Block
  storageClassName: local-scsi 
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 70G



VM1 Yaml:
----------------------------------------------------------
---
oc get vm vm-win12-datavolume -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubemacpool.io/transaction-timestamp: "2023-11-19T11:43:28.558198691Z"
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1
  creationTimestamp: "2023-10-30T11:50:59Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  generation: 56
  labels:
    kubevirt.io/vm: vm-win12-datavolume
  name: vm-win12-datavolume
  namespace: default
  resourceVersion: "91170595"
  uid: 0da144e8-42bc-4e94-bc20-fa453cba028e
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: win12-dv
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 60Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      source:
        http:
          url: http://10.19.3.125/pub/users/joherr/os/bootsource_images/win2012r2.qcow2.gz
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-win12-datavolume
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
          - disk:
              bus: sata
            name: datavolumedisk1
          - errorPolicy: report
            lun:
              bus: scsi
              reservation: true
            name: scsi-disk
          - cdrom:
              bus: sata
            name: windows-drivers-disk
          interfaces:
          - bridge: {}
            macAddress: 02:7e:7e:00:00:05
            model: e1000e
            name: nic-conservation-lungfish
        machine:
          type: pc-q35-rhel9.2.0
        resources:
          requests:
            memory: 10Gi
      networks:
      - multus:
          networkName: l2-cluster-net
        name: nic-conservation-lungfish
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: win12-dv
        name: datavolumedisk1
      - name: scsi-disk
        persistentVolumeClaim:
          claimName: scsi-pvc
      - containerDisk:
          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:3a83562ffa9e9c3438eecd7b0833e303778691d8e17ddb41f09fce598ed9a01b
        name: windows-drivers-disk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-20T12:07:24Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC win12-dv is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  - lastProbeTime: "2023-11-20T12:08:33Z"
    lastTransitionTime: null
    status: "True"
    type: AgentConnected
  created: true
  desiredGeneration: 56
  observedGeneration: 56
  printableStatus: Running
  ready: true
  volumeSnapshotStatuses:
  - enabled: true
    name: datavolumedisk1
  - enabled: false
    name: scsi-disk
    reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
      StorageClass [local-scsi] [scsi-disk]'
  - enabled: false
    name: windows-drivers-disk
    reason: Snapshot is not supported for this volumeSource type [windows-drivers-disk]


VM2 Yaml:
------------------------------------------------------------
---
oc get vm vm-win12-datavolume-b -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubemacpool.io/transaction-timestamp: "2023-11-19T11:43:49.226565165Z"
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1
  creationTimestamp: "2023-10-30T16:14:32Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  generation: 70
  labels:
    kubevirt.io/vm: vm-win12-datavolume-b
  name: vm-win12-datavolume-b
  namespace: default
  resourceVersion: "91141914"
  uid: 81f583e1-5285-47e3-bcf1-1acf50ad033d
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: win12-dv-b
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 60Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      source:
        http:
          url: http://10.19.3.125/pub/users/joherr/os/bootsource_images/win2012r2.qcow2.gz
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-win12-datavolume-b
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
          - disk:
              bus: sata
            name: datavolumedisk1-b
          - errorPolicy: report
            lun:
              bus: scsi
              reservation: true
            name: scsi-disk
          - cdrom:
              bus: sata
            name: windows-drivers-disk
          interfaces:
          - bridge: {}
            macAddress: 02:7e:7e:00:00:03
            model: e1000e
            name: nic-contemporary-chinchilla
        machine:
          type: pc-q35-rhel9.2.0
        resources:
          requests:
            memory: 8Gi
      networks:
      - multus:
          networkName: l2-cluster-net
        name: nic-contemporary-chinchilla
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: win12-dv-b
        name: datavolumedisk1-b
      - name: scsi-disk
        persistentVolumeClaim:
          claimName: scsi-pvc
      - containerDisk:
          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:3a83562ffa9e9c3438eecd7b0833e303778691d8e17ddb41f09fce598ed9a01b
        name: windows-drivers-disk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-20T11:43:03Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC win12-dv-b is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  - lastProbeTime: "2023-11-20T11:44:00Z"
    lastTransitionTime: null
    status: "True"
    type: AgentConnected
  created: true
  desiredGeneration: 70
  observedGeneration: 70
  printableStatus: Running
  ready: true
  volumeSnapshotStatuses:
  - enabled: true
    name: datavolumedisk1-b
  - enabled: false
    name: scsi-disk
    reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
      StorageClass [local-scsi] [scsi-disk]'
  - enabled: false
    name: windows-drivers-disk
    reason: Snapshot is not supported for this volumeSource type [windows-drivers-disk]



VM3 Yaml:
---------------------------------------------------------
---
oc get vm vm-win12-datavolume-c -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubemacpool.io/transaction-timestamp: "2023-11-19T11:43:58.073946416Z"
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1
  creationTimestamp: "2023-10-30T18:14:40Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  generation: 50
  labels:
    kubevirt.io/vm: vm-win12-datavolume-c
  name: vm-win12-datavolume-c
  namespace: default
  resourceVersion: "91139114"
  uid: b1e0d451-9e13-47e9-9bf2-0d7c904229f8
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: win12-dv-c
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 60Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      source:
        http:
          url: http://10.19.3.125/pub/users/joherr/os/bootsource_images/win2012r2.qcow2.gz
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-win12-datavolume-c
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
          - disk:
              bus: sata
            name: datavolumedisk1-c
          - errorPolicy: report
            lun:
              bus: scsi
              reservation: true
            name: scsi-disk
          - cdrom:
              bus: sata
            name: windows-drivers-disk
          interfaces:
          - bridge: {}
            macAddress: 02:7e:7e:00:00:06
            model: e1000e
            name: nic-willing-jackal
        machine:
          type: pc-q35-rhel9.2.0
        resources:
          requests:
            memory: 7Gi
      networks:
      - multus:
          networkName: l2-cluster-net
        name: nic-willing-jackal
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: win12-dv-c
        name: datavolumedisk1-c
      - name: scsi-disk
        persistentVolumeClaim:
          claimName: scsi-pvc
      - containerDisk:
          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:3a83562ffa9e9c3438eecd7b0833e303778691d8e17ddb41f09fce598ed9a01b
        name: windows-drivers-disk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-20T11:40:52Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC win12-dv-c is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  - lastProbeTime: "2023-11-20T11:41:38Z"
    lastTransitionTime: null
    status: "True"
    type: AgentConnected
  created: true
  desiredGeneration: 50
  observedGeneration: 50
  printableStatus: Running
  ready: true
  volumeSnapshotStatuses:
  - enabled: true
    name: datavolumedisk1-c
  - enabled: false
    name: scsi-disk
    reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
      StorageClass [local-scsi] [scsi-disk]'
  - enabled: false
    name: windows-drivers-disk
    reason: Snapshot is not supported for this volumeSource type [windows-drivers-disk]

Comment 4 Adam Litke 2023-11-28 20:47:46 UTC
Kevin.  Is this a duplicate of Bug 2249554 ?

Comment 5 Kevin Alon Goldblatt 2023-12-03 22:33:10 UTC
(In reply to Adam Litke from comment #4)
> Kevin.  Is this a duplicate of Bug 2249554 ?

No it is not a duplicate. Here in step 10 the The Validate Disk Failover Test fails due to a disk error. This was resolved after doing a disk check/fix on the scsi disk.
After cleaning the disk I was not able to reproduce this error again.
In bug Bug 2249554 the validation failed due in the scsi3 reservation test and the vm crashes due to the error policy.