This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2250659 - Validate Disk Failover Test fails during Windows Shared Cluster validation
Summary: Validate Disk Failover Test fails during Windows Shared Cluster validation
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.14.1
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Adam Litke
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-11-20 12:20 UTC by Kevin Alon Goldblatt
Modified: 2023-12-14 16:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-14 16:04:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Validate Disk Failover test - 1 (71.42 KB, image/png)
2023-11-20 12:20 UTC, Kevin Alon Goldblatt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   CNV-35429 0 None None None 2023-12-14 16:04:41 UTC

Description Kevin Alon Goldblatt 2023-11-20 12:20:55 UTC
Created attachment 2000491 [details]
Validate Disk Failover test - 1

Description of problem:
Validate Disk Failover Test fails during Windows Shared Cluster validation when attempting to write file data to partition table entry claiming the disk structure is corrupted or unreadable.

Version-Release number of selected component (if applicable):
oc version
Client Version: 4.14.0-rc.2
Kustomize Version: v5.0.1
Server Version: 4.14.0-rc.2
Kubernetes Version: v1.27.6+1648878
[cloud-user@ocp-psi-executor ~]$ oc get csv -n openshift-cnv
NAME                                       DISPLAY                       VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.14.1   OpenShift Virtualization      4.14.1    kubevirt-hyperconverged-operator.v4.14.0   Succeeded
openshift-pipelines-operator-rh.v1.11.0    Red Hat OpenShift Pipelines   1.11.0                                               Succeeded


How reproducible:
100%

Steps to Reproduce:
1. Map an iscsi LUN from the Netapp Storage Provider to 3 worker nodes
2. Create a PV referencing the iscsi LUN and detailing the 3 worker nodes 
3. Create a shared PVC based on the iscsi LUN with volumeMode block and accessMode ReadWriteMany
4. Create 3 Windows12 R2 vms each with their own OS disk and referencing the 2nd shared iscsi LUN based PVC
5. Install the virtio-win-guest-tools to update the drivers on each of the VMs
6. Via Disk Manager access the iscsi LUN, create a partition on all available space and install NTFS
7. Install Windows Shared Cluster software on each of the 3 Windows VMS
8. Install and configure Active Directory/DNS Domain Controller on one of the VMs
9. Run the Failover Cluster - Validate Configuration tool and select to run only the Storage - Validate Disk Failover Test
10. The Validate Disk Failover Test fails when attempting to write file data to the partition table entry claiming the disk structure is corrupted or unreadable.


Actual results:
The Validate Disk Failover Test fails when attempting to write file data to the partition table entry claiming the disk structure is corrupted or unreadable.

Expected results:
The Validate Disk Failover Test should succeed when attempting to write file data to the partition table entry.
Additional info:

Storage class yaml:
------------------------------------------
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-scsi
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate


PV Yaml:
------------------------------------------
apiVersion: v1
kind: PersistentVolume
metadata:
  name: iscsi-pv-root
spec:
  capacity:
    storage: 70Gi
  accessModes:
    - ReadWriteMany
  storageClassName: local-scsi
  iscsi:
     targetPortal: 10.9.96.31:3260
     iqn: iqn.1992-08.com.netapp:sn.438c2b596a3811e894b800a098da27d5:vs.4
     lun: 0
  volumeMode: Block
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - stg03-kevin-zrzbv-worker-0-2c2q7
          - stg03-kevin-zrzbv-worker-0-9wssf
          - stg03-kevin-zrzbv-worker-0-zwwg9



Shared PVC Yaml:
---------------------------------------------------------
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scsi-pvc
spec:
  volumeMode: Block
  storageClassName: local-scsi 
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 70G



VM1 Yaml:
----------------------------------------------------------
---
oc get vm vm-win12-datavolume -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubemacpool.io/transaction-timestamp: "2023-11-19T11:43:28.558198691Z"
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1
  creationTimestamp: "2023-10-30T11:50:59Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  generation: 56
  labels:
    kubevirt.io/vm: vm-win12-datavolume
  name: vm-win12-datavolume
  namespace: default
  resourceVersion: "91170595"
  uid: 0da144e8-42bc-4e94-bc20-fa453cba028e
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: win12-dv
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 60Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      source:
        http:
          url: http://10.19.3.125/pub/users/joherr/os/bootsource_images/win2012r2.qcow2.gz
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-win12-datavolume
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
          - disk:
              bus: sata
            name: datavolumedisk1
          - errorPolicy: report
            lun:
              bus: scsi
              reservation: true
            name: scsi-disk
          - cdrom:
              bus: sata
            name: windows-drivers-disk
          interfaces:
          - bridge: {}
            macAddress: 02:7e:7e:00:00:05
            model: e1000e
            name: nic-conservation-lungfish
        machine:
          type: pc-q35-rhel9.2.0
        resources:
          requests:
            memory: 10Gi
      networks:
      - multus:
          networkName: l2-cluster-net
        name: nic-conservation-lungfish
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: win12-dv
        name: datavolumedisk1
      - name: scsi-disk
        persistentVolumeClaim:
          claimName: scsi-pvc
      - containerDisk:
          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:3a83562ffa9e9c3438eecd7b0833e303778691d8e17ddb41f09fce598ed9a01b
        name: windows-drivers-disk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-20T12:07:24Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC win12-dv is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  - lastProbeTime: "2023-11-20T12:08:33Z"
    lastTransitionTime: null
    status: "True"
    type: AgentConnected
  created: true
  desiredGeneration: 56
  observedGeneration: 56
  printableStatus: Running
  ready: true
  volumeSnapshotStatuses:
  - enabled: true
    name: datavolumedisk1
  - enabled: false
    name: scsi-disk
    reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
      StorageClass [local-scsi] [scsi-disk]'
  - enabled: false
    name: windows-drivers-disk
    reason: Snapshot is not supported for this volumeSource type [windows-drivers-disk]


VM2 Yaml:
------------------------------------------------------------
---
oc get vm vm-win12-datavolume-b -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubemacpool.io/transaction-timestamp: "2023-11-19T11:43:49.226565165Z"
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1
  creationTimestamp: "2023-10-30T16:14:32Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  generation: 70
  labels:
    kubevirt.io/vm: vm-win12-datavolume-b
  name: vm-win12-datavolume-b
  namespace: default
  resourceVersion: "91141914"
  uid: 81f583e1-5285-47e3-bcf1-1acf50ad033d
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: win12-dv-b
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 60Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      source:
        http:
          url: http://10.19.3.125/pub/users/joherr/os/bootsource_images/win2012r2.qcow2.gz
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-win12-datavolume-b
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
          - disk:
              bus: sata
            name: datavolumedisk1-b
          - errorPolicy: report
            lun:
              bus: scsi
              reservation: true
            name: scsi-disk
          - cdrom:
              bus: sata
            name: windows-drivers-disk
          interfaces:
          - bridge: {}
            macAddress: 02:7e:7e:00:00:03
            model: e1000e
            name: nic-contemporary-chinchilla
        machine:
          type: pc-q35-rhel9.2.0
        resources:
          requests:
            memory: 8Gi
      networks:
      - multus:
          networkName: l2-cluster-net
        name: nic-contemporary-chinchilla
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: win12-dv-b
        name: datavolumedisk1-b
      - name: scsi-disk
        persistentVolumeClaim:
          claimName: scsi-pvc
      - containerDisk:
          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:3a83562ffa9e9c3438eecd7b0833e303778691d8e17ddb41f09fce598ed9a01b
        name: windows-drivers-disk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-20T11:43:03Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC win12-dv-b is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  - lastProbeTime: "2023-11-20T11:44:00Z"
    lastTransitionTime: null
    status: "True"
    type: AgentConnected
  created: true
  desiredGeneration: 70
  observedGeneration: 70
  printableStatus: Running
  ready: true
  volumeSnapshotStatuses:
  - enabled: true
    name: datavolumedisk1-b
  - enabled: false
    name: scsi-disk
    reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
      StorageClass [local-scsi] [scsi-disk]'
  - enabled: false
    name: windows-drivers-disk
    reason: Snapshot is not supported for this volumeSource type [windows-drivers-disk]



VM3 Yaml:
---------------------------------------------------------
---
oc get vm vm-win12-datavolume-c -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubemacpool.io/transaction-timestamp: "2023-11-19T11:43:58.073946416Z"
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1
  creationTimestamp: "2023-10-30T18:14:40Z"
  finalizers:
  - kubevirt.io/virtualMachineControllerFinalize
  generation: 50
  labels:
    kubevirt.io/vm: vm-win12-datavolume-c
  name: vm-win12-datavolume-c
  namespace: default
  resourceVersion: "91139114"
  uid: b1e0d451-9e13-47e9-9bf2-0d7c904229f8
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: win12-dv-c
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 60Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      source:
        http:
          url: http://10.19.3.125/pub/users/joherr/os/bootsource_images/win2012r2.qcow2.gz
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/vm: vm-win12-datavolume-c
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
          - disk:
              bus: sata
            name: datavolumedisk1-c
          - errorPolicy: report
            lun:
              bus: scsi
              reservation: true
            name: scsi-disk
          - cdrom:
              bus: sata
            name: windows-drivers-disk
          interfaces:
          - bridge: {}
            macAddress: 02:7e:7e:00:00:06
            model: e1000e
            name: nic-willing-jackal
        machine:
          type: pc-q35-rhel9.2.0
        resources:
          requests:
            memory: 7Gi
      networks:
      - multus:
          networkName: l2-cluster-net
        name: nic-willing-jackal
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: win12-dv-c
        name: datavolumedisk1-c
      - name: scsi-disk
        persistentVolumeClaim:
          claimName: scsi-pvc
      - containerDisk:
          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:3a83562ffa9e9c3438eecd7b0833e303778691d8e17ddb41f09fce598ed9a01b
        name: windows-drivers-disk
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-20T11:40:52Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'cannot migrate VMI: PVC win12-dv-c is not shared, live migration requires
      that all PVCs must be shared (using ReadWriteMany access mode)'
    reason: DisksNotLiveMigratable
    status: "False"
    type: LiveMigratable
  - lastProbeTime: "2023-11-20T11:41:38Z"
    lastTransitionTime: null
    status: "True"
    type: AgentConnected
  created: true
  desiredGeneration: 50
  observedGeneration: 50
  printableStatus: Running
  ready: true
  volumeSnapshotStatuses:
  - enabled: true
    name: datavolumedisk1-c
  - enabled: false
    name: scsi-disk
    reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
      StorageClass [local-scsi] [scsi-disk]'
  - enabled: false
    name: windows-drivers-disk
    reason: Snapshot is not supported for this volumeSource type [windows-drivers-disk]

Comment 4 Adam Litke 2023-11-28 20:47:46 UTC
Kevin.  Is this a duplicate of Bug 2249554 ?

Comment 5 Kevin Alon Goldblatt 2023-12-03 22:33:10 UTC
(In reply to Adam Litke from comment #4)
> Kevin.  Is this a duplicate of Bug 2249554 ?

No it is not a duplicate. Here in step 10 the The Validate Disk Failover Test fails due to a disk error. This was resolved after doing a disk check/fix on the scsi disk.
After cleaning the disk I was not able to reproduce this error again.
In bug Bug 2249554 the validation failed due in the scsi3 reservation test and the vm crashes due to the error policy.


Note You need to log in before you can comment on or make changes to this bug.