This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2247157 - CNV DataSources taking >90 minutes to import and VM PVCs are taking >30 minutes to clone
Summary: CNV DataSources taking >90 minutes to import and VM PVCs are taking >30 minut...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.13.4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Adam Litke
QA Contact: dalia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-10-31 01:58 UTC by Chad Hobbs
Modified: 2023-12-14 16:04 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-14 16:04:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cdi-deployment pod log (1.05 MB, application/zip)
2023-10-31 01:58 UTC, Chad Hobbs
no flags Details
crashing importer log 1 (977 bytes, application/zip)
2023-10-31 02:01 UTC, Chad Hobbs
no flags Details
crashing importer log 0 (1.40 KB, application/zip)
2023-10-31 02:01 UTC, Chad Hobbs
no flags Details
topolvm-node lvmd log (1.09 KB, application/zip)
2023-10-31 02:02 UTC, Chad Hobbs
no flags Details
topolvm-node node log (2.05 KB, application/zip)
2023-10-31 02:02 UTC, Chad Hobbs
no flags Details
*-source-pod log from a nearly completed VM creation (57.21 KB, text/plain)
2023-10-31 17:28 UTC, Chad Hobbs
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   CNV-34712 0 None None None 2023-12-14 16:04:34 UTC

Description Chad Hobbs 2023-10-31 01:58:17 UTC
Created attachment 1996303 [details]
cdi-deployment pod log

Created attachment 1996303 [details]
cdi-deployment pod log

Description of problem:
Upon vanilla installation of OpenShift 4.13.17 SNO with OCP Virt and LVM Storage operators installed as part of Assisted Installer cluster creation, the data sources take up to 90 minutes to finish importing with upwards of 15 restarts over that period. The topolvm-node-* will enter a crashloopbackoff. Manually deleting the pod and allowing for recreation will rectify the crashloopbackoff situation. Finally, the cloning of a PVC for a newly created VM takes upwards of 30 minutes.

Version-Release number of selected component (if applicable):
OpenShift Virtualization - 4.13.4
LVM Storage - 4.14.0
OpenShift - 4.13.17

How reproducible:
Every time on test cluster

Steps to Reproduce:
1. Deploy cluster on bare metal server (48vCPUs, 256GB RAM, 2TB SSD storage on 2 1TB SSDs) using interactive installer, selecting SNO, static IP configuration, and OCPvirt and LVM Storage selected
2. Cluster deploys in about 30 minutes

Actual results:
[chad@bastion ~]$ oc get -n openshift-virtualization-os-images dv
NAME                          PHASE              PROGRESS   RESTARTS   AGE
centos-stream9-aea06c312f87   ImportInProgress   99.91%     19         105m
rhel8-2cde3f47f8c7            ImportInProgress   0.00%      19         105m
rhel9-a1947a1edca5            ImportInProgress   0.00%      19         105m

[chad@bastion ~]$ oc get -n openshift-virtualization-os-images pods
NAME                                   READY   STATUS             RESTARTS         AGE
importer-centos-stream9-aea06c312f87   1/2     CrashLoopBackOff   21 (17s ago)     110m
importer-rhel8-2cde3f47f8c7            2/2     Running            20 (6m23s ago)   111m
importer-rhel9-a1947a1edca5            1/2     CrashLoopBackOff   20 (65s ago)     111m

Eventually, all will finish their import. I then create a VM, in this case Centos9
[chad@bastion ~]$ oc get pods -n vm-test
NAME                                    READY   STATUS    RESTARTS   AGE
cdi-upload-centos-stream9-empty-skunk   0/1     Running   0          10s

[chad@bastion ~]$ oc get dv -n vm-test
NAME                         PHASE             PROGRESS   RESTARTS   AGE
centos-stream9-empty-skunk   CloneInProgress   4.71%      2          2m58s

This took approximately 30 minutes, I do not know how to search for that particular metric.


Expected results:
A VM deploys on OCP SNO without error and without import and cloning errors that take over 2 hours to overcome.

Additional info:
I have additional logs and can provide any resources necessary regarding the test cluster.

Comment 1 Chad Hobbs 2023-10-31 02:01:06 UTC
Created attachment 1996304 [details]
crashing importer log 1

Comment 2 Chad Hobbs 2023-10-31 02:01:33 UTC
Created attachment 1996305 [details]
crashing importer log 0

Comment 3 Chad Hobbs 2023-10-31 02:02:03 UTC
Created attachment 1996306 [details]
topolvm-node lvmd log

Comment 4 Chad Hobbs 2023-10-31 02:02:25 UTC
Created attachment 1996307 [details]
topolvm-node node log

Comment 5 Chad Hobbs 2023-10-31 15:25:14 UTC
Here is the lvms storage class yaml, as provisioned by the assisted installer:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: lvms-vg1
  uid: 65ad7acd-bbc1-4d70-a166-5f167b098825
  resourceVersion: '18273'
  creationTimestamp: '2023-10-30T22:39:19Z'
  annotations:
    description: Provides RWO and RWOP Filesystem & Block volumes
    storageclass.kubernetes.io/is-default-class: 'true'
  managedFields:
    - manager: manager
      operation: Update
      apiVersion: storage.k8s.io/v1
      time: '2023-10-30T22:39:19Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:allowVolumeExpansion': {}
        'f:metadata':
          'f:annotations':
            .: {}
            'f:description': {}
            'f:storageclass.kubernetes.io/is-default-class': {}
        'f:parameters':
          .: {}
          'f:csi.storage.k8s.io/fstype': {}
          'f:topolvm.io/device-class': {}
        'f:provisioner': {}
        'f:reclaimPolicy': {}
        'f:volumeBindingMode': {}
provisioner: topolvm.io
parameters:
  csi.storage.k8s.io/fstype: xfs
  topolvm.io/device-class: vg1
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Comment 6 Chad Hobbs 2023-10-31 17:28:20 UTC
Created attachment 1996400 [details]
*-source-pod log from a nearly completed VM creation

Since spinning up the cluster, all pods have settled in and VM creation progresses as normal. Cloning restarts seem to only occur within the first few hours of creating a cluster, perhaps only while source images are being downloaded.


Note You need to log in before you can comment on or make changes to this bug.