Bug 1885964
Summary: | Image Cloning Slow Compared to URL | |||
---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Benjamin Schmaus <bschmaus> | |
Component: | Storage | Assignee: | Michael Henriksen <mhenriks> | |
Status: | CLOSED ERRATA | QA Contact: | Alex Kalenyuk <akalenyu> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 2.5.0 | CC: | akalenyu, alitke, cnv-qe-bugs, mhenriks, pelauter, sjhala | |
Target Milestone: | --- | Flags: | pelauter:
needinfo-
|
|
Target Release: | 2.5.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | virt-cdi-operator-container-v2.5.0-17 hco-bundle-registry-container-v2.5.0-329 | Doc Type: | Release Note | |
Doc Text: |
Host-assisted cloning performance has been improved by using a more efficient compression algorithm
|
Story Points: | --- | |
Clone Of: | ||||
: | 1887844 (view as bug list) | Environment: | ||
Last Closed: | 2020-11-17 13:24:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1887844 |
Description
Benjamin Schmaus
2020-10-07 11:04:23 UTC
I wonder if this is due to CPU resource limits on the clone source or target pods since we are filtering through gzip to reduce network traffic. Michael could you please take a look? Cloning is even more important for 2.5 and I'd like to make sure we're performing acceptably. Before investigating CPU limits it would be helpful to know a little more about the import use case. What is the format of the file (qcow2 or raw)? What is the physical size? Looks like the virtual size is 63G. Here is a snippet of the datavolume they were testing when instantiating a VM: #dataVolumeTemplates: # - apiVersion: cdi.kubevirt.io/v1alpha1 # kind: DataVolume # metadata: # name: fedora-32-disk-0 # spec: # pvc: # accessModes: # - ReadWriteMany # resources: # requests: # storage: 10Gi # storageClassName: px-repl2-file # #volumeMode: Filesystem # source: # http: # url: >- # https://files.caas.<customer redacted>.com:9443/ova/Fedora-Cloud-Base-32-1.6.x86_64.qcow2 dataVolumeTemplates: - apiVersion: cdi.kubevirt.io/v1alpha1 kind: DataVolume metadata: name: fedora-32-disk-0 spec: pvc: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: px-repl2-file #volumeMode: Filesystem source: pvc: name: fedora-cloud-base-32-v1 namespace: vm-images They were switching between using the fedora image and then cloning from a pvc in the example above. My thought was maybe the cloning process is cloning all 10gb pvc whereas the Fedora qcow is just the size of the initial qcow2 which I believe is >1gb. After doing some profiling, I discovered that clone process using a lot of CPU in gzip. Migrating to snappy made clone performance comparable to http import. I suspect this fix will go a long way in addressing the performance variation described in this bug, but of course cannot be sure as there are a lot of different variables at play. But for my test, there was an approx 6X improvement in cloning the base fedora cloud image. Compared timings for hostpath-provisioner SC with Fedora32 image both pre & post this change (HCO-v2.5.0-270 VS 329) and definitely seems like there is an improvement: Pre-change: Import: 1:35 minutes Clone: 5:33 minutes Post-change: Import: 1:03 minutes Clone: 38 seconds Keep in mind this is not BM (PSI) Verified on CNV 2.5.0, KubeVirt: v0.34.0-13-g2cc7d61, CDI: Containerized Data Importer v1.23.6 HCO image: registry.redhat.io/container-native-virtualization/hyperconverged-cluster-operator@sha256:631d9868743d52fba5fb5a089acc418328e0c33c55ae2319e5e06c24a4323124 CSV creation time: 2020-10-11 19:19:07 @Peter I see Adam already has a clone for this purpose, I will try to get my hands on a 2.4.X environment to sample pre-change behavior Release notes PR is now merged: https://github.com/openshift/openshift-docs/pull/27019 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5127 |