Description of problem: Performing a data copy function of a preallocated volume, such as creating a template, results in a slowdown of all i/o to/from the NFS server in question. The NFS file systems are V3, with the default mount options provided by RHV. On the SPM host, MiscFileReadException read timeouts in VDSM, high repoStats 'delay' times, Storage Domains become 'INVALID', VM's become unresponsive. All storage domains on the same NFS server are affected. This can be replicated by manually running; # qemu-img convert -p -t none -T none -f raw <src path> -O raw -o preallocation=falloc <dest path> Removing "-o preallocation=falloc" circumvents the problem, but the result is a sparse file instead of a preallocated file. Version-Release number of selected component (if applicable): RHV 4.3.9 RHVH 4.3.9 host vdsm-4.30.44-1.el7ev.x86_64 libvirt-4.5.0-33.el7.x86_64 qemu-kvm-rhev-2.12.0-44.el7.x86_64 ioprocess-1.3.1-1.el7ev.x86_64 How reproducible: 100% in customer's environment. Steps to Reproduce: 1. Perform a 'copyImage' function of a raw/preallocated disk/volume, e.g. template creation from/to an NFS storage domain that results in 'qemu-ing convert .... -o preallocation=falloc ....' Actual results: On the SPM, MiscFileReadException read timeouts in VDSM, high repoStats 'delay' times, Storage Domains become 'INVALID', VM's become unresponsive. Expected results: None of the above. Additional info:
- The above 'qemu-img convert .... -o preallocation=falloc' does the following (under 'strace'); 6024 13:17:12.697935 fallocate(12, 0, 0, 3221225472) = -1 EOPNOTSUPP (Operation not supported) <0.000007> 6024 13:17:12.698320 pwrite64(12, "\0", 1, 4095) = 1 <0.000014> 6024 13:17:12.698351 pwrite64(12, "\0", 1, 8191) = 1 <0.000009> 6024 13:17:12.698375 pwrite64(12, "\0", 1, 12287) = 1 <0.000008> 6024 13:17:12.698399 pwrite64(12, "\0", 1, 16383) = 1 <0.000008> ............ etc., etc.. ............ i.e. it tries to use fallocate(), but that isn't supported by NFS v3, so it then resorts to initialising the volume by writing a zero byte into every block. - Both of the following cause the problem too; # fallocate --posix --length 50G <dest path> # qemu-img create -f raw -o preallocation=falloc <dest path> 50G - This works ok; # dd if=/dev/zero bs=8M count=$((50 * 128)) of=<dest path> oflag=direct conv=fsync and can be followed up with; # qemu-img convert -p -t none -T none -f raw <src path> -O raw -n <dest path>
This is a correction to comment #1. The 'fallocate --posix' command does not cause the problem in question. But 'qemu-img create -o preallocation=falloc' does.
I updated the title since this issue is related to legacy NFS version. Using NFS 4.2 avoids this issues and gives huge performance improvement for any storage operation with sparse or preallocated volume. For example, creating 500g preallocated volume will take about 1 second vs many 1000 seconds on a very fast NFS server. We support legacy NFS versions, and will fix this issue, but customers that care about performance should move to NFS 4.2.
Hi Nir, Can you provide a clear scenario on this bug? Is this a scale/performance bug or a functional one? If it's not a functional one I'll redirect this to Mordechai.
This is a performance bug, and I could not reproduce the original issue: MiscFileReadException read timeouts in VDSM, high repoStats 'delay' times, Storage Domains become 'INVALID', VM's become unresponsive. But I could easily reproduce that preallocating with qemu-img is 2 times slower compared with writing zeroes. The fix changes the way we preallocate volumes to use vdsm fallocate helper instead of qemu-img. And we change the way we preallocate, in the past we did this when coping disks: 1. create sparse volume (incorrect but step 2 follows) 2. run qemu-img convert -o preallocation=falloc With the fix we will do: 1. create prealloacated volume 2. run qemu-img convert -n So we need to test both functionality and performance. ## functional - No regressions in flows creating and copying disks still. Running all automated storage tests should be enough. - preallocated disks are actually preallocated on storage, and sparse disk are actually sparse on storage. This can be tested automatically but I guess you don't have these tests yet. Testing sparseness is easy using qemu-img: $ qemu-img create -f raw -o preallocation=falloc preallocated.raw 1g Formatting 'preallocated.raw', fmt=raw size=1073741824 preallocation=falloc $ qemu-img create -f raw sparse.raw 1g Formatting 'sparse.raw', fmt=raw size=1073741824 $ qemu-img info preallocated.raw image: preallocated.raw file format: raw virtual size: 1 GiB (1073741824 bytes) disk size: 1 GiB $ qemu-img info sparse.raw image: sparse.raw file format: raw virtual size: 1 GiB (1073741824 bytes) disk size: 4 KiB Creating and copying empty disks would be the quickest way to verify this. Note that checking for sparseness can be done only on NFS 4.2 and GlusterFS. NFS < 4.2 does not support sparseness and when copying disks you are likley to get a volume full of zeroes, depending on qemu-img convert implementation. ## performance: - compare performance of creating and copying preallocated disks on NFS < 4.2 (writing zeros) - no performance regression with creating and copying disks on NFS 4.2 and GlusterFS (using native fallocate)
As this is a performance bug(see comment above), moving qa contact to Mordechai.
I tested the patches with NFS 4.2 and NFS 3. Here are the tests flows. ## Create Template from VM with preallocated disk NFS 3: 2020-07-14 21:46:05,956+0300 INFO (tasks/0) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_2/a90d27d1-755d-4808-8c48-6887f8ccd4df/images/57d4ec65-e9a4-4a1d-9443-faa1ef0e9535/8a06096c-d91a-4b95-a4ee-3e8331e542b4: 0.03 seconds (utils:390) 2020-07-14 21:46:34,763+0300 INFO (tasks/0) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_2/a90d27d1-755d-4808-8c48-6887f8ccd4df/images/57d4ec65-e9a4-4a1d-9443-faa1ef0e9535/8a06096c-d91a-4b95-a4ee-3e8331e542b4: 28.81 seconds (utils:390) [root@host4 ~]# vdsm-client StorageDomain dump sd_id=a90d27d1-755d-4808-8c48-6887f8ccd4df | jq '.volumes | .[] | select(.image=="57d4ec65-e9a4-4a1d-9443-faa1ef0e9535")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594752394, "description": "{\"DiskAlias\":\"fedora-32.raw\",\"DiskDescription\":\"NFS 3\"}", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "57d4ec65-e9a4-4a1d-9443-faa1ef0e9535", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442450944, "type": "PREALLOCATED", "voltype": "SHARED" } NFS 4.2: 2020-07-14 19:12:54,414+0300 INFO (tasks/3) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/4da6317f-8635-4dc5-b023-7911a3a4a8d5/f67a3d79-b747-4528-918c-63aacff4a424: 0.02 seconds (utils:390) 2020-07-14 19:12:54,491+0300 INFO (tasks/3) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/4da6317f-8635-4dc5-b023-7911a3a4a8d5/f67a3d79-b747-4528-918c-63aacff4a424: 0.08 seconds (utils:390) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="4da6317f-8635-4dc5-b023-7911a3a4a8d5")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594743174, "description": "{\"DiskAlias\":\"fedora-32.raw\",\"DiskDescription\":\"NFS 4.2\"}", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "4da6317f-8635-4dc5-b023-7911a3a4a8d5", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "SHARED" } ## Copy template disk To NFS 4.2: # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="57d4ec65-e9a4-4a1d-9443-faa1ef0e9535")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594754870, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 1, "image": "57d4ec65-e9a4-4a1d-9443-faa1ef0e9535", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442520576, "type": "PREALLOCATED", "voltype": "LEAF" } BUG: template is not SHARED! To NFS 3: # vdsm-client StorageDomain dump sd_id=a90d27d1-755d-4808-8c48-6887f8ccd4df | jq '.volumes | .[] | select(.image=="4da6317f-8635-4dc5-b023-7911a3a4a8d5")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594753791, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 1, "image": "4da6317f-8635-4dc5-b023-7911a3a4a8d5", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442450944, "type": "PREALLOCATED", "voltype": "LEAF" } BUG: template is not SHARED! ## Create vm from template 2020-07-14 23:56:54,135+0300 INFO (jsonrpc/7) [vdsm.api] START createVolume(sdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', imgUUID='b7f5f089-cb4b-412c-b934-b8abc7c40411', size='6442450944', volFormat=5, preallocate=1, diskType='DATA', volUUID='37cbccaf-a0f5-4a69-b55e-1163372c31d4', desc=None, srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='1') from=::ffff:192.168.122.12,56344, flow_id=af3f175b-0f20-4ecc-b617-d67b5f0f4b83, task_id=557cfa0e-5570-4e0a-8c5e-90f24076b3cf (api:48) 2020-07-14 23:56:54,314+0300 INFO (tasks/5) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/b7f5f089-cb4b-412c-b934-b8abc7c40411/37cbccaf-a0f5-4a69-b55e-1163372c31d4: 0.02 seconds (utils:390) 2020-07-14 23:56:54,389+0300 INFO (tasks/5) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/b7f5f089-cb4b-412c-b934-b8abc7c40411/37cbccaf-a0f5-4a69-b55e-1163372c31d4: 0.07 seconds (utils:390) 2020-07-14 23:57:08,662+0300 INFO (jsonrpc/0) [vdsm.api] START sdm_copy_data(job_id='e101e48d-96e3-49b4-b04f-96a5b15b086b', source={'endpoint_type': 'div', 'sd_id': '56ecc03c-4bb5-4792-8971-3c51ea924d2e', 'img_id': '4da6317f-8635-4dc5-b023-7911a3a4a8d5', 'vol_id': 'f67a3d79-b747-4528-918c-63aacff4a424'}, destination={'generation': 0, 'endpoint_type': 'div', 'sd_id': '56ecc03c-4bb5-4792-8971-3c51ea924d2e', 'img_id': 'b7f5f089-cb4b-412c-b934-b8abc7c40411', 'vol_id': '37cbccaf-a0f5-4a69-b55e-1163372c31d4'}) from=::ffff:192.168.122.12,56338, flow_id=af3f175b-0f20-4ecc-b617-d67b5f0f4b83, task_id=d9d185ac-37e1-4b82-9686-070ab112f309 (api:48) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="b7f5f089-cb4b-412c-b934-b8abc7c40411")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594760214, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 1, "image": "b7f5f089-cb4b-412c-b934-b8abc7c40411", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "LEAF" } ## Clone VM 2020-07-15 00:01:42,816+0300 INFO (jsonrpc/5) [vdsm.api] START createVolume(sdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', imgUUID='1c0ebf0e-834c-4566-bd2c-c68885aff2e9', size='6442450944', volFormat=5, preallocate=1, diskType='DATA', volUUID='eaa03a46-34c2-4756-9e18-cd25e517c2d7', desc=None, srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='1') from=::ffff:192.168.122.12,56344, flow_id=430124d2-246b-4149-8f32-aaf15edd1e9e, task_id=94dedc80-01ed-4233-a580-75a231c19314 (api:48) 2020-07-15 00:01:43,022+0300 INFO (tasks/7) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/1c0ebf0e-834c-4566-bd2c-c68885aff2e9/eaa03a46-34c2-4756-9e18-cd25e517c2d7: 0.02 seconds (utils:390) 2020-07-15 00:01:43,093+0300 INFO (tasks/7) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/1c0ebf0e-834c-4566-bd2c-c68885aff2e9/eaa03a46-34c2-4756-9e18-cd25e517c2d7: 0.07 seconds (utils:390) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="1c0ebf0e-834c-4566-bd2c-c68885aff2e9")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594760503, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 1, "image": "1c0ebf0e-834c-4566-bd2c-c68885aff2e9", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "LEAF" } ## Create template from snapshot 2020-07-15 00:07:36,412+0300 INFO (jsonrpc/5) [vdsm.api] START copyImage(sdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', vmUUID='', srcImgUUID='1c0ebf0e-834c-4566-bd2c-c68885aff2e9', srcVolUUID='eaa03a46-34c2-4756-9e18-cd25e517c2d7', dstImgUUID='34aa8f0b-f89e-4194-8f64-dd7b9381ff57', dstVolUUID='c04da05b-1a6d-414b-9a9a-72e35b7cbf78', description='{"DiskAlias":"fedora-32.raw_from-nfs-4.2-clone","DiskDescription":"NFS 4.2"}', dstSdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', volType=6, volFormat=5, preallocate=1, postZero='false', force='false', discard=False) from=::ffff:192.168.122.12,56344, flow_id=970837ef-ffab-437b-a202-ec6d9a2fcae9, task_id=e57a1745-1f04-4580-b3a4-add3b539cf37 (api:48) 2020-07-15 00:07:36,810+0300 INFO (tasks/3) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/34aa8f0b-f89e-4194-8f64-dd7b9381ff57/c04da05b-1a6d-414b-9a9a-72e35b7cbf78: 0.03 seconds (utils:390) 2020-07-15 00:07:36,899+0300 INFO (tasks/3) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/34aa8f0b-f89e-4194-8f64-dd7b9381ff57/c04da05b-1a6d-414b-9a9a-72e35b7cbf78: 0.09 seconds (utils:390) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="34aa8f0b-f89e-4194-8f64-dd7b9381ff57")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594760856, "description": "{\"DiskAlias\":\"fedora-32.raw_from-nfs-4.2-clone\",\"DiskDescription\":\"NFS 4.2\"}", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "34aa8f0b-f89e-4194-8f64-dd7b9381ff57", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "SHARED" } ## Clone VM from snapshot 2020-07-15 00:14:58,245+0300 INFO (jsonrpc/6) [vdsm.api] START copyImage(sdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', vmUUID='', srcImgUUID='1c0ebf0e-834c-4566-bd2c-c68885aff2e9', srcVolUUID='eaa03a46-34c2-4756-9e18-cd25e517c2d7', dstImgUUID='b75c2af3-a473-43ec-8992-67dc8daac084', dstVolUUID='31755064-d0a5-41c8-939c-46f9fdcda47a', description='', dstSdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', volType=8, volFormat=5, preallocate=1, postZero='false', force='false', discard=False) from=::ffff:192.168.122.12,56344, flow_id=db03f2e4-cbaa-4fdd-bb27-ccd48112157c, task_id=124c34d3-7fde-48d4-afba-a49c8ca806ba (api:48) 2020-07-15 00:14:58,601+0300 INFO (tasks/8) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/b75c2af3-a473-43ec-8992-67dc8daac084/31755064-d0a5-41c8-939c-46f9fdcda47a: 0.03 seconds (utils:390) 2020-07-15 00:14:58,673+0300 INFO (tasks/8) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/b75c2af3-a473-43ec-8992-67dc8daac084/31755064-d0a5-41c8-939c-46f9fdcda47a: 0.07 seconds (utils:390) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="b75c2af3-a473-43ec-8992-67dc8daac084")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594761298, "description": "", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "b75c2af3-a473-43ec-8992-67dc8daac084", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "LEAF" } ## Copy floating Disk 2020-07-15 00:19:22,888+0300 INFO (jsonrpc/4) [vdsm.api] START createVolume(sdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', imgUUID='38d44a55-de5c-4a49-82b5-52b33db71853', size='6442450944', volFormat=5, preallocate=1, diskType='DATA', volUUID='0e83ef36-7142-4413-abd0-7bc54f40c6ab', desc=None, srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='1') from=::ffff:192.168.122.12,56344, flow_id=6eff1f11-80c6-4d50-a7d9-b81f14a28303, task_id=fe25ef89-1532-4247-aa28-31231785bad1 (api:48) 2020-07-15 00:19:22,995+0300 INFO (tasks/6) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/38d44a55-de5c-4a49-82b5-52b33db71853/0e83ef36-7142-4413-abd0-7bc54f40c6ab: 0.02 seconds (utils:390) 2020-07-15 00:19:23,041+0300 INFO (tasks/6) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/38d44a55-de5c-4a49-82b5-52b33db71853/0e83ef36-7142-4413-abd0-7bc54f40c6ab: 0.05 seconds (utils:390) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="38d44a55-de5c-4a49-82b5-52b33db71853")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594761563, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 1, "image": "38d44a55-de5c-4a49-82b5-52b33db71853", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "LEAF" } ## Import VM with raw-preallocated disk This is not a data operation, so the disk was imported as is. # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="1c0ebf0e-834c-4566-bd2c-c68885aff2e9")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594760503, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 2, "image": "1c0ebf0e-834c-4566-bd2c-c68885aff2e9", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442467328, "type": "PREALLOCATED", "voltype": "LEAF" } ## Live/cold migrate raw-preallocated disk Cold move to nfs3 domain: 2020-07-15 00:37:29,090+0300 INFO (jsonrpc/4) [vdsm.api] START createVolume(sdUUID='a90d27d1-755d-4808-8c48-6887f8ccd4df', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', imgUUID='1c0ebf0e-834c-4566-bd2c-c68885aff2e9', size='6442450944', volFormat=5, preallocate=1, diskType='DATA', volUUID='eaa03a46-34c2-4756-9e18-cd25e517c2d7', desc=None, srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='1') from=::ffff:192.168.122.12,56344, flow_id=7527386e-bccb-4c7f-a068-7f7484c09a56, task_id=db5444ec-eea5-4773-b223-754768923d08 (api:48) 2020-07-15 00:37:29,224+0300 INFO (tasks/1) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_2/a90d27d1-755d-4808-8c48-6887f8ccd4df/images/1c0ebf0e-834c-4566-bd2c-c68885aff2e9/eaa03a46-34c2-4756-9e18-cd25e517c2d7: 0.02 seconds (utils:390) 2020-07-15 00:37:51,751+0300 INFO (tasks/1) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_2/a90d27d1-755d-4808-8c48-6887f8ccd4df/images/1c0ebf0e-834c-4566-bd2c-c68885aff2e9/eaa03a46-34c2-4756-9e18-cd25e517c2d7: 22.53 seconds (utils:390) 2020-07-15 00:38:09,839+0300 INFO (jsonrpc/0) [vdsm.api] START sdm_copy_data(job_id='872519b0-97b5-40d0-a4a9-fa42ec84bc5a', source={'endpoint_type': 'div', 'sd_id': '56ecc03c-4bb5-4792-8971-3c51ea924d2e', 'img_id': '1c0ebf0e-834c-4566-bd2c-c68885aff2e9', 'vol_id': 'eaa03a46-34c2-4756-9e18-cd25e517c2d7'}, destination={'generation': 0, 'endpoint_type': 'div', 'sd_id': 'a90d27d1-755d-4808-8c48-6887f8ccd4df', 'img_id': '1c0ebf0e-834c-4566-bd2c-c68885aff2e9', 'vol_id': 'eaa03a46-34c2-4756-9e18-cd25e517c2d7'}) from=::ffff:192.168.122.12,56338, flow_id=7527386e-bccb-4c7f-a068-7f7484c09a56, task_id=8bbd3d0a-9cf4-42b7-9484-9cf518ace07f (api:48) # vdsm-client StorageDomain dump sd_id=a90d27d1-755d-4808-8c48-6887f8ccd4df | jq '.volumes | .[] | select(.image=="1c0ebf0e-834c-4566-bd2c-c68885aff2e9")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594762671, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "1c0ebf0e-834c-4566-bd2c-c68885aff2e9", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6442450944, "type": "PREALLOCATED", "voltype": "LEAF" } Live move to nfs 4.2: 2020-07-15 00:42:38,815+0300 INFO (jsonrpc/3) [vdsm.api] START createVolume(sdUUID='56ecc03c-4bb5-4792-8971-3c51ea924d2e', spUUID='86b5a5ca-5376-4cef-a8f7-d1dc1ee144b4', imgUUID='1c0ebf0e-834c-4566-bd2c-c68885aff2e9', size='6442450944', volFormat=5, preallocate=1, diskType='DATA', volUUID='eaa03a46-34c2-4756-9e18-cd25e517c2d7', desc=None, srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='1') from=::ffff:192.168.122.12,56344, flow_id=4d519d82-2e37-4cae-bcb2-34fddb9fd472, task_id=96f142e2-8a03-4adf-8ac8-85f7f46e55af (api:48) 2020-07-15 00:42:38,945+0300 INFO (tasks/4) [storage.Volume] Creating image /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/1c0ebf0e-834c-4566-bd2c-c68885aff2e9/eaa03a46-34c2-4756-9e18-cd25e517c2d7: 0.02 seconds (utils:390) 2020-07-15 00:42:38,996+0300 INFO (tasks/4) [storage.Volume] Preallocating volume /rhev/data-center/mnt/nfs1:_export_1/56ecc03c-4bb5-4792-8971-3c51ea924d2e/images/1c0ebf0e-834c-4566-bd2c-c68885aff2e9/eaa03a46-34c2-4756-9e18-cd25e517c2d7: 0.05 seconds (utils:390) 2020-07-15 00:42:56,386+0300 INFO (jsonrpc/7) [vdsm.api] START sdm_copy_data(job_id='51c4f98b-85a5-4348-8171-25619ff399de', source={'endpoint_type': 'div', 'sd_id': 'a90d27 d1-755d-4808-8c48-6887f8ccd4df', 'img_id': '1c0ebf0e-834c-4566-bd2c-c68885aff2e9', 'vol_id': 'eaa03a46-34c2-4756-9e18-cd25e517c2d7'}, destination={'generation': 0, 'endpoint _type': 'div', 'sd_id': '56ecc03c-4bb5-4792-8971-3c51ea924d2e', 'img_id': '1c0ebf0e-834c-4566-bd2c-c68885aff2e9', 'vol_id': 'eaa03a46-34c2-4756-9e18-cd25e517c2d7'}) from=::f fff:192.168.122.12,56338, flow_id=4d519d82-2e37-4cae-bcb2-34fddb9fd472, task_id=d74d8bce-f4c5-403f-9e9b-163fc7ad6581 (api:48) # vdsm-client StorageDomain dump sd_id=56ecc03c-4bb5-4792-8971-3c51ea924d2e | jq '.volumes | .[] | select(.image=="1c0ebf0e-834c-4566-bd2c-c68885aff2e9")' { "apparentsize": 6442450944, "capacity": 6442450944, "ctime": 1594762959, "description": "None", "disktype": "DATA", "format": "RAW", "generation": 1, "image": "1c0ebf0e-834c-4566-bd2c-c68885aff2e9", "legality": "LEGAL", "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 6441996288, "type": "PREALLOCATED", "voltype": "LEAF" }
How to test the performance improvement: First, we could not reproduce the customer issue, and the customer already confirmed that the issue was resolved by using dd or fallocate. There is need to waste time on this. What we can test is the improvement switching from "qemu-img -o preallocate" to /usr/libexec/vdsm/fallocate for single and multiple disk copies with NFS 3. 1. Add NFS 3 storage domain 2. Test creation time of 100 GiB preallocated volume with RHV 4.4.1. 3. Test creation time of 10 x 100 GiB preallocated volumes with RHV 4.4.1. 3. Upgrade to RHV 4.4.2 5. Test creation time of 100 GiB preallocated volume with RHV 4.4.2. 6. Test creation time of 10 x 100 GiB preallocated volume with RHV 4.4.2. I don't see a need to test with many vms running or using many hosts, using single host is good enough. But it it helps to use existing environment we can use it. If needed, test 10 concurent disk creation tasks with both versions. We expect to see much lower time with RHV 4.4.2. For reference see the tests I have done here: https://github.com/oVirt/vdsm/commit/9533b644336636f42807a78054eae3c03da5fa4a Daniel, I think you should reconsider testing this for 4.4.2. Testing this is relatively easy. The non-performance aspects of this can be tested by QE automated tests.
We found some issue in the fix that require more work.
Fix should be available in vdsm-4.40.23.
Baseline results on 4.4.1-12 Tested on Red-03 with version : rhv-release-4.4.1-12-001.noarch vdsm-4.40.22-1.el8ev.x86_64 host > f01-h07-000-r620 (SPM) ,VM network 10g NFS server : NFS-Version = 3 16 GB RAM 16 cores 5TB local space Test creation time of 1 X 100 GiB preallocated : 1. total duration from vdsm log = 1308.54 sec 2. total duration from the GUI = 1320 sec Test creation time of 10 x 100 GiB preallocated 1. total duration from vdsm log : min = 12352.88 sec median = 62602.77 sec max = 12555.09 sec average = 12520.554 sec 2. total duration from the GUI : 12611 sec next step will be to test the same cycles on 4.4.2 the full logs files can be found : https://drive.google.com/drive/folders/18KnXMxJ2syz3IfBG11xaf6_5GZlTMmXU?usp=sharing
Tested on Red-01 version 4.4.1 : rhv-release-4.4.1-12-001.noarch vdsm-4.40.22-1.el8ev.x86_64 host > f02-h09-000-r620 (SPM) ,VM network 10g version 4.4.2 : rhv-release-4.4.2-3-001.noarch vdsm-4.40.25-1.el8ev.x86_64 host > f02-h09-000-r620 (SPM) ,VM network 10g NFS server : NFS-Version = 3 16 GB RAM 16 cores 5TB local space Version 4.4.1 results : Test creation time of 1 X 100 GiB preallocated : total duration from vdsm log = 1,312.99 sec Test creation time of 10 x 100 GiB preallocated total duration from vdsm log : min = 12,271.31 sec median = 12,613.91 sec max = 12,609.99 sec average = 12,552.00 sec Version 4.4.2 results : Test creation time of 1 X 100 GiB preallocated : total duration from vdsm log = 366.03 sec Test creation time of 10 x 100 GiB preallocated total duration from vdsm log : min = 2,619.44 sec median = 2,636.56 sec max = 2,645.55 sec average = 2,636.44 sec the improvement on the total duration is 358% on single disk , and 468% on ten disk cycle , comparison 4.4.2 to 4.4.1 he full results and logs can be found : https://drive.google.com/drive/folders/18KnXMxJ2syz3IfBG11xaf6_5GZlTMmXU?usp=sharing WARN messages during the tests on vdsm log appers only on version 4.4.1 : ------version 4.4.1 on red-03 on ten disks cycle : ten_disks/vdsm.log:2020-08-12 12:11:24,818+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/4 running <Task <JsonRpcTask {'jsonrpc': '2.0', 'method': 'Volume.getInfo', 'params': {'storagepoolID': '13e6a4d3-0241-4fc2-a094-e3f2e9abae54', 'storagedomainID': 'c48f7228-46ce-4b7e-b262-2e585855d7c0', 'imageID': 'b10919b0-b959-47fd-a2a1-8903a9dd29be', 'volumeID': '8210be54-e500-43f3-acbb-679ccf67ffcf'}, 'id': '28e81675-0590-4fb4-8f11-d9a9679c3771'} at 0x7f57c8419518> timeout=60, duration=60.36 at 0x7f57c839feb8> task#=4874 at 0x7f57ca7b8438>, traceback: ten_disks/vdsm.log:2020-08-12 12:12:24,463+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/4 waiting task#=4875 at 0x7f57ca7b8438>, traceback: ten_disks/vdsm.log:2020-08-12 12:13:25,762+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'jsonrpc': '2.0', 'method': 'Volume.getInfo', 'params': {'storagepoolID': '13e6a4d3-0241-4fc2-a094-e3f2e9abae54', 'storagedomainID': 'c48f7228-46ce-4b7e-b262-2e585855d7c0', 'imageID': '74e17045-8c1a-43ab-8d74-aebf1786eb29', 'volumeID': '956e0eab-4a03-4f42-9fe5-e0d1d9b89cd6'}, 'id': '0ddfde72-5eb6-4556-8c54-a9b64b93bcd8'} at 0x7f576ad315f8> timeout=60, duration=60.00 at 0x7f57a80f1eb8> task#=4889 at 0x7f57ca7afeb8>, traceback: ------version 4.4.1 on red-01 on ten disks cycle : vdsm.log:2020-08-13 12:20:04,861-0400 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/6 running <Task <JsonRpcTask {'jsonrpc': '2.0', 'method': 'Volume.getInfo', 'params': {'storagepoolID': '9d21300c-9899-4d2d-a9b6-8541c172387e', 'storagedomainID': 'dbf3471d-6f3b-483a-bc63-e2ce95893fee', 'imageID': '63d82c0e-6335-4cd7-8582-a95e2d44e3ee', 'volumeID': '863a2d05-7708-4c6c-8940-ee4d5777c6fe'}, 'id': 'a98aff57-347b-4f12-aed2-fcdd7e2b2022'} at 0x7fa6e0514e80> timeout=60, duration=60.09 at 0x7fa6e0514160> task#=1061 at 0x7fa7001622b0>, traceback: vdsm.log:2020-08-13 12:21:04,778-0400 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/6 waiting task#=1062 at 0x7fa7001622b0>, traceback: vdsm.log:2020-08-13 12:22:13,932-0400 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/5 running <Task <JsonRpcTask {'jsonrpc': '2.0', 'method': 'Volume.getInfo', 'params': {'storagepoolID': '9d21300c-9899-4d2d-a9b6-8541c172387e', 'storagedomainID': 'dbf3471d-6f3b-483a-bc63-e2ce95893fee', 'imageID': '0faa615d-e62d-4ea3-9fec-ed06179ebd72', 'volumeID': '0e21188a-a99f-49b0-a781-ea10ad27d853'}, 'id': '2fb1ced1-5dc9-4f74-a813-0e2f7411881a'} at 0x7fa6e05166a0> timeout=60, duration=60.00 at 0x7fa6e0516cf8> task#=1065 at 0x7fa7001621d0>, traceback: vdsm.log:2020-08-13 12:23:13,933-0400 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/5 waiting task#=1066 at 0x7fa7001621d0>, traceback: vdsm.log.2.xz:2020-08-13 11:11:11,596+0000 WARN (jsonrpc/0) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'vgs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684a64$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name', '9844f287-6571-484f-a373-f89487b95ec1'] rc=5 err=[' Volume group "9844f287-6571-484f-a373-f89487b95ec1" not found.', ' Cannot process volume group 9844f287-6571-484f-a373-f89487b95ec1'] (lvm:530) vdsm.log.2.xz:2020-08-13 11:11:11,597+0000 WARN (jsonrpc/0) [storage.LVM] Marked vgs=[] as Unreadable due to reload failure (lvm:638) vdsm.log.2.xz:2020-08-13 11:14:35,158+0000 WARN (jsonrpc/3) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'vgs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684a64$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name', 'dbf3471d-6f3b-483a-bc63-e2ce95893fee'] rc=5 err=[' Volume group "dbf3471d-6f3b-483a-bc63-e2ce95893fee" not found.', ' Cannot process volume group dbf3471d-6f3b-483a-bc63-e2ce95893fee'] (lvm:530) vdsm.log.2.xz:2020-08-13 11:14:35,158+0000 WARN (jsonrpc/3) [storage.LVM] Marked vgs=[] as Unreadable due to reload failure (lvm:638) vdsm.log.2.xz:2020-08-13 11:15:03,563+0000 WARN (jsonrpc/1) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'vgs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684a64$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name', 'dbf3471d-6f3b-483a-bc63-e2ce95893fee'] rc=5 err=[' Volume group "dbf3471d-6f3b-483a-bc63-e2ce95893fee" not found.', ' Cannot process volume group dbf3471d-6f3b-483a-bc63-e2ce95893fee'] (lvm:530) vdsm.log.2.xz:2020-08-13 11:15:03,563+0000 WARN (jsonrpc/1) [storage.LVM] Marked vgs=[] as Unreadable due to reload failure (lvm:638) vdsm.log.2.xz:2020-08-13 07:19:54,853-0400 WARN (periodic/1) [throttled] MOM not available. Error: [Errno 111] Connection refused (throttledlog:104) vdsm.log.2.xz:2020-08-13 07:19:54,854-0400 WARN (periodic/1) [throttled] MOM not available, KSM stats will be missing. Error: (throttledlog:104) vdsm.log.2.xz:2020-08-13 07:19:59,231-0400 WARN (Reactor thread) [vds.dispatcher] unhandled write event (betterAsyncore:184) vdsm.log.2.xz:2020-08-13 07:20:14,124-0400 WARN (jsonrpc/3) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'vgs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684a64$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name', 'dbf3471d-6f3b-483a-bc63-e2ce95893fee'] rc=5 err=[' Volume group "dbf3471d-6f3b-483a-bc63-e2ce95893fee" not found.', ' Cannot process volume group dbf3471d-6f3b-483a-bc63-e2ce95893fee'] (lvm:530) vdsm.log.2.xz:2020-08-13 07:20:14,124-0400 WARN (jsonrpc/3) [storage.LVM] Marked vgs=[] as Unreadable due to reload failure (lvm:638) vdsm.log.2.xz:2020-08-13 07:20:16,741-0400 WARN (jsonrpc/2) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'vgs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684a64$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name', 'dbf3471d-6f3b-483a-bc63-e2ce95893fee'] rc=5 err=[' Volume group "dbf3471d-6f3b-483a-bc63-e2ce95893fee" not found.', ' Cannot process volume group dbf3471d-6f3b-483a-bc63-e2ce95893fee'] (lvm:530) vdsm.log.2.xz:2020-08-13 07:20:16,741-0400 WARN (jsonrpc/2) [storage.LVM] Marked vgs=[] as Unreadable due to reload failure (lvm:638)
Thanks Tzahi, 4 times faster and no warnings, looks good!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4.z [ovirt-4.4.2]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3822
Due to QE capacity, we are not going to cover this issue in our automation