Created attachment 1433995 [details] logs Description of problem: There is a significant time difference for disk creation on NFS between thin provision and preallocated (same domain and same disk size) Preallocated takes ~x50 times more to complete than thin. Version-Release number of selected component (if applicable): rhvm-4.2.3.3-0.1.el7.noarch vdsm-4.20.26-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a 30G preallocated disk on NFS domain 2. Create a 30G thin disk on the same NFS domain 3. Compare the time it takes to each operation to complete Actual results: Thin disk creation takes less than 10 seconds: 2018-05-09 21:11:53,313+0300 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='df02ef23-2e77-4fd3-b6ce-0f3662ab5aac', spUUID='7dc6ec1e-4287-47aa-aa7b-bf2904dc9447', imgUUID='2df0c4a4-fd8e-48ff-b05a-1b36e74 572e1', size='32212254720', volFormat=5, preallocate=2, diskType='DATA', volUUID='2ebb5e2b-a884-4801-9935-29b2ee93a980', desc='{"DiskAlias":"vm2_Disk1","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000 -000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.83.141,54610, flow_id=c1775700-dd64-4bb8-a18d-99e7d20b263c, task_id=5179af2a-a113-4a0a-aff6-e721233bd5de (ap i:46) 2018-05-09 21:12:01,295+0300 INFO (jsonrpc/1) [vdsm.api] START clearTask(taskID='5179af2a-a113-4a0a-aff6-e721233bd5de', spUUID=None, options=None) from=::ffff:10.35.83.141,43286, flow_id=c1775700-dd64-4bb8-a18d -99e7d20b263c, task_id=ceb8fd8b-812b-48ee-bac9-80738b524888 (api:46) Preallocated takes almost 8 minutes: 2018-05-09 21:17:01,859+0300 INFO (jsonrpc/0) [vdsm.api] START createVolume(sdUUID='df02ef23-2e77-4fd3-b6ce-0f3662ab5aac', spUUID='7dc6ec1e-4287-47aa-aa7b-bf2904dc9447', imgUUID='1b88078a-d5ed-4b36-b763-b31d530ec1ee', size='32212254720', volFormat=5, preallocate=1, diskType='DATA', volUUID='7de9592a-b587-487e-9a76-06fd475a4154', desc='{"DiskAlias":"vm3_Disk1","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.83.141,54610, flow_id=5096f703-51dd-4d7d-b05b-63cfa363ee93, task_id=b7aac5d6-6d74-46f3-9c45-af8251b8b296 (api:46) 2018-05-09 21:24:49,578+0300 INFO (jsonrpc/2) [vdsm.api] START clearTask(taskID='b7aac5d6-6d74-46f3-9c45-af8251b8b296', spUUID=None, options=None) from=::ffff:10.35.83.141,43286, flow_id=5096f703-51dd-4d7d-b05b -63cfa363ee93, task_id=0dadf570-262d-4463-9ab5-0bc00a825e18 (api:46) Expected results: There should not be a time difference for disk creation on NFS between preallocated and thin Additional info: logs
Which NFS version? Is that a regression? What's the storage backend? What's the network bandwidth? I don't remember if we moved to fallocate already or not. Are you seeing a long dd command? Why do you think that should not be a difference? (I agree 30gb/8m is very slow btw ~64MBps - like a regular, slow, single HDD?)
4.2.3 should include fallocate which means there should be a difference yet not that big
(In reply to Yaniv Kaul from comment #1) > Which NFS version? v3 > Is that a regression? This difference is noticeable also with 4.1.11 (vdsm-4.19.51-1.el7ev.x86_64) > What's the storage backend? RHEL NFS server with its exposed file system resides on LUN from XtremIO > What's the network bandwidth? 10Gbps all the way > I don't remember if we moved to fallocate already or not. Are you seeing a > long dd command? > Haven't seen this, maybe I have to check with DEBUG log level. > Why do you think that should not be a difference? Because it's NFS and for preallocated the volume that's being created should be a few Ks file > (I agree 30gb/8m is very slow btw ~64MBps - like a regular, slow, single > HDD?) It's 10Gbps Something I've observed is that the request for RAW volume creation comes only after few minutes from task creation: 2018-05-09 21:17:01,859+0300 INFO (jsonrpc/0) [vdsm.api] START createVolume(sdUUID='df02ef23-2e77-4fd3-b6ce-0f3662ab5aac', spUUID='7dc6ec1e-4287-47aa-aa7b-bf2904dc9447', imgUUID='1b88078a-d5ed-4b36-b763-b31d530ec1ee', size='32212254720', volFormat=5, preallocate=1, diskType='DATA', volUUID='7de9592a-b587-487e-9a76-06fd475a4154', desc='{"DiskAlias":"vm3_Disk1","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.83.141,54610, flow_id=5096f703-51dd-4d7d-b05b-63cfa363ee93, task_id=b7aac5d6-6d74-46f3-9c45-af8251b8b296 (api:46) 2018-05-09 21:24:39,407+0300 INFO (tasks/2) [storage.Volume] Request to create RAW volume /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_dr_secondary__0/df02ef23-2e77-4fd3-b6ce-0f3662ab5aac/images/1b88078a-d5ed-4b36-b763-b31d530ec1ee/7de9592a-b587-487e-9a76-06fd475a4154 with size = 62914560 sectors (fileVolume:462)
Elad - can you try, against the same storage, only with NFSv4.2? I was just reminded of https://bugzilla.redhat.com/show_bug.cgi?id=1391859 which indeed moved to falloc - which is unfortunately inefficient on NFSv3.
Indeed, on NFSv4.2 it takes few seconds (30G preallocated): 2018-05-15 17:22:16,530+0300 INFO (jsonrpc/5) [vdsm.api] START createVolume(sdUUID=u'83b7ff43-1a74-47cf-9a92-1397a2186739', spUUID=u'29379f0d-7cf0-4b35-a5e1-7bbc71b2f192', imgUUID=u'69a1b106-70e9-4a15-a402-4fcf658d10b4', size=u'32212254720', volFormat=5, preallocate=1, diskType=u'DATA', volUUID=u'b0ed34ad-9c11-4135-a428-4346baa34dcb', desc=u'{"DiskAlias":"test-42_Disk1","DiskDescription":""}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.161.53,36620, flow_id=7f23d9ef-63fa-4b05-8767-57751f417f43, task_id=37a56dd6-5bbe-49fd-87e7-3b3deb6c694b (api:46) 2018-05-15 17:22:18,856+0300 INFO (jsonrpc/6) [vdsm.api] START clearTask(taskID=u'37a56dd6-5bbe-49fd-87e7-3b3deb6c694b', spUUID=None, options=None) from=::ffff:10.35.161.53,36610, flow_id=7f23d9ef-63fa-4b05-876 7-57751f417f43, task_id=16018853-d7e1-4ba2-9ee7-5ac743f3b452 (api:46) On NFSv4 it takes few minutes, the same as on NFSv3
Created attachment 1436804 [details] NFSv4.2
Elad - thanks. So now the real question is - are we slower than we've used to be on NFSv3? Is there a regression, perhaps platform, that slows us down? Do you have to have a 3.6 with 7.3.z that we can test this with?
Created attachment 1440279 [details] 3.6 On 3.6.13 with RHEL7.3 host, 30G preallocated disk creation takes a bit more than 1 minute. rpc.Executor/3::DEBUG::2018-05-22 21:25:49,110::task::595::Storage.TaskManager.Task::(_updateState) Task=`4724146c-6b21-4c7a-8810-651d6084ec1c`::moving from state init -> state preparing jsonrpc.Executor/3::INFO::2018-05-22 21:25:49,110::logUtils::48::dispatcher::(wrapper) Run and protect: createVolume(sdUUID=u'2a5972bb-633d-4803-9b9d-767d3a248393', spUUID=u'386aa74c-00f4-4868-8a24-c74e7dfd9803', imgUUID=u'8339ff9f-583f-4ef9-bf10-5fcb33461fd8', size=u'32212254720', volFormat=5, preallocate=1, diskType=2, volUUID=u'26abe0d7-266c-45d9-95ab-3e3e8593707e', desc=u'{"DiskAlias":"test-elad_Disk1","DiskDescription":""}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) jsonrpc.Executor/3::DEBUG::2018-05-22 21:27:06,450::taskManager::171::Storage.TaskManager::(clearTask) Entry. taskID: 4724146c-6b21-4c7a-8810-651d6084ec1c Netapp as storage backend rhevm-3.6.13.2-0.1.el6.noarch vdsm-4.17.45-1.el7ev.noarch libvirt-daemon-2.0.0-10.el7_3.12.x86_64 sanlock-3.4.0-1.el7.x86_64 nfs-utils-1.3.0-0.33.el7_3.2.x86_64 kernel - 3.10.0-514.48.5.el7.x86_64
Can we just do the 'dd' command on both, to ensure it's platform? This is really going to help in closing in, I hope.
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Eyal please have a look and see it it's related to the latest changes introduced in preallocation on file domains
(In reply to Tal Nisan from comment #12) > Eyal please have a look and see it it's related to the latest changes > introduced in preallocation on file domains If you are referring to the changes I did, they only take place when copying/moving file-based preallocated disks. No changes have been made on the disk creation.
Is it still relevant?
(In reply to Eyal Shenitzky from comment #14) > Is it still relevant? Yep, the issue is still here. Tested on latest 4.3.5.1-0.1/vdsm-4.30.20-1.el7ev.x86_64 30G preallocated disk on NFS SD takes 8 minutes and 25 seconds(!) 30G thin disk on NFS SD took 8 seconds Details(logs attached): 30G nfs SD preallocated disk details: Disk creation start: EVENT: 2019-06-26 09:58:59,515+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [203e9cb1-6703-4ada-ab62-ef83f4cfd508] EVENT_ID: USER_ADD_DISK(2,020), Add-Disk operation of 'prealloc_30G_disk_nfs' was initiated by admin@internal-authz. VDSM log start task : 2019-06-26 09:58:59,241+0300 INFO (jsonrpc/1) [vdsm.api] START createVolume(sdUUID=u'46dca920-979d-4936-bf8e-9764930f0fcb', spUUID=u'c4f011da-845a-46c4-b8e5-5dc40a0d25a9', imgUUID=u'f6f47f96-e8c8-4841-b9eb-d8dbb349c507', size=u'32212254720', volFormat=5, preallocate=1, diskType=u'DATA', volUUID=u'33b9c3db-a4c4-40c4-aadd-a7e1d21b6c8c', desc=u'{"DiskAlias":"prealloc_30G_disk_nfs","DiskDescription":""}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.161.118,45048, flow_id=203e9cb1-6703-4ada-ab62-ef83f4cfd508, task_id=3101b599-1136-4546-9343-0c616682b5a7 (api:48) Disk creation End: Event: 2019-06-26 10:07:24,981+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [] EVENT_ID: USER_ADD_DISK_FINISHED_SUCCESS(2,021), The disk 'prealloc_30G_disk_nfs' was successfully added. VDSM log task end: 2019-06-26 10:07:21,686+0300 INFO (jsonrpc/5) [vdsm.api] START clearTask(taskID=u'3101b599-1136-4546-9343-0c616682b5a7', spUUID=None, options=None) from=::ffff:10.35.161.118,45028, flow_id=203e9cb1-6703-4ada-ab62-ef83f4cfd508, task_id=5bf5c67a-21da-4ef2-8e3c-fe427c10b759 (api:48) 30G nfs SD thin disk details: EVENT: 2019-06-26 10:12:47,798+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [91322de0-6915-4b07-9a11-6ddb6cb6197b] EVENT_ID: USER_ADD_DISK(2,020), Add-Disk operation of '30G_thin_nfs_disk' was VDSM log start task : 2019-06-26 10:12:47,544+0300 INFO (jsonrpc/1) [vdsm.api] START createVolume(sdUUID=u'46dca920-979d-4936-bf8e-9764930f0fcb', spUUID=u'c4f011da-845a-46c4-b8e5-5dc40a0d25a9', imgUUID=u'577517ed-d070-4d80-ae9f-119469399bfa', size=u'32212254720', volFormat=5, preallocate=2, diskType=u'DATA', volUUID=u'aac36fa4-780c-441f-9a6a-9fd56be6778f', desc=u'{"DiskAlias":"30G_thin_nfs_disk","DiskDescription":""}', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) from=::ffff:10.35.161.118,45048, flow_id=91322de0-6915-4b07-9a11-6ddb6cb6197b, task_id=f1a57d4a-23de-4d34-a4ec-bf23543855f4 (api:48) EVENT: initiated by admin@internal-authz. 2019-06-26 10:12:55,277+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-85) [] EVENT_ID: USER_ADD_DISK_FINISHED_SUCCESS(2,021), The disk '30G_thin_nfs_disk' was successfully added. VDSM log task end: 2019-06-26 10:12:51,749+0300 INFO (jsonrpc/2) [vdsm.api] START clearTask(taskID=u'f1a57d4a-23de-4d34-a4ec-bf23543855f4', spUUID=None, options=None) from=::ffff:10.35.161.118,45028, flow_id=91322de0-6915-4b07-9a11-6ddb6cb6197b, task_id=ab65bb78-aca1-4c92-8875-95eb67535967 (api:48)
Created attachment 1584636 [details] 4.3.5_logs
Can you please explain where is the bug? When creating a preallocated disk we have an allocation operation (falloc), this operation doesn't take place when creating a thin-provisioned disk. No other changed was made except changing the allocation method from 'dd' to 'falloc' - which is faster (done in 5/2017).
The bug is as in the original description says: "NFS between thin provision and preallocated (same domain and same disk size) Preallocated takes ~x50 times more to complete than thin." For me/user/customer it is pretty clear the performance difference is huge. If you think this is not a bug please close with a proper explanation of why this is OK. I'm sure we will get this issue from the customer so explain if you are closing give a good explanation why this is OK or why we can not fix it.
My findings from several microbenchmarks on Fedora 30 is that there is an exponential increase in the time it takes to write a file of 1G to 10 Gigs on v3. If we have a real slow down here for v3 and/or v4.X lets consider alternatives, like using fallocate and not posix_fallocate or dropping back to dd on v3 only. Of course this is all with thought of how common is to see deployments using v3, and how many deployments would be affected by this. What's needed is QE conduct comparison of posix_fallocate vs dd and for file sizes 1, 2, 5, 10 Gb and see the difference. The target OS to test this on should be matching what VDSM is deployed on and I prefer to use 2 machines, an nfs server and a client, and not local test like I did. My test setup is F30. I created 2 folders, nfstest/src and nfstest/dst I exported src, and mount to dst with --options ver=3 and from there I wrote files to dst. = 1GB - 6sec = [rgolan ~/nfstest] 09:35 $ time fallocate -v -x -l $(( 1024 * 1000 * 1000 )) dst/foo.file real 0m6.360s user 0m0.073s sys 0m0.737s --------------- = 2GB - 8sec = [rgolan ~/nfstest] 09:37 $ rm dst/foo.file [rgolan ~/nfstest] 09:37 $ time fallocate -v -x -l $(( 1024 * 1000 * 1000 * 2 )) dst/foo.file real 0m8.746s user 0m0.212s sys 0m1.626s ----------------- = 10GB - 2m52 = [rgolan ~/nfstest] 09:38 $ time fallocate -v -x -l $(( 1024 * 1000 * 1000 * 10 )) dst/foo.file real 2m52.446s user 0m1.128s sys 0m8.203s -----------------
As you suspect that this is due to fallocate Vs dd than I checked on a VDSM host with a mount to an NFS storage domain and indeed it looks fallocate is much faster(X6 times more fast) TL;DR : 1G dd took 65.47 sec and fallocate took 12.36 sec (about 6 times more) 10G dd took 629.033 sec and fallocate took 127 sec (about 5-6 times more) Details on what I did : On an existing RHV system with a VDSM host connected to an NFS server as an NFS storage domain. I connected to the VDSM host itself via SSH. Looking at the host mounts as expected for a connected storage domain there is an existing NFS mount to NFS(as the storage domain mount) with the following details: yellow-vdsb.qa.lab.tlv.redhat.com:/Storage_NFS/storage_local_ge8_nfs_2 on /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2 type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=10.35.80.5,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=10.35.80.5) On that mount I created a new directories "dst" for falloc and "dst_dd" for dd. Then I tested creating a 1G and 10G files with falloc and with dd. *** fallocate results *** *1G: [root@storage-ge8-vdsm1 ~]# time fallocate -v -x -l $(( 1024 * 1000 * 1000 )) /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/dst/foo.file real 0m12.369s user 0m0.215s sys 0m8.438s *10G: [root@storage-ge8-vdsm1 ~]# time fallocate -v -x -l $((10* 1024 * 1000 * 1000 )) /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/dst/10G_foo.file real 2m5.539s user 0m2.065s sys 1m20.993s *** DD results *** I used fallocate to create the 1G and 10G local(on VDSM host) files with the following commands: 1G local file: fallocate -v -x -l $(( 1024 * 1000 * 1000 )) foo.file 10G local file: fallocate -v -x -l $((10*1024 * 1000 * 1000 )) 10G_foo.file *1G [root@storage-ge8-vdsm1 ~]# time dd if=foo.file of=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/dst_dd/foo.file1G 2000000+0 records in 2000000+0 records out 1024000000 bytes (1.0 GB) copied, 65.4706 s, 15.6 MB/s real 1m5.486s user 0m4.872s sys 0m42.841s *10G [root@storage-ge8-vdsm1 ~]# time dd if=10G_foo.file of=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/dst_dd/10G_foo.file 20000000+0 records in 20000000+0 records out 10240000000 bytes (10 GB) copied, 629.033 s, 16.3 MB/s real 10m29.058s user 0m46.921s sys 7m9.550s
Roy, It looks like Falloc is faster than DD in all cases. There is a drawback that caused Falloc and DD to be slower for disks which is bigger then ~10gig. Still, Falloc is faster than DD for 10gig disks. How do you suggest to proceed?
(In reply to Elad from comment #0) > Description of problem: > There is a significant time difference for disk creation on NFS between thin > provision and preallocated (same domain and same disk size) > Preallocated takes ~x50 times more to complete than thin. This is expected on NFS < 4.2. > Steps to Reproduce: > 1. Create a 30G preallocated disk on NFS domain > 2. Create a 30G thin disk on the same NFS domain > 3. Compare the time it takes to each operation to complete > > > Actual results: > > Thin disk creation takes less than 10 seconds: > Preallocated takes almost 8 minutes: Yes, this is expected on NFS < 4.2. > Expected results: > There should not be a time difference for disk creation on NFS between > preallocated and thin No, this is not the expected behavior. This exception is completely unreal. There is no bug here.
Re-targeting to 4.4.3 being considered a bug in current release.