Description of problem: This is a request to replace the 'dd' version of VDSM with that of the sg3-utils version Version-Release number of selected component (if applicable): vdsm-4.16.8.1-5.el6ev.x86_64 sg3_utils-1.28-6.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. created a 200GB pre-allocated disk in the same storage domain 2. did a mock up export from the san (source qcow2 thin disk) / to the san ( dest pre-alloc/raw disk) using qemu-img: [root@localhost]# export TIMEFORMAT="%3R" [root@localhost]# time /usr/bin/qemu-img convert -t none -f qcow2 d4ebc4c0-0092-4f60-8979-2d6c241cc7ef -O qcow2 f22756f4-129a-434c-84a2-8a5f0aa1e10c 966.510 <-- Time elapsed from execution 3. repeated the same test using dd (1mb buffer size, direct io read+fullblock, direct io write) [root@localhost]# time dd if=d4ebc4c0-0092-4f60-8979-2d6c241cc7ef iflag=direct,fullblock of=f22756f4-129a-434c-84a2-8a5f0aa1e10c oflag=direct bs=1M 165888+0 records in 165888+0 records out 173946175488 bytes (174 GB) copied, 824.455 s, 211 MB/s 4. repeated the same test using dd and block sized aligned to the PE size of the volume (32MB) [root@localhost]# dd if=d4ebc4c0-0092-4f60-8979-2d6c241cc7ef iflag=direct,fullblock of=f22756f4-129a-434c-84a2-8a5f0aa1e10c oflag=direct bs=32M 5184+0 records in 5184+0 records out 173946175488 bytes (174 GB) copied, 587.914 s, 296 MB/s 5. repeated the same test using sgp_dd, 12 threads and direct IO (32mb blocks) [root@localhost 809c8fb5-400e-4648-bc86-1e4b6dd76bda]# sgp_dd time=1 thr=12 if=d4ebc4c0-0092-4f60-8979-2d6c241cc7ef iflag=direct of=f22756f4-129a-434c-84a2-8a5f0aa1e10c oflag=direct bpt=65535 Assume default 'bs' (block size) of 512 bytes time to transfer data was 568.471145 secs, 305.99 MB/sec 339738624+0 records in 339738624+0 records out In Summary: qemu-img -t none : 16.1 minutes dd 1mb block (O_DIRECT) : 13.73 minutes dd 32mb block (O_DIRECT) : 9.78 minutes sgp_dd 32mb block (O_DIRECT / 12thread) : 9.46 minutes
(In reply to Simon Sekidde from comment #0) > 4. repeated the same test using dd and block sized aligned to the PE size of > the volume (32MB) > > [root@localhost]# dd if=d4ebc4c0-0092-4f60-8979-2d6c241cc7ef > iflag=direct,fullblock of=f22756f4-129a-434c-84a2-8a5f0aa1e10c oflag=direct > bs=32M > 5184+0 records in > 5184+0 records out > 173946175488 bytes (174 GB) copied, 587.914 s, 296 MB/s > > 5. repeated the same test using sgp_dd, 12 threads and direct IO (32mb > blocks) > > [root@localhost 809c8fb5-400e-4648-bc86-1e4b6dd76bda]# sgp_dd time=1 thr=12 > if=d4ebc4c0-0092-4f60-8979-2d6c241cc7ef iflag=direct > of=f22756f4-129a-434c-84a2-8a5f0aa1e10c oflag=direct bpt=65535 > Assume default 'bs' (block size) of 512 bytes > time to transfer data was 568.471145 secs, 305.99 MB/sec > 339738624+0 records in > 339738624+0 records out According to this, there is no significant difference between sg_dd and dd. Nobody will ever tell the difference between 587 and 568 seconds. What is the difference between sg_dd and dd when copying multiple disks in the same time?
- While I assume sgp_dd could be faster than 'dd', we need to account for the CPU usage as well - I don't see this above. - For the usual case of 'clone' VM, I prefer we invest in XCOPY support (via ddpt?)