Description of problem: When using either the 'rbd export' or 'rbd export-diff' commands with the '--rbd-concurrent-management-ops' flag no impact on performance is observed. Sample output from current attempt: # snapshot created for initial export $ rbd snap ls rbd/nas7-rds-stg1 SNAPID NAME SIZE 1106 161224-0935 102400 GB $ rbd info rbd/foobar@161224-0935 rbd image 'foobar': size 102400 GB in 26214400 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.5f059d238e1f29 format: 2 features: layering flags: protected: False $ rbd export-diff --rbd-concurrent-management-ops 50 rbd/foobar@161224-0935 - | pv | rbd import-diff --rbd-concurrent-management-ops 50 - expandtest/foobar Importing image diff: 1% complete... 1TiB 7:39:35 [37.2MiB/s] Importing image diff: 2% complete... 2TiB 19:53:12 [36.7MiB/s] Importing image diff: 3% complete... 3TiB 27:30:54 [43.2MiB/s] Importing image diff: 4% complete... 4TiB 34:33:08 [7.97MiB/s] Importing image diff: 5% complete... 5TiB 41:44:14 [42.4MiB/s] Version-Release number of selected component (if applicable): 0.94.x (will request specific ceph -v) How reproducible: Always Steps to Reproduce: 1. Create snapshot from an rbd 2. Export/import and specify the '--rbd-concurrent-management-ops' flag, for example: rbd export-diff --rbd-concurrent-management-ops 50 rbd/foobar@161224-0935 - | pv | rbd import-diff --rbd-concurrent-management-ops 50 - expandtest/foobar Actual results: Slow performance when performing an rbd export/export-diff Expected results: Performance should be higher when '--rbd-concurrent-management-ops' is set to a high value. A ceph cluster should have a ton of parallelism available to it. Additional info: * Providing fio profile and job output which exhibits the concurrency expected in the cluster vs the image diff import above. * Debug data is en route * It's also worth noting that if 'export-diff' truly does not support 'rbd-concurrent-management-ops' even though it appears to be present in the code: ceph/src/tools/rbd/action/ExportDiff.cc: 168 ExportDiffContext edc(&image, fd, info.size, 169 g_conf->rbd_concurrent_management_ops, no_progress); 170 r = image.diff_iterate2(fromsnapname, 0, info.size, true, whole_object, 171 &C_ExportDiff::export_diff_cb, (void *)&edc); The customer has also seen the same behavior on just 'export'.
Created attachment 1238092 [details] brief sequential rbd write & read fio job file
Created attachment 1238093 [details] fio job output against rbds in the pool we are exporting from
I would suggest adding "--debug-rbd 20" to the rbd CLI and utilize the "librbd::io::AioCompletion" logs to see if more than one instance of an AIO message is in-flight concurrently.
Thank you Jason! Verified in ceph version 12.2.0-1.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387