Bug 1722434
| Summary: | tcmu-runner: Link against tcmalloc for improved small IO performance | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Xiubo Li <xiubli> |
| Component: | tcmu-runner | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> |
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | ocs-3.11 | CC: | asriram, knarra, kramdoss, pkarampu, pprakash, prasanna.kalever, puebele, rhs-bugs, rtalur, sabose, vbellur, xiubli, ykaul |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | OCS 3.11.z Batch Update 4 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | tcmu-runner-1.2.0-30.el7rhgs | Doc Type: | If docs needed, set a value |
| Doc Text: |
With this update, tcmu-runner and libtcmu are linked to the tcmalloc library instead of default glibc allocator, which provides improvements in the I/O performance.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-30 12:33:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Xiubo Li
2019-06-20 11:24:09 UTC
https://github.com/gluster/glusterfs/issues/237 was opened 2 years before in glusterfs too. Haven't proceeded further with that. Good time to re-visit it I guess. I am happy if the change is straight forward. I haven't analyzed the effort in glusterfs yet. But it is good to get started somewhere, and happy if gluster-block takes the first step. Performed the steps below to verify the bug: ============================================== 1) Setup OCS 3.11.3 2) created 4 pods with 4 block pvcs attached 3) created a file called random-data1.log inside /var/lib/origin/openshift.local.volumes/pods/d90cd57a-b41e-11e9-8c60-005056b2f12a/volumes/kubernetes.io~iscsi/pvc-<name> 4) Created an fio job file, attached the same here for reference. 5) Ran the job file 4 times by changing the path of file name to different pvcs with 512, 1k, 4k and 8k. 1) Setup OCS 3.11.4 2) created 4 pods with 4 block pvcs attached 3) created a file called random-data1.log inside /var/lib/origin/openshift.local.volumes/pods/d90cd57a-b41e-11e9-8c60-005056b2f12a/volumes/kubernetes.io~iscsi/pvc-<name> 4) Created an fio job file, attached the same here for reference. 5) Ran the job file 4 times by changing the path of file name to different pvcs with 512, 1k, 4k and 8k. Results are as below for 3.11.3 & 3.11.4 setups : ================================================ 512 +++++++++++++++++++ READ: bw=181KiB/s (185kB/s), 181KiB/s-181KiB/s (185kB/s-185kB/s), io=50.0MiB (52.5MB), run=283357-283357msec WRITE: bw=181KiB/s (185kB/s), 181KiB/s-181KiB/s (185kB/s-185kB/s), io=49.0MiB (52.4MB), run=283357-283357msec Run status group 0 (all jobs): READ: bw=287KiB/s (294kB/s), 287KiB/s-287KiB/s (294kB/s-294kB/s), io=50.0MiB (52.5MB), run=178424-178424msec WRITE: bw=287KiB/s (294kB/s), 287KiB/s-287KiB/s (294kB/s-294kB/s), io=49.0MiB (52.4MB), run=178424-178424msec Improvement in performance => 58.5 % 1K +++++++++++++++++ READ: bw=275KiB/s (282kB/s), 275KiB/s-275KiB/s (282kB/s-282kB/s), io=50.1MiB (52.5MB), run=186220-186220msec WRITE: bw=274KiB/s (281kB/s), 274KiB/s-274KiB/s (281kB/s-281kB/s), io=49.9MiB (52.3MB), run=186220-186220msec READ: bw=653KiB/s (669kB/s), 653KiB/s-653KiB/s (669kB/s-669kB/s), io=50.1MiB (52.5MB), run=78553-78553msec WRITE: bw=651KiB/s (666kB/s), 651KiB/s-651KiB/s (666kB/s-666kB/s), io=49.9MiB (52.3MB), run=78553-78553msec Improvement in performance => 137 % 4K: ++++++++++++++++++++++++ READ: bw=2642KiB/s (2705kB/s), 2642KiB/s-2642KiB/s (2705kB/s-2705kB/s), io=50.0MiB (52.4MB), run=19381-19381msec WRITE: bw=2642KiB/s (2705kB/s), 2642KiB/s-2642KiB/s (2705kB/s-2705kB/s), io=50.0MiB (52.4MB), run=19381-19381msec READ: bw=7950KiB/s (8141kB/s), 7950KiB/s-7950KiB/s (8141kB/s-8141kB/s), io=50.0MiB (52.4MB), run=6440-6440msec WRITE: bw=7950KiB/s (8141kB/s), 7950KiB/s-7950KiB/s (8141kB/s-8141kB/s), io=50.0MiB (52.4MB), run=6440-6440msec Improvement in performance => 200% 8K: ++++++++++++++++++++++++++ READ: bw=2979KiB/s (3050kB/s), 2979KiB/s-2979KiB/s (3050kB/s-3050kB/s), io=49.9MiB (52.3MB), run=17136-17136msec WRITE: bw=2997KiB/s (3069kB/s), 2997KiB/s-2997KiB/s (3069kB/s-3069kB/s), io=50.1MiB (52.6MB), run=17136-17136msec READ: bw=13.1MiB/s (13.7MB/s), 13.1MiB/s-13.1MiB/s (13.7MB/s-13.7MB/s), io=49.9MiB (52.3MB), run=3812-3812msec WRITE: bw=13.2MiB/s (13.8MB/s), 13.2MiB/s-13.2MiB/s (13.8MB/s-13.8MB/s), io=50.1MiB (52.6MB), run=3812-3812msec Improvement in performance => 351 % 3.11.3 : =============== [root@dhcp47-16 ~]# oc exec -it glusterfs-storage-cd9f5 bash [root@dhcp47-58 /]# ldd /usr/bin/tcmu-runner | grep libtcmalloc [root@dhcp47-58 /]# ldd /usr/lib64/libtcmu.so.1 | grep libtcmalloc [root@dhcp47-58 /]# ldd /usr/lib64/tcmu-runner/handler_glfs.so | grep libtcmalloc 3.11.4: ============= [root@dhcp46-78 ~]# oc exec -it glusterfs-storage-9nd5k bash [root@dhcp47-66 /]# ldd /usr/bin/tcmu-runner | grep libtcmalloc libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007fef9d23e000) [root@dhcp47-66 /]# ldd /usr/lib64/libtcmu.so.1 | grep libtcmalloc libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007facfb81f000) [root@dhcp47-66 /]# ldd /usr/lib64/tcmu-runner/handler_glfs.so | grep libtcmalloc libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007fbcc55e6000) @prasanna, below are the numbers i see when i run fio workload here , should we wait for elvirs tests as well to move the bug to verified state ? Numbers look really interesting and exciting !!! That is a lot of improvement. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3256 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |