As we know, we always tried to improve Smallfile performance in glusterfs.After doing a long duration testing we have found a client process is having huge lock contention on iobuf_pool->mutex and in case of configure high number of reader-thread-count also the contention is increased significantly.Currently we do allocate 128 bytes even requested size is very less and for every requests first it takes a lock and the lock is implemented at process level so it has huge contention in case of configure (event|reader) threads. Solution: We have decided to skip iobuf if requested size is less than 128k and use available pool(glibc|tcmalloc), these malloc pools are thread based so after call directly api we are getting significant performance improvement. The complete data is available at the link https://github.com/gluster/glusterfs/issues/2771 I have executed a smallfile perf test case for 4.8M files 64K (more than 20 times on usual day operation) on latest devel branch.I have setup 12x3(nvme) volume after configure 4 event threads on 12 physical machines, there is no other configurable option i have enabled. Hardware details: 12 physical machines (6 client, 6 servers) Every machine is having 64 CPU 32G RAM, 10g NIC Run smallfile tool to run operations and total number of files are 4.8M date ; for i in {1..5} do /root/cleanup.sh;./smallfile_cli.py --operation create --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation ls-l --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation chmod --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation stat --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation read --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation append --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation mkdir --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation rmdir --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud /root/cleanup.sh;./smallfile_cli.py --operation cleanup --threads 16 --file-size 64 --files 50000 --top /mnt/test --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud done date;
The upstream patch is to resolve the same https://github.com/gluster/glusterfs/pull/2792
This patch[0] for using tcmalloc instead of mempool, allows the glusterfs to utilize 'tcmalloc' while moving away from the traditional 'mempool' used by glusterfs for long time. This brings in a new underlying feature which is transparent to the customers, but a significant enough change of managing the memory allocation. So this change falls under RFE as the core memory management/allocation methodology is swapped with a better memory allocation methodology optimized for performance, though there is no change exposed to the user level. As per our 'RHGS and Layered Product batch update model -2.2[1], we agreed not to include any RFE as part of maintenance release. Also, this patch is useful for small file workloads of glusterfs. Small file workloads doesn't suit well with glusterfs, and this fact was well advertised to our customers. Providing a solution to this problem is really good, but this will come as a added advantage for our existing customers who were using small file workloads against the RHGS general recommendation. This has a direct customer impact. Irrespective of the workload, definitely it will improve performance for any workload classifications but still there is a risk involved with the unknown. On the other hand, the patch is merged on Sep 17,2021 and haven't made in to any of the glusterfs upstream release so far, as I checked. The logic here is we would need good amount of soak time in the community, before we could make decision about including the patch in RHGS 3.5.z. With this thought, I would like to retarget this bug for RHGS 3.5.8 and revisit the this bug with that relevant information. This patch on iobuf is required along with the tcmalloc patch @Mohit, @Sunil - What do you suggest ? [0] - https://github.com/gluster/glusterfs/pull/2787 [1] - https://docs.google.com/document/d/1KvdyoI8-BNJJuADBkN0OVTW4sMZ4dX6LqxUXPJtK8ME/edit
- I've sent Sat few of the upstream performance results to show the significance of the changes - double digit percentage, all around, performance improvements, for small file workloads, dispreseed (CVLT) and sharded workloads (RHHI-v). - It does meet the following inclusion critera: - Has a direct customer impact - Has a reasonable business justification that is easily understood by reviewers The code will be in Gluster 10, which is coming out soon, and will have enough soak time there before inclusion in 3.5.7. I see no point on re-targeting. By getting it in now, we can build it sooner and have more time in QE as well. One additional item to consider - I'm concerned about more upstream changes that will come later, after Gluster 10, making backport more challenging.
(In reply to SATHEESARAN from comment #2) > This patch[0] for using tcmalloc instead of mempool, allows the glusterfs to > utilize 'tcmalloc' while moving away from the traditional 'mempool' used by > glusterfs for long time. > This brings in a new underlying feature which is transparent to the > customers, but a significant enough change of managing the memory > allocation. So this change falls under RFE as the core memory > management/allocation methodology is swapped with a better memory allocation > methodology optimized for performance, though there is no change exposed to > the user level. > > As per our 'RHGS and Layered Product batch update model -2.2[1], we agreed > not to include any RFE as part of maintenance release. > Also, this patch is useful for small file workloads of glusterfs. Small file > workloads doesn't suit well with glusterfs, and > this fact was well advertised to our customers. Providing a solution to this > problem is really good, but this will come as a > added advantage for our existing customers who were using small file > workloads against the RHGS general recommendation. This has a > direct customer impact. Irrespective of the workload, definitely it will > improve performance for any workload classifications > but still there is a risk involved with the unknown. > > On the other hand, the patch is merged on Sep 17,2021 and haven't made in to > any of the glusterfs upstream release so far, as I checked. > The logic here is we would need good amount of soak time in the community, > before we could make decision about including the > patch in RHGS 3.5.z. > > With this thought, I would like to retarget this bug for RHGS 3.5.8 and > revisit the this bug with that relevant information. > > This patch on iobuf is required along with the tcmalloc patch > > @Mohit, @Sunil - What do you suggest ? > > [0] - https://github.com/gluster/glusterfs/pull/2787 > [1] - > https://docs.google.com/document/d/1KvdyoI8- > BNJJuADBkN0OVTW4sMZ4dX6LqxUXPJtK8ME/edit The patch changes in the common code path, the code path will hit by every operation.I don't think we need to wait for upstream result, in most of the cases either performance will improve or same but the user will not face any regression so i believe we should consider this.On daily perf result we are getting impressive result.To analyze the improvement QE needs to run perf test cases otherwise there is not specific test case required to validate the patch. Thanks, Mohit Agrawal
(In reply to Mohit Agrawal from comment #4) > (In reply to SATHEESARAN from comment #2) > > This patch[0] for using tcmalloc instead of mempool, allows the glusterfs to > > utilize 'tcmalloc' while moving away from the traditional 'mempool' used by > > glusterfs for long time. > > This brings in a new underlying feature which is transparent to the > > customers, but a significant enough change of managing the memory > > allocation. So this change falls under RFE as the core memory > > management/allocation methodology is swapped with a better memory allocation > > methodology optimized for performance, though there is no change exposed to > > the user level. > > > > As per our 'RHGS and Layered Product batch update model -2.2[1], we agreed > > not to include any RFE as part of maintenance release. > > Also, this patch is useful for small file workloads of glusterfs. Small file > > workloads doesn't suit well with glusterfs, and > > this fact was well advertised to our customers. Providing a solution to this > > problem is really good, but this will come as a > > added advantage for our existing customers who were using small file > > workloads against the RHGS general recommendation. This has a > > direct customer impact. Irrespective of the workload, definitely it will > > improve performance for any workload classifications > > but still there is a risk involved with the unknown. > > > > On the other hand, the patch is merged on Sep 17,2021 and haven't made in to > > any of the glusterfs upstream release so far, as I checked. > > The logic here is we would need good amount of soak time in the community, > > before we could make decision about including the > > patch in RHGS 3.5.z. > > > > With this thought, I would like to retarget this bug for RHGS 3.5.8 and > > revisit the this bug with that relevant information. > > > > This patch on iobuf is required along with the tcmalloc patch > > > > @Mohit, @Sunil - What do you suggest ? > > > > [0] - https://github.com/gluster/glusterfs/pull/2787 > > [1] - > > https://docs.google.com/document/d/1KvdyoI8- > > BNJJuADBkN0OVTW4sMZ4dX6LqxUXPJtK8ME/edit > > The patch changes in the common code path, the code path will hit by > every operation.I don't think we need to wait for upstream result, in most > of the cases either performance will improve or same but the user will not > face any regression so i believe we should consider this.On daily perf result > we are getting impressive result.To analyze the improvement QE needs to > run perf test cases otherwise there is not specific test case required > to validate the patch. > > Thanks, > Mohit Agrawal Hi Mohit, Thanks for your input. QE has stopped performing any performance oriented tests or any other performance regressions for long time now. Dev team is taking care of the perf regression tests, as I understand. @Sunil, Can you please confirm ? With devel_ack on this bug, I believe that the perf regression testing will be carried out. @Mohit, At the same time, from QE point of view we would be more than happy to see regression and longevity tests to give us more confidence with these changes. If QE could see a solid proof of no regressions echoed from the community, it would really help in boosting our confidence. I can understand that you have already vouched for no regressions, but making these changes available to the community will definitely a positive factor.
Can we get a QE ACK on this?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:4840
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days