Bug 2005779 - Perf: iobuf: Call standard allocation api in iobuf if page_size is less than equal to 128KB
Summary: Perf: iobuf: Call standard allocation api in iobuf if page_size is less than ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.5
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: RHGS 3.5.z Batch Update 7
Assignee: Mohit Agrawal
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-20 06:55 UTC by Mohit Agrawal
Modified: 2023-09-15 01:36 UTC (History)
11 users (show)

Fixed In Version: glusterfs-6.0-62
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-31 12:37:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github gluster glusterfs pull 2792 0 None Merged iobuf: Call standard allocation api in iobuf if page_size is less than equal to 128KB 2021-09-20 07:10:40 UTC
Red Hat Product Errata RHBA-2022:4840 0 None None None 2022-05-31 12:37:46 UTC

Description Mohit Agrawal 2021-09-20 06:55:37 UTC
As we know, we always tried to improve Smallfile performance in glusterfs.After doing a long duration testing we have found a client process is having huge lock contention on iobuf_pool->mutex and in case of configure high number of reader-thread-count also the contention is increased significantly.Currently
we do allocate 128 bytes even requested size is very less and for every requests first it takes a lock and the lock is implemented at process level so it has huge contention in case of configure (event|reader) threads.

Solution: We have decided to skip iobuf if requested size is less than 128k and
use available pool(glibc|tcmalloc), these malloc pools are thread based so after call directly api we are getting significant performance improvement.

The complete data is available at the link https://github.com/gluster/glusterfs/issues/2771

I have executed a smallfile perf test case for 4.8M files 64K (more than 20 times on usual day operation) on latest devel branch.I have setup 12x3(nvme) volume after configure 4 event threads on 12 physical machines, there is no other configurable option i have enabled.

Hardware details:
12 physical machines (6 client, 6 servers)
Every machine is having 64 CPU 32G RAM, 10g NIC

Run smallfile tool to run operations and total number of files are 4.8M
date ;
for i in {1..5}
do
/root/cleanup.sh;./smallfile_cli.py --operation create --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation ls-l --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation chmod --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation stat --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation read --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation append --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation mkdir --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation rmdir --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
/root/cleanup.sh;./smallfile_cli.py --operation cleanup --threads 16 --file-size 64 --files 50000 --top /mnt/test  --host-set client01.perf.cloud,client02.perf.cloud,client03.perf.cloud,client04.perf.cloud,client05.perf.cloud,client06.perf.cloud
done

date;

Comment 1 Mohit Agrawal 2021-09-20 06:57:50 UTC
The upstream patch is to resolve the same
https://github.com/gluster/glusterfs/pull/2792

Comment 2 SATHEESARAN 2021-10-20 09:18:50 UTC
This patch[0] for using tcmalloc instead of mempool, allows the glusterfs to utilize 'tcmalloc' while moving away from the traditional 'mempool' used by glusterfs for long time.
This brings in a new underlying feature which is transparent to the customers, but a significant enough change of managing the memory
allocation. So this change falls under RFE as the core memory management/allocation methodology is swapped with a better memory allocation
methodology optimized for performance, though there is no change exposed to the user level.

As per our 'RHGS and Layered Product batch update model -2.2[1], we agreed not to include any RFE as part of maintenance release.
Also, this patch is useful for small file workloads of glusterfs. Small file workloads doesn't suit well with glusterfs, and
this fact was well advertised to our customers. Providing a solution to this problem is really good, but this will come as a 
added advantage for our existing customers who were using small file workloads against the RHGS general recommendation. This has a
direct customer impact. Irrespective of the workload, definitely it will improve performance for any workload classifications 
but still there is a risk involved with the unknown.

On the other hand, the patch is merged on Sep 17,2021 and haven't made in to any of the glusterfs upstream release so far, as I checked.
The logic here is we would need good amount of soak time in the community, before we could make decision about including the
patch in RHGS 3.5.z.

With this thought, I would like to retarget this bug for RHGS 3.5.8 and revisit the this bug with that relevant information.

This patch on iobuf is required along with the tcmalloc patch

@Mohit, @Sunil - What do you suggest ?

[0] - https://github.com/gluster/glusterfs/pull/2787
[1] - https://docs.google.com/document/d/1KvdyoI8-BNJJuADBkN0OVTW4sMZ4dX6LqxUXPJtK8ME/edit

Comment 3 Yaniv Kaul 2021-10-20 10:21:53 UTC
- I've sent Sat few of the upstream performance results to show the significance of the changes - double digit percentage, all around, performance improvements, for small file workloads, dispreseed (CVLT) and sharded workloads (RHHI-v).
- It does meet the following inclusion critera:
   - Has a direct customer impact
   - Has a reasonable business justification that is easily understood by reviewers

The code will be in Gluster 10, which is coming out soon, and will have enough soak time there before inclusion in 3.5.7. I see no point on re-targeting.
By getting it in now, we can build it sooner and have more time in QE as well.

One additional item to consider - I'm concerned about more upstream changes that will come later, after Gluster 10, making backport more challenging.

Comment 4 Mohit Agrawal 2021-10-20 11:41:06 UTC
(In reply to SATHEESARAN from comment #2)
> This patch[0] for using tcmalloc instead of mempool, allows the glusterfs to
> utilize 'tcmalloc' while moving away from the traditional 'mempool' used by
> glusterfs for long time.
> This brings in a new underlying feature which is transparent to the
> customers, but a significant enough change of managing the memory
> allocation. So this change falls under RFE as the core memory
> management/allocation methodology is swapped with a better memory allocation
> methodology optimized for performance, though there is no change exposed to
> the user level.
> 
> As per our 'RHGS and Layered Product batch update model -2.2[1], we agreed
> not to include any RFE as part of maintenance release.
> Also, this patch is useful for small file workloads of glusterfs. Small file
> workloads doesn't suit well with glusterfs, and
> this fact was well advertised to our customers. Providing a solution to this
> problem is really good, but this will come as a 
> added advantage for our existing customers who were using small file
> workloads against the RHGS general recommendation. This has a
> direct customer impact. Irrespective of the workload, definitely it will
> improve performance for any workload classifications 
> but still there is a risk involved with the unknown.
> 
> On the other hand, the patch is merged on Sep 17,2021 and haven't made in to
> any of the glusterfs upstream release so far, as I checked.
> The logic here is we would need good amount of soak time in the community,
> before we could make decision about including the
> patch in RHGS 3.5.z.
> 
> With this thought, I would like to retarget this bug for RHGS 3.5.8 and
> revisit the this bug with that relevant information.
> 
> This patch on iobuf is required along with the tcmalloc patch
> 
> @Mohit, @Sunil - What do you suggest ?
> 
> [0] - https://github.com/gluster/glusterfs/pull/2787
> [1] -
> https://docs.google.com/document/d/1KvdyoI8-
> BNJJuADBkN0OVTW4sMZ4dX6LqxUXPJtK8ME/edit

The patch changes in the common code path, the code path will hit by
every operation.I don't think we need to wait for upstream result, in most
of the cases either performance will improve or same but the user will not
face any regression so i believe we should consider this.On daily perf result
we are getting impressive result.To analyze the improvement QE needs to 
run perf test cases otherwise there is not specific test case required
to validate the patch.

Thanks,
Mohit Agrawal

Comment 5 SATHEESARAN 2021-10-26 08:11:52 UTC
(In reply to Mohit Agrawal from comment #4)
> (In reply to SATHEESARAN from comment #2)
> > This patch[0] for using tcmalloc instead of mempool, allows the glusterfs to
> > utilize 'tcmalloc' while moving away from the traditional 'mempool' used by
> > glusterfs for long time.
> > This brings in a new underlying feature which is transparent to the
> > customers, but a significant enough change of managing the memory
> > allocation. So this change falls under RFE as the core memory
> > management/allocation methodology is swapped with a better memory allocation
> > methodology optimized for performance, though there is no change exposed to
> > the user level.
> > 
> > As per our 'RHGS and Layered Product batch update model -2.2[1], we agreed
> > not to include any RFE as part of maintenance release.
> > Also, this patch is useful for small file workloads of glusterfs. Small file
> > workloads doesn't suit well with glusterfs, and
> > this fact was well advertised to our customers. Providing a solution to this
> > problem is really good, but this will come as a 
> > added advantage for our existing customers who were using small file
> > workloads against the RHGS general recommendation. This has a
> > direct customer impact. Irrespective of the workload, definitely it will
> > improve performance for any workload classifications 
> > but still there is a risk involved with the unknown.
> > 
> > On the other hand, the patch is merged on Sep 17,2021 and haven't made in to
> > any of the glusterfs upstream release so far, as I checked.
> > The logic here is we would need good amount of soak time in the community,
> > before we could make decision about including the
> > patch in RHGS 3.5.z.
> > 
> > With this thought, I would like to retarget this bug for RHGS 3.5.8 and
> > revisit the this bug with that relevant information.
> > 
> > This patch on iobuf is required along with the tcmalloc patch
> > 
> > @Mohit, @Sunil - What do you suggest ?
> > 
> > [0] - https://github.com/gluster/glusterfs/pull/2787
> > [1] -
> > https://docs.google.com/document/d/1KvdyoI8-
> > BNJJuADBkN0OVTW4sMZ4dX6LqxUXPJtK8ME/edit
> 
> The patch changes in the common code path, the code path will hit by
> every operation.I don't think we need to wait for upstream result, in most
> of the cases either performance will improve or same but the user will not
> face any regression so i believe we should consider this.On daily perf result
> we are getting impressive result.To analyze the improvement QE needs to 
> run perf test cases otherwise there is not specific test case required
> to validate the patch.
> 
> Thanks,
> Mohit Agrawal

Hi Mohit,

Thanks for your input. QE has stopped performing any performance oriented tests or
any other performance regressions for long time now. Dev team is taking care
of the perf regression tests, as I understand.

@Sunil, Can you please confirm ? 
With devel_ack on this bug, I believe that the perf regression testing will be 
carried out.

@Mohit,
At the same time, from QE point of view we would be more than happy to see
regression and longevity tests to give us more confidence with these changes.
If QE could see a solid proof of no regressions echoed from the community,
it would really help in boosting our confidence. I can understand that you
have already vouched for no regressions, but making these changes available 
to the community will definitely a positive factor.

Comment 7 Yaniv Kaul 2021-11-09 12:05:02 UTC
Can we get a QE ACK on this?

Comment 20 errata-xmlrpc 2022-05-31 12:37:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4840

Comment 21 Red Hat Bugzilla 2023-09-15 01:36:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.