Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 761748 (GLUSTER-16)

Summary:

GlusterFS file opening latency increases linearly with number of open files

Product:

[Community] GlusterFS

Reporter:

Shehjar Tikoo <shehjart>

Component:

core

Assignee:

Shehjar Tikoo <shehjart>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

low

Docs Contact:

Priority:

low

Version:

mainline

CC:

gluster-bugs

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

RTP

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Plot showing the posix open performance	none
client-after-o1-fdtable	none
client-before-o1-fdtable	none
server-after-o1-fdtable	none
.server-before-o1-fdtable	none
First performance test	none
optimized perf	none

Description Shehjar Tikoo 2009-06-15 15:12:56 UTC

Created attachment 4 [details]
resolv.conf

Comment 1 Shehjar Tikoo 2009-06-15 15:13:27 UTC

Created attachment 5 [details]
patch for NCR 3416 and rh 6.1

Comment 2 Shehjar Tikoo 2009-06-15 15:14:29 UTC

Created attachment 6 [details]
SIGALRM patch

Comment 3 Shehjar Tikoo 2009-06-15 15:14:57 UTC

Created attachment 7 [details]
Lock patch

Comment 4 Shehjar Tikoo 2009-06-15 15:15:42 UTC

Created attachment 8 [details]
A few cuts from ppp-watch.c illustrating the error

Comment 5 Shehjar Tikoo 2009-06-15 15:17:09 UTC

Created attachment 9 [details]
File with a copy of the error message

Comment 6 Shehjar Tikoo 2009-06-15 15:39:17 UTC

The  following two patches resolve the two performance issues discussed here:

1. http://patches.gluster.com/patch/568/: Reduce server side CPU usage by turning the table into an O(1) allocator.

2. http://patches.gluster.com/patch/569/: Replaces the use of dict_t in protocol/client with a list so that dictionary operations are no more a bottleneck in file open path.

The effect to of these two changes is in the plot in image titled "optimized perf". As this latest plot shows, the O(1) fd-table does not lead to a reduction in file open latency as much as the second patch above, but as shown by the callgrind output earlier, the use of O(1) fd-table does lead to drastic reduction in CPU utilization.

Comment 7 Shehjar Tikoo 2009-06-15 15:39:52 UTC

Created attachment 10 [details]
Fix

Comment 8 Shehjar Tikoo 2009-06-15 18:12:09 UTC

A few weeks back I ran a test with a tool I called glfiletable which opens multiple files concurrently. The aim was to determine whether the O(n) file descriptor table in libglusterfs was a bottleneck or not. The tool measures the latency of opening a file while increasing the number of files in each iteration.

The first result is shown in open-performance.png(..attached..). In this plot, glfiletable was run over different types of mount points, for eg. over FUSE with both protocol/client and protocol/server in the stack, with libglusterfsclient using the same vol file and the Linux local file system.

In this plot, note the constant latency for Linux open performance, then look at the increasing latency for glusterfs-fuse and libglusterfsclient. Initially, I assumed that the fd-table was the culprit, so I changed the implementation into an O(1) algo but the increasing latencies did not go away, as is shown by the data points for "libglusterfsclient-o1" points.

To zero in further, I ran the same test over a vol file that contained just storage-posix translator and that did not show such latencies at all. For this see, the plot in "opentest-posix.png".

This actually pointed to some inefficiency in the client-server interaction. So I ran both under valgrind. The results of these runs are in the following attached files. Open them using Kcachegrind.

1. callgrind.out.server-before-o1-fdtable: Shows the CPU usage call graph before the use of fd-table in libglusterfs. It clearly shows the huge amount of CPU being used up by gf_fd_unused_get function.

2. callgrind.out.server-after-o1-fdtable: Shows the CPU usage call graph after turning the fd-table O(1). As expected, the CPU usage by gf_fd_used_get is no longer the highest consumer of CPU. However, as the chart above showed, this didnt actually fix the increasing latency problem.

3. callgrind.out.client-before-o1-fdtable: Shows client CPU call graph before and callgrind.out.client-after-o1-fdtable shows the same after the fd-table change. Both show unusual CPU usage in client_open_cbk due to dictionary operations.

Avati confirmed that use of dictionary for saving the list of per-client fd_t was the cause of this problem.

Comment 9 Shehjar Tikoo 2009-06-18 02:20:42 UTC

Patches accepted into repo:

http://git.savannah.gnu.org/cgit/gluster.git/commit/?id=efcce990960fb91d422630fc7b310b216a500fed

http://git.savannah.gnu.org/cgit/gluster.git/commit/?id=bb4e14b213a39e9d403be9790ef0a75388496dee