Bug 1021808 - gfapi - hang when heavily multithreaded app attempts parallel reads
gfapi - hang when heavily multithreaded app attempts parallel reads
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: krishnan parthasarathi
M S Vishwanath Bhat
: ZStream
Depends On:
  Show dependency treegraph
Reported: 2013-10-22 03:11 EDT by Anand Avati
Modified: 2016-05-31 21:57 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-
Doc Type: Bug Fix
Doc Text:
Previously, when an application attempts parallel file operations using multiple threads on a volume, using the libgfapi interface, a change in volume configuration, eg. a volume set, may result in the file operations to hang. Now, in this update, applications would face no such hang.
Story Points: ---
Clone Of:
Last Closed: 2013-11-27 10:43:30 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
test program to do simultaneous reads from different threads. (2.92 KB, text/plain)
2013-10-28 07:27 EDT, M S Vishwanath Bhat
no flags Details
test program to reproduce hang (2.27 KB, text/x-csrc)
2013-10-29 19:29 EDT, Anand Avati
no flags Details

  None (edit)
Description Anand Avati 2013-10-22 03:11:56 EDT
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:
Comment 1 Anand Avati 2013-10-22 03:13:15 EDT
Description of problem:

When a heavily multithreaded app attempts reads from many threads concurrently, and if a migration event happens, many threads are found to be hung in gdb

Version-Release number of selected component (if applicable):


How reproducible:

Most of the times
Comment 5 Gowrishankar Rajaiyan 2013-10-22 13:14:20 EDT
Please provide steps to verify.
Comment 6 Anand Avati 2013-10-22 17:41:34 EDT
Write a simple gfapi test program which spawns many threads (20-30) and try to perform simultaneous operation (like glfs_stat (fs, "/filename")) from multiple threads. While threads are busy doing so, make some volume change which results in graph switch (like turning on/off some perf xlator). Without the patch some of the threads will hang forever. With the patch all threads should continue their loop.
Comment 7 M S Vishwanath Bhat 2013-10-28 07:27:46 EDT
Created attachment 816776 [details]
test program to do simultaneous reads from different threads.
Comment 8 M S Vishwanath Bhat 2013-10-28 07:29:45 EDT
I wrote a test gfapi program which spawns 40 threads and each thread will to glfs_lstat 8000 times. All the threads does lstat on the same file. But I couldn't reproduce the issue in the rhs-2,1 bits ( The behaviour of the test program seems to be same in 32rhs build and 36rhs build.

I have attached the program I wrote for the same. Please point out to me If I have made any mistakes.
Comment 9 Anand Avati 2013-10-29 19:29:52 EDT
Created attachment 817199 [details]
test program to reproduce hang

check the attached program. it reproduces the hang (on 2.1 GA release) pretty much 100% of the times. with the fix the hang never happens.
Comment 11 M S Vishwanath Bhat 2013-10-30 16:55:34 EDT
I tried with the attached test gfapi program with 3.4.032rhs build and with the build. With the 32rhs build, the thread gets hang whenever I do the graph  change (enabled volume profile and changelog). With the 36rhs build it doesn't hang and goes on to completion.

Moving the bug to verified.

With 32rhs build...

[root@skywalker examples]# ./glfsxmp slave falcon
Joining thread 0 ... done
Joining thread 1 ... ^C

It gets hanged and needs to be Ctrl-C ed manually.

With 36rhs build...

Joining thread 38 ... done
Joining thread 39 ... done
The program completed successfully
Total number of lstats successfully executed: 360000
Comment 13 krishnan parthasarathi 2013-11-14 22:31:50 EST
Migration event is the (internal) transition that happens when a volume's configuration is changed, for eg. when a volume option is set. This migration can be explained as the transition, where file operations on the volume begin to perceive the change in the volume configuration that has happened meanwhile.

Hope that helps,
Comment 14 errata-xmlrpc 2013-11-27 10:43:30 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.