Bug 1021808 - gfapi - hang when heavily multithreaded app attempts parallel reads
Summary: gfapi - hang when heavily multithreaded app attempts parallel reads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: krishnan parthasarathi
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-22 07:11 UTC by Anand Avati
Modified: 2018-12-09 17:14 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.4.0.36rhs-1
Doc Type: Bug Fix
Doc Text:
Previously, when an application attempts parallel file operations using multiple threads on a volume, using the libgfapi interface, a change in volume configuration, eg. a volume set, may result in the file operations to hang. Now, in this update, applications would face no such hang.
Clone Of:
Environment:
Last Closed: 2013-11-27 15:43:30 UTC
Embargoed:


Attachments (Terms of Use)
test program to do simultaneous reads from different threads. (2.92 KB, text/plain)
2013-10-28 11:27 UTC, M S Vishwanath Bhat
no flags Details
test program to reproduce hang (2.27 KB, text/x-csrc)
2013-10-29 23:29 UTC, Anand Avati
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1769 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #1 2013-11-27 20:17:39 UTC

Description Anand Avati 2013-10-22 07:11:56 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2013-10-22 07:13:15 UTC
Description of problem:

When a heavily multithreaded app attempts reads from many threads concurrently, and if a migration event happens, many threads are found to be hung in gdb

Version-Release number of selected component (if applicable):

2.1

How reproducible:

Most of the times

Comment 5 Gowrishankar Rajaiyan 2013-10-22 17:14:20 UTC
Please provide steps to verify.

Comment 6 Anand Avati 2013-10-22 21:41:34 UTC
Write a simple gfapi test program which spawns many threads (20-30) and try to perform simultaneous operation (like glfs_stat (fs, "/filename")) from multiple threads. While threads are busy doing so, make some volume change which results in graph switch (like turning on/off some perf xlator). Without the patch some of the threads will hang forever. With the patch all threads should continue their loop.

Comment 7 M S Vishwanath Bhat 2013-10-28 11:27:46 UTC
Created attachment 816776 [details]
test program to do simultaneous reads from different threads.

Comment 8 M S Vishwanath Bhat 2013-10-28 11:29:45 UTC
I wrote a test gfapi program which spawns 40 threads and each thread will to glfs_lstat 8000 times. All the threads does lstat on the same file. But I couldn't reproduce the issue in the rhs-2,1 bits (3.4.0.32rhs). The behaviour of the test program seems to be same in 32rhs build and 36rhs build.

I have attached the program I wrote for the same. Please point out to me If I have made any mistakes.

Comment 9 Anand Avati 2013-10-29 23:29:52 UTC
Created attachment 817199 [details]
test program to reproduce hang

check the attached program. it reproduces the hang (on 2.1 GA release) pretty much 100% of the times. with the fix the hang never happens.

Comment 11 M S Vishwanath Bhat 2013-10-30 20:55:34 UTC
I tried with the attached test gfapi program with 3.4.032rhs build and with the 3.4.0.36rhs build. With the 32rhs build, the thread gets hang whenever I do the graph  change (enabled volume profile and changelog). With the 36rhs build it doesn't hang and goes on to completion.

Moving the bug to verified.


With 32rhs build...

[root@skywalker examples]# ./glfsxmp slave falcon
Joining thread 0 ... done
Joining thread 1 ... ^C

It gets hanged and needs to be Ctrl-C ed manually.


With 36rhs build...

Joining thread 38 ... done
Joining thread 39 ... done
The program completed successfully
Total number of lstats successfully executed: 360000

Comment 13 krishnan parthasarathi 2013-11-15 03:31:50 UTC
Kevin,
Migration event is the (internal) transition that happens when a volume's configuration is changed, for eg. when a volume option is set. This migration can be explained as the transition, where file operations on the volume begin to perceive the change in the volume configuration that has happened meanwhile.

Hope that helps,
Krish

Comment 14 errata-xmlrpc 2013-11-27 15:43:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html


Note You need to log in before you can comment on or make changes to this bug.