Bug 1021808

Summary: gfapi - hang when heavily multithreaded app attempts parallel reads
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anand Avati <aavati>
Component: glusterfsAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: M S Vishwanath Bhat <vbhat>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.1CC: aavati, chrisw, grajaiya, kcleveng, kparthas, mzywusko, nsathyan, shaines, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.36rhs-1 Doc Type: Bug Fix
Doc Text:
Previously, when an application attempts parallel file operations using multiple threads on a volume, using the libgfapi interface, a change in volume configuration, eg. a volume set, may result in the file operations to hang. Now, in this update, applications would face no such hang.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 15:43:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test program to do simultaneous reads from different threads.
none
test program to reproduce hang none

Description Anand Avati 2013-10-22 07:11:56 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2013-10-22 07:13:15 UTC
Description of problem:

When a heavily multithreaded app attempts reads from many threads concurrently, and if a migration event happens, many threads are found to be hung in gdb

Version-Release number of selected component (if applicable):

2.1

How reproducible:

Most of the times

Comment 5 Gowrishankar Rajaiyan 2013-10-22 17:14:20 UTC
Please provide steps to verify.

Comment 6 Anand Avati 2013-10-22 21:41:34 UTC
Write a simple gfapi test program which spawns many threads (20-30) and try to perform simultaneous operation (like glfs_stat (fs, "/filename")) from multiple threads. While threads are busy doing so, make some volume change which results in graph switch (like turning on/off some perf xlator). Without the patch some of the threads will hang forever. With the patch all threads should continue their loop.

Comment 7 M S Vishwanath Bhat 2013-10-28 11:27:46 UTC
Created attachment 816776 [details]
test program to do simultaneous reads from different threads.

Comment 8 M S Vishwanath Bhat 2013-10-28 11:29:45 UTC
I wrote a test gfapi program which spawns 40 threads and each thread will to glfs_lstat 8000 times. All the threads does lstat on the same file. But I couldn't reproduce the issue in the rhs-2,1 bits (3.4.0.32rhs). The behaviour of the test program seems to be same in 32rhs build and 36rhs build.

I have attached the program I wrote for the same. Please point out to me If I have made any mistakes.

Comment 9 Anand Avati 2013-10-29 23:29:52 UTC
Created attachment 817199 [details]
test program to reproduce hang

check the attached program. it reproduces the hang (on 2.1 GA release) pretty much 100% of the times. with the fix the hang never happens.

Comment 11 M S Vishwanath Bhat 2013-10-30 20:55:34 UTC
I tried with the attached test gfapi program with 3.4.032rhs build and with the 3.4.0.36rhs build. With the 32rhs build, the thread gets hang whenever I do the graph  change (enabled volume profile and changelog). With the 36rhs build it doesn't hang and goes on to completion.

Moving the bug to verified.


With 32rhs build...

[root@skywalker examples]# ./glfsxmp slave falcon
Joining thread 0 ... done
Joining thread 1 ... ^C

It gets hanged and needs to be Ctrl-C ed manually.


With 36rhs build...

Joining thread 38 ... done
Joining thread 39 ... done
The program completed successfully
Total number of lstats successfully executed: 360000

Comment 13 krishnan parthasarathi 2013-11-15 03:31:50 UTC
Kevin,
Migration event is the (internal) transition that happens when a volume's configuration is changed, for eg. when a volume option is set. This migration can be explained as the transition, where file operations on the volume begin to perceive the change in the volume configuration that has happened meanwhile.

Hope that helps,
Krish

Comment 14 errata-xmlrpc 2013-11-27 15:43:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html