Bug 1021808
Summary: | gfapi - hang when heavily multithreaded app attempts parallel reads | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anand Avati <aavati> | ||||||
Component: | glusterfs | Assignee: | krishnan parthasarathi <kparthas> | ||||||
Status: | CLOSED ERRATA | QA Contact: | M S Vishwanath Bhat <vbhat> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 2.1 | CC: | aavati, chrisw, grajaiya, kcleveng, kparthas, mzywusko, nsathyan, shaines, vbellur | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.4.0.36rhs-1 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, when an application attempts parallel file operations using multiple threads on a volume, using the libgfapi interface, a change in volume configuration, eg. a volume set, may result in the file operations to hang. Now, in this update, applications would face no such hang.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-11-27 15:43:30 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Anand Avati
2013-10-22 07:11:56 UTC
Description of problem: When a heavily multithreaded app attempts reads from many threads concurrently, and if a migration event happens, many threads are found to be hung in gdb Version-Release number of selected component (if applicable): 2.1 How reproducible: Most of the times Please provide steps to verify. Write a simple gfapi test program which spawns many threads (20-30) and try to perform simultaneous operation (like glfs_stat (fs, "/filename")) from multiple threads. While threads are busy doing so, make some volume change which results in graph switch (like turning on/off some perf xlator). Without the patch some of the threads will hang forever. With the patch all threads should continue their loop. Created attachment 816776 [details]
test program to do simultaneous reads from different threads.
I wrote a test gfapi program which spawns 40 threads and each thread will to glfs_lstat 8000 times. All the threads does lstat on the same file. But I couldn't reproduce the issue in the rhs-2,1 bits (3.4.0.32rhs). The behaviour of the test program seems to be same in 32rhs build and 36rhs build. I have attached the program I wrote for the same. Please point out to me If I have made any mistakes. Created attachment 817199 [details]
test program to reproduce hang
check the attached program. it reproduces the hang (on 2.1 GA release) pretty much 100% of the times. with the fix the hang never happens.
I tried with the attached test gfapi program with 3.4.032rhs build and with the 3.4.0.36rhs build. With the 32rhs build, the thread gets hang whenever I do the graph change (enabled volume profile and changelog). With the 36rhs build it doesn't hang and goes on to completion. Moving the bug to verified. With 32rhs build... [root@skywalker examples]# ./glfsxmp slave falcon Joining thread 0 ... done Joining thread 1 ... ^C It gets hanged and needs to be Ctrl-C ed manually. With 36rhs build... Joining thread 38 ... done Joining thread 39 ... done The program completed successfully Total number of lstats successfully executed: 360000 Kevin, Migration event is the (internal) transition that happens when a volume's configuration is changed, for eg. when a volume option is set. This migration can be explained as the transition, where file operations on the volume begin to perceive the change in the volume configuration that has happened meanwhile. Hope that helps, Krish Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |