Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Description of problem: When a heavily multithreaded app attempts reads from many threads concurrently, and if a migration event happens, many threads are found to be hung in gdb Version-Release number of selected component (if applicable): 2.1 How reproducible: Most of the times
Please provide steps to verify.
Write a simple gfapi test program which spawns many threads (20-30) and try to perform simultaneous operation (like glfs_stat (fs, "/filename")) from multiple threads. While threads are busy doing so, make some volume change which results in graph switch (like turning on/off some perf xlator). Without the patch some of the threads will hang forever. With the patch all threads should continue their loop.
Created attachment 816776 [details] test program to do simultaneous reads from different threads.
I wrote a test gfapi program which spawns 40 threads and each thread will to glfs_lstat 8000 times. All the threads does lstat on the same file. But I couldn't reproduce the issue in the rhs-2,1 bits (3.4.0.32rhs). The behaviour of the test program seems to be same in 32rhs build and 36rhs build. I have attached the program I wrote for the same. Please point out to me If I have made any mistakes.
Created attachment 817199 [details] test program to reproduce hang check the attached program. it reproduces the hang (on 2.1 GA release) pretty much 100% of the times. with the fix the hang never happens.
I tried with the attached test gfapi program with 3.4.032rhs build and with the 3.4.0.36rhs build. With the 32rhs build, the thread gets hang whenever I do the graph change (enabled volume profile and changelog). With the 36rhs build it doesn't hang and goes on to completion. Moving the bug to verified. With 32rhs build... [root@skywalker examples]# ./glfsxmp slave falcon Joining thread 0 ... done Joining thread 1 ... ^C It gets hanged and needs to be Ctrl-C ed manually. With 36rhs build... Joining thread 38 ... done Joining thread 39 ... done The program completed successfully Total number of lstats successfully executed: 360000
Kevin, Migration event is the (internal) transition that happens when a volume's configuration is changed, for eg. when a volume option is set. This migration can be explained as the transition, where file operations on the volume begin to perceive the change in the volume configuration that has happened meanwhile. Hope that helps, Krish
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html