Description of problem: getpwuid_r sometimes hangs when called. It is likely that the problem results when two processes concurrently call the function but I have been unable to get clear evidence of this. Version-Release number of selected component (if applicable): glibc-2.7-2.i686 How reproducible: Occasionally when frequently called by multiple pthread threads. Steps to Reproduce: This is a bit hard as it happens during testing of autofs when using the Connectathon suite. I'll need to try to develop something simpler to replicate the problem.
Created attachment 310108 [details] gdb stace trace of hung process
Also I found that adding a mutex to bracket the get*_r calls I make causes the problem go away.
(In reply to comment #2) > Also I found that adding a mutex to bracket the get*_r calls > I make causes the problem go away. Oh .. now I've seen this with the mutex in the autofs code as well. Is this something I'm doing wrong or is there something amis with the pthread locking code? Ian
There are no known problems with the locking code and I rate that there is one a extremely unlikely. The code is used by hundreds of thousands of programs. The libc-internal locking can potentially be thrown off if the program doesn't know that more than one thread is running. This can really only happen if you're using clone() directly (as opposed to pthread_create) or if the appropriate memory location storing that information is corrupted. The effect would be that instead of using atomic operations for various locks non-atomic operations are used. This does not affect the pthread_* functions themselves, though. If you see problems with them as well it is something else.
(In reply to comment #4) > There are no known problems with the locking code and I rate that there is one a > extremely unlikely. The code is used by hundreds of thousands of programs. Yes, I agree. > > The libc-internal locking can potentially be thrown off if the program doesn't > know that more than one thread is running. This can really only happen if > you're using clone() directly (as opposed to pthread_create) or if the > appropriate memory location storing that information is corrupted. The effect > would be that instead of using atomic operations for various locks non-atomic > operations are used. > > This does not affect the pthread_* functions themselves, though. If you see > problems with them as well it is something else. I'm only using pthread_* functions and I do see problems occasionally but I'm unable to duplicate them in any reasonably simple way. Oddly enough, after a yum update this particular problem seems to have gone away. But that could also be due to corrections that I've made. Ian