Hide Forgot
Description of problem: Running OpenMP applications compiled with the Intel compiler, I noticed that sporadically the OpenMP library would detect the wrong number of CPUs. When talking to Intel about it, their support indicated that their library uses the following call to determine the number of CPUs: "sysconf(_SC_NPROCESSORS_ONLN)". When investigating this problem I found that the above function calls a function named 'get_nprocs()' to retrieve the number of CPUs. A small test program that calls 'get_nprocs()' directly was showing the same problem seen with the Intel OpenMP libraries. When searching the net for this problem I discovered that the problem has been reported on March 2010 and fixed on April 2010. Here is a link detailing the problem, its cause and the solution for it: http://sourceware.org/bugzilla/show_bug.cgi?id=11432 here is a link to the solution patch itself: http://sources.redhat.com/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=3ed8e241229e370cca96650ed727f09838c51d67 And here is a link indicating when the patch was applied to glibc-2.11-311: http://cygwin.com/ml/glibc-cvs/2010-q2/msg00003.html HOWEVER, RHEL 5 uses glibc-2.5. In fact, the problem is only seen in later update releases of RHEL 5: It does not exist in RHEL 5.3, but does show up in RHEL 5.5, 5.6 and 5.7 beta. It appears that a patch to getsysstat.c has been backported to glibc-2.5. The patch file in the source RPM is named: 'glibc-expmalloc4.patch'. Unfortunately, this new version of getsysstat.c exhibits the problem described above. The patch that solved the problem was never backported as well - leaving the version of 'get_nprocs()' used by RHEL 5.5 and beyond with this bug (including the current 5.7 beta). TO SUMMARIZE: This is a known bug that was fixed more than a year ago, however the fix was never ported to the glibc-2.5 used in the latest RHEL 5 versions. Version-Release number of selected component (if applicable): glibc-2.5-49 , glibc-2.5-58 , glibc-2.5-65 How reproducible: On a system with 160 CPUs or more this could happen. I used a system with 768 CPUs and ran into the problem at least 30% of the time with any OpenMP application that was compiled with the Intel compiler on the system Steps to Reproduce: 1. Use a system with a large number of CPUs (160 may do, but use at least 192 to be certain). 1. Compile an OpenMP application (e.g. STREAM) with Intel compiler 11. 2. set KMP_AFFINITY=compact,verbose 3. run repeatedly. Check the number of CPUs detected as reported by Intel's OpenMP. When it varies from the currect value, it is due to this bug in 'get_nprocs()'. Actual results: This is how Intel's OpenMP may report the number of CPUs: OMP: Info #156: KMP_AFFINITY: 157 available OS procs and sometimes it was: OMP: Info #156: KMP_AFFINITY: 487 available OS procs Expected results: This is how Intel's OpenMP should report the number of CPUs detected: OMP: Info #156: KMP_AFFINITY: 768 available OS procs Additional info:
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Quality Engineering Management has reviewed and declined this request. You may appeal this decision by reopening this request.