Created attachment 555173 [details]
Description of problem:
When the new glibc 2.12-1.47 is installed, all reverse fourier transforms (with FFTW2) of images done by this package is defected. FFTW2 itself reports correct under the new glibc. Old glibc-2.12-1.25 is not affected. Reverting to this old version solves the problem. FC16 glibc 2.14.90-24 has similar issues.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Fourier transform followed by reverse tranform with EMAN 1.8
2.Look at the resulting image, the bottom-left 1/4 is corrupted.
3.Corrupted image as screen shot.
Created attachment 555174 [details]
Related package with compile
needs boost_python, gsl, fftw2
Created attachment 555175 [details]
data file to test package
Compile the package,
from EMAN import *
the display will be corrupted, expected to be identical to a:
Unable to build the package:
CMake already installed (no version check performed)
sh: bjam: command not found
--2012-01-16 12:25:59-- http://superb-west.dl.sourceforge.net/sourceforge/boost/boost-jam-3.1.14-1-linuxx86.tgz
Resolving superb-west.dl.sourceforge.net... 22.214.171.124
Connecting to superb-west.dl.sourceforge.net|126.96.36.199|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-01-16 12:26:00 ERROR 404: Not Found.
tar (child): boost-jam-3.1.14-1-linuxx86.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
mv: cannot stat `boost-jam-3.1.14-1-linuxx86/bjam': No such file or directory
--2012-01-16 12:26:00-- http://easynews.dl.sourceforge.net/sourceforge/boost/boost_1_34_1.tar.bz2
Resolving easynews.dl.sourceforge.net... 188.8.131.52
Connecting to easynews.dl.sourceforge.net|184.108.40.206|:80... failed: Connection timed out.
(adjust parameters according to your system, then press <c> to config, <g> to generate Makefile)
make -j <n>
Any clue about this?
Haven't had the time to do anything on it over the last few weeks due to security errata and other time critical issues.
That's ok, I just keep downgraded to the 2.12-1.25 and work for now. May have some time to debug it myself too.
Rest assured it's not forgotten. I expect Alex or myself will be able to dig into it shortly.
I'm afraid I couldn't duplicate the problem. Here's what I did:
On a pristine 6.1/x86_64 system, I installed qt3-devel, python-devel, boost-python, boost-python-devel and mpi-devel (mvapich-devel actually), all from that release. I also installed fftw2-devel-2.1.5-21.el6, from EPEL6. I took gsl 1.15-3.el6 sources from Fedora 16, rebuilt (atlas-devel required) and installed. I had to make links from python2.5 to python2.6 for ccmake to find python headers and libraries.
After the build succeeded, I installed it, and set:
export LD_LIBRARY_PATH=$HOME/EMAN/lib:/usr/lib64/mvapich/lib \
then I could run the python snippet in comment 2, and (as expected, given glibc 1.25) displaying a and c produced the same image.
Then I upgraded glibc, glibc-devel, glibc-headers and glibc-common to 1.47 and ran the script again. I got the same images.
Can you please confirm that this is the procedure you used to trigger the problem? It might be processor-dependent. e.g., I ran this on an Opteron processor, but I didn't enable any of the opteron-specific options in ccmake. Nothing jumped at me looking at glibc changelogs either.
Good point. Mine is all Intel (Xeon 55xx).
I can try recompile. Was compiled with -Wno-deprecated -w -fpermissive -march=core2 -m64 -mfpmath=sse -ffast-math -mssse3 -O3 -funroll-loops -pipe gcc4.44
what confuses me is why the new glibc appearantly breaks a previously gcc compiled processor dependent code. Maybe REG assignment or function call convension? The pattern in the screen shot (attachment 555173 [details]) seems to be memory issue or SSE (SIMD) issue.
To answer your more recent question Peng, glibc provides shared libraries which are referenced by executables. Thus when a glibc update is installed existing applications will get the new shared libraries rather than the old shared libraries.
Have you tried running your application using valgrind? (valgrind --tool=memcheck)
In particular, I'm wondering if it is calling memcpy with overlapping arguments. -47 includes a better performing memcpy implementation for certain processors; however, that memcpy is more sensitive to programmer errors such as overlapping arguments.
I will do the check. Meanwhile 1.47.el6_2.5 still have problem.
So is the overlap of src and dest a caveat in POSIX C? In other words, if it is desired behavior, I would just fix my program and close this bug. It does overlap:
line 6882, libEM/EMDataA.C
Thanks for your suggestion.
Please close the bug since I can not. Basically it is a new version of glibc that is more sensitive to the overlapping of src and dest in memcpy. Changing memcpy to memmove fixes EMAN bug.
An overlap of src & dest is a violation of the ANSI/ISO/POSIX specs and results in undefined behaviour.
I'm going to keep this open for now as we are considering reverting to the prior behaviour of memcpy for the duration of the RHEL 6 lifecycle. We'll use this bug to track the issue [summary updated to reflect the root cause].
However, certainly the right thing to do when src & dest overlap is use memmove. So if you make that change EMAN will be fixed regardless of whether or not we restore the prior memcpy behaviour.
Thanks for your help in identifying this issue. Your comments in c#11 came at just the right time for me to make the connection between your problem and the change in memcpy behaviour.
Thanks for your work. Now that I have patched the memcpy to memmove, we can stand whatever decision you guys make to the glibc.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.