Bug 781646

Summary: Change in memcpy behavior for overlapping arguments breaks existing applications
Product: Red Hat Enterprise Linux 6 Reporter: Peng Ge <gepeng1983>
Component: glibcAssignee: Jeff Law <law>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: high Docs Contact:
Priority: urgent    
Version: 6.1CC: aoliva, fweimer, green, jpallich, mfranc, mishu, pmuller
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 12:09:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 799259    
Attachments:
Description Flags
Screen shot
none
Related package with compile
none
data file to test package none

Description Peng Ge 2012-01-14 01:06:47 UTC
Created attachment 555173 [details]
Screen shot

Description of problem:
When the new glibc 2.12-1.47 is installed, all reverse fourier transforms (with FFTW2) of images done by this package is defected. FFTW2 itself reports correct under the new glibc. Old glibc-2.12-1.25 is not affected. Reverting to this old version solves the problem. FC16 glibc 2.14.90-24 has similar issues.

Version-Release number of selected component (if applicable):
glibc 2.12-1.47

How reproducible:
Always

Steps to Reproduce:
1.Fourier transform followed by reverse tranform with EMAN 1.8
2.Look at the resulting image, the bottom-left 1/4 is corrupted.
3.Corrupted image as screen shot.
  
Actual results:


Expected results:


Additional info:

Comment 1 Peng Ge 2012-01-14 01:12:14 UTC
Created attachment 555174 [details]
Related package with compile

needs boost_python, gsl, fftw2

Comment 2 Peng Ge 2012-01-14 01:15:28 UTC
Created attachment 555175 [details]
data file to test package

To reproduce:

Compile the package,
run python

from EMAN import * 
a=EMData()
a.readImage("1111.mrc",0)
b=a.doFFT()
c=b.doIFT()
c.display()

the display will be corrupted, expected to be identical to a:
a.display()

Comment 4 Jeff Law 2012-01-16 19:27:36 UTC
Unable to build the package:

./autobuildsetup.py 
CMake already installed (no version check performed)
sh: bjam: command not found
--2012-01-16 12:25:59--  http://superb-west.dl.sourceforge.net/sourceforge/boost/boost-jam-3.1.14-1-linuxx86.tgz
Resolving superb-west.dl.sourceforge.net... 216.34.181.96
Connecting to superb-west.dl.sourceforge.net|216.34.181.96|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-01-16 12:26:00 ERROR 404: Not Found.

tar (child): boost-jam-3.1.14-1-linuxx86.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
mv: cannot stat `boost-jam-3.1.14-1-linuxx86/bjam': No such file or directory
--2012-01-16 12:26:00--  http://easynews.dl.sourceforge.net/sourceforge/boost/boost_1_34_1.tar.bz2
Resolving easynews.dl.sourceforge.net... 69.16.168.245
Connecting to easynews.dl.sourceforge.net|69.16.168.245|:80... failed: Connection timed out.
Retrying.

Comment 5 Peng Ge 2012-01-17 23:32:16 UTC
To Build:

mkdir build
cd build
ccmake ..
(adjust parameters according to your system, then press <c> to config, <g> to generate Makefile)
make -j <n>

Peng

Comment 6 Peng Ge 2012-02-02 03:36:40 UTC
Any clue about this?

Comment 7 Jeff Law 2012-02-02 03:54:50 UTC
Haven't had the time to do anything on it over the last few weeks due to security errata and other time critical issues.

Comment 8 Peng Ge 2012-02-03 18:14:29 UTC
That's ok, I just keep downgraded to the 2.12-1.25 and work for now. May have some time to debug it myself too.

Comment 9 Jeff Law 2012-02-03 18:18:08 UTC
Rest assured it's not forgotten.  I expect Alex or myself will be able to dig into it shortly.

Comment 10 Alexandre Oliva 2012-02-12 03:14:49 UTC
I'm afraid I couldn't duplicate the problem.  Here's what I did:

On a pristine 6.1/x86_64 system, I installed qt3-devel, python-devel, boost-python, boost-python-devel and mpi-devel (mvapich-devel actually), all from that release.  I also installed fftw2-devel-2.1.5-21.el6, from EPEL6.  I took gsl 1.15-3.el6 sources from Fedora 16, rebuilt (atlas-devel required) and installed.  I had to make links from python2.5 to python2.6 for ccmake to find python headers and libraries.

After the build succeeded, I installed it, and set:

export LD_LIBRARY_PATH=$HOME/EMAN/lib:/usr/lib64/mvapich/lib \
PYTHONPATH=$HOME/EMAN/lib PATH=$HOME/EMAN/bin:$PATH

then I could run the python snippet in comment 2, and (as expected, given glibc 1.25) displaying a and c produced the same image.

Then I upgraded glibc, glibc-devel, glibc-headers and glibc-common to 1.47 and ran the script again.  I got the same images.

Can you please confirm that this is the procedure you used to trigger the problem?  It might be processor-dependent.  e.g., I ran this on an Opteron processor, but I didn't enable any of the opteron-specific options in ccmake.  Nothing jumped at me looking at glibc changelogs either.

Comment 11 Peng Ge 2012-02-12 09:02:37 UTC
Good point. Mine is all Intel (Xeon 55xx).

I can try recompile. Was compiled with  -Wno-deprecated -w -fpermissive -march=core2 -m64 -mfpmath=sse -ffast-math -mssse3 -O3 -funroll-loops -pipe gcc4.44

what confuses me is why the new glibc appearantly breaks a previously gcc compiled processor dependent code. Maybe REG assignment or function call convension? The pattern in the screen shot (attachment 555173 [details]) seems to be memory issue or SSE (SIMD) issue.

Comment 12 Jeff Law 2012-02-14 19:07:28 UTC
To answer your more recent question Peng, glibc provides shared libraries which are referenced by executables.  Thus when a glibc update is installed existing applications will get the new shared libraries rather than the old shared libraries.

Have you tried running your application using valgrind? (valgrind --tool=memcheck)

In particular, I'm wondering if it is calling memcpy  with overlapping arguments.  -47 includes a better performing memcpy implementation for certain processors; however, that memcpy is more sensitive to programmer errors such as overlapping arguments.

Comment 13 Peng Ge 2012-02-22 23:53:05 UTC
I will do the check. Meanwhile 1.47.el6_2.5 still have problem.

Comment 14 Peng Ge 2012-02-23 00:02:15 UTC
So is the overlap of src and dest a caveat in POSIX C? In other words, if it is desired behavior, I would just fix my program and close this bug. It does overlap:

line 6882, libEM/EMDataA.C

Thanks for your suggestion.

Peng

Comment 15 Peng Ge 2012-02-23 00:31:57 UTC
Please close the bug since I can not. Basically it is a new version of glibc that is more sensitive to the overlapping of src and dest in memcpy. Changing memcpy to memmove fixes EMAN bug.

Comment 16 Jeff Law 2012-02-23 03:06:18 UTC
An overlap of src & dest is a violation of the ANSI/ISO/POSIX specs and results in undefined behaviour.

I'm going to keep this open for now as we are considering reverting to the prior behaviour of memcpy for the duration of the RHEL 6 lifecycle.  We'll use this bug to track the issue [summary updated to reflect the root cause].

However, certainly the right thing to do when src & dest overlap is use memmove.  So if you make that change EMAN will be fixed regardless of whether or not we restore the prior memcpy behaviour.

Thanks for your help in identifying this issue.  Your comments in c#11 came at just the right time for me to make the connection between your problem and the change in memcpy behaviour.

Comment 17 Peng Ge 2012-02-24 05:53:24 UTC
Thanks for your work. Now that I have patched the memcpy to memmove, we can stand whatever decision you guys make to the glibc.

Comment 24 errata-xmlrpc 2012-06-20 12:09:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0763.html