Bug 691336

Summary: memcpy acts randomly (and differently) with overlapping areas
Product: [Fedora] Fedora Reporter: Felipe Contreras <felipe.contreras>
Component: glibcAssignee: Andreas Schwab <schwab>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 14CC: alexander.hunt2005, fweimer, jakub, mail2benny, martin, mathieu-acct, rich, schwab, tmraz
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-28 12:03:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Felipe Contreras 2011-03-28 09:54:32 UTC
Description of problem:
glibc changed the behavior of memcpy which used to work like memmove, but now it breaks certain use-cases. Considering it's impossible to check all the millions of lines of code possibly affected, and that produces _zero_ gain, it's hard to tout in favor of this change.

Additional info:
Here's the upstream bug:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=12518

In the words of Linus Torvalds:

I realize that it's technically "undefined", but the behavior has changed, and
in the process broken existing binaries. It's a common bug to use memcpy()
instead of memmove(), and the traditional behavior of "copy upwards" means that
that bug can go unnoticed for a *long* time if the memory move moves things
downwards.

The change was introduced in commit 6fb8cbcb58a29fff73eb2101b34caa19a7f88eba
("Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7"), which was part
of the 2.13 release.

As a result, upgrading from Fedora-13 to Fedora-14 caused applications like
flash to fail. But it's a bug that has been reported for other applications too
(and it's a bug that I've personally introduced in 'git' too - happily, people
had run things like valgrind, so I don't know of any _current_ cases of it).

And there is absolutely _zero_ reason to not handle overlapping areas
correctly. The "undefined behavior" is due to historical implementation issues,
not because it's better than just doing the right thing. 

And now applications will randomly do different things depending on the phase
of the moon (technically, depending on which CPU they have and what particular
version of memcpy() glibc happens to choose).

So from a pure Quality-of-Implementation standpoint, just make glibc implement
memcpy() as memmove(), if you don't want to re-instate the traditional behavior
that binaries have at least been tested with. Don't break random existing
binaries just because the glibc version changed, and they had never been tested
with the new random behavior!

Comment 1 Jakub Jelinek 2011-03-28 09:58:41 UTC
"glibc changed the behavior of memcpy which used to work like memmove"
is complete nonsense, glibc memcpy never worked like memmove.

Comment 2 Felipe Contreras 2011-03-28 10:08:48 UTC
(In reply to comment #1)
> "glibc changed the behavior of memcpy which used to work like memmove"
> is complete nonsense, glibc memcpy never worked like memmove.

At least in certain situations that memcpy used to work, now only memmove works.

Comment 3 Andreas Schwab 2011-03-28 11:08:30 UTC
Of course, this is not a bug.

Comment 4 Felipe Contreras 2011-03-28 11:17:39 UTC
(In reply to comment #3)
> Of course, this is not a bug.

Care to explain why?

Ulrich Drepper and Linus Torvalds agree it is. Do you claim to know more than they do regarding glibc?

Have you actually tested all the millions of lines of code possibly affected?

Comment 5 Andreas Schwab 2011-03-28 11:29:50 UTC
This is certainly not a bug in any way.

Comment 6 Jakub Jelinek 2011-03-28 11:31:41 UTC
Why do you think Ulrich Drepper agrees it is a bug?
http://sourceware.org/bugzilla/show_bug.cgi?id=12518#c4
says quite contrary.  The fact that memcpy has undefined behavior with overlapping memory areas and memmove has defined behavior is the only difference in between those two functions, the undefined behavior allows both the implementation and the compiler to optimize based on assumption that it is never called with overlapping areas and both the implementation in glibc is done in that way and gcc optimizes using that assumption.

What Ulrich has agreed to is for broken apps that misuse memcpy glibc might have a workaround, having memcpy.5 map to memmove and only memcpy@@GLIBC_2.14 on x86_64 be a real memcpy.  That is an upstream issue, so I have no idea why you are creating a Fedora bug about it, that is definitely nothing that can be backported.

Comment 7 Felipe Contreras 2011-03-28 11:44:48 UTC
(In reply to comment #5)
> This is certainly not a bug in any way.

I repeat. Care to explain why?

Ulrich Drepper and Linus Torvalds agree it is. Do you claim to know more than
they do regarding glibc?

Have you actually tested all the millions of lines of code possibly affected?

(In reply to comment #6)
> Why do you think Ulrich Drepper agrees it is a bug?

Because it's still open in upstream.

> The fact that memcpy has undefined behavior with
> overlapping memory areas and memmove has defined behavior is the only
> difference in between those two functions, the undefined behavior allows both
> the implementation and the compiler to optimize based on assumption that it is
> never called with overlapping areas and both the implementation in glibc is
> done in that way and gcc optimizes using that assumption.

The check for overlap is a couple of cycles. That is not reason enough to break applications. But anyway, that's for upstream to decide.

> What Ulrich has agreed to is for broken apps that misuse memcpy glibc might
> have a workaround, having memcpy.5 map to memmove and only
> memcpy@@GLIBC_2.14 on x86_64 be a real memcpy.  That is an upstream issue, so I
> have no idea why you are creating a Fedora bug about it, that is definitely
> nothing that can be backported.

Of course you can backport the patch. Or mark this as fixed when the upstream bug is resolved.

In the meantime, this remains an issue in Fedora.

Comment 8 Jakub Jelinek 2011-03-28 11:48:42 UTC
New symbol versions can't be introduced into older glibcs, so when F14 ships with glibc 2.13, it can't have GLIBC_2.14 symbols.  The only way would be if, after glibc 2.14 is released, the whole glibc 2.14 would be shipped as update, but that is not desirable.

Comment 9 Felipe Contreras 2011-03-28 11:59:15 UTC
(In reply to comment #8)
> New symbol versions can't be introduced into older glibcs, so when F14 ships
> with glibc 2.13, it can't have GLIBC_2.14 symbols.  The only way would be if,
> after glibc 2.14 is released, the whole glibc 2.14 would be shipped as update,
> but that is not desirable.

The current patch is for GLIBC_2.13, you think it should be GLIBC_2.14, and that's probably the case, but nobody in upstream has said anything regarding that. Besides, even if upstream goes for GLIBC_2.14, Fedora can go hack it up for GLIBC_2.13, and this bug would be fixed (somewhat).

If you don't want to do that, that doesn't mean the bug is not present (NOTABUG), that means you don't want to fix the bug (WONTFIX).

And upstream still hasn't decided what fix to apply, so for some strange miracle, Ulrich might pick Linus's patch, or something like that.

Comment 10 Andreas Schwab 2011-03-28 12:03:27 UTC
There is no bug to fix.

Comment 11 Felipe Contreras 2011-03-28 16:35:56 UTC
(In reply to comment #10)
> There is no bug to fix.

Yes there is:
http://sourceware.org/bugzilla/show_bug.cgi?id=12518

And since you didn't answer, I repeat. Care to explain why you think it's not a bug while Ulrich Drepper and Linus Torvalds agree it is?

Have you actually tested all the millions of lines of code possibly affected?

Mark this as WONTFIX if you don't plan to fix this issue accepted by upstream as a bug.

Actually, I have an idea, I would recompile glibc to print a warning when memcpy is used wrongly, and run my F14 system as usual for a few days, that should give an idea of the magnitude of this bug.

That's something Fedora people should be doing, but I guess you don't care about the reliability of people's machines.

Comment 12 Jakub Jelinek 2011-03-28 16:53:53 UTC
It is not a glibc bug, because all the relevant standards/man pages etc. since forever say that memcpy has undefined behavior if the memory regions overlap.
So it is not the responsibility of glibc maintainers to check all code in the wild.
You don't need to modify glibc to get diagnostics about memcpy misuses, valgrind reports this for many years and there is also
http://fedorapeople.org/gitweb?p=wcohen/public_git/memstomp;a=summary
which you can just LD_PRELOAD.
Nobody upstream accepted BZ#12518 as a bug, it is considered to be an enhancement request to workaround crappy old software.