Created attachment 1055966 [details]
patch to fix the problem.
Description of problem:
At Facebook we had an app that started hanging and crashing weirdly when going from glibc-2.12-1.149.el6.x86_64 to glibc-2.12-1.163.el6.x86_64. Turns out this patch
Introduced the problem.
You added the following bit to _int_malloc()
+ /* There are no usable arenas. Fall back to sysmalloc to get a chunk from
+ mmap. */
+ if (__glibc_unlikely (av == NULL))
+ void *p = sYSMALLOc (nb, av);
+ if (p != NULL)
+ alloc_perturb (p, bytes);
+ return p;
But this isn't ok, alloc_perturb unconditionally memset's the front byte to 0xf, unlike upstream where it checks to see if perturb_byte is set. This needs to be changed to
if (p != NULL && && __builtin_expect(perturb_byte, 0))
alloc_perturb (p, bytes);
The patch I've attached fixes the problem for me.
This problem is exacerbated by the fact that any sort of lock contention on the arena's results in us falling back on mmap()'ing a new chunk. This is because we check to see if the uncontended arena we check is corrupt, and if it is we loop through, and if we loop to the beginning we know we didn't find anything. Except if our initial arena isn't actually corrupt we'll still return NULL, so we fall back on this mmap() thing more often, which really makes things unstable.
Please get this fixed as soon as possible, I'd even go so far as to call it a possible security issue.
(In reply to Josef Bacik from comment #0)
> Created attachment 1055966 [details]
> patch to fix the problem.
> Description of problem:
> At Facebook we had an app that started hanging and crashing weirdly when
> going from glibc-2.12-1.149.el6.x86_64 to glibc-2.12-1.163.el6.x86_64.
Please note that there is already a RHEL 6.7.z errata that fixes this, and it was released two days ago:
Please update to glibc-2.12-1.166.el6_7.1.
One question, when you write "glibc-2.12-1.163.el6.x86_64" do you actually mean "glibc-2.12-1.166.el6.x86_64?" (note .166 not .163)?
Lastly, the robust malloc support has been backed out for the release, but we plan to put it back in as soon as we are certain we've corrected the remaining issues. Would you be interested in testing an unsupported non-production build with the new feature?
We're on Centos, not RHEL, we just happened to end up with the .163 release (I'm not sure how) before 6.7 was released. Give me whatever package you want me to test, we don't care about unsupported, obviously we are capable of supporting ourselves ;). I do need to have an src.rpm tho so I can build and test it on our systems and verify the issue I was seeing is actually fixed.
(In reply to Josef Bacik from comment #5)
> We're on Centos, not RHEL, we just happened to end up with the .163 release
> (I'm not sure how) before 6.7 was released. Give me whatever package you
> want me to test, we don't care about unsupported, obviously we are capable
> of supporting ourselves ;). I do need to have an src.rpm tho so I can build
> and test it on our systems and verify the issue I was seeing is actually
Sounds good. We'll get you something when we're ready. Thanks for agreeing to test :-)
Removing the "already fixed in 6.7.z" from the title because it confused me the couple of times I read it.
This issue has been addressed in the following products:
Red Hat Enterprise Linux 6
Via RHBA-2015:1465 https://rhn.redhat.com/errata/RHBA-2015-1465.html
The SRPM is available here: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/6Server/en/os/SRPMS/glibc-2.12-1.166.el6_7.1.src.rpm