Bug 1305406

Summary: invalid fastbin entry (free), missing glibc patch
Product: Red Hat Enterprise Linux 7 Reporter: Sumeet Keswani <sumeet.keswani>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: Arjun Shankar <ashankar>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.0CC: ashankar, a.stimec, bkunal, bobby.prins, bturner, codonell, cww, dkochuka, efi, eyal, fweimer, jkachuck, jreznik, jshort, ken.verma, mcermak, mnewsome, ncroxon, olim, pfrankli, rcyriac, sumeet.keswani, tfrazier, trinh.dao, t.williams, zpytela
Target Milestone: rcKeywords: Patch, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: glibc-2.17-121.el7 Doc Type: Bug Fix
Doc Text:
Certain code paths used by the C library's memory allocator "fastbins" feature, enabled by default (mallopt option M_MXFAST set to non-zero), were not thread-safe. When the non-safe code paths executed they could cause corruption in the allocator that would lead to a program segfault. The thread-unsafe code paths have been made thread-safe and should no longer cause application segfaults.
Story Points: ---
Clone Of:
: 1313308 (view as bug list) Environment:
Last Closed: 2016-11-03 08:30:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1203710, 1213541, 1297579, 1313308, 1364088    

Description Sumeet Keswani 2016-02-08 04:17:11 UTC
Description of problem:

On Red Hat Enterprise Linux 7.0+ the Vertica database server process may fail. 

1. Symptoms

1.1 This error appears in the <CATALOG_DIRECTORY>/dbLog file after the failure:

*** Error in `/opt/vertica/bin/vertica': invalid fastbin entry (free): 0x00007ef70f209800 ***
======= Backtrace: =========
0x7f0614f0efe1(/lib64/libc.so.6):  + 0x7cfe1
0x2a1e014(/opt/vertica/bin/vertica) CAT::TabColPair_pairToBytes2(void const*, void*, unsigned long)

1.2 The vertica.log file appears as if was truncated at an arbitrary place, sometime in the middle of a line.

1.3 In the core file for the failure,  the following pattern appears at the top of the stack

    raise
    abort
    __libc_message
    _int_free          <========== 
    CAT::TabColPair_pairToBytes2(void const*, void*, unsigned long)



2.0 Root cause
It appears that RHEL have taken the following glibc bug fix:
https://www.sourceware.org/bugzilla/show_bug.cgi?id=15073


3.0 How to check you have the affected glibc
3.1 Find your libc.so file.
ldd /opt/vertica/bin/vertica | grep libc.so
	libc.so.6 => /lib64/libc.so.6 (0x00007ff6dd99e000)

3.2 Run this command to determine whether fix has been applied:

## example of buggy lib.c
objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10 cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your libc is likely buggy." || echo "Your libc looks OK.")
   7ca16:	48 85 c9             	test   %rcx,%rcx
Your libc is likely buggy.

## example of good lib.c
objdump -r -d /lib/x86_64-linux-gnu/libc.so.6 | grep -C 20 _int_free | grep -C 10 cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your libc is likely buggy." || echo "Your libc looks OK.")
Your libc looks OK.

3.3. Complete examples
You can also examine your libc and identify whether the fix has been applied or not. The following example contains the string ‘test   %dil,%dil. This means that the fix has been applied:

objdump -r -d /lib64/libc-2.12.so | grep -C 20 _int_free | grep -C 10 cmpxchg | head -21
 32cd8786cb:   40 20 f7                and    %sil,%dil
 32cd8786ce:   74 0c                   je     32cd8786dc <_int_free+0xec>
 32cd8786d0:   4c 8b 42 08             mov    0x8(%rdx),%r8
 32cd8786d4:   41 c1 e8 04             shr    $0x4,%r8d
 32cd8786d8:   41 83 e8 02             sub    $0x2,%r8d
 32cd8786dc:   48 89 53 10             mov    %rdx,0x10(%rbx)
 32cd8786e0:   48 89 d0                mov    %rdx,%rax
 32cd8786e3:   64 83 3c 25 18 00 00    cmpl   $0x0,%fs:0x18
 32cd8786ea:   00 00
 32cd8786ec:   74 01                   je     32cd8786ef <_int_free+0xff>
 32cd8786ee:   f0 48 0f b1 19          lock cmpxchg %rbx,(%rcx)
 32cd8786f3:   48 39 c2                cmp    %rax,%rdx
 32cd8786f6:   75 c0                   jne    32cd8786b8 <_int_free+0xc8>
 32cd8786f8:   40 84 ff                test   %dil,%dil             <==** likely good**==
 32cd8786fb:   74 09                   je     32cd878706 <_int_free+0x116>
 32cd8786fd:   41 39 e8                cmp    %ebp,%r8d
 32cd878700:   0f 85 05 07 00 00       jne    32cd878e0b <_int_free+0x81b>
 32cd878706:   48 83 c4 28             add    $0x28,%rsp
 32cd87870a:   5b                      pop    %rbx
 32cd87870b:   5d                      pop    %rbp
 32cd87870c:   41 5c                   pop    %r12


The following example does not contain the ‘test   %dil,%dil’ . This means the fix has not been applied:
objdump -r -d /lib64/libc-2.17.so | grep -C 20 _int_free | grep -C 10 cmpxchg | head -21
 7c9ec:       48 85 c9                test   %rcx,%rcx
 7c9ef:       74 09                   je     7c9fa <_int_free+0xda>
 7c9f1:       8b 41 08                mov    0x8(%rcx),%eax
 7c9f4:       c1 e8 04                shr    $0x4,%eax
 7c9f7:       8d 70 fe                lea    -0x2(%rax),%esi
 7c9fa:       48 89 4b 10             mov    %rcx,0x10(%rbx)
 7c9fe:       48 89 c8                mov    %rcx,%rax
 7ca01:       64 83 3c 25 18 00 00    cmpl   $0x0,%fs:0x18
 7ca08:       00 00
 7ca0a:       74 01                   je     7ca0d <_int_free+0xed>
 7ca0c:       f0 48 0f b1 1a          lock cmpxchg %rbx,(%rdx)
 7ca11:       48 39 c1                cmp    %rax,%rcx
 7ca14:       75 ca                   jne    7c9e0 <_int_free+0xc0>
 7ca16:       48 85 c9                test   %rcx,%rcx               <==**likely buggy**===
 7ca19:       74 09                   je     7ca24 <_int_free+0x104>
 7ca1b:       44 39 e6                cmp    %r12d,%esi
 7ca1e:       0f 85 84 08 00 00       jne    7d2a8 <_int_free+0x988>
 7ca24:       48 83 c4 48             add    $0x48,%rsp
 7ca28:       5b                      pop    %rbx
 7ca29:       5d                      pop    %rbp
 7ca2a:       41 5c                   pop    %r12





Version-Release number of selected component (if applicable):


How reproducible:
Randomly occurs, not reliably reproduicible

Steps to Reproduce:
none

Actual results:


Expected results:


Additional info:

Comment 1 Sumeet Keswani 2016-02-08 04:25:15 UTC
It appears that RHEL has *NOT* taken/back-ported the following glibc bug fix into the gllibc 2.17 release stream.

https://www.sourceware.org/bugzilla/show_bug.cgi?id=15073

Comment 3 Sumeet Keswani 2016-02-08 04:33:36 UTC
I am hoping someone can confirm or ascertain if RHEL took this glbic fix or not?
Our observations of the sources and the dis-assembly suggest that RHEL is missing this crucial fix, which may result in application crashes.

Comment 5 Sumeet Keswani 2016-02-08 14:44:17 UTC
Yes we know rhel patched it in 2.12 stream of glibc on RHEL 6.X.

but that does not mean RHEL patched 2.17 stream on RHEL 7.x.
check the glibc sources from the glibc src rpm for 2.17 - the patch is not there.

Comment 6 Sumeet Keswani 2016-02-08 17:49:24 UTC
We later discovered that depending on how the bug is fixed, 
the “test   %rcx,%rcx” is part of a valid fix PROVIDED there are 4 conditional jumps after the “cmpxchg” rather than the 3 as shown in the description.

furthermore
The *only* reason the “fastbin” error shows up on stdout is because of the code’s erroneous belief that ABA is an error, therefore, *all* “fastbin” errors in stdout are exactly because of this error (missed patch).
 

Also, specific lines from sources (from src.rpm) that show the patch isn't there

# glibc-2.17-c758a686/malloc/malloc.c
3834       }
3835     while ((old = catomic_compare_and_exchange_val_rel (fb, p, fd)) != fd);
3836 
3837     if (fd != NULL && __builtin_expect (old_idx != idx, 0))
3838       {

Comment 7 Carlos O'Donell 2016-02-09 04:11:51 UTC
This is a regression from RHEL6 and we will be looking into this to make sure RHEL7 doesn't have the same flaw.

Comment 8 Sumeet Keswani 2016-02-10 18:14:22 UTC
Current RHEL 7.x does have this issue at least till glibc-2.17-106. 
(based on observations of the sources and dis-assembly - above). 
Operationally, processes do get killed with SIGABORT (sometimes several times a day). 

Do you have a time estimate as to when this can be patched and released to the repositories?

Comment 9 Florian Weimer 2016-02-10 18:23:31 UTC
(In reply to Sumeet Keswani from comment #8)
> Do you have a time estimate as to when this can be patched and released to
> the repositories?

We confirm that this issue exits.  A future update may address it.

Please open a support case if you can.  It helps us to prioritize this issue.

Comment 10 Sumeet Keswani 2016-02-10 22:43:58 UTC
thanks - will do

Comment 12 Joseph Kachuck 2016-02-11 17:37:35 UTC
Hello Sumeet,
Please confirm your HPE email address.

Thank You
Joe Kachuck

Comment 14 Sumeet Keswani 2016-02-11 18:49:40 UTC
(In reply to Joseph Kachuck from comment #12)
> Hello Sumeet,
> Please confirm your HPE email address.
> 
> Thank You
> Joe Kachuck

Do you want me to put my email in the BZ?
I just changed the email of my account to the hpe email, is that sufficient?

Comment 15 Sumeet Keswani 2016-02-11 18:55:59 UTC
Although we ran into this on both RHEL 7.0 and RHEL 7.1. 
A fix for this desired on the following streams
RHEL 7.0
RHEL 7.1
RHEL 7.2
RHEL 7.3

I believe a glibc update should be pushed to repositories on all three streams.

This would be a minor change which should not require a end user to upgrade to a higher version of the kernel/OS.

Comment 16 Joseph Kachuck 2016-02-11 21:54:36 UTC
Hello,
In order to request this for Z stream. Please provide a client impact statement.
Please confirm what a client would be doing to see the issue.

Please also note that in order for a client to get a 7.1.z update they would need to have an EUS entitlement. Please confirm if you know of a client that would be willing to purchase EUS to get this update? If a client already has EUS would you be able to have them file a support case requesting this update.

Please also note this cannot be requested for Z until it accepted and verified for the current release.

Thank You
Joe Kachuck

Comment 17 Sumeet Keswani 2016-02-11 22:51:55 UTC
My apologies, I am not very familiar with internal RHEL processes and business practices.


1. This is a regression. I presume that should mean something.

2. I will ask three customers who have a confirmed RHEL entitlements to open support cases referring to this bug and escalate via their internal IT departments.

Comment 22 Trinh Dao 2016-02-12 20:02:35 UTC
Hi JoeK, can the customer with EUS subscription request RH to create the 7.0 or 7.1 EUS ZStream they need?

Comment 23 Sumeet Keswani 2016-02-12 20:10:42 UTC
We have put out a request to our customer(s) who are hitting this and have RHEL support to escalate and ask for a patch. there are a few of them.
Hopefully you will get a support request on this.

Comment 24 Florian Weimer 2016-02-12 20:31:45 UTC
(In reply to Trinh Dao from comment #22)
> Hi JoeK, can the customer with EUS subscription request RH to create the 7.0
> or 7.1 EUS ZStream they need?

To clarify, there is no EUS stream for 7.0, and no future updates will be released.  Please refer to this resource for detail product life-cycle information:

  https://access.redhat.com/support/policy/updates/errata/#Extended_Update_Support

Comment 25 Trinh Dao 2016-02-12 20:42:28 UTC
the link stated:
In Red Hat Enterprise Linux 7, EUS is available for the following releases:
•7.1 (ends March 31, 2017)
•7.2 (ends November 30, 2017)

 7.0 end date is not listed. What is the end date for 7.0?

Comment 26 Joseph Kachuck 2016-02-12 21:09:43 UTC
Hello,
A client with EUS can open a support case and request this support case to be connected to this BZ. They would also need to state they need this fix for RHEL 7.1.z.

> This is a regression. I presume that should mean something.
Please confirm the latest RHEL 7.x release this worked correctly. I apologize for details, in order for me to flag this BZ as a regression. It must have worked previously in same major release.

RHEL 7.0 did have EUS support. No errata updates would have been released once RHEL 7.1 was released.

Thank You
Joe Kachuck

Comment 27 Sumeet Keswani 2016-02-12 21:18:12 UTC
> This is a regression. I presume that should mean something.

we saw this exact same thing in RHEL 6.5, which was ultimately fixed
( https://bugzilla.redhat.com/show_bug.cgi?id=1027101 )

Now we ran into it in RHEL 7.x again. 

i.e. something was broken in 6.5 , was fixed, and then is now broken in 7.x.
Hence i referred to it as a regression.

Comment 33 Florian Weimer 2016-02-17 15:06:26 UTC
Alen, you set the needinfo? flag on this bug.  What kind of information do you need?

Comment 34 Joseph Kachuck 2016-02-17 18:52:42 UTC
Hello HPE,
Please download and test the packages from:
http://people.redhat.com/jkachuck/.test1305406/

Please note these are test packages only. 
Please note they may be removed in 3 days.

Thank You
Joe Kachuck

Comment 35 Sumeet Keswani 2016-02-17 22:49:06 UTC
Thanks, i got them.
We will try to test. This is a random failure, so will take a bit to know for sure. But if our RHEL 6.5 experience is any indication, this is just the fix we need(ed).

Comment 38 Joseph Kachuck 2016-02-24 21:19:04 UTC
Hello,
Any update on if the test package corrected this issue?

Thank You
Joe Kachuck

Comment 39 Sumeet Keswani 2016-02-24 23:17:03 UTC
Its a race conditions so we don't have proof positive with your rpm specifically just yet.

We have built our own glibc rpm with the patch from sourceware and that has been working without any issues for a while - if that is any indication.

Comment 40 Carlos O'Donell 2016-02-25 01:15:32 UTC
(In reply to Sumeet Keswani from comment #39)
> Its a race conditions so we don't have proof positive with your rpm
> specifically just yet.
> 
> We have built our own glibc rpm with the patch from sourceware and that has
> been working without any issues for a while - if that is any indication.

Thanks. That does help indicate that the patch is a step in the right direction. I agree that with concurrency defects it is difficult to validate them.

Comment 56 Joseph Kachuck 2016-03-01 16:21:44 UTC
Hello Sumeet,
Please confirm if you were are to confirm if the test package corrected this issue.

Thank You
Joe Kachuck

Comment 57 Sumeet Keswani 2016-03-01 20:39:35 UTC
We need to run for a bit longer before we will know with reasonable degree of certainty. We have also asked customer who gets this often to try the rpms. 

I will update you with the outcome in a week or so.

Comment 61 Sumeet Keswani 2016-03-18 14:31:57 UTC
glibc patch/rpm looks stable and good for weeks without issues.
I feel this is good. 
(would like to test for a few more weeks, but i think i am ready to declare success here since we can go on for ever.)

Comment 62 Oonkwee Lim 2016-03-21 18:45:30 UTC
Hello Carlos,

My customer is requesting for the i686 version of this glibc patch.

Do you know how to obtain the packages?

Thanks & Regards

Oonkwee
Emerging Technologies
RedHat Global Support

Comment 66 Trinh Dao 2016-03-23 15:08:13 UTC
There is already 7.2z bug requested, bug 1313308.

Comment 67 Oonkwee Lim 2016-03-25 00:57:30 UTC
I have send the i686 version of this glibc patch to the customer.

Thanks for re-generating it.

Comment 68 Mike McCune 2016-03-28 22:37:22 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 69 Sumeet Keswani 2016-03-29 18:30:58 UTC
no crashes since we started using the patched glibc.
AFAIK, this fixes it. 

now the only things is to get it out there proactively for all/relevant RHEL streams.

Comment 73 Trinh Dao 2016-06-23 14:39:01 UTC
Hi Joe, will this patch fix be in RHEL7.3 Alpha?

Comment 74 Florian Weimer 2016-06-23 14:43:18 UTC
(In reply to Trinh Dao from comment #73)
> Hi Joe, will this patch fix be in RHEL7.3 Alpha?

Yes, we have already addressed the issue in 7.2.z (via RHBA-2016:1030-1), and 7.3 will inherit the fix.

Comment 75 Eyal Yurman 2016-07-07 21:39:16 UTC
Hi, I'm a HPE customer and AWS customer using Red Hat.
how can I get this package? (glibc-2.17-121.el7)
From HPE, from AWS or directly from Red Hat?

Comment 76 Florian Weimer 2016-07-08 06:46:21 UTC
This issue was addressed for Red Hat Enterprise Linux 7.2 in this erratum:

  https://rhn.redhat.com/errata/RHBA-2016-1030.html

If you need assistance in obtaining it and have an active Red Hat Enterprise Linux subscription, please file a support request at:

  https://access.redhat.com/support/cases/

Comment 85 Ben Turner 2016-09-01 14:10:06 UTC
In the fixed in field for this bug I see:

glibc-2.17-121.el7

In the errata I see:

glibc-2.17-106.el7_2.6.x86_64.rpm

This is what lead to my confusion.  Should we update the fixed in field of this BZ with the errata's build number or is the -121 the correct build number for this fix?

Comment 86 Florian Weimer 2016-09-01 14:15:49 UTC
(In reply to Ben Turner from comment #85)
> In the fixed in field for this bug I see:
> 
> glibc-2.17-121.el7
> 
> In the errata I see:
> 
> glibc-2.17-106.el7_2.6.x86_64.rpm
> 
> This is what lead to my confusion.  Should we update the fixed in field of
> this BZ with the errata's build number or is the -121 the correct build
> number for this fix?

The fix for Red Hat Enterprise Linux 7.2.z is tracked in bug 1313308, but we also need to fix it in Red Hat Enterprise Linux 7.3 because 7.3 branched from 7.2 after the fix went into 7.2.z.

Comment 87 Ben Turner 2016-09-19 14:08:59 UTC
*** Bug 1371228 has been marked as a duplicate of this bug. ***

Comment 89 errata-xmlrpc 2016-11-03 08:30:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2573.html