1544815 – logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster

Bug 1544815 - logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster

Summary: logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalabi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Rich Megginson
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:	aos-scalability-39
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-13 14:56 UTC by Mike Fiedler
Modified:	2018-06-20 15:05 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: On some hardware, the jemalloc library is not loaded correctly when using LD_PRELOAD. Consequence: The fluentd process crashes. Fix: The fix is to use LD_BIND_NOW to ensure the jemalloc library is loaded correctly. Result: fluentd does not crash
Clone Of:
Environment:
Last Closed:	2018-03-28 14:28:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
strace of logging-fluentd startup until it crashes (14.95 KB, text/plain) 2018-02-13 14:56 UTC, Mike Fiedler	no flags	Details
installed packages from environment where fluentd crashes (56.66 KB, text/plain) 2018-02-13 15:35 UTC, Mike Fiedler	no flags	Details
installed packages from environment where fluentd does not crash (57.67 KB, text/plain) 2018-02-13 15:35 UTC, Mike Fiedler	no flags	Details
semanage boolean --list on the compute where fluentd pod is running (23.23 KB, text/plain) 2018-02-13 20:33 UTC, Mike Fiedler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin-aggregated-logging pull 977	0	'None'	closed	Bug 1544815 - logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster	2020-03-17 21:12:38 UTC
Red Hat Product Errata	RHBA-2018:0489	0	None	None	None	2018-03-28 14:28:54 UTC

Description Mike Fiedler 2018-02-13 14:56:20 UTC

Created attachment 1395413 [details]
strace of logging-fluentd startup until it crashes

Description of problem:

In the OpenShift scalability CI lab, logging-fluentd is crashing immediately after startup.   The exact same build of logging-fluentd (3.9.0-0.41.0) runs fine on AWS.   Both environments are cloud and have a huge number of differences (hardware, kernel version, virtualization method, etc) which I will try to narrow down as much as possible. The crash is not load related, logging-fluentd just never comes to Running when USE_JEMALLOC uses its default value of true.

Externally, the sypod just restarts over and over with nothing in the pod log.

Running oc debug and then running "run.sh" manually to start fluentd shows that fluentd is seg faulting and core dumping immediately after "exec fluentd --no-supervisor ":

+ rm -f /etc/fluent/configs.d/openshift/filter-pre-force-utf8.conf    
+ touch /etc/fluent/configs.d/openshift/filter-pre-force-utf8.conf    
+ type -p jemalloc-config          
+ '[' true = true ']'              
++ jemalloc-config --libdir        
++ jemalloc-config --revision      
+ export LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2         
+ LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2                
+ '[' true = true ']'              
+ rm -f /etc/fluent/configs.d/openshift/filter-exclude-journal-debug.conf                                                                   
+ touch /etc/fluent/configs.d/openshift/filter-exclude-journal-debug.conf                                                                   
+ [[ -n '' ]]                      
+ exec fluentd --no-supervisor     
Segmentation fault (core dumped)  

The core dump has very little of value (to me anyways - I can provide it):

Core was generated by `/usr/bin/ruby /usr/bin/fluentd --no-supervisor'.                                                                      
Program terminated with signal 11, Segmentation fault.                
#0  0x00007f7a96deafaf in _dl_runtime_profile_avx512 () from /lib64/ld-linux-x86-64.so.2                                                     
Missing separate debuginfos, use: debuginfo-install glibc-2.17-221.el7.x86_64 nss-softokn-freebl-3.34.0-2.el7.x86_64                         
(gdb) bt                           
#0  0x00007f7a96deafaf in _dl_runtime_profile_avx512 () from /lib64/ld-linux-x86-64.so.2                                                     
#1  0x0000000000000000 in ?? ()    
(gdb) info threads                 
  Id   Target Id         Frame     
  2    LWP 292           0x00007f7a96682f06 in ?? ()                  
* 1    LWP 345           0x00007f7a96deafaf in _dl_runtime_profile_avx512 () from /lib64/ld-linux-x86-64.so.2 
 
Gathered an strace of fluentd startup and the crash is a SEGV_ACCERR while trying to fork a child process (full strace attached):

641   mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f5837516000
641   mprotect(0x7f5837516000, 4096, PROT_NONE) = 0
641   clone(child_stack=0x7f58375177f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f58375199d0, tls=0x7f5837519700, child_tidptr=0x7f58375199d0) = 642
642   --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7f5837516d40} ---
642   --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---


Where does libjemalloc come in?  If I set the environment variable USE_JEMALLOC=false then the crash does not occur and logging-fluentd fully initializes and goes into Running state.


Version-Release number of selected component (if applicable):  Logging 3.9.0-0.41.0


How reproducible:  Always, in this specific OpenStack cluster


Steps to Reproduce:

This problem is occurring immediately upon starting logging-fluentd in this cluster.

Actual results:

logging-fluentd will not start - there are no entries in the pod log.   Running the pod with oc debug ds/logging-fluentd and then running bash -x ./run.sh to start fluentd will demonstrate the issue

Additional info:

Will add a comparison of working and non-working environments in a follow-up comment.

Comment 1 Mike Fiedler 2018-02-13 14:59:02 UTC

Searching jemalloc issues did not yield a match (so far).   Several issues mention problems with jemalloc and Transparent Huge Pages (which might be an inherent problem for usage in OpenShift).  I turned THP off on one of the nodes where the crash was occurring but the issue still occurred.   This crash does not seem related to THP

Comment 2 Rich Megginson 2018-02-13 15:05:33 UTC

> Where does libjemalloc come in?

here

+ export LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2
+ LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2

> If I set the environment variable USE_JEMALLOC=false then the crash does not occur and logging-fluentd fully initializes and goes into Running state.

So I guess the workaround for now use USE_JEMALLOC=false

selinux?  any AVCs?

I don't think it is specifically related to jemalloc, since fluentd works with jemalloc in many other environments.

@nhosoi - could this be related to the execmem issue you have been working on?

Comment 3 Mike Fiedler 2018-02-13 15:13:29 UTC

> Where does libjemalloc come in?

Sorry, rhetorical question :-)

> So I guess the workaround for now use USE_JEMALLOC=false

Agree, but that also re-introduces known leaks/memory growth.  But, I understatnd.

>  selinux?  any AVCs?

Nothing in /var/log/audit.  Also, tested with setenforce 0 and the issue was still reproducible.

>I don't think it is specifically related to jemalloc, since fluentd works with jemalloc in many other environments.

Yes, it definitely works for me in other environments.   Comparison of env coming in subsequent comment.

Comment 4 Mike Fiedler 2018-02-13 15:34:12 UTC

Full list of installed packages in working and non-working environments attached.

Non-working (scale CI cluster)

RHEL 7.4.4 
Kernel 3.10.0-829.el7.x86_64  Spectre/Meltdown patch status:
  /sys/kernel/debug/x86/ibpb_enabled:0
  /sys/kernel/debug/x86/ibrs_enabled:0
  /sys/kernel/debug/x86/pti_enabled
OpenShift 3.9.0-0.24.0 on Docker 1.13
OpenStack 11 and KVM
Skylake processors on the hypervisors


Working (AWS)

RHEL 7.4.4
Kernel 3.10.0-843.el7.x86_64  Spectre/Meltdown patch status:
  /sys/kernel/debug/x86/ibpb_enabled:0
  /sys/kernel/debug/x86/ibrs_enabled:0
  /sys/kernel/debug/x86/pti_enabled:1
OpenShift 3.9.0-0.38.0 on Docker 1.13
AWS EC2 and Xen
Broadwell processors on the hypervisors (AWS m4 instances)

Comment 5 Mike Fiedler 2018-02-13 15:35:08 UTC

Created attachment 1395452 [details]
installed packages from environment where fluentd crashes

Comment 6 Mike Fiedler 2018-02-13 15:35:39 UTC

Created attachment 1395453 [details]
installed packages from environment where fluentd does not crash

Comment 7 Noriko Hosoi 2018-02-13 16:20:09 UTC

Hi Mike,

Could you run "ldd /usr/bin/fluentd" on the both envs?

Are there any differences?

You may need to install glibc-common to get ldd...

Comment 8 Mike Fiedler 2018-02-13 17:42:24 UTC

non-working/crash env:

sh-4.2# ldd /usr/bin/fluentd       
        not a dynamic executable   
sh-4.2# ldd /usr/bin/ruby          
        linux-vdso.so.1 =>  (0x00007fff56f8d000)                      
        libruby.so.2.0 => /lib64/libruby.so.2.0 (0x00007f2589d9d000)  
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2589b81000)                                                                      
        librt.so.1 => /lib64/librt.so.1 (0x00007f2589979000)          
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f2589775000)          
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f258953e000)    
        libm.so.6 => /lib64/libm.so.6 (0x00007f258923c000)            
        libc.so.6 => /lib64/libc.so.6 (0x00007f2588e79000)            
        /lib64/ld-linux-x86-64.so.2 (0x00007f258a1fa000)              
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f2588c76000)

sh-4.2# yum list installed | grep glibc                              
glibc.x86_64                         2.17-196.el7_4.2      @anaconda/7.4        
glibc-common.x86_64                  2.17-196.el7_4.2      @anaconda/7.4 

working env:

sh-4.2# ldd /usr/bin/fluentd
        not a dynamic executable
sh-4.2# ldd /usr/bin/ruby
        linux-vdso.so.1 =>  (0x00007ffe55df8000)
        libruby.so.2.0 => /lib64/libruby.so.2.0 (0x00007f89371fc000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8936fe0000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f8936dd8000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f8936bd4000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f893699d000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f893669b000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f89362d8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f8937659000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f89360d5000)

Comment 9 Mike Fiedler 2018-02-13 17:43:38 UTC

Submitted too soon - glibc is identical version in both env.

working:

sh-4.2# yum list installed | grep glibc                              
glibc.x86_64                   2.17-196.el7_4.2      @anaconda/7.4              
glibc-common.x86_64            2.17-196.el7_4.2      @anaconda/7.4

Comment 10 Noriko Hosoi 2018-02-13 19:43:06 UTC

Hmmm, could deny_execmem be set on the crashed host by any chance?
Can we have the output from "semanage boolean --list"?

> Gathered an strace of fluentd startup and the crash is a SEGV_ACCERR while trying to fork a child process (full strace attached):
>
> 641   mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f5837516000
> 641   mprotect(0x7f5837516000, 4096, PROT_NONE) = 0
> 641   clone(child_stack=0x7f58375177f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f58375199d0, tls=0x7f5837519700, child_tidptr=0x7f58375199d0) = 642
> 642   --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7f5837516d40} ---
> 642   --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---

Comment 11 Mike Fiedler 2018-02-13 20:33:59 UTC

Created attachment 1395643 [details]
semanage boolean --list on the compute where fluentd pod is running

deny_execmem                   (off  ,  off)  Allow deny to execmem

Full listing attached.

Comment 12 Noriko Hosoi 2018-02-13 21:36:51 UTC

Thanks, Mike.  The list is identical to mine on the test env...

Comment 14 Florian Weimer 2018-02-13 21:59:15 UTC

(In reply to Mike Fiedler from comment #4)
> Full list of installed packages in working and non-working environments
> attached.
> 
> Non-working (scale CI cluster)
> 
> RHEL 7.4.4 
> Kernel 3.10.0-829.el7.x86_64  Spectre/Meltdown patch status:
>   /sys/kernel/debug/x86/ibpb_enabled:0
>   /sys/kernel/debug/x86/ibrs_enabled:0
>   /sys/kernel/debug/x86/pti_enabled
> OpenShift 3.9.0-0.24.0 on Docker 1.13
> OpenStack 11 and KVM
> Skylake processors on the hypervisors

Is this a CPU variant with AVX-512?  (There are many different Skylake chips out there.)  Lazy binding needs a fair amount of stack space since glibc-2.17-196.el7_4.1.  We changed the thread stack accounting in glibc-2.17-221.el7, essentially increasing the stack allocation by 4 KiB.  This may help here.

If lazy binding is indeed the culprit, LD_BIND_NOW=1 may be a workaround, too.

The other question is what is creating such small stacks.  PTHREAD_STACK_MIN is *not* intended as a guideline for small stacks, it is the absolute minimum stack that can be used to launch a new thread.  This does not mean the thread is able to do anything useful with that stack.

Comment 15 Carlos O'Donell 2018-02-13 22:49:04 UTC

We need a full backtrace to debug this any further and to distinguish the exact root cause of the issue.

Please install the debuginfo as directed by the debugger "Missing separate debuginfos, use: debuginfo-install glibc-2.17-221.el7.x86_64 nss-softokn-freebl-3.34.0-2.el7.x86_64", re-run the scenario, and get a full backtrace of the exact crash location.

Comment 17 Mike Fiedler 2018-02-14 01:21:01 UTC

re: comment 14:  LD_BIND_NOW=1 worked as a workaround.  No crash.

re:  comment 15.  Installed the debuginfo packages and here is what I get from bt full and info threads.   Let me know if there are other things you need to see.

Core was generated by `/usr/bin/ruby /usr/bin/fluentd --no-supervisor'.                                    
Program terminated with signal 11, Segmentation fault.                                                     
#0  0x00007fba05101faf in _dl_runtime_profile_avx512 () at ../sysdeps/x86_64/dl-trampoline.h:262           
262             VMOVA %VEC(5), (LR_VECTOR_OFFSET + VECTOR_SIZE*5)(%rsp)                                    
(gdb) bt                  
#0  0x00007fba05101faf in _dl_runtime_profile_avx512 () at ../sysdeps/x86_64/dl-trampoline.h:262           
#1  0x0000000000000000 in ?? ()                      
(gdb) bt full             
#0  0x00007fba05101faf in _dl_runtime_profile_avx512 () at ../sysdeps/x86_64/dl-trampoline.h:262           
No locals.                
#1  0x0000000000000000 in ?? ()                      
No symbol table info available.                      
(gdb) info thread
  Id   Target Id         Frame 
  2    LWP 7             0x00007fba0499a0ad in ?? ()
* 1    LWP 67            0x00007fba05101faf in _dl_runtime_profile_avx512 ()
    at ../sysdeps/x86_64/dl-trampoline.h:262

Comment 19 Rich Megginson 2018-02-14 01:23:06 UTC

(In reply to Mike Fiedler from comment #17)
> re: comment 14:  LD_BIND_NOW=1 worked as a workaround.  No crash.

Great news.  So we do have a workaround that allows us to use jemalloc.

> 
> re:  comment 15.  Installed the debuginfo packages and here is what I get
> from bt full and info threads.   Let me know if there are other things you
> need to see.
> 
> Core was generated by `/usr/bin/ruby /usr/bin/fluentd --no-supervisor'.     
> 
> Program terminated with signal 11, Segmentation fault.                      
> 
> #0  0x00007fba05101faf in _dl_runtime_profile_avx512 () at
> ../sysdeps/x86_64/dl-trampoline.h:262           
> 262             VMOVA %VEC(5), (LR_VECTOR_OFFSET + VECTOR_SIZE*5)(%rsp)     
> 
> (gdb) bt                  
> #0  0x00007fba05101faf in _dl_runtime_profile_avx512 () at
> ../sysdeps/x86_64/dl-trampoline.h:262           
> #1  0x0000000000000000 in ?? ()                      
> (gdb) bt full             
> #0  0x00007fba05101faf in _dl_runtime_profile_avx512 () at
> ../sysdeps/x86_64/dl-trampoline.h:262           
> No locals.                
> #1  0x0000000000000000 in ?? ()                      
> No symbol table info available.                      
> (gdb) info thread
>   Id   Target Id         Frame 
>   2    LWP 7             0x00007fba0499a0ad in ?? ()
> * 1    LWP 67            0x00007fba05101faf in _dl_runtime_profile_avx512 ()
>     at ../sysdeps/x86_64/dl-trampoline.h:262

Comment 20 Carlos O'Donell 2018-02-14 05:48:39 UTC

(In reply to Rich Megginson from comment #19)
> (In reply to Mike Fiedler from comment #17)
> > re: comment 14:  LD_BIND_NOW=1 worked as a workaround.  No crash.
> 
> Great news.  So we do have a workaround that allows us to use jemalloc.

This is an indicator that you are very close to the limit of the stack. Someone should analyze why you are close enough to the stack limit that this makes a difference. Additional stack usage could push you over again, and no additional tweaking will help. As Florian notes in comment #14, the usage of PTHREAD_STACK_MIN is the minimum amount of stack to start a thread, but anything else requires more stack.

Comment 21 Rich Megginson 2018-02-14 14:23:10 UTC

(In reply to Carlos O'Donell from comment #20)
> (In reply to Rich Megginson from comment #19)
> > (In reply to Mike Fiedler from comment #17)
> > > re: comment 14:  LD_BIND_NOW=1 worked as a workaround.  No crash.
> > 
> > Great news.  So we do have a workaround that allows us to use jemalloc.
> 
> This is an indicator that you are very close to the limit of the stack.
> Someone should analyze why you are close enough to the stack limit that this
> makes a difference. Additional stack usage could push you over again, and no
> additional tweaking will help. As Florian notes in comment #14, the usage of
> PTHREAD_STACK_MIN is the minimum amount of stack to start a thread, but
> anything else requires more stack.

So, how do we proceed with trying to use glibc malloc, if/when we hit the wall with jemalloc and cannot use it anymore?

Comment 23 Carlos O'Donell 2018-02-14 16:40:27 UTC

(In reply to Rich Megginson from comment #21)
> (In reply to Carlos O'Donell from comment #20)
> > (In reply to Rich Megginson from comment #19)
> > > (In reply to Mike Fiedler from comment #17)
> > > > re: comment 14:  LD_BIND_NOW=1 worked as a workaround.  No crash.
> > > 
> > > Great news.  So we do have a workaround that allows us to use jemalloc.
> > 
> > This is an indicator that you are very close to the limit of the stack.
> > Someone should analyze why you are close enough to the stack limit that this
> > makes a difference. Additional stack usage could push you over again, and no
> > additional tweaking will help. As Florian notes in comment #14, the usage of
> > PTHREAD_STACK_MIN is the minimum amount of stack to start a thread, but
> > anything else requires more stack.
> 
> So, how do we proceed with trying to use glibc malloc, if/when we hit the
> wall with jemalloc and cannot use it anymore?

I'm spinning up a project to analyze RSS usage in glibc malloc, and I'd like to use fluentd and 389ds as examples/guinea pigs. We have a high-speed malloc tracer version of libc.so.6 that can trace all malloc API family calls across all threads. We work with you to capture traces, and then run them through a simulator so we can recreate the fragmentation and pause the simulator to look at heap evolution.

I want to be clear here that this is just the root cause analysis of why there is increased RSS, after that we have to have a deeper think about how that might be fixed. It is a longer road, but if we start today we'll be able to offer you more choices in the future, hopefully one that allows you to choose glibc malloc and reduce your need to bundle other allocators.

Comment 28 Florian Weimer 2018-02-19 11:25:28 UTC

I would like to find out what actually is going on.  I reviewed the available evidence, and it does not conclusively show that lack of stack space is the immediate cause of the crash.

I tried to reproduce the issue with:

glibc-2.17-222.el7.x86_64
ruby-libs-2.0.0.648-30.el7.x86_64
rubygem-yajl-ruby-1.3.1-1.el7.x86_64
rubygem-msgpack-1.2.0-1.el7.x86_64
rubygem-cool.io-1.5.3-1.el7.x86_64
rubygem-http_parser.rb-0.6.0-4.el7.x86_64
rubygem-string-scrub-0.0.5-4.el7.x86_64
libyaml-0.1.4-11.el7_0.x86_64
rubygem-bigdecimal-1.2.0-30.el7.x86_64
ruby-irb-2.0.0.648-30.el7.noarch
ruby-2.0.0.648-30.el7.x86_64
rubygem-io-console-0.4.2-30.el7.x86_64
rubygem-rdoc-4.0.0-30.el7.noarch
rubygem-json-2.0.2-3.el7.x86_64
rubygems-2.0.14.1-30.el7.noarch
rubygem-psych-2.0.0-30.el7.x86_64
rubygem-thread_safe-0.3.6-1.el7.noarch
rubygem-tzinfo-1.2.4-1.el7.noarch
rubygem-tzinfo-data-1.2017.3-1.el7.noarch
rubygem-sigdump-0.2.4-1.el7.noarch
fluentd-0.12.42-1.el7.x86_64

However, fluentd starts up:

# LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2 /usr/bin/ruby /usr/bin/fluentd --no-supervisor
2018-02-19 06:08:47 -0500 [info]: reading config file path="/etc/fluent/fluent.conf"
2018-02-19 06:08:47 -0500 [info]: starting fluentd-0.12.42 without supervision
2018-02-19 06:08:47 -0500 [info]: gem 'fluentd' version '0.12.42'
2018-02-19 06:08:47 -0500 [info]: using configuration file: <ROOT>
</ROOT>

And this configuration uses _dl_runtime_resolve_xsavec, not _dl_runtime_profile_avx512.  Since comment 17 shows that the _dl_runtime_profile_avx512 symbol is *not* spurious, I wonder why this dynamic linker trampoline is used on your system.  This should not happen, and I have not seen any explanation for that yet.

Comment 29 Mike Fiedler 2018-02-19 12:58:23 UTC

@fweimer  re:  comment 28 - what information is required?   I'm happy to gather it but it wasn't clear what you were looking for.   Thanks.

Comment 30 Florian Weimer 2018-02-19 13:23:42 UTC

(In reply to Mike Fiedler from comment #29)
> @fweimer  re:  comment 28 - what information is required?   I'm happy to
> gather it but it wasn't clear what you were looking for.   Thanks.

I need to replicate the issue, so that I can poke at the running binary with GDB.  I tried to replicate your installation, but the issue does not reproduce, so either my configuration is different, or I'm using different software versions/builds.

Comment 34 Mike Fiedler 2018-02-19 16:20:23 UTC

Further investigation:

Running LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2 /usr/bin/ruby /usr/bin/fluentd --no-supervisor 

with glibc 2.17-221.el7 - no crash, fluentd starts

with glibc 2.17-196.el7_4.2 - the crash described here occurs

Lucky?  or something fixed in the later version?

@fweimer I have a system exhibiting the behavior on 2.17-196.   If you are still interested,  please send/link me your public ssh key via e-mail and I will send you access instructions.

Comment 35 Florian Weimer 2018-02-19 19:06:24 UTC

(In reply to Mike Fiedler from comment #34)
> Further investigation:
> 
> Running LD_PRELOAD=/usr/libexec/fluentd/lib/libjemalloc.so.2 /usr/bin/ruby
> /usr/bin/fluentd --no-supervisor 
> 
> with glibc 2.17-221.el7 - no crash, fluentd starts
> 
> with glibc 2.17-196.el7_4.2 - the crash described here occurs

I see a different crash on the system you supplied, not matching the one in the description:

(gdb) bt
#0  0x00007ffff7df0faf in _dl_runtime_resolve_xsavec () from /lib64/ld-linux-x86-64.so.2
#1  0x00007ffff72fdd9b in start_thread () from /lib64/libpthread.so.0
#2  0x00007ffff66e634d in clone () from /lib64/libc.so.6

Note that the expected ld.so trampoline is used.

This crash happens because Ruby creates the timer thread with a size of PTHREAD_STACK_MIN, and glibc subtracts the size of the TLS data from that (which is a glibc bug—we never even get to the user-supplied thread function, so this is a clear conformance issue).  jemalloc needs much, much more TLS data than glibc malloc (6104 bytes, I believe), which is why LD_PRELOADing it triggers the crash.  And the crash happens in the expected place in the dynamic linker trampoline, right after the stack allocation for the large save area required by AVX-512.

So to summarize: This crash (in comment 34) is the known issue around PTHREAD_STACK_MIN.  The workaround in glibc 2.17-221.el7 appears to help here, but with 6104 bytes of TLS data consumed by jemalloc, it is still going to be very tight.

However, there have been *different* backtraces posted in this bug (comment 0 and comment 17), allegedly even with glibc 2.17-221.el7, and those are not explained by the issue discussed in this comment.  If you can reproduce the different crash on glibc-2.17-221.el7.x86_64, I would love to see that under GDB and poke around for a bit.  I think we may have a second bug here.

Comment 36 openshift-github-bot 2018-02-26 23:19:34 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/6ed6a427c4e2c82f775ab31887866d5560b4f8d5
Bug 1544815 - logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster

https://bugzilla.redhat.com/show_bug.cgi?id=1544815
Use

    export LD_BIND_NOW=1

to workaround the problem when using jemalloc.

https://github.com/openshift/origin-aggregated-logging/commit/3310075669a0253856cdb018257c037085052434
Merge pull request #977 from richm/bug-1544815

Automatic merge from submit-queue.

Bug 1544815 - logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster

Comment 38 Mike Fiedler 2018-03-05 15:52:56 UTC

Verified on 3.9.2.   LD_BIND_NOW is set.

Comment 41 errata-xmlrpc 2018-03-28 14:28:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Comment 42 openshift-github-bot 2018-06-20 15:05:55 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/1774d58f9488046c0ee4f835e986106078d4cb8c
Bug 1544815 - logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster

https://bugzilla.redhat.com/show_bug.cgi?id=1544815
Use

    export LD_BIND_NOW=1

to workaround the problem when using jemalloc.

https://github.com/openshift/origin-aggregated-logging/commit/1f36bf5166fcb3900fb0e0647de839f095f0ae0e
Merge pull request #978 from openshift-cherrypick-robot/cherry-pick-977-to-es5.x

[es5.x] Bug 1544815 - logging-fluentd SEGV_ACCERR crash on init when libjemalloc is used in scalability CI lab cluster

Note You need to log in before you can comment on or make changes to this bug.