Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1692583

Summary: OpenShift Router HAProxy core dumping
Product: OpenShift Container Platform Reporter: Robert Bost <rbost>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED NEXTRELEASE Docs Contact:
Severity: unspecified    
Priority: urgent CC: acomabon, aos-bugs, bperkins, dmace, jtudelag, mfisher, weliang
Version: 3.9.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-07 15:22:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Bost 2019-03-25 23:55:00 UTC
Description of problem: HAProxy is core dumping during shutdown which occurs quite often in OpenShift router during reload sequence. The issue seems similar to what is resolved in https://github.com/haproxy/haproxy/commit/7c49711d6041d1afc42d5b310ddfd7d6f6817c3c#diff-47179e5db7ed3f2db741c99372ba24f3 

Can the patch above be backported?

# gdb /usr/sbin/haproxy core.1101
(gdb) bt
#0  0x00007fe0da09d207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007fe0da09e8f8 in __GI_abort () at abort.c:90
#2  0x00007fe0da0dfd27 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fe0da1f1678 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007fe0da0e8489 in malloc_printerr (ar_ptr=0x7fe0da42d760 <main_arena>, ptr=<optimized out>, str=0x7fe0da1f17a0 "double free or corruption (!prev)", action=3) at malloc.c:5004
#4  _int_free (av=0x7fe0da42d760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3843
#5  0x00005565803ab808 in deinit_log_buffers ()
#6  0x00005565803a93cb in run_thread_poll_loop ()
#7  0x00007fe0db1a4dd5 in start_thread (arg=0x7fe0d4cf6700) at pthread_create.c:307
#8  0x00007fe0da164ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Version-Release number of selected component (if applicable): 
haproxy18-1.8.17-3.el7.x86_64 
ose-haproxy-router v3.9 (e3d8c2560f68)

How reproducible: Always for customer.

Comment 1 Alejandro Coma 2019-03-26 13:22:02 UTC
Editting the router dc to use v3.9.68-5 "solves" the issue.

Comment 4 Robert Bost 2019-03-26 16:09:32 UTC
(In reply to Alejandro Coma from comment #1)
> Editting the router dc to use v3.9.68-5 "solves" the issue.

This image doesn't result in core dumps because it contains a build of haproxy that still has THREAD_LOCAL flag for the startup_logs variable.  This means there's no double-free possible:

  https://github.com/haproxy/haproxy/commit/a648399c901485a4985f786075535756946113cc#diff-47179e5db7ed3f2db741c99372ba24f3

Comment 10 Robert Bost 2019-03-29 21:55:04 UTC
Still no luck reproducing this issue myself. Linking these for reference:

https://discourse.haproxy.org/t/haproxy-1-8-18-19-occasional-crashes-with-multi-threading-enabled/3597/6
https://github.com/haproxy/haproxy/issues/58