Bug 737011

Summary: SIGSEGV when select() received async signal (cancel signal)
Product: Red Hat Enterprise Linux 5 Reporter: Marek Polacek <mpolacek>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED WONTFIX QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.7CC: ashankar, fweimer, law, pfrankli, spoyarek
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-26 22:52:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer none

Description Marek Polacek 2011-09-09 11:05:31 UTC
Created attachment 522305 [details]
Reproducer

Description of problem:
Program crashes if it is run long enough.  This applies only to ppc64.

Version-Release number of selected component (if applicable):
gcc-4.1.2-51.el5

How reproducible:
Sometimes.

Steps to Reproduce:
1. # gcc -O2 -g rep.c -lpthread
2. # ./a.out 180
Segmentation fault
  
Actual results:
Segfault.

Expected results:
No segfault.

Additional info:
Seems like we need to backport some change into libgcc/config/rs6000/linux-unwind.h.

The corefile says:
Core was generated by `./a.out 180'.
Program terminated with signal 11, Segmentation fault.
#0  0x0fbcc838 in get_regs (context=0xf7fede5c, fs=0xf7fed950)
    at ../../gcc/config/rs6000/linux-unwind.h:159
159	  if (*(unsigned int *) (pc + 4) != 0x44000002)
(gdb) bt
#0  0x0fbcc838 in get_regs (context=0xf7fede5c, fs=0xf7fed950)
    at ../../gcc/config/rs6000/linux-unwind.h:159
#1  ppc_fallback_frame_state (context=0xf7fede5c, fs=0xf7fed950)
    at ../../gcc/config/rs6000/linux-unwind.h:227
#2  uw_frame_state_for (context=0xf7fede5c, fs=0xf7fed950) at ../../gcc/unwind-dw2.c:1127
#3  0x0fbce12c in _Unwind_ForcedUnwind_Phase2 (exc=0xf7fef6f0, context=0xf7fede5c)
    at ../../gcc/unwind.inc:159
#4  0x0fbce73c in _Unwind_ForcedUnwind (exc=0xf7fef6f0, stop=0xfdaf260 <unwind_stop>, 
    stop_argument=0xf7feee40) at ../../gcc/unwind.inc:211
#5  0x0fdb24a0 in _Unwind_ForcedUnwind (exc=0xf7fef6f0, stop=0xfdaf260 <unwind_stop>, 
    stop_argument=0xf7feee40) at ../nptl/sysdeps/pthread/unwind-forcedunwind.c:100
#6  0x0fdaf21c in __pthread_unwind (buf=<value optimized out>) at unwind.c:130
#7  0x0fda47cc in __do_cancel (sig=<value optimized out>, si=<value optimized out>, 
    ctx=<value optimized out>) at ../nptl/pthreadP.h:259
#8  sigcancel_handler (sig=<value optimized out>, si=<value optimized out>, 
    ctx=<value optimized out>) at init.c:199
#9  <signal handler called>
#10 0x0ff1153c in __libc_enable_asynccancel () at libc-cancellation.c:76
#11 0x00000400 in ?? ()

Comment 1 Jeff Law 2013-11-26 20:03:48 UTC
Reassigning to glibc -- this looks a whole lot like some of the problems we had in RHEL 6 with incorrect memory fencing in the glibc unwind-forcedunwind code on modern power hardware.  We haven't completely resolved those in RHEL 6, so backporting anything to RHEL 5 seems premature at this point.

Comment 2 Carlos O'Donell 2013-11-26 20:23:06 UTC
This looks exactly like the memory fencing issues in the glibc forced unwind code. I'm hesitant to commit to fixing this because it's more than just glibc it also requires thread-safe PLT stubs from binutils to catch the rest of the niggling issues. I'll have to look into if binutils for rhel5 has those fixes.

Comment 3 Jeff Law 2013-11-26 20:36:32 UTC
I doubt binutils has those fixes...  I don't recall backporting them to RHEL5.

My gut says this shouldn't make the cut for RHEL 5.11, but wanted to get it reassigned to the proper component so you could chime in as well.

Comment 4 Carlos O'Donell 2013-11-26 22:39:47 UTC
(In reply to Jeff Law from comment #3)
> I doubt binutils has those fixes...  I don't recall backporting them to
> RHEL5.

It doesn't have them, bfd/elf64-ppc.c (build_plt_stub) lacks all of the thread-safe plt stub code. Therefore fixing this is going to be much much harder.

> My gut says this shouldn't make the cut for RHEL 5.11, but wanted to get it
> reassigned to the proper component so you could chime in as well.

I agree the risk of backporting binutils fixes that impact all binaries by using alternate PLT stubs is just too high to fix this kind of problem.

I could fix the glibc bug, but it wouldn't fix all of these kinds of crashes.

What do I do with this bug? Shall we close this as CLOSED/WONTFIX?

Comment 5 Jeff Law 2013-11-26 22:52:39 UTC
It gets even worse, Alan's original code was too optimistic in when it decided to use thread safe stubs.  We'd need to turn them on for any shared link.  Which implies that to be effective we need to relink (which implies rebuild) a large number of libraries on the system (I'd nearly forgotten about this part of the mess).

Given there's no customer case, I say CLOSE/WONTFIX, too risky & invasive at this stage in the RHEL 5 lifecycle.