1180633 – libitm: segfault for the simple testcase

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1180633 - libitm: segfault for the simple testcase

Summary: libitm: segfault for the simple testcase

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gcc
Sub Component:
Version:	7.1
Hardware:	ppc64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jakub Jelinek
QA Contact:	Michael Petlan
Docs Contact:
URL:
Whiteboard:
Depends On:	1080486
Blocks:	1110700 1191021 1297579 1313485
TreeView+	depends on / blocked

Reported:	2015-01-09 15:33 UTC by Miroslav Franc
Modified:	2016-11-04 06:25 UTC (History)
CC List:	12 users (show)
Fixed In Version:	gcc-4.8.5-6.el7
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:	1080486
Environment:
Last Closed:	2016-11-04 06:25:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2433	0	normal	SHIPPED_LIVE	gcc bug fix and enhancement update	2016-11-03 14:01:11 UTC

Description Miroslav Franc 2015-01-09 15:33:07 UTC

I'm using the same test case as in Bug #1080486 and after a few seconds I'm getting segfault.  Seems to be happening on ppc64 only.

# gcc -g simple-2.c  -O2 -o T_simple-2 -fgnu-tm -lpthread
# while ./T_simple-2; do :;done
Segmentation fault (core dumped)

backtraces:
Thread 4 (Thread 0x3fffac7ff1c0 (LWP 18858)):
#0  0x00003fffb01708a4 in .__madvise () from /lib64/power8/libc.so.6
#1  0x00003fffb0abc380 in .start_thread () from /lib64/power8/libpthread.so.0
#2  0x00003fffb0177fa0 in .__clone () from /lib64/power8/libc.so.6

Thread 3 (Thread 0x3fffb0b652f0 (LWP 18850)):
#0  0x00003fffb0170780 in .__GI_mmap () from /lib64/power8/libc.so.6
#1  0x00003fffb0abd230 in .pthread_create () from /lib64/power8/libpthread.so.0
#2  0x0000000010000aa0 in start (dummy=<optimized out>) at simple-2.c:8
#3  0x00003fffae7ff1c0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 2 (Thread 0x3fffa7fff1c0 (LWP 18859)):
#0  0x00003fffb0177f58 in .__clone () from /lib64/power8/libc.so.6
#1  0x00003fffb0abadf8 in .do_clone.constprop.4 () from /lib64/power8/libpthread.so.0
#2  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x3fffaefff1c0 (LWP 18853)):
#0  GTM::gtm_thread::trycommit (this=0x0) at ../../../libitm/beginend.cc:502
#1  0x0000000000000000 in ?? ()


disassembly:
499     bool
500     GTM::gtm_thread::trycommit ()
501     {
   0x00003fffb02639f0 <+0>:     mflr    r0
   0x00003fffb02639f4 <+4>:     std     r30,-16(r1)
   0x00003fffb02639f8 <+8>:     std     r28,-32(r1)
   0x00003fffb02639fc <+12>:    std     r31,-8(r1)
   0x00003fffb0263a00 <+16>:    std     r29,-24(r1)
   0x00003fffb0263a04 <+20>:    mr      r31,r3
   0x00003fffb0263a08 <+24>:    std     r0,16(r1)
   0x00003fffb0263a0c <+28>:    stdu    r1,-160(r1)
   0x00003fffb0263a14 <+36>:    ld      r9,-28688(r13)
   0x00003fffb0263a18 <+40>:    std     r9,120(r1)
   0x00003fffb0263a1c <+44>:    li      r9,0

502       nesting--;
=> 0x00003fffb0263a10 <+32>:    lwz     r30,644(r3)
   0x00003fffb0263a20 <+48>:    addi    r30,r30,-1
   0x00003fffb0263a28 <+56>:    stw     r30,644(r3)

503

# rpm -q libitm gcc libstdc++ glibc kernel
libitm-4.8.3-9.el7.ppc64
gcc-4.8.3-9.el7.ppc64
libstdc++-4.8.3-9.el7.ppc64
glibc-2.17-75.el7.ppc64
kernel-3.10.0-219.el7.ppc64


+++ This bug was initially created as a clone of Bug #1080486 +++

Description of problem:
Simple test cases fails on s390x.  Not sure whether it's s390x specific or not.


Version-Release number of selected component (if applicable):
libitm-4.8.2-16.el7.s390x
kernel-3.10.0-113.el7.s390x
glibc-2.17-55.el7.s390x


How reproducible:
~20 % of attempts (at least for me)

--- simple-2.c ---
#include <stdlib.h>
#include <pthread.h>

static int x;

static void *start (void *dummy __attribute__((unused)))
{
  __transaction_atomic { x++; }
  return NULL;
}

int main()
{
  pthread_t p[10];
  int i;

  for (i = 0; i < 10; ++i)
    pthread_create (p+i, NULL, start, NULL);

  for (i = 0; i < 10; ++i)
    pthread_join  (p[i], NULL);

  if (x != 10)
    abort ();

  return 0;
}
--- --- ---

Steps to Reproduce:
1. gcc -g simple-2.c  -O2 -o T_simple-2 -fgnu-tm -lpthread
2. ./T_simple-2  # few times (while ./T_simple-2; do :;done)


Actual results:
libitm: futex failed (Operation not permitted)
$? = 1

Comment 1 Marek Polacek 2015-01-12 10:30:09 UTC

I've reproduced the issue on a RHEL7 box; gcc trunk seems to be fine.  Let me see if I can perhaps bisect when it got fixed.

Comment 2 Marek Polacek 2015-01-12 14:58:56 UTC

(In reply to Marek Polacek from comment #1)
> I've reproduced the issue on a RHEL7 box; gcc trunk seems to be fine.  Let
> me see if I can perhaps bisect when it got fixed.

Some new observations.  The above is incorrect; I reproduced the failure even with the FSF trunk and with 4.9/4.8 branches, but only on a RHEL7 system.  On a Fedora 20 box I couldn't reproduce the segv at all.  Both were power8 machines.

Comment 3 Jeff Law 2015-04-02 03:10:43 UTC

Given c#2, I don't think we have any sense of what the real issue is.  It's highly unlikely this will get fixed in RHEL 7.3.  It's not even clear at this point if we have a bug in gcc or something else.

Comment 7 Jakub Jelinek 2015-12-16 14:34:09 UTC

I have added some debugging:
--- libitm/beginend.cc.xx	2013-06-28 05:44:16.000000000 -0400
+++ libitm/beginend.cc	2015-12-16 09:13:52.722248313 -0500
@@ -32,6 +32,8 @@ using namespace GTM;
 extern __thread gtm_thread_tls _gtm_thr_tls;
 #endif
 
+__thread char myflags[6];
+
 gtm_rwlock GTM::gtm_thread::serial_lock;
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
@@ -57,6 +59,12 @@ static pthread_once_t thr_release_once =
 // See gtm_thread::begin_transaction.
 uint32_t GTM::htm_fastpath = 0;
 
+char *
+getmyflags (void)
+{
+  return myflags;
+}
+
 /* Allocate a transaction structure.  */
 void *
 GTM::gtm_thread::operator new (size_t s)
@@ -85,6 +93,7 @@ thread_exit_handler(void *)
   gtm_thread *thr = gtm_thr();
   if (thr)
     delete thr;
+  myflags[0] = 1;
   set_gtm_thr(0);
 }
 
@@ -189,6 +198,7 @@ GTM::gtm_thread::begin_transaction (uint
   // and thus do not trigger the standard retry handling).
   if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
     {
+myflags[3] = 1;
       for (uint32_t t = htm_fastpath; t; t--)
 	{
 	  uint32_t ret = htm_begin();
@@ -219,6 +229,7 @@ GTM::gtm_thread::begin_transaction (uint
 	        {
 	          // See below.
 	          tx = new gtm_thread();
+		  myflags[1] = 1;
 	          set_gtm_thr(tx);
 	        }
 	      // Check whether there is an enclosing serial-mode transaction;
@@ -245,6 +256,7 @@ GTM::gtm_thread::begin_transaction (uint
       // Create the thread object. The constructor will also set up automatic
       // deletion on thread termination.
       tx = new gtm_thread();
+      myflags[2] = 1;
       set_gtm_thr(tx);
     }
 
--- libitm/method-serial.cc.xx	2013-02-06 12:20:04.000000000 -0500
+++ libitm/method-serial.cc	2015-12-16 09:18:18.802254699 -0500
@@ -24,6 +24,9 @@
 
 #include "libitm_i.h"
 
+extern __thread char myflags[6];
+char myflags2[6];
+
 // Avoid a dependency on libstdc++ for the pure virtuals in abi_dispatch.
 extern "C" void HIDDEN
 __cxa_pure_virtual ()
@@ -223,12 +226,16 @@ struct htm_mg : public method_group
     // initially disabled.
 #ifdef USE_HTM_FASTPATH
     htm_fastpath = htm_init();
+myflags[4] = htm_fastpath + 1;
+myflags2[0] = 1;
 #endif
   }
   virtual void fini()
   {
     // Disable the HTM fastpath.
     htm_fastpath = 0;
+myflags[5] = 1;
+myflags2[1] = 1;
   }
 };
 

and in the thread that crashed I see only myflags[3] being set, and in the global myflags2 array I see both myflags2[0] and myflags2[1] being set.
So my suspicion is that the the first thread that encounters GTM::gtm_thread::begin_transaction (or whatever other routine) at some point invokes the htm_mg::init (), which sets a global variable, and later on for whatever reason performs htm_mg::fini (), which sets the global variable back to 0.  But other threads access this global variable concurrently, and can e.g. see it being set while in beginTransaction, but already cleared when in commitTransaction.  Should that variable be TLS too, or is some locking/whatever needed, or should it never be cleared?

Comment 12 Michael Petlan 2016-07-27 14:53:01 UTC

Tested against 4.8.5-4.el7 and 4.8.5-9.el7.

Reproducible on POWER8 boxes only. New gcc passes everywhere.

VERIFIED

Comment 14 errata-xmlrpc 2016-11-04 06:25:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2433.html

Note You need to log in before you can comment on or make changes to this bug.