Bug 1317523

Summary: pthread_kill() is not a barrier
Product: [Fedora] Fedora Reporter: Stas Sergeev <stsp2>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: arjun.is, codonell, dj, fweimer, jakub, law, mfabian, pfrankli, siddhesh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-14 18:54:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
test case none

Description Stas Sergeev 2016-03-14 13:10:20 UTC
Created attachment 1136140 [details]
test case

Description of problem:
The attached test-case prints "1" if compiled with
-O0, and "0" if compiled with -O1.
This is because __THROW now enables the leaf attribute.
As pthread_kill() (and also raise(), kill() etc) invokes
the signal handler, they should use __THROWNL instead.


Version-Release number of selected component (if applicable):


How reproducible:
easily

Steps to Reproduce:
1. compile the test-case
2. run it

Actual results:
depends on an opt level

Expected results:
always "1"

Additional info:
The alternative is to mark the "tst" var volatile.
IMHO volatile should only be used in the async signals,
if at all. It removes too many optimizations, and is in
general undesirable. OTOH the functions that always invoke
the sighandler synchronously, should simply not be marked
leaf.
Please note that this is also a regression.

Comment 1 Stas Sergeev 2016-03-14 13:30:31 UTC
Hmm, seems like its already documented:
---
Note that leaf functions might invoke signals and signal handlers might be defined in the current compilation unit and use static variables. The only compliant way to write such a signal handler is to declare such variables volatile. 
---

I wonder what was the rationale behind adding such a regression...
The volatile was not needed for the sync sighandlers in the past,
and in async sighandlers it was helpful only as long as it is of
an atomic type. So I thought the generic advice is to try to avoid
volatile. And now the use of volatiles is mandated again, just by this? :(

Comment 2 Carlos O'Donell 2016-03-14 18:54:36 UTC
Your function my_h is async-signal unsafe. The function printf may not be called from a signal handler. See 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html for the table of AS-safe functions (starts at "Therefore, applications can invoke them, without restriction, from signal-catching functions:").

Let us assume you were copying the value from one global to another or acting on it directly.

Let us also assume this is always between the thread and it's own signal handler. 

If you don't assume the latter then you have inter-thread memory synchronization issues (The function phtread_kill is not on the list of functions which synchronize memory: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11, so other threads might not see any changes unless you use sem_pot/sem_wait to create a happens-before relationship).

The relationship between a thread's memory and the memory seen by the current thread or other thread signal handlers is an underspecified part of both the ISO C standard and the POSIX standard. To be pedantic you can't say anything about what the signal handler may or may not observe. POSIX provides a weakly worded "synchronizes memory" for the inter-thread case, but not the self-synchronization case of the single thread with it's own signal handler.

None of the above is a problem yet in the practical sense (today). What you are seeing today is a compiler elided global write because of the __THROW markup (as you quote), but there are an equal number of other problems in your example.

In the worst theoretical case signal handlers are not useful for anything today.

Let us assume the standards eventually provide some stronger guarantees, for example, that it appears as if the signal was executed in another concurrent execution context (not quite a thread). In which case you'd have to treat it like a distinct thread of execution and use sem_post/sem_wait to synchronize your operations (or C11 atomics) and both of those options would fix your problem here and they would be robust.

Sent documentation clarification patch to gcc:
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00788.html

I have started a discussion in upstream glibc here:
https://www.sourceware.org/ml/libc-alpha/2016-03/msg00400.html

In the meantime this is not really a bug. So I'm closing as notabug for now.

Comment 3 Stas Sergeev 2016-03-14 20:54:14 UTC
> Your function my_h is async-signal unsafe. The function printf may
> not be called from a signal handler. See
I think you contradict yourself a bit.
my_h is indeed async-signal-unsafe, but it is invoked
synchronously. I would definitely not call printf() in
a SIGALRM handler, or the like. But doing pthread_kill() -
is a synchronous signal invocation. Also doing
asm volatile ("hlt\n" ::: "memory")
is a way to get SIGSEGV synchronously.
I believe using any functions there, is safe. There are
many programs that rely on this. Even if it is not valid
by some interpretation of posix (posix always allows the
unlimited number of interpretations), people do not consult
with posix at all. People use glibc extensions. :) Now that
this volatile requirement was documented in glibc, I find this
a very bad incident. How can the one trust glibc extensions
if they silently disappear?


> Let us also assume this is always between the thread and it's
> own signal handler.
That's why I pass pthread_self() to pthread_kill().

> POSIX provides a weakly worded "synchronizes memory" for the inter-thread
> case, but not the self-synchronization case of the single thread with it's
> own signal handler.
Indeed, but... why not to simply consider the sync signal
invocation as a call-back? Surely for the async signals the
problem does exist and people are well aware of it. I wouldn't
be compaining with some SIGALRM case. But when we are talking
about pthread_kill() that just invokes the call-back (a handler),
why would we bother about what posix says of _async_ handlers?
They are just the entirely different anymals IMHO.

> None of the above is a problem yet in the practical sense (today).
> What you are seeing today is a compiler elided global write because
> of the __THROW markup (as you quote), but there are an equal number
> of other problems in your example.
Again, I don't believe this is the case.
Be it the SIGALRM example then yes.


> In the worst theoretical case signal handlers are not useful
> for anything today.
They are very good for trapping the guest OS events and
swapping the contexts by replacing the sigcontext structs
in hypervisor.

> Sent documentation clarification patch to gcc:
Oh, you complicated it a lot, poor users who'd read it. :)

> +in the current compilation unit and use static variables. There
> is no standards
> +compliant way to write such a signal handler, resolver function, or
> +implementation function, and the best that you can do is to remove the leaf
> +attribute
From pthread_kill() defined in a glibc header?

OK, thanks for bringing this to upstream.
I am pretty sure it is quite a bug, or at least a regression.
So I'd rather leave it open so that its not forgotten, but as you wish.

Comment 4 Stas Sergeev 2016-03-15 06:01:34 UTC
Of course, what I say only applies to the case
when sending a signal to pthread_self(), which is
what my test-case does. And what you say applies
to sending the signal to other thread, which is what
my test-case does not.
Please note that when sending to pthread_self(),
the pthread_kill() will return only after the sighandler
returned, so it is a fully synchronous invocation
(a callback basically).
The biggest problem I see with this all, is that glibc
took the liberty to document this the way it did.
It now mandates the use of the volatile - something people
tried hard to avoid. IMHO glibc should make it clear
that "The only compliant way" is actually
"The only glibc-compliant way", because there are many
other libs out there that give you the different guarantees
for this case, and even the glibc itself did it differently
in the past. Maybe it is also "The only POSIX-compliant way",
but the very careful study is needed to assert that.
So I would really suggest to documented it as the glibc's
opinion only. glibc should not enforce its vision on the
programmers or on the competing libs. So it should either
refer to POSIX (if the case is researched well enough for
such a reference), or refer to itself, but not to make the
assertion universally.