Bug 2172869

Summary: inconsistent documentation: restrict the child process, forked from an MT program, to AS-safe functions, or not?
Product: Red Hat Enterprise Linux 9 Reporter: Laszlo Ersek <lersek>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED UPSTREAM QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.1CC: ashankar, codonell, dj, eblake, fweimer, mnewsome, pfrankli, sipoyare
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-23 20:07:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Laszlo Ersek 2023-02-23 11:16:48 UTC
I'm filing this bug for downstream glibc, although the fix certainly needs an upstream patch, *and* coordination with the Linux Manual Pages upstream project. The reason I'm filing this issue here because I cannot take on the task of coordinating with multiple projects; I'm not a developer for either glibc or the linux manual pages.

* Description of problem:

- The Single Unix Specification v4 (aka POSIX 2018) explains that the child process forked from a multi-threaded parent process may only call such functions that are async-signal-safe. The table of async-signal-safe functions does not include functions like malloc() or execvp(). Therefore a program targeting POSIX will avoid these functions in a child process like described above.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03

- The Linux Manual Pages confirm this. The fork(2) manual page repeats the same restriction, and the table in the signal-safety(7) manual page does not include malloc() or execvp(). This applies to both "man-pages-5.10-5.el9.noarch", and to the currently visible upstream pages:

https://man7.org/linux/man-pages/man2/fork.2.html
https://man7.org/linux/man-pages/man7/signal-safety.7.html

- At the same time, the GNU C Library manual *avoids* the same async-signal-safety restriction in its specification of fork(). Note that the child process is not *expressly* licensed to call malloc() or execvp(), instead, the relaxation of the POSIX restriction becomes apparent when the reader compares the specification of fork() with that of _Fork(). The latter writes:

     The ‘_Fork’ function is similar to ‘fork’, but it does not invoke
     any callbacks registered with ‘pthread_atfork’, nor does it reset
     any internal state or locks (such as the ‘malloc’ locks).  In the
     new subprocess, only async-signal-safe functions may be called,
     such as ‘dup2’ or ‘execve’.

(This applies to both "info libc" (which comes from "glibc-doc-2.34-40.el9_1.1.noarch"), and to the manual that's currently visible on the web: <https://www.gnu.org/software/libc/manual/html_node/Creating-a-Process.html>.)

This implies that glibc fork() *does* make it safe for the child to call malloc() and/or execvp(), contradicting (more precisely, going beyond) the requirements in POSIX and the Linux Manual Pages.

Indeed, the *code* in glibc agrees with that implication, using special atfork handlers.

In fact, explicit work has been done to maintain as many liberties for the child process as possible; see for example bug 906468.

* Expected results:

*Both* the linux manual pages *and* the glibc info should clearly and *consistently* state that glibc provides the child process, forked from a multi-theaded process, more guarantees than POSIX requires. These extra guarantees should be individually enumerated, in *both* documentation sets.

Thanks!

Comment 1 Laszlo Ersek 2023-02-23 11:19:04 UTC
(Side observation: the GNU manual also says about _Fork(): "The _Fork function is an async-signal-safe replacement of fork. It is a GNU extension."

This makes no sense to me, as POSIX *already* requires fork() to be AS-Safe -- find it in the table at <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>.)

Comment 2 Eric Blake 2023-02-23 15:18:10 UTC
To make things more complicated,
https://www.austingroupbugs.net/view.php?id=62#c193
POSIX folks said that in the next version of POSIX:

Add to the end of fork() DESCRIPTION:
The _Fork() function shall be equivalent to fork(), except that fork handlers
established by means of the pthread_atfork() function shall not be called and
_Fork() shall be async-signal-safe.

Add to SYNOPSIS on fork() page: pid_t _Fork(void);
and also to unistd.h

Add _Fork() to the list of async-signal-safe functions
Remove fork() from the list of async-signal-safe functions.

Comment 3 Eric Blake 2023-02-23 15:49:58 UTC
(In reply to Eric Blake from comment #2)
> To make things more complicated,
> https://www.austingroupbugs.net/view.php?id=62#c193
> POSIX folks said that in the next version of POSIX:
> 
> Add to the end of fork() DESCRIPTION:
> The _Fork() function shall be equivalent to fork(), except that fork handlers
> established by means of the pthread_atfork() function shall not be called and
> _Fork() shall be async-signal-safe.
> 
> Add to SYNOPSIS on fork() page: pid_t _Fork(void);
> and also to unistd.h
> 
> Add _Fork() to the list of async-signal-safe functions
> Remove fork() from the list of async-signal-safe functions.

Re-reading more of the context from the Austin Group discussion at the time, the entire notion of pthread_atfork() handlers does not play nicely with multi-threaded applications (not only is it difficult to write handlers that play nice with async-signal-safety, but issues of topologically sorting handlers and handlers installed by a dlopen'd library come into play), so the introduction of _Fork() is intended to provide a way to fork a process that intentionally does not use any atfork handling but thereby has a stricter limitation on safe functions to call before exec*().  Before _Fork()'s invention, it was previously possible to attempt to write a single-threaded server that calls fork() from a signal handler context (that is, with SIGIO or even fcntl(F_SETSIG) for realtime signals, you can write an asynchronous style servicing application without the use of poll(), where the handler spawns child processes).  But safely using fork() in a signal handler has turned out to be such a can of worms, especially now that multi-threaded processes are more common than single-threaded, that it is now deemed more practical to only call fork() from a primary thread of execution rather than a signal handler, which in turn impacts the set of functions useful to call between fork() and exec*().  As for whether async-signal-safe functions is the right set of limited-use APIs, or whether we should maintain two very-similar sets of functions (those functions safe to call in an async signal handler, vs. those functions safe to call between fork() and exec(), where malloc() can be in the latter but not the former), may then be a quality-of-implementation extension beyond POSIX - but the point remains that CLEAR and CONSISTENT documentation about this will help both programmers trying to exploit the full power of glibc/Linux, and programmers trying to be portable to just POSIX.

Comment 4 Carlos O'Donell 2023-02-23 20:07:32 UTC
We follow the POSIX standard in this regard.

In glibc the forked child of an MT process may only call AS-safe functions. I believe that answers your first question.

There is no relaxation of the POSIX restriction. The `info libc fork` information is not explicit about what you can or can't call in the child, and that could be improved.

Example:
~~~
diff --git a/manual/process.texi b/manual/process.texi
index 9307379194..dbfcc7d75f 100644
--- a/manual/process.texi
+++ b/manual/process.texi
@@ -316,6 +316,12 @@ The child doesn't inherit alarms set by the parent process.
 The set of pending signals (@pxref{Delivery of Signal}) for the child
 process is cleared.  (The child process inherits its mask of blocked
 signals and signal actions from the parent process.)
+
+@item
+If the parent was multi-threaded at the time of the call then the
+child process may only call asynchronous-signal-safe functions until
+the process is replaced with a call to an @code{exec} family function.
+
 @end itemize
 
 @deftypefun pid_t _Fork (void)
~~~

A developer may choose to use _Fork() to avoid calling any of the registered pthread_atfork() handlers, but this has nothing to do with allowing non-AS-safe functions.

Fixes like those in bug 906468 are about making fork MT-safe. An MT process must allow any thread to call fork() safely regardless of the state of any other thread. The bug itself is about ensuring that internal library book-keeping is always in a consistent state that allows a fork() to make forward progress. The "implementation" may internally do things that you cannot do as a developer.

The Issue 8 update for POSIX makes fork() AS-unsafe, which is the correct change, it allows libraries to use pthread_atfork() to register handlers that can call AS-unsafe functions to reset internal state. If a developer wants to call fork in a signal handler then hey must use _Fork().

I'm marking this CLOSED/UPSTREAM and I'll be tracking it here:
https://sourceware.org/bugzilla/show_bug.cgi?id=30159

I believe only the glibc manual change is required.

Comment 5 Laszlo Ersek 2023-02-24 05:18:48 UTC
(In reply to Carlos O'Donell from comment #4)
> We follow the POSIX standard in this regard.
>
> In glibc the forked child of an MT process may only call AS-safe
> functions.
> I believe that answers your first question.

It totally does! (What a U-Turn!)

> There is no relaxation of the POSIX restriction. The `info libc fork`
> information is not explicit about what you can or can't call in the
> child, and that could be improved.
>
> Example:
> ~~~
> diff --git a/manual/process.texi b/manual/process.texi
> index 9307379194..dbfcc7d75f 100644
> --- a/manual/process.texi
> +++ b/manual/process.texi
> @@ -316,6 +316,12 @@ The child doesn't inherit alarms set by the parent
> process.
>  The set of pending signals (@pxref{Delivery of Signal}) for the child
>  process is cleared.  (The child process inherits its mask of blocked
>  signals and signal actions from the parent process.)
> +
> +@item
> +If the parent was multi-threaded at the time of the call then the
> +child process may only call asynchronous-signal-safe functions until
> +the process is replaced with a call to an @code{exec} family function.
> +
>  @end itemize
>
>  @deftypefun pid_t _Fork (void)
> ~~~

Looks good to me; much appreciated.

Perhaps the note could mention that the exec*p*() sub-family of the
exec*() family is not AS-Safe.

(... Digression: upon reviewing
<https://www.gnu.org/software/libc/manual/html_node/Executing-a-File.html>,
something else catches my eye: per POSIX, the following functions are
AS-Safe: execl(), execle(), execv(), execve(). But in the glibc manual,
execl() and execle() are marked "AS-Unsafe heap". Is that intentional?)

> A developer may choose to use _Fork() to avoid calling any of the
> registered pthread_atfork() handlers, but this has nothing to do with
> allowing non-AS-safe functions.
>
> Fixes like those in bug 906468 are about making fork MT-safe. An MT
> process must allow any thread to call fork() safely regardless of the
> state of any other thread. The bug itself is about ensuring that
> internal library book-keeping is always in a consistent state that
> allows a fork() to make forward progress. The "implementation" may
> internally do things that you cannot do as a developer.

Many thanks for this very clear explanation.

> The Issue 8 update for POSIX makes fork() AS-unsafe, which is the
> correct change, it allows libraries to use pthread_atfork() to
> register handlers that can call AS-unsafe functions to reset internal
> state. If a developer wants to call fork in a signal handler then hey
> must use _Fork().

Understood -- the atfork handlers effectively transfer their AS-unsafety
to fork().

However (separate topic! please allow the digression), I think it raises
an interesting question. If the update renders fork() AS-Unsafe, then
calling fork() for a *second* time, that is, in a child process
originally forked from a multi-threaded process, becomes forbidden. I
wonder if this has been considered, and/or if it's going to have any
particular consequences for existing applications. (End of digression.)

> I'm marking this CLOSED/UPSTREAM and I'll be tracking it here:
> https://sourceware.org/bugzilla/show_bug.cgi?id=30159
>
> I believe only the glibc manual change is required.

Yes, thank you. The proposed language brings the glibc manual in sync
with the Linux Manual Pages -- not in the direction that I had expected,
but in reverse; and I'm totally OK with that. Thanks!

Comment 6 Laszlo Ersek 2023-02-24 05:25:06 UTC
I notice <https://sourceware.org/bugzilla/show_bug.cgi?id=30159#c1> -->
<https://patchwork.sourceware.org/project/glibc/patch/20230223223728.423748-1-carlos@redhat.com/>
mentions that malloc() *is* allowed as an extension. I think that level
of detail in the docs is great.

Very relevant upstream discussion under the v1 patch:

https://patchwork.sourceware.org/project/glibc/patch/20230223204959.408243-1-carlos@redhat.com/

Anyway, exec*p*() remains forbidden. (This is not a complaint at all,
just an observation, because that particular use case is relevant to me.)

Comment 7 Laszlo Ersek 2023-02-24 05:27:51 UTC
With the v2 upstream patch however, where the extension list is ultimately non-empty, it remains necessary (in my opinion) to update the Linux Manual Pages as well.