Bug 1328723 - glibc: posix_spawn bug breaks recursive mutexes
Summary: glibc: posix_spawn bug breaks recursive mutexes
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Florian Weimer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-20 07:30 UTC by Sam P.
Modified: 2016-05-02 05:10 UTC (History)
14 users (show)

Fixed In Version: glibc-2.23.90-13.fc25
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-02 05:10:55 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Sourceware 19957 0 None None None 2016-04-25 14:33:10 UTC

Description Sam P. 2016-04-20 07:30:35 UTC
Description of problem:

Fish shell hangs on startup.  It hangs before the prompt is printed.


Version-Release number of selected component (if applicable):

fish-2.2.0-11.fc25.x86_64
glibc-2.23.90-11.fc25.x86_64
kernel 4.6.0-0.rc4.git0.1.fc25.x86_64

How reproducible:

Every time.

Steps to Reproduce:
1.  Open gnome-terminal
2.  Run the "fish" command

Actual results:

Prompt is printed, commands can be run


Expected results:

Terminal hangs, does not respond to interrupts.


Additional info:

strace:  http://paste.fedoraproject.org/357633/61136409
the hang is the from the last call to futex (the last line of the file).

Comment 1 Sam P. 2016-04-20 07:33:56 UTC
Additionally, the bug was not present when I was using:

the same fish version as above
glibc-2.23.90-4.fc25.x86_64 (compared to .90-11)
kernel-4.6.0-0.rc2.git2.2.fc25.x86_64   @fedora-rawhide-kernel-nodebug  (compared to rc4.git0.1)

Comment 2 Andy Lutomirski 2016-04-20 22:09:01 UTC
Hi, glibc people:

I won't have a chance to debug this in depth for a while, but this looks like a glibc issue.

However, Sam, is there any chance you can check whether the new glibc plus older kernel works?  I'm running 4.6-rc2-ish here, and everything's fine.  I'll try -rc4.

Comment 3 Florian Weimer 2016-04-21 15:04:53 UTC
Sam, would you please run strace again, this time with -f?  The current trace doesn't show much, it's waiting on a subprocess.

Comment 4 Sam P. 2016-04-21 21:41:45 UTC
Florian,  Here is the strace -f output:  http://paste.fedoraproject.org/358365/12747981/

Andy, I'll look into downgrading the Kernel.

Comment 5 Florian Weimer 2016-04-23 13:45:54 UTC
Thanks, Sam.  It does look like a kernel bug to me because the pipe is created with O_NONBLOCK (so reads should not result in EAGAIN), and the read is scheduled after the subprocess closed the read end of the pipe and has exited (so it should return the end-of-file indicator (return value 0), and not an error).

Comment 6 Andy Lutomirski 2016-04-24 19:56:37 UTC
Florian, I'm confused.  Are you referring to:

    read(7, 0x7ffe54910f00, 4096)           = -1 EAGAIN (Resource temporarily unavailable)
    [...]
    close(8)                                = 0
    [...]
    read(7, "", 4096)                       = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    close(7)                                = 0
    getpid()                                = 9177
    getpid()                                = 9177
    getpid()                                = 9177
    futex(0x56452ae7ab20, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 9177 detached

I see nothing wrong with that EAGAIN: the write side (fd 8) wasn't closed yet.

In any event, this is getting stuck waiting for a futex.  The program is single-threaded at this point -- what's it waiting for?

Comment 7 Andy Lutomirski 2016-04-24 20:09:51 UTC
Sam, I don't suppose you could get a backtrace of the hang?  It should be as simple as:

$ gdb fish
(gdb) run
... wait for hang and press Ctrl-C ...
(gdb) backtrace

You may need to install fish-debuginfo and glibc-debuginfo for the trace to be any good.

Comment 8 Florian Weimer 2016-04-25 12:58:44 UTC
(In reply to Andy Lutomirski from comment #6)
> Florian, I'm confused.

No, I'm confused, please disregard comment 5.  Thanks.

Comment 9 Florian Weimer 2016-04-25 14:33:11 UTC
Never mind, it's swbz#19957.  At the point of the hang:

(gdb) print ((struct pthread *)(pthread_self ()))->tid
$11 = -1

fish uses posix_spawn, so it all adds up.

An upstream fix is in the works.  If it does not land soon in upstream master, I will back at the problematic commit in rawhide.


Note You need to log in before you can comment on or make changes to this bug.