| Summary: | glibc: posix_spawn bug breaks recursive mutexes | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Sam P. <sam> |
| Component: | glibc | Assignee: | Florian Weimer <fweimer> |
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | arjun.is, codonell, dj, fweimer, jakub, jaswinder, law, luto, mfabian, oliver, oliver, pfrankli, sam, siddhesh |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | glibc-2.23.90-13.fc25 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-02 05:10:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Sam P.
2016-04-20 07:30:35 UTC
Additionally, the bug was not present when I was using: the same fish version as above glibc-2.23.90-4.fc25.x86_64 (compared to .90-11) kernel-4.6.0-0.rc2.git2.2.fc25.x86_64 @fedora-rawhide-kernel-nodebug (compared to rc4.git0.1) Hi, glibc people: I won't have a chance to debug this in depth for a while, but this looks like a glibc issue. However, Sam, is there any chance you can check whether the new glibc plus older kernel works? I'm running 4.6-rc2-ish here, and everything's fine. I'll try -rc4. Sam, would you please run strace again, this time with -f? The current trace doesn't show much, it's waiting on a subprocess. Florian, Here is the strace -f output: http://paste.fedoraproject.org/358365/12747981/ Andy, I'll look into downgrading the Kernel. Thanks, Sam. It does look like a kernel bug to me because the pipe is created with O_NONBLOCK (so reads should not result in EAGAIN), and the read is scheduled after the subprocess closed the read end of the pipe and has exited (so it should return the end-of-file indicator (return value 0), and not an error). Florian, I'm confused. Are you referring to:
read(7, 0x7ffe54910f00, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[...]
close(8) = 0
[...]
read(7, "", 4096) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(7) = 0
getpid() = 9177
getpid() = 9177
getpid() = 9177
futex(0x56452ae7ab20, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 9177 detached
I see nothing wrong with that EAGAIN: the write side (fd 8) wasn't closed yet.
In any event, this is getting stuck waiting for a futex. The program is single-threaded at this point -- what's it waiting for?
Sam, I don't suppose you could get a backtrace of the hang? It should be as simple as: $ gdb fish (gdb) run ... wait for hang and press Ctrl-C ... (gdb) backtrace You may need to install fish-debuginfo and glibc-debuginfo for the trace to be any good. (In reply to Andy Lutomirski from comment #6) > Florian, I'm confused. No, I'm confused, please disregard comment 5. Thanks. Never mind, it's swbz#19957. At the point of the hang: (gdb) print ((struct pthread *)(pthread_self ()))->tid $11 = -1 fish uses posix_spawn, so it all adds up. An upstream fix is in the works. If it does not land soon in upstream master, I will back at the problematic commit in rawhide. |