Bug 1465147

Summary: bash: segfault in do_redirection_internal on ppc64le
Product: [Fedora] Fedora Reporter: Robbie Harwood <rharwood>
Component: socket_wrapperAssignee: Andreas Schneider <asn>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: admiller, asn, jhrozek, kdudka, madam, rharwood, svashisht
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: socket_wrapper-1.1.7-3.fc26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 16:54:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
coredump
none
valgrind output none

Description Robbie Harwood 2017-06-26 19:01:05 UTC
Created attachment 1292049 [details]
coredump

Here's the backtrace:

(gdb) bt
#0  do_redirection_internal (redirect=0x1b6, flags=3) at redir.c:865
#1  0x000000002ec0eadc in do_redirections (list=<optimized out>, flags=<optimized out>) at redir.c:234
#2  0x000000002ebb04a4 in execute_command_internal (command=0x1000fe2f870, asynchronous=0, pipe_in=-1, pipe_out=-1, fds_to_close=0x1000fe2f960) at execute_cmd.c:749
#3  0x000000002ebb0bc0 in execute_connection (fds_to_close=0x1000fe2f960, pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x1000fe2f930) at execute_cmd.c:2606
#4  execute_command_internal (command=0x1000fe2f930, asynchronous=<optimized out>, pipe_in=<optimized out>, pipe_out=<optimized out>, fds_to_close=0x1000fe2f960) at execute_cmd.c:983
#5  0x000000002ec1d058 in parse_and_execute (string=<optimized out>, from_file=0x2ec83a28 "-c", flags=<optimized out>) at evalstring.c:421
#6  0x000000002eb8d484 in run_one_command (command=0x7fffd126efbe "LD_LIBRARY_PATH=`echo -L./lib | sed -e \"s/-L//g\" -e \"s/ /:/g\"`; \\\nfor i in LD_LIBRARY_PATH; do \\\n\teval echo 'env['\\\\\\'$i\\\\\\''] = '\\\\\\'\\$$i\\\\\\'; \\\ndone > pyrunenv.vals") at shell.c:1409
#7  0x000000002eb923bc in main (argc=3, argv=0x7fffd12696a8, env=<optimized out>) at shell.c:734
(gdb) 

Coredump is attached.

I'm hitting this whenever I try to set some environment variables in krb5's test suite; you can see it for instance here:

https://koji.fedoraproject.org/koji/getfile?taskID=20099674&volume=DEFAULT&name=build.log&offset=-4000

This can be reproduced by `rpmbuild -ba SPECS/krb5.spec` on this version of krb5 on the appropriate hardware, if needed.

Thanks!

Comment 1 Kamil Dudka 2017-06-27 07:47:06 UTC
As long as "some environment variables" include LD_PRELOAD, anything can happen.  

Minimal example:

# dnf install -y socket_wrapper
# LD_PRELOAD=libsocket_wrapper.so bash -c '>/dev/null'
Segmentation fault (core dumped)

# rpm -q bash socket_wrapper
bash-4.4.12-5.fc27.ppc64le
socket_wrapper-1.1.7-2.fc26.ppc64le

Comment 2 Siteshwar Vashisht 2017-07-02 23:17:19 UTC
Setting LD_PRELOAD changes behavior of bash, so I am closing this as NOTABUG.

Comment 3 Robbie Harwood 2017-07-06 15:04:56 UTC
Um.  No.

This is working fine on all architectures, and has been working fine for almost a decade now.  ppc64le is not special here.  Preloading libsocket_wrapper is a standard operation for test suites.

Comment 4 Kamil Dudka 2017-07-07 07:48:02 UTC
(In reply to Robbie Harwood from comment #3)
> Preloading libsocket_wrapper is a standard operation for test suites.

The test suites of bash, ksh, tcsh, zsh do not use socket_wrapper, so it is probably not well tested (not sure if ever suitable) for instrumenting shell interpreters.

Feel free to use any instrumentation tools you find appropriate in your test suite but do not expect bash maintainers to debug issues like this.  The burden is really on you to prove that this is a bug of bash and that the bug is also reproducible in the native execution environment.  I have isolated a minimal example from your high-level bug description, so you can start from there.

Hint: If you do not want to debug the bash interpreter, there is a way to use the instrumentation for your binaries only while having bash executed natively.

Comment 5 Robbie Harwood 2017-07-07 16:59:57 UTC
Okay, let's do a quick lesson in how to use bugzilla.

1. If it is not your bug, don't touch the statuses else you risk irritating the maintainer(s).

2. If you are responsible for the package, take ownership (the assigned field) before closing a bug.  This lets the reporter know that the bug is being managed, and who actually has authority.

3. This is the most important.  If the bug is not in your package but instead (you think) in someone else's, assign it there.  Don't close it out of hand.

4. Don't expect users to debug your package for you.  This is especially true when you have a reproducer.

5. The NOTABUG status is reserved for things which are not bugs.  If something is a bug but you don't feel like fixing it, close with a different status.

6. Something which is not your use case is not invalid by virtue of being different.

7. If something was previously working, and is now not, and was not intentionally broken, what you have is a bug.

I have assigned this bug to socket_wrapper for investigation.  I hope that if it turns out to be a problem in bash you will be co-operative in debugging and fixing it.

Comment 6 Andreas Schneider 2017-07-10 09:27:05 UTC
I've looked at the bash code and it fails dereferencing and invalid pointer.

Robbie, could you run the command with valgrind?

Comment 7 Robbie Harwood 2017-07-10 18:00:20 UTC
Created attachment 1295901 [details]
valgrind output

Sure, it's attached.

Comment 8 Andreas Schneider 2017-07-31 16:44:45 UTC
The issue happens because of the redirection (> /dev/null) redirects stdout ...

We access a pointer in the linked list of redirections (redirect->flags).

==9507== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==9507==  Access not within mapped region at address 0x1C6
==9507==    at 0x22D7A0: do_redirection_internal (redir.c:865)
==9507==    by 0x22EADB: do_redirections (redir.c:234)
==9507==    by 0x1AEA5B: execute_null_command (execute_cmd.c:3905)
==9507==    by 0x1AEA5B: execute_simple_command (execute_cmd.c:4179)
==9507==    by 0x1D12FB: execute_command_internal (execute_cmd.c:814)
==9507==    by 0x23D057: parse_and_execute (evalstring.c:421)
==9507==    by 0x1AD483: run_one_command (shell.c:1409)
==9507==    by 0x1B23BB: main (shell.c:734)
==9507==  If you believe this happened as a result of a stack
==9507==  overflow in your program's main thread (unlikely but
==9507==  possible), you can try to increase the size of the
==9507==  main thread stack using the --main-stacksize= flag.
==9507==  The main thread stack size used in this run was 8388608.

The strange thing is the that the first derefenece of redirect->flags does not segfault.

Comment 9 Andreas Schneider 2017-08-01 10:23:48 UTC
I had an idea and improved several thing sin socket_wrapper. However then I thought lets reproduce the issue first and then check if it is fixed.

So I started a scratch build:

https://kojipkgs.fedoraproject.org//work/tasks/2006/20942006/build.log

It did not segfault but passed ...

Comment 10 Robbie Harwood 2017-08-01 14:58:11 UTC
socket_wrapper-1.1.7-3.fc27 failed to build on ppc64le during the mass rebuild: https://koji.fedoraproject.org/koji/buildinfo?buildID=932543

If you have a build you would like me to check I am happy to.

Comment 11 Andreas Schneider 2017-08-02 11:55:07 UTC
I've fixed the issue. Rawhide has a new build, you can also find it here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=20966998

Comment 12 Fedora Update System 2017-08-02 12:08:15 UTC
socket_wrapper-1.1.7-3.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-969354ba0e

Comment 13 Fedora Update System 2017-08-03 00:53:29 UTC
socket_wrapper-1.1.7-3.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-969354ba0e

Comment 14 Fedora Update System 2017-08-10 16:54:24 UTC
socket_wrapper-1.1.7-3.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.