Description of problem: I typed CTRL-D to end my session, and it crashed. This is happening, constantly, and the crashes are not all identified as the same type of crash by ABRT. Version-Release number of selected component: gnuplot-4.6.3-6.fc20 Additional info: reporter: libreport-2.2.1 backtrace_rating: 4 cmdline: gnuplot crash_function: el_wgets executable: /usr/bin/gnuplot-wx kernel: 3.13.10-200.fc20.x86_64 runlevel: N 5 type: CCpp uid: 13013 Truncated backtrace: Thread no. 1 (8 frames) #0 el_wgets at read.c:717 #1 el_gets at eln.c:80 #2 readline at readline.c:420 #3 readline_ipc at ../../src/readline.c:104 #4 rlgets at ../../src/command.c:2669 #5 gp_get_string at ../../src/command.c:2865 #6 read_line at ../../src/command.c:2897 #7 com_line at ../../src/command.c:314
Created attachment 888812 [details] File: backtrace
Created attachment 888813 [details] File: cgroup
Created attachment 888814 [details] File: core_backtrace
Created attachment 888815 [details] File: dso_list
Created attachment 888816 [details] File: exploitable
Created attachment 888817 [details] File: limits
Created attachment 888818 [details] File: maps
Created attachment 888819 [details] File: open_fds
Created attachment 888820 [details] File: proc_pid_status
See bug 1081764. Both of these are occuring very often and I cannot figure out why it's happening. All started with F20.
Could you give this build a try? http://koji.fedoraproject.org/koji/taskinfo?taskID=6770153 No particular expectation of a fix, but worth a shot.
Sure. I tried the packages you built and the symptoms are unchanged. This has cost me another weekend of system rebuilds. I thought it might be memory corruption because the backtraces are not always the same, eventhough I'm able to reproduce the crash using the same "technique" each time. But, now I have been able to reproduce the crash on a completely different hardware platform (my netbook), so there is a software bug. I just can't tell you where it is. I would love to get this fixed as it's costing me a lot of time. Please let me know how I can help you find the problem.
FWIW, I am not able to reproduce the crash on XUbuntu.
Can you run it under valgrind and attach the output from a session that crashed? I'm not seeing anything obvious in the backtrace. Might be a libedit issue as well.
Created attachment 891277 [details] valgrind output Full valgrind output from crashed gnuplot session. I'm very confused by the results, though; I don't know how to interpret it.
Sorry, isn't going to be much use without needed debuginfo packages installed: debuginfo-install gnuplot
Created attachment 891341 [details] valgrind results Full valgrind output for session that crashed. To cause this crash, all I did was: > plot sin(x) then I hit CTRL-C three or four times very quickly.
(In reply to Paul DeStefano from comment #17) > Created attachment 891341 [details] > valgrind results > > Full valgrind output for session that crashed. Hmm, not very instructive I'm afraid. > then I hit CTRL-C three or four times very quickly. I think reports from crashes that happen from normal behavior might be more interesting.
You're joking, right? After I did what you asked you are going to dismiss this reproduible crash? I use CTRL-C a lot to cancel command edits and start over or go to a different commands in my history. This is perfectly reasonable user bahavior. I may be accelerating the process a bit to save time for this report, but that doesn't change the fact gnuplot crashes constantly. Besides, I've already explained that it happens with a variety of use cases. This is just the most recent way it crashed. gnuplot crashes so often that I've had to stop using it. Can you please help?
First off - we are all volunteers here and I *am* trying to help, so let's keep it civil, okay. I'm sorry, but the reports you've sent (from not fault of your own) are not useful for debugging the crashes. The ctrl-c behavior seems strange and may be why this last report is not useful. I'm hopeful that a report from a session that crashed in a more natural way will be more helpful. But maybe it won't, and maybe this will take longer. A gdb backtrace would be helpful too. Run gnuplot under gdb. After crash enter "thread apply all bt" at the (gdb) prompt. You might get better help here: https://sourceforge.net/p/gnuplot/bugs/ since they are more familiar with the code.
(In reply to Orion Poplawski from comment #20) > First off - we are all volunteers here and I *am* trying to help, so let's > keep it civil, okay. Okay, good. I honestly thought you were blowing off this report. If not, then I'll just say I'm a volunteer, too, and I'm sorry if you felt I was uncivil. > I'm sorry, but the reports you've sent (from not fault of your own) are not > useful for debugging the crashes. The ctrl-c behavior seems strange and may > be why this last report is not useful. I'm hopeful that a report from a > session that crashed in a more natural way will be more helpful. But maybe > it won't, and maybe this will take longer. Are you saying that CTRL-C might be useful becuase it was running under valgrind at the time. Sort of like CTRL-C might have gone to valgrind, and not gnuplot. Because I can understand that; I don't know if valgrind gets CTRL-C and, if it does, if it passes it on or what. But, I don't know why you think this is such an unusual case. If you are editing a command in you history and you decided not to run that command, how do you cancel the edit and start over? This happens to me constantly. And, I've never noticed a correlation with crashing before F20. > A gdb backtrace would be helpful too. Run gnuplot under gdb. After crash > enter "thread apply all bt" at the (gdb) prompt. Okay! Yes, I can do this. > You might get better help here: https://sourceforge.net/p/gnuplot/bugs/ > since they are more familiar with the code. Maybe, but this all started with F20, and, moreover, it doesn't happen on Ubuntu and that's the same version of gnuplot.
What version on libedit is on your ubuntu Install?
The libedit2 pkg says version 3.1-2013712-1 I just got the notice of a new major version, so I'm upgrading my XUbuntu system, now.
Can you try installing an updated libedit from here: http://koji.fedoraproject.org/koji/taskinfo?taskID=6832084 and see if that helps?
Sorry, I didn't forget; just took me while to get back to this. I'm using a VM to do some F20 testing since I've had a couple very sudden and strange problems. I installed F20 fresh and then installed koji build libedit-3.1-5.20140213cvs.fc21.x86_64. The problem was reproducible very easy in using the "plot sin(x) then fast CTRL-C" method I've been using. Also got you a backtrace with all debuginfos: Program received signal SIGSEGV, Segmentation fault. 0x00007fada440c0a2 in el_wgets (el=el@entry=0x11f9a60, nread=0x0, nread@entry=0x7fffb47f17f4) at read.c:717 717 *nread = num != -1 ? num : 0; (gdb) bt #0 0x00007fada440c0a2 in el_wgets (el=el@entry=0x11f9a60, nread=0x0, nread@entry=0x7fffb47f17f4) at read.c:717 #1 0x00007fada441c10d in el_gets (el=0x11f9a60, nread=nread@entry=0x7fffb47f17f4) at eln.c:80 #2 0x00007fada4417470 in readline (p=0x50e30e "gnuplot> ") at readline.c:427 #3 0x000000000047e875 in readline_ipc (prompt=<optimized out>) at ../../src/readline.c:104 #4 0x000000000041eaa7 in rlgets (prompt=0x50e30e "gnuplot> ", n=1024, s=0x11f3100 "") at ../../src/command.c:2669 #5 gp_get_string (prompt=0x50e30e "gnuplot> ", len=1024, buffer=0x11f3100 "") at ../../src/command.c:2865 #6 read_line (prompt=prompt@entry=0x50e30e "gnuplot> ", start=<optimized out>, start@entry=0) at ../../src/command.c:2897 #7 0x0000000000421e7c in com_line () at ../../src/command.c:314 #8 0x0000000000415a76 in main (argc=0, argv=0x7fffb47f1b78) at ../../src/plot.c:684 second try: Program received signal SIGSEGV, Segmentation fault. 0x00007fe9f98330a2 in el_wgets (el=el@entry=0xeb6a60, nread=0x0, nread@entry=0x7fffdf23afd4) at read.c:717 717 *nread = num != -1 ? num : 0; (gdb) bt #0 0x00007fe9f98330a2 in el_wgets (el=el@entry=0xeb6a60, nread=0x0, nread@entry=0x7fffdf23afd4) at read.c:717 #1 0x00007fe9f984310d in el_gets (el=0xeb6a60, nread=nread@entry=0x7fffdf23afd4) at eln.c:80 #2 0x00007fe9f983e470 in readline (p=0x50e30e "gnuplot> ") at readline.c:427 #3 0x000000000047e875 in readline_ipc (prompt=<optimized out>) at ../../src/readline.c:104 #4 0x000000000041eaa7 in rlgets (prompt=0x50e30e "gnuplot> ", n=1024, s=0xeb0100 "") at ../../src/command.c:2669 #5 gp_get_string (prompt=0x50e30e "gnuplot> ", len=1024, buffer=0xeb0100 "") at ../../src/command.c:2865 #6 read_line (prompt=prompt@entry=0x50e30e "gnuplot> ", start=<optimized out>, start@entry=0) at ../../src/command.c:2897 #7 0x0000000000421e7c in com_line () at ../../src/command.c:314 #8 0x0000000000415a76 in main (argc=0, argv=0x7fffdf23b358) at ../../src/plot.c:684 Does this help?
Re-assigning to libedit to get some more eyes on this, but I don't understand it. First thought is "oh, nread is set to null", but read.c has this at the top: if (nread == NULL) nread = &nrb; *nread = 0; So it should already handle this case. So it still makes no sense to me. Next time in gdb, do a "print nread".
Thread 1 is the crasher: Thread 1 (Thread 0x7f68f2773a40 (LWP 20072)): #0 0x0000003c27213f92 in el_wgets (el=el@entry=0x117df80, nread=0x0, nread@entry=0x7fff5bc158b4) at read.c:717 retval = <optimized out> cmdnum = <optimized out> num = 185 ch = 10 L'\n' cp = <optimized out> crlf = 0 nrb = 32616 Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000003c27213f92 in el_wgets (el=el@entry=0x117df80, nread=0x0, nread@entry=0x7fff5bc158b4) at read.c:717 717 *nread = num != -1 ? num : 0; And valgrind reports: ==27740== Invalid write of size 8 ==27740== at 0x3C27E0EC18: recv (recv.c:35) ==27740== by 0x4D6C067: ??? ==27740== by 0xFFEFFF57F: ??? ==27740== by 0xFFEFFF58F: ??? ==27740== Address 0xffefff448 is on thread 1's stack which must be how nread is being set to NULL, because it isn't possible for it to be NULL under normal circumstances. I have tried for about 10 minutes to reproduce this, and have not been able to get it to crash, so clearly I'm not doing whatever causes that recv() invocation. Can you reproduce under valgrind again but with debuginfo packages installed, so we can see where that recv() call is made? I don't see any recv() calls in either libedit or gnuplot.
(In reply to Orion Poplawski from comment #26) > Next time in gdb, do a "print nread". Okay, I can do that. (In reply to Jerry James from comment #27) > Can you reproduce under valgrind again but with debuginfo packages > installed, so we can see where that recv() call is made? I don't see any > recv() calls in either libedit or gnuplot. Yes, I can do that. But, I installed a ton of debuginfos already. I can't tell what infos I'm still missing. I actually whent through three stages of gdb telling me "use debuginfo-install ...". It stopped saying that, so I figured I had finally got them all, but I guess not. I thought debuginfo-install was supposed to get everything. Is there a better way to install debuginfos for a particular application?
Oh, I'm sorry. My mistake. I was looking at the first valgrind run you did, before you installed debuginfo packages. The valgrind run from after installing debuginfo packages isn't much help, either. It shows the probable cause of the stack corruption that leads to the crash: ==23580== Invalid write of size 8 ==23580== at 0x3C276EA9E4: ??? (syscall-template.S:81) ==23580== by 0xFFEFFF79F: ??? ==23580== Address 0xffefff648 is on thread 1's stack but without any useful information to show where that call came from. Hmmm. Is something using a separate signal stack, perhaps?
I just took a quick look at bug 1081764. It appears similar: a variable that has already been dereferenced is suddenly NULL, leading to the crash. Something seems to be writing zeroes over memory it shouldn't be touching. I still cannot reproduce the crash, by the way. I've tried quite a few times now, and I never get the crash you are seeing. I'm doing "plot sin(x)" and hitting Ctrl-C like mad. That's your recipe, right? If you feel adventuresome, you might try passing the --vgdb and --vgdb-error options to valgrind to see if you can catch that invalid write in action and try to figure out what is causing it. It was the 3rd error in the first valgrind output you attached, and the 4th error in the second.
(In reply to Jerry James from comment #29) > Oh, I'm sorry. My mistake. I was looking at the first valgrind run you > did, before you installed debuginfo packages. Actually I think the mistake was mine: you said run *valgrind* again. Your right, I didn't have many debuginfos installed for my valgrind run. I was thinking of running gdb, again. So, no problem! I'm running gdb just to get it to complain about missing debuginfo packages. When it stops, I'll run valgrind and hopefully that will give us what we need. (I don't understand why 'debuginfo-install gnuplot' says everything is installed and even gdb doesn't complain on start up. But after the crash, gdb says it needs more infos.) (In reply to Jerry James from comment #30) > I still cannot reproduce the crash, by the way. I've tried quite a few > times now, and I never get the crash you are seeing. I'm doing "plot > sin(x)" and hitting Ctrl-C like mad. That's your recipe, right? Yes, that's about it. It typically happens after a handfull of CTRL-C's, but if not, I just keep it depressed and let the keyboard repeat rate kick in, and that does it. > If you feel adventuresome, you might try passing the --vgdb and --vgdb-error > options to valgrind to see if you can catch that invalid write in action and > try to figure out what is causing it. It was the 3rd error in the first > valgrind output you attached, and the 4th error in the second. Sure, no problem.
Well, I take it back. I'm having a hard time putting all these pieces together. When I run gnuplot under valgrind now, it just dies when I CLTRL-C, it doesn't segfault. Looks like this: gnuplot> plot sin(x) gnuplot> Killed $ Under gdb, I can get you a new bt with more debuginfos: gnuplot> Program received signal SIGSEGV, Segmentation fault. 0x00000031d56ec703 in select () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) ^CQuit (gdb) ^CQuit (gdb) ^CQuit (gdb) bt #0 0x00000031d56ec703 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000503bd4 in wxt_waitforinput () at ../../src/wxterminal/wxt_gui.cpp:3458 #2 0x000000000047e85a in getc_wrapper (fp=0x0) at ../../src/readline.c:82 #3 0x00000031e722029f in _getc_function (el=<optimized out>, c=0x7fffffffdcc8 "\266\200E") at readline.c:221 #4 0x00000031e721392e in el_wgetc (el=el@entry=0xa0aa60, cp=cp@entry=0x7fffffffdcc8 L"\x4580b6") at read.c:439 #5 0x00000031e7213bcf in read_getcmd (ch=0x7fffffffdcc8 L"\x4580b6", cmdnum=<synthetic pointer>, el=0xa0aa60) at read.c:247 #6 el_wgets (el=el@entry=0xa0aa60, nread=nread@entry=0x7fffffffdd34) at read.c:586 #7 0x00000031e722407d in el_gets (el=0xa0aa60, nread=nread@entry=0x7fffffffdd34) at eln.c:80 #8 0x00000031e7220b20 in readline (p=0x50e30e "gnuplot> ") at readline.c:420 #9 0x000000000041eaa7 in read_line (prompt=<optimized out>, start=<optimized out>) at ../../src/command.c:2669 #10 0x0000000000421e7c in com_line () at ../../src/command.c:314 #11 0x0000000000415a76 in main (argc=0, argv=0x7fffffffe0b8) at ../../src/plot.c:684 I was able to get valgrind w/ gdbserver running and talking to gdb. But, I have to do a number of machinations to get gdb NOT to convert interrupts into stop events. And, even then, I see a lot of SIGTRAPs before it finally dies and it's not the same as under normal conditions. -- Here is the GDB end of it: -- (gdb) (gdb) c Continuing. Program received signal SIGINT, Interrupt. [New Thread 1932] Program received signal SIGINT, Interrupt. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000031d5e0ec18 in __libc_recv (fd=0, buf=0x4d10a04, n=4096, flags=-706679797) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:35 35 LIBC_CANCEL_RESET (oldtype); (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. __pthread_disable_asynccancel () at ../nptl/sysdeps/unix/sysv/linux/x86_64/cancellation.S:104 104 1: ret (gdb) Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. __libc_recv (fd=0, buf=0x4d10a04, n=4096, flags=-706679797) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:38 38 } (gdb) Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000031d5e0ec25 in __libc_recv (fd=0, buf=0x4d10a04, n=4096, flags=-706679797) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:38 38 } (gdb) Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000031d5e0ec26 in __libc_recv (fd=0, buf=0x4d10a04, n=4096, flags=-706679797) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:38 38 } (gdb) Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000000ffefff988 in ?? () (gdb) Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0000000ffefff988 in ?? () (gdb) bt #0 0x0000000ffefff988 in ?? () #1 0x0000000ffefff8b0 in ?? () #2 0x0000000ffefff8a0 in ?? () #3 0x0000000000000000 in ?? () (gdb) c Continuing. Program terminated with signal SIGKILL, Killed. The program no longer exists. (gdb) -- Here is part of the valgrind file: -- ==1931== Memcheck, a memory error detector ==1931== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==1931== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==1931== Command: gnuplot ==1931== Parent PID: 1461 ==1931== ==1931== ==1931== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==1931== /path/to/gdb gnuplot ==1931== and then give GDB the following command ==1931== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=1931 ==1931== --pid is optional if only one valgrind process is running ==1931== ==1931== ==1931== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==1931== /path/to/gdb gnuplot ==1931== and then give GDB the following command ==1931== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=1931 ==1931== --pid is optional if only one valgrind process is running ==1931== ==1931== Invalid write of size 8 ==1931== at 0x31D5E0EC18: recv (recv.c:35) ==1931== by 0x4CBC9C7: ??? ==1931== by 0xFFEFFF97F: ??? ==1931== by 0xFFEFFF98F: ??? ==1931== Address 0xffefff848 is on thread 1's stack ==1931== ==1931== (action on error) vgdb me ... ==1931== Continuing ... ==1931== Invalid read of size 8 ==1931== at 0x31D5E0E51F: __pthread_disable_asynccancel (cancellation.S:104) ==1931== by 0x31D5E0EC1C: recv (recv.c:35) ==1931== by 0xFFEFFF987: ??? ==1931== by 0xFFEFFF8AF: ??? ==1931== by 0xFFEFFF89F: ??? ==1931== Address 0xffefff848 is on thread 1's stack ... repeats many times ... ... heap summary ... ==1931== LEAK SUMMARY: ==1931== definitely lost: 4,048 bytes in 9 blocks ==1931== indirectly lost: 21,728 bytes in 890 blocks ==1931== possibly lost: 69,992 bytes in 795 blocks ==1931== still reachable: 2,819,834 bytes in 17,474 blocks ==1931== suppressed: 0 bytes in 0 blocks ==1931== Reachable blocks (those to which a pointer was found) are not shown. ==1931== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==1931== ==1931== For counts of detected and suppressed errors, rerun with: -v ==1931== ERROR SUMMARY: 760 errors from 760 contexts (suppressed: 2 from 2) See the test8 attachment for full valgrind output. I see that there are still debuginfos missing, but I don't know who recv.c belongs to. Is this recv(2) as in sockets?
OK, so me and a colleague of mine took a look at this and here is what we could come up with: 1.) The issue was introduced with the following patch: http://pkgs.fedoraproject.org/cgit/gnuplot.git/tree/gnuplot-4.6.4-singlethread.patch?h=f20 If you compile gnuplot without it, the problem goes away. 2.) It does not seem to be libedit related at all as I got few call traces where it crashed without any libedit involvement whatsoever. Afterwards, I was able to find the problem that the patch was supposed to fix: http://gnuplot.10905.n7.nabble.com/wxGtk-crash-on-haswell-td17944.html Considering the fact that the issue was haswell-specific and the report specifically mentions the problematic xbegin/xend instructions I'm inclined to believe that it was related to the following bug (although the bz is filed against f21, maybe the reporter updated the microcode manually?): https://bugzilla.redhat.com/show_bug.cgi?id=1146967 I could not hit the issue specified in the nabble.com link when I compiled gnuplot without the patch (e.g. plot sin(x) works perfectly fine on my haswell machine without the patch) so hopefully, it is safe to revert the patch, now. -> reassigning back to gnuplot.
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.