Bug 541386 - canto segmentation fault
Summary: canto segmentation fault
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: python
Version: 12
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Andreas Osowski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 558115 558301 563028 (view as bug list)
Depends On: 539917
Blocks: 545294
TreeView+ depends on / blocked
 
Reported: 2009-11-25 18:22 UTC by Mark Knoop
Modified: 2010-03-09 03:15 UTC (History)
10 users (show)

Fixed In Version: python-2.6.2-4.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-09 03:15:25 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace canto &>canto-strace (359.68 KB, text/plain)
2009-11-25 18:22 UTC, Mark Knoop
no flags Details
abrt backtrace (22.92 KB, text/plain)
2009-11-25 20:43 UTC, Andreas Osowski
no flags Details
gdb backtrace (11.10 KB, text/plain)
2009-12-02 09:57 UTC, Mark Knoop
no flags Details
patch to python-2.6.2-config.patch (919 bytes, patch)
2010-01-16 16:58 UTC, Mark Knoop
no flags Details | Diff

Description Mark Knoop 2009-11-25 18:22:40 UTC
Created attachment 373806 [details]
strace canto &>canto-strace

Description of problem:
Canto crashes with segmentation fault on startup. 

Version-Release number of selected component (if applicable):
canto-0.7.4-1.fc12.x86_64

How reproducible:
100%

Steps to Reproduce:
1. canto
2. Application starts, loads feeds
3. Crashes at beginning main loop
  
Additional info:
- No change when run without existing ~/.canto
- Also crashes with self-built canto-0.7.5
- see attached strace

~/.canto/log contains:

Canto v 0.7.4 (sh:)
Time: Wed Nov 25 18:15:37 2009
Config parsed successfully.
Populating feeds...
Precaching: []
Curses initialized.
GUI initialized.
Signals set.
Beginning main loop.

Comment 1 Andreas Osowski 2009-11-25 20:43:12 UTC
Created attachment 373823 [details]
abrt backtrace

Hello,
thanks for reporting the bug.
I forgot to push the canto 0.7.5 update the past weekend,
yet, as you already stated, the crash occurs here (i686) too.
However, the crash appears to be caused by python internals.
(based upon backtraces)

@dmalcolm:
This is the backtrace on my system (i686, fc12) for canto-0.7.4 
abadger1999 unfortunately was unable to help me.

Comment 2 Mark Knoop 2009-11-30 19:38:13 UTC
Ref canto issue tracker: http://github.com/themoken/Canto/issues#issue/5

Comment 3 Dave Malcolm 2009-12-01 03:52:21 UTC
Sorry for the belated response.

This is possibly a symptom of bug 539917

FWIW I'm seeing various "random" crashes inside Python running canto on F11 (canto-0.7.4-1.fc11.i586).

I reproduced the backtrace from comment #1;
frame #2 (PyEval_EvalFrameEx) is at:
/usr/lib/python2.6/site-packages/canto/interface_draw.py (436): status
which is here:
    def status(self, bar, height, width, str):
        self.simple_out([(str, u" ", u"")], 0, height, width, [bar])

frame #1 (call_function) is the call to "simple_out"
frame #0 (list_dealloc) is crashing, decreffing a PyListObject, "w" below:

3738		while ((*pp_stack) > pfunc) {
3739			w = EXT_POP(*pp_stack);
3740			Py_DECREF(w);
3741			PCALL(PCALL_POP);
3742		}
3743		return x;

(gdb) pyo op
object  : <refcnt 0 at 0xb704f12c>
type    : list
refcount: 0
address : 0xb704f12c

and the list's ob_item seems to be corrupt

Comment 4 Dave Malcolm 2009-12-01 20:11:27 UTC
I spent a little time trying to track this down.

It looks like the heap is getting corrupted either during the call to canto's canto/widecurse.c:mvw (implementation of widecurse.core), or shortly afterwards.

I'm not sure where the specific problem is.

However, I did notice that in widecurse.c, various functions ("disable_color" etc) return Py_None without doing an INCREF on that object; this _is_ a bug, though I set breakpoints on these functions and they didn't seem to be being called.  They need to have a:
    Py_INCREF(Py_None);
before the 
    return Py_None;
or to replace it with this macro:
    return Py_RETURN_NONE;
which does the INCREF

(FWIW, at the time of the crash,
(gdb) p _Py_NoneStruct 
$70 = {ob_refcnt = 5849, ob_type = 0x672c820}
which is much greater than 0, so it looks like None isn't getting freed, so this is a different bug)

Hope this is helpful.

Comment 5 Dave Malcolm 2009-12-01 20:36:35 UTC
> or to replace it with this macro:
>     return Py_RETURN_NONE;
> which does the INCREF
Sorry, this should simply read:
      Py_RETURN_NONE;
as the macro contains the "return" statement (it's in python's object.h)

Comment 6 Jack Miller 2009-12-01 21:14:02 UTC
(In reply to comment #5)
> > or to replace it with this macro:
> >     return Py_RETURN_NONE;
> > which does the INCREF
> Sorry, this should simply read:
>       Py_RETURN_NONE;
> as the macro contains the "return" statement (it's in python's object.h)  

Thanks alot Dave, I've fixed this in git. As you said, different bug though. If there's any help I can offer wrt this bug, let me know (I'm the author of canto).

Comment 7 Dave Malcolm 2009-12-02 00:44:34 UTC
I'm reinstalling my primary machine; I hope to have another look at this when that's done.

This may well be simply a problem with Fedora's python curses module; see bug 539917.

If it isn't that, then I believe that something is corrupting an internal data structure, and the program later crashes when structure is read (threads? heap corruption?)

If that's the case, then I don't think the strace approach described in the upstream report is going to help, and it's going to be hard to track this down.

Some approaches for locating this:
  (difficult) use gdb to try to track down the segfault.  Invoke it thus:
[david@brick ~]$ gdb --args python /usr/bin/canto
(gdb) run

when it segfaults:
(gdb) bt 

though it's somewhat tricky dealing with this due to the way curses has reset the terminal.

  (much more involved): rebuild python without memory arenas (i.e. configure --without-pymalloc), and run it under valgrind:
valgrind python /usr/bin/canto

Comment 8 Mark Knoop 2009-12-02 09:57:11 UTC
Created attachment 375383 [details]
gdb backtrace

OK, this is a backtrace of canto from git (commit 0b2b790a9870ac8703eb006c8f43f01c2b574723) run with gdb. Hope it might help.

Comment 9 Mark Knoop 2009-12-02 10:02:02 UTC
I should add that gdb requested PyXML-debuginfo which wasn't in the repos, so I downloaded the fc12 updates-candidates from koji and installed them.

PyXML-debuginfo-0.8.4-17.fc12.x86_64
PyXML-0.8.4-17.fc12.x86_64

Comment 10 Andreas Osowski 2009-12-21 23:04:27 UTC
Hello,
could I get some update on this?

Apparently, the last activity on the github bug tracker was on 12-01 too

Unfortunately, the problem still persists.

Comment 11 Jack Miller 2010-01-16 02:43:46 UTC
I finally got around to debugging myself in an FC12 VM. I can confirm that this is most likely a symptom of #539917. I messed around with the window object passed to the extension and it's behavior is very strange. Removing the addch calls removes the segfault (unsurprisingly, considering they're the only calls made from the extension into curses), but the coordinates given to all of the calls are valid (inside the coordinates) so they shouldn't cause a problem (even though they do).

Compiling and installing a fresh copy of 2.6.4 and ensuring that the _curses.so is linked against ncursesw solved the problem. As such, I'm willing to bet that closing #539917 will fix this.

Comment 12 Dave Malcolm 2010-01-16 15:11:59 UTC
Thanks for checking into this; sorry about the lack of activity here.

Marking bug 539917 as blocking this (also to generate a URL linking to that bug).

Comment 13 Andreas Osowski 2010-01-16 16:37:21 UTC
Thanks for your efforts, Jack.
I hope with your information, we'll be able to solve this bug rather soon.

Comment 14 Mark Knoop 2010-01-16 16:58:48 UTC
Created attachment 384817 [details]
patch to python-2.6.2-config.patch

Comment 15 Mark Knoop 2010-01-16 16:59:21 UTC
Just to confirm, I've successfully rebuilt python with the above changes to the config patch (essentially the same as in bug 242583) and canto is once again working.

Comment 16 Dave Malcolm 2010-01-17 16:10:12 UTC
Thanks Jack and Mark!

I'm writing up some more notes on this in bug 539917.

Comment 17 Andreas Osowski 2010-01-25 06:54:30 UTC
*** Bug 558301 has been marked as a duplicate of this bug. ***

Comment 18 Andreas Osowski 2010-01-25 06:54:40 UTC
*** Bug 558115 has been marked as a duplicate of this bug. ***

Comment 19 Andreas Osowski 2010-01-25 06:56:05 UTC
Hello everyone,
what is the status of this bug?
Given that the patch seems to be working, 
it'd be great if we could fix it soon as I've got an increasing number of 
users reporting the bugs :(

Comment 20 Dave Malcolm 2010-01-25 19:04:10 UTC
Sorry about the delay.

I've submitted https://admin.fedoraproject.org/updates/F12/FEDORA-2010-0393 to "testing", which has a fix for this bug.  See also bug 539917

Comment 21 Fedora Update System 2010-01-27 01:12:22 UTC
python-2.6.2-4.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update python'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-0393

Comment 22 Pierre Dorbais 2010-01-30 19:42:57 UTC
canto works with python-2.6.2-4 update

Comment 23 Andreas Osowski 2010-02-21 12:01:30 UTC
*** Bug 563028 has been marked as a duplicate of this bug. ***

Comment 24 Fedora Update System 2010-03-09 03:15:09 UTC
python-2.6.2-4.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.