Bug 133662

Summary: [x86_64] Python plugins cause gnumeric to abort on exit
Product: [Fedora] Fedora Reporter: Robert Walsh <rjwalsh>
Component: gnumericAssignee: Caolan McNamara <caolanm>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: michal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-22 09:14:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
example that uses py_printf none

Description Robert Walsh 2004-09-25 21:20:51 UTC
Description of problem:
If I create a spreadsheet that uses a python plugin, gnumeric crashes
on exit.  It's pretty easy to reproduce.  Run gnumeric and put the
following formula into a cell:

  =py_printf("Hello")

Now exit gnumeric (click Discard when prompted what to do with the
changes) and it will dump core.  Here's what gdb says:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894325888 (LWP 13521)]
0x0000002a986d11e0 in ?? ()
(gdb) bt
#0  0x0000002a986d11e0 in ?? ()
#1  0x00000031b45171d9 in g_datalist_clear () from
/usr/lib64/libglib-2.0.so.0
#2  0x00000031b4d0e43f in g_object_unref () from
/usr/lib64/libgobject-2.0.so.0
#3  0x0000000000485a97 in g_slist_free_custom ()
#4  0x00000000004ac94c in gnm_plugin_get_type ()
#5  0x00000031b4d0e43f in g_object_unref () from
/usr/lib64/libgobject-2.0.so.0
#6  0x0000000000485a97 in g_slist_free_custom ()
#7  0x00000000004b007d in plugins_shutdown ()
#8  0x000000000048f758 in gnm_shutdown ()
#9  0x00000000005130ba in main ()

Looks like some boogums in the plugin-cleanup code.

Version-Release number of selected component (if applicable):
gnumeric-1.2.13-2.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Run gnumeric.
2. Place a python call in a cell (e.g. =py_printf("Hello"))
3. Exit gnumeric.
  
Actual results:
Program gets a SEGV.

Expected results:
Program should exit normally.

Additional info:
Leaving this at severity "normal" since it doesn't cause loss of data
- everything is saved by the time the crash occurs and the only thing
that doesn't get saved is the file history.

Comment 1 Caolan McNamara 2004-09-27 09:46:02 UTC
hmm, gnumeric-1.2.13-2 x86 works for me, so this might be an x64 only
issue. gnumeric bugzilla has similiar bugs, but marked as resolved 

Comment 2 Daniel Veillard 2004-09-27 10:12:03 UTC
I can reproduce this on rawhide, after activating the python
plug-in, the cell displayed the Hello string, correctly, but
gnumeric crashed on exit as reported.
  gnumeric-1.2.13-2
  python-2.3.4-10

----------------------------------------
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182895255680 (LWP 3735)]
0x0000002a98ec41e0 in ?? ()
(gdb) where
#0  0x0000002a98ec41e0 in ?? ()
#1  0x000000389fa171d9 in g_datalist_clear () from
/usr/lib64/libglib-2.0.so.0
#2  0x000000389fc0e43f in g_object_unref () from
/usr/lib64/libgobject-2.0.so.0
#3  0x0000000000485a97 in g_slist_free_custom ()
#4  0x00000000004ac94c in gnm_plugin_get_type ()
#5  0x000000389fc0e43f in g_object_unref () from
/usr/lib64/libgobject-2.0.so.0
#6  0x0000000000485a97 in g_slist_free_custom ()
#7  0x00000000004b007d in plugins_shutdown ()
#8  0x000000000048f758 in gnm_shutdown ()
#9  0x00000000005130ba in main ()
---------------------------------------------

Daniel

Comment 4 Caolan McNamara 2004-09-28 07:36:33 UTC
Can you give the x64 gnumeric at
http://people.redhat.com/caolanm/gnumeric/ a test to see if my first
attempt at a simple fix works for you

Comment 5 Robert Walsh 2004-09-28 18:34:33 UTC
I'll give it a spin later this evening when I get home and let you
know how it goes.  Thanks for hopping on this so quickly!

Comment 6 Robert Walsh 2004-09-29 02:42:42 UTC
No joy.  I get the same crash.  I installed the debuginfo, so the
backtrace now has some extra information:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894325888 (LWP 1646)]
0x0000002a986d01e0 in ?? ()
(gdb) bt
#0  0x0000002a986d01e0 in ?? ()
#1  0x00000031b45171d9 in g_datalist_clear () from
/usr/lib64/libglib-2.0.so.0
#2  0x00000031b4d0e43f in g_object_unref () from
/usr/lib64/libgobject-2.0.so.0
#3  0x0000000000485a97 in g_slist_free_custom (list=0x954300, 
    free_func=0x446120 <g_object_unref>) at gutils.c:224
#4  0x00000000004ac94c in gnm_plugin_finalize (obj=0x946bb0) at
plugin.c:155
#5  0x00000031b4d0e43f in g_object_unref () from
/usr/lib64/libgobject-2.0.so.0
#6  0x0000000000485a97 in g_slist_free_custom (list=0x93e980, 
    free_func=0x4aff30 <gnm_plugin_try_unref>) at gutils.c:224
#7  0x00000000004b007d in plugins_shutdown () at plugin.c:1843
#8  0x000000000048f758 in gnm_shutdown () at libgnumeric.c:158
#9  0x00000000005130ba in main (argc=8491024, argv=0x415a1f81)
    at main-application.c:243
(gdb) f 3
#3  0x0000000000485a97 in g_slist_free_custom (list=0x954300, 
    free_func=0x446120 <g_object_unref>) at gutils.c:224
224                     free_func (l->data);
(gdb) p l 
$1 = (GSList *) 0x954300
(gdb) p l->data
$2 = 0x990260
(gdb) p *l
$3 = {data = 0x990260, next = 0x0}


Comment 7 Caolan McNamara 2004-09-29 14:34:50 UTC
This will be a bit tricky to figure out. Running from a remote and
slow x86_64 didn't blow up for me, which *might* imply a threading
issue. For the record what is the rpm -q python in the unlikely case
it turns out to be python's fault.

Comment 8 Robert Walsh 2004-09-29 16:56:14 UTC
python-2.3.4-10

Comment 9 Caolan McNamara 2004-09-30 09:52:42 UTC
Created attachment 104573 [details]
example that uses py_printf

> gdb gnumeric
(gdb) run --disable-crash-dialog --quit Book1.gnumeric

Comment 10 Caolan McNamara 2004-12-15 20:07:14 UTC
I could never reproduce this problem on the x86_64 machines available
to me, do you want to give 1.4.1-1 a try. If so grab the rpms from
http://people.redhat.com/caolanm/gnumeric/ or soon available from rawhide

Comment 11 Robert Walsh 2004-12-15 20:17:45 UTC
Sure - I'll give it a spin later on tonight when I get home.

Comment 12 Robert Walsh 2004-12-16 03:15:46 UTC
Gnee - I can't install without also updating python to 2.4 (I'm
running FC3 here, which has python 2.3.)  I'll poke around in rawhide
later on, but I'm a bit nervous about doing this and causing other
dependency problems (I don't want to update my relatively stable FC3
to rawhide...)

Comment 13 Robert Walsh 2004-12-16 05:05:38 UTC
Eep - I'm going to pass on testing this.  gnumeric needs a new python,
which needs a new compat-db, which needs a new gnome
something-or-other, etc.  Too many changes for my liking at the
moment.  I'll wait until FC4 to try this out, I think.  I'll run
Valgrind on it then, too.

Comment 14 Caolan McNamara 2004-12-17 14:52:12 UTC
I've also stuffed a gnumeric-1.2.13-9.fc3.x86_64 into
http://people.redhat.com/caolanm/gnumeric/ its just a rebuild of the
gnumeric-1.2.13-8.fc3.x86_64 that fixes a worrying quirk I saw with
python libraries during the build. I seriously doubt that it makes any
difference, but you never know.

Comment 15 Robert Walsh 2004-12-17 19:01:03 UTC
Well, it's worth trying.  I'll install it tonight when I get home
(becoming a familiar refrain :-)

Comment 16 Robert Walsh 2004-12-18 03:21:50 UTC
Huh!  What do you know - those FC3 rpms actually solved the problem! 
Thanks!

Comment 17 Michal Jaegermann 2004-12-19 05:56:30 UTC
I cannot reproduce that on x86_64 with gnumeric-1.4.1-3 neither
with a a sample from a comment #9 nor with something which I produced
myself.

OTOH with gnumeric-1.2.13-8.fc3 on an FC3 installation, after
I had to turn on "Python functions" in plugins (is that expected?),
I can reproduce that crash every time.  Oh, both installations
are _not_ SMP.

Here is a backtrace:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894258848 (LWP 28385)]
0x000000000049d22d in mstyle_unref ()
(gdb) bt
#0  0x000000000049d22d in mstyle_unref ()
#1  0x000000000047bfa0 in gnm_conf_shutdown ()
#2  0x000000000048f79f in gnm_shutdown ()
#3  0x00000000005130ca in main ()

Is this still off interest?


Comment 18 Michal Jaegermann 2004-12-19 06:27:41 UTC
Responding to my own question it appears that this interest is
still there. 1/2 :-)

I decided to try with gnumeric-1.2.13-9.fc3.x86_64 and I got
exactly the same segmentation fault.  This time I loaded debuginfo
as well.  Here is what I see:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894258848 (LWP 28650)]
mstyle_unref (style=0x0) at mstyle.c:762
762             g_return_if_fail (style->ref_count > 0);
(gdb) l
757     }
758
759     void
760     mstyle_unref (GnmStyle *style)
761     {
762             g_return_if_fail (style->ref_count > 0);
763
764             d(("unref %p = %d\n", style, style->ref_count-1));
765             if (style->ref_count-- <= 1) {
766                     g_return_if_fail (style->link_count == 0);
(gdb) p style
$1 = (GnmStyle *) 0x0
(gdb) up
#1  0x000000000047bfa0 in gnm_conf_shutdown () at gnumeric-gconf.c:289
289             mstyle_unref (prefs.printer_decoration_font);
(gdb) l
284     }
285
286     void
287     gnm_conf_shutdown (void)
288     {
289             mstyle_unref (prefs.printer_decoration_font);
290             prefs.printer_decoration_font = NULL;
291     }
292
293     void

I wonder if this related to these warnings:

GnomePrintCupsPlugin-Message: The ppd file for the CUPS printer lp
could not be loaded

No idea why it "could not be loaded".  That printer is remote
but a bog-standard Lexmark Optra and configured that way.

BTW - CUPS bitching is followed immediately by

** (gnumeric:28650): CRITICAL **: file func.c: line 327
(gnm_func_free): assertion `func->ref_count == 0' failed

plus a bunch of warnings about assorted leaks.

With all of that said and done now I am not sure if this is
the same SIGSEGV as the one originally reported by Robert.


Comment 19 Robert Walsh 2004-12-19 06:54:11 UTC
Looks a bit different to what I was seeing all right.  I can't
reproduce that myself, and I don't see any CUPS errors, either.

Comment 20 Michal Jaegermann 2004-12-19 16:41:22 UTC
There still could be a common cause.  Namely exit functions
trying to clean up some resources which for some reasons, legitimate
or not, were not assigned.  That would explain why are seeing
different bombs or not at all.  If this is really the issue then
quite possibly it is not processor dependent.

The case which I am seeing is a clear and obvious bug.  Even if
you should not be able to pass a null pointer to
g_return_if_fail() then the bug is somewhere else allowing this
to happen.  A pointer cut down somewhere to 32-bits which are all 0?

Comment 21 Caolan McNamara 2004-12-20 09:15:05 UTC
The prefs.printer_decoration_font problem sounds suspiciously related
to the fix "src/gnumeric-gconf.c
(gnm_conf_init_printer_decoration_font) : add a default in case conf
lookup fails." at
http://cvs.gnome.org/viewcvs/gnumeric/src/gnumeric-gconf.c?r1=1.61&r2=1.62

so I'd say its not the same as the python issue which looks in good
shape all of a sudden. I'll either push an 1.2 undate with the python
fix and the init_printer fix applied, or push a 1.4.1 as an update
with the python fix.

Comment 23 Caolan McNamara 2004-12-22 09:14:16 UTC
caolanm->rjwalsh: I've pushed 1.2.13-10 an an update to fc-3 which is
the same as gnumeric-1.2.13-9.fc3 reported to not have the crash on
exiting with python plugins activated problem [which I hope stays
squished :-)] 

caolanm->michal:
As a bonus 1.2.13-10 adds a backport of the patch which will hopefully
fix the seperate mstyle_unref crash. Though if there is still a crash
with mstyle_unref then open a seperate bug against me about it as it's
(apparently) unrelated to my bete noir.

Comment 24 Michal Jaegermann 2004-12-22 17:30:21 UTC
I tried gnumeric-1.2.13-10 and to cause SEGV the same way as before.
It did not happen.  As a matter of fact various complaints also
vanished. "The ppd file for the CUPS printer ..." is still there but
"CRITICAL" and assorted leak complaints are gone.  I do not know
if they were simply turned off... :-)