Bug 240191 - pirut crashes with SIGSEGV
Summary: pirut crashes with SIGSEGV
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: pirut
Version: 7
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jeremy Katz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-15 18:08 UTC by Stas Sergeev
Modified: 2007-11-30 22:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-17 14:27:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/etc/yum.repos.d (1.73 KB, application/octet-stream)
2007-05-15 18:08 UTC, Stas Sergeev
no flags Details
stack trace (18.66 KB, text/plain)
2007-05-15 18:12 UTC, Stas Sergeev
no flags Details
another stack trace (23.16 KB, text/plain)
2007-05-15 19:24 UTC, Stas Sergeev
no flags Details
yet a better stack trace (23.48 KB, text/plain)
2007-05-15 19:33 UTC, Stas Sergeev
no flags Details
glib2 patch (3.27 KB, patch)
2007-08-03 13:00 UTC, Stas Sergeev
no flags Details | Diff
pigobject patch (813 bytes, patch)
2007-08-03 13:00 UTC, Stas Sergeev
no flags Details | Diff
pirut patch (538 bytes, patch)
2007-08-03 13:01 UTC, Stas Sergeev
no flags Details | Diff
backtrace (1.01 KB, text/plain)
2007-08-16 09:31 UTC, Stas Sergeev
no flags Details

Description Stas Sergeev 2007-05-15 18:08:03 UTC
Description of problem:
If in a package installer (pirut) I open
the "List" tab, it crashes with SIGSEGV.

Version-Release number of selected component (if applicable):
python-2.4.4-1.fc6

How reproducible:
Always

Steps to Reproduce:
1. Unpack the attached repos.tar.gz into /etc/yum.repos.d
2. Run pirut
3. Open the "List" tab
  
Actual results:
python sigsegves

Expected results:
the complete package list

Additional info:
I'll attach the content of my /etc/yum.repos.d
here as well as the stack trace of the python.
This bug beats me for rather long already. I
think it was present in fc6 without updates.

Comment 1 Stas Sergeev 2007-05-15 18:08:03 UTC
Created attachment 154758 [details]
/etc/yum.repos.d

Comment 2 Stas Sergeev 2007-05-15 18:12:12 UTC
Created attachment 154759 [details]
stack trace

Comment 3 Matthias Clasen 2007-05-15 18:55:28 UTC
Not convinced this is a glib problem. Of course, I'd need a stacktrace that
shows what arguments are passed to g_markup_escape_text() to say for sure.

Comment 4 Stas Sergeev 2007-05-15 19:24:08 UTC
Created attachment 154764 [details]
another stack trace

> Not convinced this is a glib problem. Of course, I'd need a stacktrace that
> shows what arguments are passed to g_markup_escape_text() to say for sure.
Done.

Comment 5 Stas Sergeev 2007-05-15 19:33:12 UTC
Created attachment 154766 [details]
yet a better stack trace

Comment 6 Matthias Clasen 2007-05-15 19:36:18 UTC
Looks to me like someone is passing garbage to g_markup_escape_text()

Comment 7 Stas Sergeev 2007-08-03 05:47:31 UTC
> Looks to me like someone is passing garbage to g_markup_escape_text()
Maybe not:
---
(gdb) up
#1  0x00002aaab036b3e0 in pyg_markup_escape_text (
    unused=<value optimized out>, args=<value optimized out>, 
    kwargs=<value optimized out>) at gobjectmodule.c:2524
2524        text_out = g_markup_escape_text(text_in, text_size);
(gdb) p text_in
$2 = 0xd1238dc "�������: �������� ������� ������ �������������� ����� ����"
(gdb) p strlen(text_in)
$3 = 58
(gdb) p text_size
$4 = 58
(gdb) 
---
Looks like a glib2 screwup after all.

Is this really difficult to reproduce
that problem?

Comment 8 Stas Sergeev 2007-08-03 05:51:04 UTC
> Is this really difficult to reproduce
> that problem?
Oh, I bet.
You need LC_ALL="ru_RU.UTF-8", and then
you can reproduce.

Comment 9 Stas Sergeev 2007-08-03 12:56:42 UTC
OK, the problem was caused by the
third-party packages which had the
description not in UTF-8.
So after all, yes, the string passed
to glib, was not good.

Here are the patches needed to fix
the problem.
Is there anyone interested in applying
them?

Comment 10 Stas Sergeev 2007-08-03 13:00:08 UTC
Created attachment 160603 [details]
glib2 patch

Comment 11 Stas Sergeev 2007-08-03 13:00:55 UTC
Created attachment 160604 [details]
pigobject patch

Comment 12 Stas Sergeev 2007-08-03 13:01:36 UTC
Created attachment 160606 [details]
pirut patch

Comment 13 Matthias Clasen 2007-08-03 13:16:27 UTC
Comment on attachment 160603 [details]
glib2 patch

The glib patch will not be accepted. It is the callers responsibility to pass
in valid utf-8.

Comment 14 Stas Sergeev 2007-08-03 13:32:52 UTC
> The glib patch will not be accepted. It is the callers responsibility to pass
> in valid utf-8.
I only added the function that reports
an error if this is not the case, and I
added an assertion to the old function.
Which one of the above two you don't like?

It doesn't do any "recovery" or whatever -
its still a caller's responsibility to pass
the valid strings, but at least there are
no SIGSEGV now. There already was a check
for text!=NULL to avoid a SIGSEGV, so why
just another small check will hurt? Or if
it will - feel free to remove it. I only
need the _errcheck() variant to stay.

Comment 15 Matthias Clasen 2007-08-03 13:39:39 UTC
The utf validation is what I object to. There are a lot of functions in glib
that expect valid utf8 input, and they are calling each oher. If we start adding
validation to these entry points, we'll end up validating the same text over and
over and over. Therefore, it is the callers responsibility to validate. And if he
does that already, there is no need for the errcheck variant either.

Comment 16 Stas Sergeev 2007-08-03 14:22:23 UTC
> The utf validation is what I object to. There are a lot of functions in glib
> that expect valid utf8 input, and they are calling each oher.
If they are, for example, private to the
library, then I believe they do not need
the validation. I added it to the public
function.
There are probably the public functions
that do not check the string, but at least
it seems the converting functions (g_convert_xxx)
always return an error, and the markup_escape()
is a kinda conversion function either I think.
And besides, it was not me who invented the
definition of G_MARKUP_ERROR_BAD_UTF8.
It was already there and used in a few places.
What is the difference between the places it
was used before, and where I wanted to add it?

Also:
- What exactly public functions do not check the
string UTF8 validity?
- Will the memory corruption happen if they are
passed an invalid-UTF8 but a valid ASCIIZ string,
or maybe they are safe by some other means?
- How the one can distinguish whether the
particular function will memory-corrupt or
report an error? Is there some rule for that,
so that the user can always avoid the mistake?

> If we start adding
> validation to these entry points, we'll end up
> validating the same text over and
> over and over.
This sometimes can be avoided by making an internal
functions like this:
public_func()
{
  validate_string();
  internal_func_process_string();
}
and not call the public ones from the internal
ones.
So at least in theory this is solvable (not sure
how much work will it take, of course).

> Therefore, it is the callers responsibility to validate.
But then you can just as well request the caller
to validate the strings before any conversion.
Or to check the file permissions before trying
to open the file. I think this would be an inconvenience.

> And if he
> does that already, there is no need for the errcheck variant either.
I agree there is a redundancy, but IMHO it would
be better to have only an error-checking one. But
since this will break an API, I of course have not
even tried to do this. I think having both is a
reasonable compromise.

So, to sum up. I don't think you can force the
user to always check his input, because:
1. Many functions (esp the converting ones) do
the checking already, and as such, there must
be a real reason doing an exceptions from that,
if you want a consistent API.
2. This is very inconvenient, if not impossible
sometimes.

Comment 17 Matthias Clasen 2007-08-03 14:27:16 UTC
You can argue for it in upstream bugzilla, if you want. 
But it is not going to happen...

Comment 18 Stas Sergeev 2007-08-03 14:36:17 UTC
Well, I more wanted an explanation rather
than arguing, and hence the concrete
questions I need to have an answers to,
to understand your point.

OK, I'll try the upstream too, however, please
note that I am fixing the bug assigned to you. :)
So if you have your implementation at hands -
I won't argue against it that's for sure
(provided it works fine, of course).

Comment 19 Stas Sergeev 2007-08-03 15:25:32 UTC
> note that I am fixing the bug assigned to you. :)
> So if you have your implementation at hands -
... or at least re-assign that bug to someone
responsible for the faulty component, whatever
you think it is (python? pirut? pygobject?)

Comment 20 Jeremy Katz 2007-08-10 23:15:37 UTC
Should be fixed so that we validate before passing things in for pirut > 1.3.10.
 I don't have any packages that trip this up, though, so can't actually test it.

Comment 21 Stas Sergeev 2007-08-16 09:30:10 UTC
Tested - doesn't work.
Crashes with the attached backtrace.

Comment 22 Stas Sergeev 2007-08-16 09:31:35 UTC
Created attachment 161640 [details]
backtrace

Comment 23 Jeremy Katz 2007-08-16 15:58:09 UTC
Can you point me to the repo where this is happening (via mail if you don't want
it in the bug report)?  That's likely to be better than me just taking random
stabs at the problem

Comment 24 Stas Sergeev 2007-08-17 07:55:32 UTC
> Can you point me to the repo where this is happening (via mail if you don't want
> it in the bug report)?
I've mailed you an rpm that you need to
install first. Its 10Mb, so not sure if
you'll get it.
Then set LC_ALL=ru_RU.UTF-8 and run pirut.
Then go to the "List" tab.
If it doesn't crash, then I don't know what
this stuff is all about. :)

Comment 25 Jeremy Katz 2007-08-17 14:27:24 UTC
Thanks, that helped a lot.  Fixed in CVS to not crash.  We have to bail on
displaying the string, but that's because there's no defined encoding so we'd
have to do some ugly and not-at-all-reliable heuristics to do the conversion

Comment 26 Stas Sergeev 2007-08-17 15:15:00 UTC
> We have to bail on
> displaying the string
Do you mean aborting an execution, or something else?

I know it can't be decoded, but the other
way around (IMHO more user-friendly) is to
just substitute the string with something bogus,
like "!!! WRONG ENCODING IN PACKAGE DESCRIPTION !!!",
and keep functioning normally.
Aborting just because some nasty package is
installed, is IMHO not very good. The user
might want to uninstall that package with pirut,
but he can't.
Unless I misunderstand what "bail" means here.

Comment 27 Jeremy Katz 2007-08-17 15:31:19 UTC
Not aborting, just making the string empty.  Otherwise, it pulls in needs for
translations and gets to be a little overkill.

Comment 28 Stas Sergeev 2007-08-17 15:43:25 UTC
I think some bogus string to alert
the user would be nice. Not a big deal
though.


Note You need to log in before you can comment on or make changes to this bug.