Red Hat Bugzilla – Bug 240191
pirut crashes with SIGSEGV
Last modified: 2007-11-30 17:12:04 EST
Description of problem:
If in a package installer (pirut) I open
the "List" tab, it crashes with SIGSEGV.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Unpack the attached repos.tar.gz into /etc/yum.repos.d
2. Run pirut
3. Open the "List" tab
the complete package list
I'll attach the content of my /etc/yum.repos.d
here as well as the stack trace of the python.
This bug beats me for rather long already. I
think it was present in fc6 without updates.
Created attachment 154758 [details]
Created attachment 154759 [details]
Not convinced this is a glib problem. Of course, I'd need a stacktrace that
shows what arguments are passed to g_markup_escape_text() to say for sure.
Created attachment 154764 [details]
another stack trace
> Not convinced this is a glib problem. Of course, I'd need a stacktrace that
> shows what arguments are passed to g_markup_escape_text() to say for sure.
Created attachment 154766 [details]
yet a better stack trace
Looks to me like someone is passing garbage to g_markup_escape_text()
> Looks to me like someone is passing garbage to g_markup_escape_text()
#1 0x00002aaab036b3e0 in pyg_markup_escape_text (
unused=<value optimized out>, args=<value optimized out>,
kwargs=<value optimized out>) at gobjectmodule.c:2524
2524 text_out = g_markup_escape_text(text_in, text_size);
(gdb) p text_in
$2 = 0xd1238dc "�������: �������� ������� ������ �������������� ����� ����"
(gdb) p strlen(text_in)
$3 = 58
(gdb) p text_size
$4 = 58
Looks like a glib2 screwup after all.
Is this really difficult to reproduce
> Is this really difficult to reproduce
> that problem?
Oh, I bet.
You need LC_ALL="ru_RU.UTF-8", and then
you can reproduce.
OK, the problem was caused by the
third-party packages which had the
description not in UTF-8.
So after all, yes, the string passed
to glib, was not good.
Here are the patches needed to fix
Is there anyone interested in applying
Created attachment 160603 [details]
Created attachment 160604 [details]
Created attachment 160606 [details]
Comment on attachment 160603 [details]
The glib patch will not be accepted. It is the callers responsibility to pass
in valid utf-8.
> The glib patch will not be accepted. It is the callers responsibility to pass
> in valid utf-8.
I only added the function that reports
an error if this is not the case, and I
added an assertion to the old function.
Which one of the above two you don't like?
It doesn't do any "recovery" or whatever -
its still a caller's responsibility to pass
the valid strings, but at least there are
no SIGSEGV now. There already was a check
for text!=NULL to avoid a SIGSEGV, so why
just another small check will hurt? Or if
it will - feel free to remove it. I only
need the _errcheck() variant to stay.
The utf validation is what I object to. There are a lot of functions in glib
that expect valid utf8 input, and they are calling each oher. If we start adding
validation to these entry points, we'll end up validating the same text over and
over and over. Therefore, it is the callers responsibility to validate. And if he
does that already, there is no need for the errcheck variant either.
> The utf validation is what I object to. There are a lot of functions in glib
> that expect valid utf8 input, and they are calling each oher.
If they are, for example, private to the
library, then I believe they do not need
the validation. I added it to the public
There are probably the public functions
that do not check the string, but at least
it seems the converting functions (g_convert_xxx)
always return an error, and the markup_escape()
is a kinda conversion function either I think.
And besides, it was not me who invented the
definition of G_MARKUP_ERROR_BAD_UTF8.
It was already there and used in a few places.
What is the difference between the places it
was used before, and where I wanted to add it?
- What exactly public functions do not check the
string UTF8 validity?
- Will the memory corruption happen if they are
passed an invalid-UTF8 but a valid ASCIIZ string,
or maybe they are safe by some other means?
- How the one can distinguish whether the
particular function will memory-corrupt or
report an error? Is there some rule for that,
so that the user can always avoid the mistake?
> If we start adding
> validation to these entry points, we'll end up
> validating the same text over and
> over and over.
This sometimes can be avoided by making an internal
functions like this:
and not call the public ones from the internal
So at least in theory this is solvable (not sure
how much work will it take, of course).
> Therefore, it is the callers responsibility to validate.
But then you can just as well request the caller
to validate the strings before any conversion.
Or to check the file permissions before trying
to open the file. I think this would be an inconvenience.
> And if he
> does that already, there is no need for the errcheck variant either.
I agree there is a redundancy, but IMHO it would
be better to have only an error-checking one. But
since this will break an API, I of course have not
even tried to do this. I think having both is a
So, to sum up. I don't think you can force the
user to always check his input, because:
1. Many functions (esp the converting ones) do
the checking already, and as such, there must
be a real reason doing an exceptions from that,
if you want a consistent API.
2. This is very inconvenient, if not impossible
You can argue for it in upstream bugzilla, if you want.
But it is not going to happen...
Well, I more wanted an explanation rather
than arguing, and hence the concrete
questions I need to have an answers to,
to understand your point.
OK, I'll try the upstream too, however, please
note that I am fixing the bug assigned to you. :)
So if you have your implementation at hands -
I won't argue against it that's for sure
(provided it works fine, of course).
> note that I am fixing the bug assigned to you. :)
> So if you have your implementation at hands -
... or at least re-assign that bug to someone
responsible for the faulty component, whatever
you think it is (python? pirut? pygobject?)
Should be fixed so that we validate before passing things in for pirut > 1.3.10.
I don't have any packages that trip this up, though, so can't actually test it.
Tested - doesn't work.
Crashes with the attached backtrace.
Created attachment 161640 [details]
Can you point me to the repo where this is happening (via mail if you don't want
it in the bug report)? That's likely to be better than me just taking random
stabs at the problem
> Can you point me to the repo where this is happening (via mail if you don't want
> it in the bug report)?
I've mailed you an rpm that you need to
install first. Its 10Mb, so not sure if
you'll get it.
Then set LC_ALL=ru_RU.UTF-8 and run pirut.
Then go to the "List" tab.
If it doesn't crash, then I don't know what
this stuff is all about. :)
Thanks, that helped a lot. Fixed in CVS to not crash. We have to bail on
displaying the string, but that's because there's no defined encoding so we'd
have to do some ugly and not-at-all-reliable heuristics to do the conversion
> We have to bail on
> displaying the string
Do you mean aborting an execution, or something else?
I know it can't be decoded, but the other
way around (IMHO more user-friendly) is to
just substitute the string with something bogus,
like "!!! WRONG ENCODING IN PACKAGE DESCRIPTION !!!",
and keep functioning normally.
Aborting just because some nasty package is
installed, is IMHO not very good. The user
might want to uninstall that package with pirut,
but he can't.
Unless I misunderstand what "bail" means here.
Not aborting, just making the string empty. Otherwise, it pulls in needs for
translations and gets to be a little overkill.
I think some bogus string to alert
the user would be nice. Not a big deal