Description of problem: If in a package installer (pirut) I open the "List" tab, it crashes with SIGSEGV. Version-Release number of selected component (if applicable): python-2.4.4-1.fc6 How reproducible: Always Steps to Reproduce: 1. Unpack the attached repos.tar.gz into /etc/yum.repos.d 2. Run pirut 3. Open the "List" tab Actual results: python sigsegves Expected results: the complete package list Additional info: I'll attach the content of my /etc/yum.repos.d here as well as the stack trace of the python. This bug beats me for rather long already. I think it was present in fc6 without updates.
Created attachment 154758 [details] /etc/yum.repos.d
Created attachment 154759 [details] stack trace
Not convinced this is a glib problem. Of course, I'd need a stacktrace that shows what arguments are passed to g_markup_escape_text() to say for sure.
Created attachment 154764 [details] another stack trace > Not convinced this is a glib problem. Of course, I'd need a stacktrace that > shows what arguments are passed to g_markup_escape_text() to say for sure. Done.
Created attachment 154766 [details] yet a better stack trace
Looks to me like someone is passing garbage to g_markup_escape_text()
> Looks to me like someone is passing garbage to g_markup_escape_text() Maybe not: --- (gdb) up #1 0x00002aaab036b3e0 in pyg_markup_escape_text ( unused=<value optimized out>, args=<value optimized out>, kwargs=<value optimized out>) at gobjectmodule.c:2524 2524 text_out = g_markup_escape_text(text_in, text_size); (gdb) p text_in $2 = 0xd1238dc "�������: �������� ������� ������ �������������� ����� ����" (gdb) p strlen(text_in) $3 = 58 (gdb) p text_size $4 = 58 (gdb) --- Looks like a glib2 screwup after all. Is this really difficult to reproduce that problem?
> Is this really difficult to reproduce > that problem? Oh, I bet. You need LC_ALL="ru_RU.UTF-8", and then you can reproduce.
OK, the problem was caused by the third-party packages which had the description not in UTF-8. So after all, yes, the string passed to glib, was not good. Here are the patches needed to fix the problem. Is there anyone interested in applying them?
Created attachment 160603 [details] glib2 patch
Created attachment 160604 [details] pigobject patch
Created attachment 160606 [details] pirut patch
Comment on attachment 160603 [details] glib2 patch The glib patch will not be accepted. It is the callers responsibility to pass in valid utf-8.
> The glib patch will not be accepted. It is the callers responsibility to pass > in valid utf-8. I only added the function that reports an error if this is not the case, and I added an assertion to the old function. Which one of the above two you don't like? It doesn't do any "recovery" or whatever - its still a caller's responsibility to pass the valid strings, but at least there are no SIGSEGV now. There already was a check for text!=NULL to avoid a SIGSEGV, so why just another small check will hurt? Or if it will - feel free to remove it. I only need the _errcheck() variant to stay.
The utf validation is what I object to. There are a lot of functions in glib that expect valid utf8 input, and they are calling each oher. If we start adding validation to these entry points, we'll end up validating the same text over and over and over. Therefore, it is the callers responsibility to validate. And if he does that already, there is no need for the errcheck variant either.
> The utf validation is what I object to. There are a lot of functions in glib > that expect valid utf8 input, and they are calling each oher. If they are, for example, private to the library, then I believe they do not need the validation. I added it to the public function. There are probably the public functions that do not check the string, but at least it seems the converting functions (g_convert_xxx) always return an error, and the markup_escape() is a kinda conversion function either I think. And besides, it was not me who invented the definition of G_MARKUP_ERROR_BAD_UTF8. It was already there and used in a few places. What is the difference between the places it was used before, and where I wanted to add it? Also: - What exactly public functions do not check the string UTF8 validity? - Will the memory corruption happen if they are passed an invalid-UTF8 but a valid ASCIIZ string, or maybe they are safe by some other means? - How the one can distinguish whether the particular function will memory-corrupt or report an error? Is there some rule for that, so that the user can always avoid the mistake? > If we start adding > validation to these entry points, we'll end up > validating the same text over and > over and over. This sometimes can be avoided by making an internal functions like this: public_func() { validate_string(); internal_func_process_string(); } and not call the public ones from the internal ones. So at least in theory this is solvable (not sure how much work will it take, of course). > Therefore, it is the callers responsibility to validate. But then you can just as well request the caller to validate the strings before any conversion. Or to check the file permissions before trying to open the file. I think this would be an inconvenience. > And if he > does that already, there is no need for the errcheck variant either. I agree there is a redundancy, but IMHO it would be better to have only an error-checking one. But since this will break an API, I of course have not even tried to do this. I think having both is a reasonable compromise. So, to sum up. I don't think you can force the user to always check his input, because: 1. Many functions (esp the converting ones) do the checking already, and as such, there must be a real reason doing an exceptions from that, if you want a consistent API. 2. This is very inconvenient, if not impossible sometimes.
You can argue for it in upstream bugzilla, if you want. But it is not going to happen...
Well, I more wanted an explanation rather than arguing, and hence the concrete questions I need to have an answers to, to understand your point. OK, I'll try the upstream too, however, please note that I am fixing the bug assigned to you. :) So if you have your implementation at hands - I won't argue against it that's for sure (provided it works fine, of course).
> note that I am fixing the bug assigned to you. :) > So if you have your implementation at hands - ... or at least re-assign that bug to someone responsible for the faulty component, whatever you think it is (python? pirut? pygobject?)
Should be fixed so that we validate before passing things in for pirut > 1.3.10. I don't have any packages that trip this up, though, so can't actually test it.
Tested - doesn't work. Crashes with the attached backtrace.
Created attachment 161640 [details] backtrace
Can you point me to the repo where this is happening (via mail if you don't want it in the bug report)? That's likely to be better than me just taking random stabs at the problem
> Can you point me to the repo where this is happening (via mail if you don't want > it in the bug report)? I've mailed you an rpm that you need to install first. Its 10Mb, so not sure if you'll get it. Then set LC_ALL=ru_RU.UTF-8 and run pirut. Then go to the "List" tab. If it doesn't crash, then I don't know what this stuff is all about. :)
Thanks, that helped a lot. Fixed in CVS to not crash. We have to bail on displaying the string, but that's because there's no defined encoding so we'd have to do some ugly and not-at-all-reliable heuristics to do the conversion
> We have to bail on > displaying the string Do you mean aborting an execution, or something else? I know it can't be decoded, but the other way around (IMHO more user-friendly) is to just substitute the string with something bogus, like "!!! WRONG ENCODING IN PACKAGE DESCRIPTION !!!", and keep functioning normally. Aborting just because some nasty package is installed, is IMHO not very good. The user might want to uninstall that package with pirut, but he can't. Unless I misunderstand what "bail" means here.
Not aborting, just making the string empty. Otherwise, it pulls in needs for translations and gets to be a little overkill.
I think some bogus string to alert the user would be nice. Not a big deal though.