Description of problem: # yum -y upgrade yum Loaded plugins: downloadonly CoRA | 2.1 kB 00:00 addons | 951 B 00:00 adobe-linux | 951 B 00:00 base | 2.1 kB 00:00 epel | 3.7 kB 00:00 extras | 2.1 kB 00:00 rpmforge | 1.1 kB 00:00 rpmforge: [### ] 471/10722Segmentation fault # yum -y upgrade yum -d9 Loading "downloadonly" plugin Not loading "fastestmirror" plugin, as it is disabled Running "config" handler for "downloadonly" plugin Config time: 0.164 Yum Version: 3.2.22 COMMAND: yum -y upgrade yum -d9 Installroot: / Ext Commands: yum Setting up Package Sacks Running "postreposetup" handler for "downloadonly" plugin Segmentation fault Version-Release number of selected component (if applicable): yum-3.2.22-33.el5.centos How reproducible: Every time
I can confirm this happening on over 25 of my RHEL5.6 boxes. And not just with yum but with yum-updatesd and rhn_check: Jul 26 14:13:56 raine kernel: yum-updatesd-he[18464]: segfault at 0000000000000000 rip 0000003360479b80 rsp 00007fff5f282268 error 4 Jul 26 15:13:55 raine kernel: yum-updatesd-he[25007]: segfault at 0000000000000000 rip 0000003360479b80 rsp 00007fff0e44d6b8 error 4 Jul 26 15:17:05 raine kernel: rhn_check[25672]: segfault at 0000000000000000 rip 0000003360479b80 rsp 00007fff5401a2d8 error 4 Jul 26 16:13:55 raine kernel: yum-updatesd-he[31721]: segfault at 0000000000000000 rip 0000003360479b80 rsp 00007fffdc7e8148 error 4 Jul 26 16:13:55 raine kernel: yum-updatesd-he[31721]: segfault at 0000000000000000 rip 0000003360479b80 rsp 00007fffdc7e8148 error 4
(In reply to comment #1) Apologies, I meant to add: All of these servers also have the rpmforge repo enabled.
(gdb) bt #0 0x000000343c079b80 in strlen () from /lib64/libc.so.6 #1 0x00002aaab3bfb958 in IA__g_string_chunk_insert_len (chunk=0x1091b60, string=0x0, len=-1) at gstring.c:224 #2 0x00002aaab39a9d2b in update_package_cb (p=0x10d8920, user_data=<value optimized out>) at sqlitecache.c:361 #3 0x00002aaab39a6b33 in primary_parser_package_end (data=0x7fffffffc220, name=0x10b25fb "package") at xml-parser.c:365 #4 primary_sax_end_element (data=0x7fffffffc220, name=0x10b25fb "package") at xml-parser.c:473 #5 0x00002aaab3e8c988 in xmlParseEndTag1 (ctxt=0x1097600, line=1785) at parser.c:7428 #6 0x00002aaab3e9a84a in xmlParseElement__internal_alias (ctxt=0x1097600) at parser.c:8750 #7 0x00002aaab3e9662e in xmlParseContent__internal_alias (ctxt=0x1097600) at parser.c:8564 #8 0x00002aaab3e9a7e2 in xmlParseElement__internal_alias (ctxt=0x1097600) at parser.c:8724 #9 0x00002aaab3e9ac2a in xmlParseDocument__internal_alias (ctxt=0x1097600) at parser.c:9332 #10 0x00002aaab3e9b8ec in xmlSAXUserParseFile__internal_alias (sax=0x2aaab3baca40, user_data=0x7fffffffc220, filename=<value optimized out>) at parser.c:12572 #11 0x00002aaab39a77eb in yum_xml_parse_primary ( filename=0xfffb94 "/var/cache/yum/rpmforge/primary.xml.gz", count_callback=<value optimized out>, package_callback=<value optimized out>, user_data=<value optimized out>, err=<value optimized out>) at xml-parser.c:593 #12 0x00002aaab39a9919 in update_packages (self=<value optimized out>, args=<value optimized out>, update_info=0x7fffffffc330) at sqlitecache.c:412 #13 py_update (self=<value optimized out>, args=<value optimized out>, update_info=0x7fffffffc330) at sqlitecache.c:537 #14 0x00002aaab39a9be5 in py_update_primary (self=0x0, args=0xaf6730) at sqlitecache.c:565 (gdb) up #2 0x00002aaab39a9d2b in update_package_cb (p=0x10d8920, user_data=<value optimized out>) at sqlitecache.c:361 361 g_hash_table_insert (update_info->all_packages, (gdb) list 356 static void 357 update_package_cb (Package *p, gpointer user_data) 358 { 359 UpdateInfo *update_info = (UpdateInfo *) user_data; 360 361 g_hash_table_insert (update_info->all_packages, 362 g_string_chunk_insert (update_info->package_ids_chunk, 363 p->pkgId), 364 GINT_TO_POINTER (1)); 365 (gdb) print update_info $1 = (UpdateInfo *) 0x7fffffffc330 (gdb) print *update_info $2 = {db = 0x1087e90, remove_handle = 0x1090fd0, count_from_md = 10722, packages_seen = 471, add_count = 471, del_count = 0, current_packages = 0x1092840, all_packages = 0x1092800, package_ids_chunk = 0x1091b60, timer = 0x108e280, python_callback = 0xfee960, info_init = 0x2aaab39aa0f0 <package_writer_info_init>, info_clean = 0x2aaab39a94e0 <package_writer_info_clean>, create_tables = 0x2aaab39a8900 <yum_db_create_primary_tables>, write_package = 0x2aaab39a9ff0 <write_package_to_db>, xml_parse = 0x2aaab39a7790 <yum_xml_parse_primary>, user_data = 0xef9120} (gdb) print *p $3 = {pkgKey = 0, pkgId = 0x0, name = 0x12afd90 "mhash-devel", arch = 0x12afd9c "x86_64", version = 0x12afda5 "0.9.9", epoch = 0x12afda3 "0", release = 0x12afdab "1.el5.rf", summary = 0x12afdb8 "Header files and libraries for developing apps which will use mhash", description = 0x12afdfc "The mhash-devel package contains the header files and libraries needed\nto develop programs that use the mhash library.\n\nInstall the mhash-devel package if you want to develop applications that\nwill us"..., url = 0x12afef5 "http://mhash.sourceforge.net/", time_file = 1311155083, time_build = 1311154870, rpm_license = 0x12aff3e "LGPL", rpm_vendor = 0x12aff43 "Dag Apt Repository, http://dag.wieers.com/apt/", rpm_group = 0x12aff72 "Development/Libraries", rpm_buildhost = 0x12aff88 "lisse.hasselt.wieers.com", rpm_sourcerpm = 0x12affa1 "mhash-0.9.9-1.el5.rf.src.rpm", rpm_header_start = 440, rpm_header_end = 4747, rpm_packager = 0x12afed9 "Dag Wieers <dag>", size_package = 19967, size_installed = 36750, size_archive = 38772, location_href = 0x12aff13 "RPMS/mhash-devel-0.9.9-1.el5.rf.x86_64.rpm", location_base = 0x0, checksum_type = 0x12afdb4 "sha", requires = 0x10de260, provides = 0x10e2780, conflicts = 0x0, obsoletes = 0x10e2790, files = 0x0, changelogs = 0x0, chunk = 0x11ca900}
According to investigations on IRC this is almost certainly: commit 6dd9030c162d484f3871a6cee267980b94c9c902 Author: Paul Nasrat <pauln> Date: Tue Nov 27 14:07:31 2007 +0000 Fix segmentation fault experienced with a malformed primary.xml diff --git a/sqlitecache.c b/sqlitecache.c index 749cbc1..bc20b56 100644 --- a/sqlitecache.c +++ b/sqlitecache.c @@ -358,6 +358,11 @@ update_package_cb (Package *p, gpointer user_data) { UpdateInfo *update_info = (UpdateInfo *) user_data; + /* TODO: Wire in logging of skipped packages */ + if (p->pkgId == NULL) { + return; + } + g_hash_table_insert (update_info->all_packages, g_string_chunk_insert (update_info->package_ids_chunk, p->pkgId),
Yeh, that traceback pretty much confirms it.
And from the metadata: <package type="rpm"> <name>mhash-devel</name> <arch>x86_64</arch> <version epoch="0" ver="0.9.9" rel="1.el5.rf"/> <checksum type="sha" pkgid="YES"/>
Are you suggesting the test itself is the problem? Seems more like the test should be extended to catch this case where pkgid=YES but no pkgid is actually set: <checksum type="sha" pkgid="YES"/></checksum> The empty checksum is the problem. p->pkgId is set to 0x0 in this case, reading from the trace in comment 3, so the test for NULL does not catch it. Extending the test in the patch mentioned in comment #4 to if (p->pkgId == NULL || p->pkgId == 0) { return; } seems a possible solution.
The root cause here is createrepo --cachedir. The cache for specific pkgs is damaged and is just an empty file. It appears to be happening on dag's system for one reason or another. He knows how to correct it. Additionally, yum-metadata-parser upstream already has a patch for the segfault and has so for a while. Hope this helps.
I misread comment 4. Thought you indicated that patch as the culprit, but since that patch is not in the code I now understand it is the proposed fix. You can also disregard my confusion about 0x0 which is an address (so indeed NULL not 0) not a value.
Let me clarify that this 'corrupt' metadata is caused by using RHEL5's createrepo using the --cachedir option. With the help of Seth Vidal we found no less than 86 zero-length cache files in the cachedir for various repositories (we have more than 40 large repositories and almost each of them was affected in some way by this during 19/7 and 25/7), even when there has not been any problems reported with the filesystem, the disk-backend or the KVM virtualization. One way to prevent this, is to make createrepo report empty cache-files and not take them into account or replace them instantly and use that. It should not be using empty checksums to generate repository metadata or generate corrupt metadata.
Ok, apparently I am slow to comment as Seth already mentioned it more than 30 minutes :-/
Sorry Dag, but is there a chanse to fix ASAP?
No, we are waiting for our main mirror (HEAnet) to pull the latest changes. HEAnet's rsync however is pretty slow lately and takes 9 hours for an unchanged repository. It takes 16 hours when pushing updates as we recently did. We don't control it :-/
Thanks Dag. I am happy to hear that!
Is disabling rpmforge temporarily an acceptable workaround here?
yes, you can work around it by disabling the repo.
The problem has been fixed since 5 am (CET) this morning, now 18 hours ago. If you still have this problem, you should clean your yum cache. Probably safest to do 'yum clean all'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-1115.html