Red Hat Bugzilla – Bug 211254
Last modified: 2007-11-30 17:11:46 EST
Description of problem:
apt-get update or upgrade segfaults
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run apt-get update
List of repositories updating
Rebuilding the source code seems to have fixed the problem.
I've rebuilt the source RPMs for 4 computers. Three of them seem to have no problem
with apt-get the fourth one continues to segfault.
I expect that in the not to distant future, the others will conintue to segfault. I will let you
Please add some more information about the segfault, for example backtraces with
the debuginfo package installed. For self-rebuilt packages you need to use the
debuginfo package that you built, otherwise use that from the repos.
OK.. I have the debug package installed. But when running apt-get update I still
get a segmentation fault but, without additional info. How do I generate or find
the backtrace info?
With debuginfo package installed, you need to run it under gdb:
# gdb apt-get
(gdb) run update
... and when it crashes, get the backtrace:
Then copy-paste the full gdb session here. Oh and btw, I'd prefer getting the
backtrace from FE built package rather than self-rebuilt versions to eliminate
unnecessary variables from the equation.
I would be very happy to comply with the request to get debug info from the
prebuilt FE packages, but the debug packages are not supplied. I will let you
know soon how things go.
Here is the backtrace
Reading Package Lists... 0%
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208555024 (LWP 24180)]
0x0054eb43 in strlen () from /lib/libc.so.6
#0 0x0054eb43 in strlen () from /lib/libc.so.6
#1 0x0095d7d6 in std::string::compare () from /usr/lib/libstdc++.so.6
#2 0x00c4a4e1 in rpmPkgListIndex::FindInCache (this=0x9526f38,
#3 0x00cb4fa0 in CheckValidity (CacheFile=Variable "CacheFile" is not available.
) at pkgcachegen.cc:654
#4 0x00cb55b9 in pkgMakeStatusCache (List=@0xbf8a7884, Progress=@0xbf8a7918,
OutMap=0xbf8a7a18, AllowMem=false) at pkgcachegen.cc:789
#5 0x00c9e1e1 in pkgCacheFile::BuildCaches (this=0xbf8a7a18,
Progress=@0xbf8a7918, WithLock=false) at cachefile.cc:74
#6 0x00c9e304 in pkgCacheFile::Open (this=0xbf8a7a18, Progress=@0xbf8a7918,
WithLock=true) at cachefile.cc:94
#7 0x0806543a in CacheFile::Open (this=0xbf8a7a18, WithLock=true)
#8 0x080566c1 in DoUpdate (CmdL=@0xbf8a829c) at apt-get.cc:1748
#9 0x00c2506b in CommandLine::DispatchArg (this=0xbf8a829c, Map=0xbf8a8210,
NoMatch=true) at contrib/cmndline.cc:340
#10 0x0805dbc5 in main (argc=2, argv=Cannot access memory at address 0x4
) at apt-get.cc:3312
#11 0x004fa4e4 in __libc_start_main () from /lib/libc.so.6
#12 0x0804d221 in _start ()
OK.... Just blind as a bat. I did not see the debug folder in extras. So I reinstalled all
packages from FE including the debug folder. I got exactly the same results.
OK --- I just downloaded and compiled and installed the latest development release
apt-0.5.15lorg3.90 from apt-rpm.org.
On the machines that were giving me trouble, apt-get seemed to work as expected at
least once. I will let you know next weekend if it continues to run trouble free, unless
new packages are provided. If that's happens, I will report on how apt-get works with
the new packages when provided.
I appreciate very much the work that you are doing
I cannot reproduce it, neither on FC5, nor on FC6. I wonder if your metadata
under /var has some trouble/corruption. Could you nuke that? Best done by
uninstalling apt, removing /var/cache/apt and /var/state/apt and reinstalling apt.
I can't reproduce it either, and the traceback suggests like Axel says, that
there's something very strange about the metadata.
What repositories are in use on the systems where apt crashes? (sources.list and
Funny... Axel, I did as suggested, everything seemed to work at least once. As I'm sure
there will ample opportunity to test things out with kde-redhat updating from 3.5.4 to
3.5.5. I will let you know by the end of next week if things continue to function as
expected. The question remains, how do three computers out of five have problematic
Panu - It doesn't crash while downloading package data. It crashes either at the very
beginning, (before reading the data or during the "Reading Package Lists" stage or
"Building Dependency Tree" stage.
Oh.. one other thing maybe I should mention, I am getting a few packages from Livna
as required by kde-redhat. They use repmod exclusively having dropped support for
Eli, can you attach (or make somehow accessible) the exact contents of
/etc/apt/sources.list and sources.list.d directory on a system where apt crashes?
According to the traceback, the crash occurs on an old-style apt-rpm repository,
so that rules out all the repomd repositories such as FC+FE, Livna and kde-redhat.
Created attachment 139053 [details]
Here you go
Panu, don't you also need a tarball of Eli's /var/*/apt contents to reproduce
it? If yes, then Eli please make them available through some URL instead of
attaching them to the bug :)
(but wait to see if Panu really needs them, maybe he doesn't)
I was basically hoping for an easy reproducer with just the info about
repositories :) Alas, no such luck.
So yes, to futher track this I'd need the following bits from a system that crashes:
/var/lib/apt/ (can be at /var/state/apt in some cases) contents in their
entirety, although the problem is most likely in the *.bin files.
The *.bin files are the key to pretty much everything in apt and quite often
just removing them fixes various more-or-less mysterious problems, would seem to
be the case here as well according to comment #12. So, after backing up the
current cached files, do the cleanup steps on each problematic box and lets see
if that helps. Even if that cures the problem, the corrupted cache data is
interesting to me as garbage data shouldn't segfault, just error out cleanly.
Sorry Panu. Those files have been cleaned way back by comment #12. If this should
happen again, I will indeed send the files.
Axel. could you keep this open for a week. Like I said, in about weeks time, if everything
is OK, I will post a comment to indicate all is well or not.
No, I won't close it. I'm lowering the severity as it seems to only affect your
system (and there are many apt users still). If you find that there is no issue
anymore you can close it, too. If the issue doesn't pop up again or only pops on
on one system I suspect that you may have some bad ram somewhere corrupting the
BTW some people sync their contents of /var/*/apt to save some bandwidth, don't
do it otherwise if something eats up your bits on one system you mirror the
issue to the other healthy systems, too.
Created attachment 139074 [details]
/var/lib/apt and /var/cache/apt
It happened again
I'm getting pretty convinced that this is a memory issue and it may not be an apt
problem per se. However, I'm seeing this happen only with apt. And, its only been
happening since the last kernel update. So, I suspect that there is something buggy
about the latest kernel release. But how do I figure out where?
I'm dealing with 4 FC5 boxes. On three of them I have 512 MByte RAM on the 4th, I
have 1 GIG. On the 1 Gig box I have yet to see this problem. On the other three boxes.
This is a continuing and on going nag. Running RPM commands by themselves does
not seem to be a problem. Only, when running apt.
Apt uses quite a bit of memory, especially with repomd repositories, so it's
possible it triggers something more easily than others. Do you remember at which
kernel update this started happening and if you downgrade the kernel back to
some older version does it actually stop this from happening? Also, is there
anything even remotely relevant in /var/log/messages from the time when this
Sorry, I haven't had a chance to look at your cache data yet, been a bit busy
with other things :-/
It'll be awhile before I can get back to this. I'm trying to upgrade my main desktop from
fc5 i386 to fc6 x86_64. Its a real nightmare. I will try to get back to this within a week,
that is if I don't need to reinstall my desktop.
The kernel that is currently installed on the machines that are giving me trouble, are
One correction to make is that one of the computers has less than 1/2Gig but rather
256MBytes. And before I can use apt-get successfully on this machine, I have to restart
the computer, run
rm -f /var/lib/__db*
because the rpmdb gets corrupted.
also, I have to remove the bin files from /var/cache/apt and sometimes the files
This can't be good for this machine.
OK... I'm back. Had a bit of trouble getting the upgraded machine to function properly
but that's another bug.
In the meantime I've had a little time to do some research, and found that the problem
does not effect apt-get get, but rpm and yum as well. A cursive search at this bugzilla
sight on rpm segfault or yum segfault reveals several reports.
An example would be https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213963
I really would like to help getting this problem solved.
Thanks for your patience
There are known bugs in yum on FC6 due to opening/closing rpmdb too often, or
one can argue that yum only exhibits bugs that were previously in the rpmlib
code but undetected, either way you look at it, one would have to uninstall yum,
fix up any broken rpm metadata (e.g. rm -f /var/lib/rpm/__db*; rpm --rebuilddb)
and only use apt for a while.
Note that yum is automatically invoced by applets, cron jobs, daemons and the
like if present, so you really need to uninstall it for the sake of testing. Can
you please do so for say a week and report back with positive or negative
OK... Here's the situation. I don't like yum and if I can avoid using it I do. And I have
been avoiding it for several years. I don't invoke the automatic scripts, because I like to
do kernel updates manually. And, on occasion I like to change runlevels in order to
ensure that I get trouble free updates (another story).
I deal with 5 fc computers. Three of them are fc6 and 2 are fc5. The fc6 computers
function as desktops while the fc5 function as servers.
FC6 computers Memory Composistion
FC5 computers Memory Composition
The only computer that is giving me consistent headaches is the 256 FC5 machine. It is
utilizing serveral services. It provides gateway, dns, mail gateway, ftp, http, and ssh
services. So, it utilizes a considerable amount of memory, but, I have never, until
recently had a problem with apt-get. In order to fix the problems, for the last month or
so, I've had to:
rm -f /var/lib/rpm/__db*
rm -f /var/lib/apt/*.bin
rm -f /var/cache/apt/lists/*.*
Then, reboot the computer run rpm --rebuilddb
Usually worked but a real pain in the butt.
A couple of days ago, that didn't work either. So, I tried yum, because I need to keep
the system patched, especially with security updates since this computer constantly
faces the internet.
First invocation, yum segfaulted. After simply running
rm -f /var/lib/rpm/__db*
And ran yum update again, everything worked the way it was supposed to. However, I
started getting packages from repos I did not want the packages to come from. And
this is why I prefer apt-get to yum. I love the pinning feature available in apt-get. It
saves my brain a great deal of confusion.
Anyway. On the other machines I am still able to run apt-get. When I see the segfault,
rm -f /var/cache/rpm/*.bin fixes the problem.
I have never had a problem like this until recently. Which makes me suspect the
upgrading of critical libraries as the main culprit. This brings me back to my suspicion
that the main instigator is a kernel update. Why, because, if I recall correctly, one of the
updates to the 2.6.18 kernel changed the way the kernel manages memory and if I'm
not mistaken, a segfault is somekind of screwup in memory.
Oh... I've copied this message over to the yum bug.
OK .... I just installed apt 0.5.15lorg3.2-8 for fc5 and fc6 so far so good. On the
machine giving me the most trouble, apt-get worked on the first run. If I continue to run
trouble free, I will let you know. If of course I continue to get segfaults, then I will let you
Thank you so much for your work.
Oh... one other thing. Looks like synaptic needs to be rebuilt on fc6 platforms.
(In reply to comment #28)
> OK .... I just installed apt 0.5.15lorg3.2-8 for fc5 and fc6 so far so good.
There were only ppc related changes in this release, this will not work
better/worse in your context than the previous one. :/
Please do the testing with only rpm involved as mentioned bug #213963.
Sorry, Axel, but this is about as far as I can go with the previous version. I have
provided a traceback, requested information and all the observations that I know how
to provide. There is nothing for me left to test. The previous version under current
circumstances is useless for me.
I have to move on. Hopefully the current version fixes the problem since, no doubt it
was compiled against the most recent libraries. Like I said, I will let you know how
You indicated in this and other reports that this seems to stem from other
non-apt related parts of the system (yum and/or rpm/kernel), which suggested
that apt may never have been at fault. Therefore the isolated testing is
neccessary to see whether apt was at fault ever.
I completely understand lack of time, so if you can't proceed with debugging,
let's close this as WORKSFORME, as there hasn't been anyone (including Panu and
myself) that could reproduce it in the sense of apt being responsible for
corrupting rpmdb. I'll put it into NEEDINFO for now.
No Joy. I attempted a second run on the "more trouble than its worth" running apt-get
and I get segfaults and the appearance of a corrupt rpm database. I guess I will have
to use yum (yech) on that computer.
Please, you, or anyone else, let me know if you have anything else to suggest that I
can do to troubleshoot or help to resolve this issue.
I already suggested a way to help in bug #213963 comment #4.
Should the rpm-stress test fail, then the bug is in rpm/kernel. If it succeeds,
then one needs to check one depsolver at a time (e.g. only apt-get installed or
only yum installed) by a similar stress test.
Also did you check your memory hardware?
What I mean to suggest, is that there may be a relationship between the new memory
management scheme in the newer kernels and the problems that I have experienced.
They appear to be significantly more severe when using apt-get.
I did not test memory, because I find it improbable that I am experiencing a hardware
difficulty when the problem appears on five seperate computers. All of which
experience the same difficulty to varying degrees depending on the amount of memory
and the load on memory. The most severe is a gateway server with 256 MBytes RAM
I have already sent all the relevant information, and they are attached to this bug
report. I have yet to hear about the results from Panu.
I do not believe that the RPM database is corrupted by apt-get but it appears that way
to apt-get. The reason that I say this, is that if I experience the rare segfault with yum
then removing the lock files (__db*) as suggested in the link I provided about the
problem in yum and then run yum update everything works OK. Only, I'm getting
packages from repos that I don't want to get them from. This is the reason that I prefer
I have tried to run apt-get with only headers coming from Freshrpms (core, updates,
extras, freshrpms). Same results. If I use the repmod configuration linking to Fedora
itself, then I may as well use yum, because then all pinning goes out the window,
because, as far as I know, pinning is not supported with repmod data.
Its not that I don't have the time. to continue. I'm terrirfied that I may actually do
irreperable damage to the RPM database if I continue to use apt-get on that machine.
I'm willing to continue to troubleshoot, but, lets not go around in circles. Making me
write the same detailed report over and over again. I do not have the time for that :).
Eli, it is important to remove yum, if you want to do any testing with rpm
and/or apt-get. yum is called by applets, daemons, cron jobs and who knows what
else. So you may think that you're not using it, but in fact you do. That's why
the instructions on bug #213963 comment #4 asked you to remove even both yum and
And grepping this bug report for yum shows that you've been using it to
cross-check the results/failures you have been encountering all along, so the
results you quote are a mixed use of rpm/apt/yum and we can't put a finger on
one of them. In order to see which component is at fault, rpm/kernel, apt or
yum, you need to isolate the problem.
You suspect kernel/rpm, then please follow the instructions and start with a
good rpmdb and w/o any apt/yum/etc. tools around. If you manage to break the
rpmdb, then you'll have proved that it's rpm/kernel that is the rogue character.
Otherwise you would have to add apt to it and repeat. If it break now it's apt.
And if it doesn't it was yum all along.
Please explain to me how in the world will I be able to maintain the computer without
yum or apt-get?
How in the world am I to determine if the rpmdb is good or not?
I have used apt-get exclusively for several years. I have never had yum-updateonboot
and yum does not exist in any of the cron jobs. On the most troublesome computer. No
other process, that I'm aware of does an automatic update. I don't like and don't use
gnome except for a few applications. Primarily synaptic.
Which daemons call yum? Maybe I can configure them out of running automatically?
Eli, I didn't imply staying w/o yum/apt for the rest of your life :)
The rpmdb stress tests wouldn't take longer than 5 minutes each in any testing
of yours, since the bug seems to hit you so often with regular updates.
Anyway, we're not really pushing this any further, maybe Panu will have
something to say when he looks at the apt cache, or maybe the bug will vanish
once the pure yum bugs elsewhere in this bugzilla get fixed. If a yum/kernel
update makes your problem vanish, please note this in this bug.
For reference here are some bugs in rpm/yum of which this may be a duplicate:
The long and the short of this is that this is not an easy "ahhah, there's the
NULL pointer dereference" type of bug, don't expect it to be fixed "just like
that". Something in causing corruption in apt main datastructure (which is a
memorymapped cachefile) and whether it's the combination of 2.6.18 kernel + low
memory + apt-rpm's mmap() usage patters or something else remains to be seen.
I'm reinstalling my 32bit testbox now and try to see if I can (eventually)
reproduce it by limiting available memory.
Thank you Panu. I'm looking forward to hearing your results.
*** Bug 214846 has been marked as a duplicate of this bug. ***
*** Bug 217707 has been marked as a duplicate of this bug. ***
Since this is now the central bug for tracking this issue, here are my findings
I managed to reproduce the second backtrace here. The steps:
- install fresh FC6-i386, pretty much default installation
- boot with mem=256M, disable swap
- # apt-get update
- # apt-get dist-upgrade
- the dist-upgrade died in middle of transaction after first package upgrade
- consecutive apt-get dist-upgrade runs crashes in FindInCache
After rebuilding apt cache it's not segfaulting anymore (and can't reproduce
that at will, so it doesn't happen *always*) but dist-upgrade with over hundred
packages keeps exiting "normally" after just one package upgrade (rpmlib calls
exit(0) at some signal apparently, I'll need to talk to JBJ about that). After
re-enabling swap dist-upgrade appears to continue normally now.
So, it would appear that this is at least related to systems being tight on
memory. Why this has only appeared now ... is it a matter of repositories
getting bigger, kernel changes or what remains to be seen. I should have time to
look properly at this today with wife and the kid out for the evening :)
Now, couple of things that *might* help, and on which I'd like to hear test
results (after clearing the various potentially corrupted caches):
1) Try adding (temporarily) more swap to the system, for example just double
what you have now. Swapfile will do just fine as it's intended for just a
2) Try setting 'RPM::PM "external";' in /etc/apt/apt.conf. That causes apt to
use external rpm process to run the transactions which has the side-effect of
essentially splitting the memory usage between two processes, making kernel's
OOM killer less trigger happy to terminate the upgrade process.
1) is the test I'm more interested in.
(duh, previous post while logged in to "wrong" account, sorry about that)
One thing I forgot to mention: try to keep an eye on how apt installs/upgrades
finish - you should always see "Done." at the end of "Commiting changes..."
output, if you don't, then it has died abnormally in middle of transaction
(because rpmlib has called it quits without giving a chance for apt to do
anything about it). That abnormal exit is at least one possible cause for this
Thanks Panu for what sounds like very reasonable suggestions and tests. I will be
trying these things first thing in the morning. Its been a very long week and I need
some food and sleep. I will let you know.
I just ran test 1) as per comment 44. Apt-get worked and exited normally as done. I
added the temporary swap file (512MBytes) to fstab for the time being. This was a first
One more bit of behavior that I have noticed; after cleaning out the caches and lock
files, then rebuilding the rpm database, I usually get things to work for one run. After
that run, if I immediately run apt-get update / upgrade no segfaults. Mind you, there is
nothing to upgrade. However when the next set of packages are ready to be
upgraded, then apt-get will segfault.
As per test 1). I had the opportunity last night to make the second consecutive run with
apt-get. The problem continues. So providing more swap memory didn't help. I will be
testing 2) next.
Test 2) better. I first had to
rm -f /var/lib/rpm/__db*
rm -f /var/lib/apt/listsl/*.*
rm -f /var/cache/apt/*.bin
I was able to run apt-get update / update without a segfault and without having to
reboot and run rpm --rebuilddb
Mind you, there were no packages to upgrade. So, the test is incomplete. I will let you
know how things go once there are new packages available.
As per test 2), The problem persists.
Ok, pretty much expected, I have been able to reproduce the problem with gobs of
memory available so it apparently wasn't related to that after all. I've gotten
a bit futher in my investigations now, it IS related to rpmdb, but just exactly
how is a bit of a mystery. The crash occurs because something causes a pointer
in apt's cache to what should be a string containing rpm database path to be
NULL, but the rpm database itself seems to be intact. That's a side-effect of
*something* - what exactly I dunno yet.
The nasty thing here is that the segmentation fault happens on the run *after*
the damage has been done already, so debugging it is somewhat like post-mortem
Well well well, this also seems to be happening on Debian apt:
All of those are reasonably recent and crash occurs in the very same place (if
you ignore the used repository type) - something has corrupted (one of) the
index file names in cache. Could well be a long standing bug in apt cache
handling, only triggered by some of the changes in latest kernels. One
possibility could be some of the address-space randomization things, just a wild
Looking forward to hearing that you've found and fixed the problem. As always, I
remain available for testing purposes.
*** Bug 219134 has been marked as a duplicate of this bug. ***
Dunno, but this sounds eeriely familiar:
- Linus' test program on FC6 2.6.18-1.2849.fc6 kernel behaves the way he expects
2.6.19 to work
- For 2.6.18 you need to be unlucky and under memory pressure
- "Some data on mmaped file appears zeroed" is exactly the kind of corruption
that triggers this crash
I would really appreciate if the people who are hitting this on any sort of
regularity could try downgrading their kernel to something older (for example
2.6.17-1.2187_FC5 with Linus' test doesn't exhibit zeroes in the middle) for a
while and see if you can still reproduce the crash eventually.
Meanwhile I'll have a look at apt's mmap code and see if there's anything
resembling the "trigger pattern."
Sorry Panu, downgrading the kernel is the one thing that I cannot comply with :( The
computer that's giving me the most problems is a server facing the internet and I
cannot, in good faith to the company that I'm working for, compromise kernel security
Sure, I'm not expecting anybody to mess around with production enviroment to get
this sorted out. If others can test older kernels that would be much appreciated.
I am going to test this issue with a older kernel.
I am using FC6 and going to use the 2.6.17-1.2187_FC5 kernel as you have mentioned.
I hope to come up with some results within few days.
I've been reading through the long, long thread on linux-kernel mailinglist
about the mmap file corruption issue I mentioned in comment #55, and oh boy is
it hairy. Nobody really knows whether it's really just an application bug, only
triggered by recent kernels, a kernel bug triggered by some rare application
usage patterns or combination of both.
The short summary however is that there are basically two applications people
have seen corruption with: (Debian) apt and a bittorrent client. The mmap code
is identical in Debian apt and apt-rpm (it's been unchanged for years AFAIK), so
that kind of confirms that this has indeed been triggered by something in recent
kernels like Eli suspected early on.
Created attachment 144319 [details]
Band-aid patch for the cache corruption
While looking for the real cause and solution, here's a band-aid patch to help
the situation. The patch does NOT fix the real issue, it only detects a
specific symptom and forces a cache rebuild when corruption is detected and
issues warnings. That effectively cures the segfault unless you're very
The corruption seems to always happen in the area regarding rpmdb itself, which
is special in the sense that it's always the last one of all "repositories" to
be processed. That's another hint towards some of the findings/speculations on
BTW if somebody can capture a full strace of an 'apt-get update' run where the
segfault *initially* happens (the crashes afterwards aren't that interesting)
that might have some interesting data in it.
Do you want this to be added to the package?
Perhaps the mmap issue is also present in rpm itself? That could explain the
yum/rpm bug in fc6.
Might as well add it to the package, besides avoiding crashes in rpm-related
code (which is always a bit nasty) it should help collect people seeing the
problem to this bug :)
Berkeley DB does use mmap so I suppose it's at least possible the same thing
affects rpm itself as well.
Today I used apt-get and synaptic on my kernel "2.6.18-1.2798.fc6" and
astonishingly they did not crashed.
Looks like the extras and updates repo files of Fedora are good (ok !!!).
Previously, apt-get update crashed while working on extras / updates repo.
So, now I am going to wait for new crash happening on "2.6.18-1.2798.fc6" and
then will test with the "2.6.17-1.2187_FC6". That kernel is ready now.
So far so good. The bandaid patch allowed me to get at least one run with apt-get. I will
let you know if things don't work out.
Do note that with the bandaid patch, you'll get loud warnings if the corruption
triggers, it just doesn't (or shouldn't ;) crash anymore because of it. So
people who used to see the crash should see "Cache corruption detected, band-aid
applied" now just as often as they did see the crashes.
Yet another thing people can try: some folks on lkml reported that mounting the
filesystem in question (in apt's case wherever /var is located) with
data=writeback option (assuming ext3 filesystem is used) seems to cure the
corruption issue. If people can try that and see if they still get crashes (or
with the bandaid patch, warnings about corruption) or not, that'd be an useful
datapoint as well. Check 'man mount' for what the option does in detail and if
on production environment, whether the implications matters to you or not.
Oh and remember, a single successful run (meaning no crashes and no warnings)
doesn't mean anything at all, this doesn't trigger anywhere near 100% reliably
so it's going to take quite a bit of time to be convinced it (be it the mount
option or whatever) made a real difference.
Ok, baind aid patch helped to have apt-rpm running on my AMD Duron test box but
maybe this bug is related to a VM kernel bug (i don't have the knowledge to
Yup, I'm fairly convinced by now it's that kernel VM bug what's been hitting
apt(-rpm). Now we just need to verify the above kernel patch cures the crashes
(or with the bandaid patch to apt-rpm, the warnings).
If somebody can test it, that'd be great :)
Just FYI, there's a kernel update coming out "next week or so" including a fix
for the mmap file corruption issue we have here, so this should get resolved
I haven't been able to reproduce this since updating to the latest kernel
(2.6.19-1.2895.fc6). Dunno if that's available for FC5 though.
Mind you, it's possible you'll see the warning *once* after rebooting to the
updated kernel: if the previous run on old kernel has corrupted the cache it'll
hit you the next time you run apt, fixed kernel or no. I'd say it's best to
force the cache rebuild ('rm -f /var/cache/apt/*.bin') after booting to the new
kernel just in case.
From my POV I consider this case closed. Axel, I suggest you leave the bandaid
patch in place for FC5 and 6 as there could be lots and lots of people running
those with older kernels, for rawhide it can go at this point I think.
I can pretty much confirm that the kernel fixed the problem on my fc6 machines.
However, as Panu pointed out, there is yet to be a kernel update for fc5. And one in
particular is giving me no end of headaches.
Anyone have any idea when there will be a kernel update for fc5 incorporating the fix?
Dave, any idea when FC5 will get an updated kernel fixing the mmap corruption
thingy (which this bug is all about)?
> From my POV I consider this case closed. Axel, I suggest you leave the bandaid
> patch in place for FC5 and 6 as there could be lots and lots of people running
> those with older kernels, for rawhide it can go at this point I think.
I'll keep the patch, wouldn't it even be nice to keep it upstream? It's a
failsafe path that is usually not taken unless something skrews up and such a
net is nice :)
Now, how do I close this bug? It wasn't FIXED in apt, but it's also not NOTABUG.
Since it was was fixed elsewhere, it's also not WONTFIX or CANTFIX. Technically
I'd move it to the kernel and close it there, but I don't want the kernel guys
to be confused.
I'll try fixed in CURRENTRELEASE, since there is the bandaid fix for FC5, too.
Axel. The "not a bug" shouldn't be closed quite yet. Maybe it should be transferred over
to the kernel boys since we still do not have a fix for fc5. Which is why I opened the
bug in the first place.
Doesn't the bandaid patch fix any issues with FC5? Agreed, it is not fixing the
cause, but it is a workaround fixing the outcome, e.g. the bug is dealt with.
As Panu wrote in an earlier post, it depends on how memory stressed the system is.
And I can confirm that this indeed is the case. One of my fc5 boxes sometimes requires
rm -f /var/cache/apt/*.bin
rm -f /var/lib/apt/lists/*.*
rm -f /var/lib/apt/lists/lock
rm -f /var/lib/rpm/__db*
before apt-get will complete its cycle successfully. The thing about the band aid, is
eventually, apt-get will work on the box giving me my biggest headache without me
having to reboot the system (most of the time).
In other words, you still get the bug on your FC5 system even though the bandaid
is supposed to workaround/fix that on the fly? Perhaps the bandaid patch does
not always detect the corruption. Panu, can really something slip past the
If you have installed the latest apt (that contains the bandaid) and the system
still gets chewed (which comment #75 suggests) please reopen the bug.
Whether the bandaid patch reliably detects and corrects the problem is
irrelevant (it's called bandaid for a reason :) There's a real fix to the
problem, getting an updated kernel to the users is the only thing that matters
anymore. That's what I meant with the "from my POV this case is closed" comment,
no amount of bandaid in apt is going to make it reliable if the kernel can't be
trusted to keep our data intact.
Eli, either the bandaid simply isn't working for you or there's a
misunderstanding here: you only need to do the rm -f stuff if you get segfaults
(which means the bandaid didn't help), otherwise the warning is just that: a
warning about this issue being present on the system.
Hi guys, I made a request at
to find out when a new kernel release will be available for fc5.
*** This bug has been marked as a duplicate of 214495 ***
OK... I just noticed that there are 2.6.19 kernels in fc5 update testing. I just installed it
in one of my fc5 boxes and will it booted OK. I will be seeing how it goes for a couple of
days before I try to get it installed in the pain in the ass box unless of course new
kernels are made generally available.
For those of you wanting to intall it remember, the kernel is a "testing" kernel so
rpm -ivh is in order just in case you need to fall back to the older kernel.
Even better, installed new, generally available 2.6.19 kernel for fc5. Things seem to be
working. I'd give it a couple of more real updates, and if there are no more problems
then I think that we can call this genuinely done.
(In reply to comment #36)
> Eli, it is important to remove yum, if you want to do any testing with rpm
> and/or apt-get. yum is called by applets, daemons, cron jobs and who knows what
> else. So you may think that you're not using it, but in fact you do. That's why
> the instructions on bug #213963 comment #4 asked you to remove even both yum and
If I try to remove yum using synaptic, it says that the following are dependent
on yum and need to be removed also:
Since I want to keep synaptic, how should I go about this? Remove synaptic with
apt CLI, then add it back later.
I'm switching to synaptic because yum has never worked properly in FC5, and I
have had good experiences with synaptic in the past.