Created attachment 326405 [details]
strace -r -f yum -y update kernel
Description of problem:
While trying to upgrade kernel 2.6.28-0.121.rc7.git5.fc11, I found that yum would simply exit after running the transaction test, with no indication of what the problem was.
# rpm -qa kernel
# yum -y update kernel
Loaded plugins: refresh-packagekit, refresh-updatesd
Excluding Packages in global exclude list
Setting up Update Process
There are unfinished transactions remaining. You might consider running yum-complete-transaction first to finish them.
--> Running transaction check
---> Package kernel.x86_64 0:2.6.28-0.121.rc7.git5.fc11 set to be installed
--> Finished Dependency Resolution
--> Running transaction check
---> Package kernel.x86_64 0:2.6.28-0.110.rc7.git3.fc11 set to be erased
--> Finished Dependency Resolution
Package Arch Version Repository Size
kernel x86_64 2.6.28-0.121.rc7.git5.fc11 rawhide 22 M
kernel x86_64 2.6.28-0.110.rc7.git3.fc11 installed 173 M
Install 1 Package(s)
Update 0 Package(s)
Remove 1 Package(s)
Total size: 22 M
============================== Entering rpm code ===============================
Running Transaction Test
Finished Transaction Test
And that's it. It burps out several blank lines and quits. (Note that I had already upgraded kernel-firmware and kernel-headers separately - those upgraded just fine, it's just the 'kernel' RPM exhibiting this).
I've attached the tail end of an 'strace -r -f yum update -y kernel', and from that it's pretty obvious that after it prints 'Finished transaction test', it's ready to pack it in and go home - about all that happens after that are closing files. Oddest thing - what's writing those 3 newlines out, and why?
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. check 'dmesg' or your logs and see if the process is being oom-killed
2. how much ram do you have on this machine.
3. does it happen for anything else or just kernels?
1) Absolutely nothing in dmesg and /var/log/messages. /var/log/yum just has lines listing the RPMs that successfully updated, nothing resembling an error message. Plus, if a child process had OOM'ed or taken a signal, we'd have seen a SIGCHLD in the strace.
2) 2G of RAM, another 2G of swap, 0% used. /proc/meminfo and related show no memory pressure.
3) I've updated 31 other RPMs from Rawhide today, including kernel-headers and kernel-firmware. It's only the kernel RPM that causes issue. I wonder if it's the fact that the kernel is 'installonly' causing a problem.
The code looks like:
_('Finished Transaction Test'))
if len(tserrors) > 0:
errstring = _('Transaction Check Error:\n')
for descr in tserrors:
errstring += ' %s\n' % to_unicode(descr)
raise yum.Errors.YumBaseError, errstring + '\n' + \
_('Transaction Test Succeeded'))
...so I'll put money on the fact that "transaction check error" is being raised, but something is going wrong and the output turns into '\n\n\n' (and I'd put money on a unicode thing upsetting the logging there).
So your best bet to get to the bottom of this is to add print statements to find out what the problem is (just before the raise, line 440 in cli.py).
Yeh, this is where the raise ends up:
...and the logger puts a \n on the end automatically ... and if it doesn't like the unicode => str conversion it just replaces the %s to '' ... which gives you \n\n\n.
I would guess your /boot is getting low on space.
Actually, it wasn't /boot, it looks like / is a tad tight:
# df /
Filesystem 1K-blocks Used Available Use% Mounted on
1566576 1358472 129464 92% /
Which looked like plenty of space because I "knew" that a kernel would use about 90M in /lib/modules, leaving 40M to spare. However....
# du -s /lib/modules/2.6.2*.x86_64
Those kernel modules got awfully piggy. Compiled with -g?
# size 184.108.40.206-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
text data bss dec hex filename
19269 10752 736 30757 7825 220.127.116.11-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
# size 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko
text data bss dec hex filename
22741 68784 864 92389 168e5 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko
So about the same binary image size...
# ls -l 18.104.22.168-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
-rwxr--r-- 1 root root 67552 2008-11-17 04:22 22.214.171.124-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
# ls -l 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko
-rwxr--r-- 1 root root 134216 2008-12-07 22:49 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko
But the .ko's have doubled in size.
Yes, this looks like a kernel RPM packaging issue - but yum should be able to output the message that it needs more room on '/'. It manages to do so for other filesystems. Is the fact it's '/' with no chars before-after causing a Python indigestion?
1. What lang/locale are you running in?
2. The 3 blank lines ARE that warning message.
# echo $LANG
And yes, I realize the blank lines are what little of the message made it out. Just commenting that the *rest* of the text would have been nice to have. ;)
what ver of yum were you using when this happened? I'm trying to replicate it and I just cannot.
Wow, I thought I included it - yum-3.2.20-7.fc11.noarch
Is this python 2.6?
I just replicated it - 'df' reports 120M free on /. I was able to successfully install one 'kernel' update the other day after 'rpm -e' an older kernel so there was enough free space on /. So it's definitely root-caused by not enough space for the /lib/modules/<kernel> directory, but unclear why the error message gets bollixed up. (Incidentally, I was able to verify that out-of-space on some *other* filesystem (/usr/share in this case) properly reports the error, so it's something having to do with '/' in particular.
Is some code someplace treating the filesystem name as '/$something' and going south when $something is ""? That's my first guess...
Okay in python 2.6 only, so far the following is happening:
We're passing an exception instance to yum.misc.to_unicode() which should just return the object b/c the exception instance is not also an instance of basestring.
Then we're logger.critical('\n\n%s', to_unicode(e)). Which is not, apparently able to do the string assertion any longer.
I tested this with a simple test and it looks like the logging module logger.critical() which I eventually traced back to
logging/__init__.py LogRecord.getMessage() method.
msg % self.args
which, apparently on python 2.6 no longer does the string assertion for exception objects.
I scanned through the python 2.6 whats new pages looking for something related to this and I couldn't quite find anything.
I can make a change to yum's to_unicode() which sorta 'handles' this. But I ultimately think it's an issue in logging or maybe in python's str assertions itself.
Welcoming input from any of the other python folks about this.
Okay, Ignacio found the specific problem
bug report to python: http://bugs.python.org/issue1551432
patch to solve this in yum:
it'll be out in the next yum release (3.2.21)
If somebody needs it I have a repeatable case with tons of a disk space free and not a kernel. Packaging for gawk is broken in 3.1.6-3.fc11 (see bug 476240).
'yum -y -d 10 localupdate gawk-3.1.6-3.fc11.x86_64.rpm' prints the usual stuff,
nothing really relevant to the problem on hand, and after that:
Total download size: 2.6 M
and this is it (including all those newlines).
right, it is probably a transaction failure due to fileconflicts or something like that.
It doesn't really matter, the issue is not related to the transaction at all and I'm pretty sure the patch I posted in comment 14 solves it.
have you tried it?
> it is probably a transaction failure due to fileconflicts or something
> like that.
Actually in the case of bug 476240 it is a broken attempt to create a broken cross-device hard link.
> and I'm pretty sure the patch I posted in comment 14 solves it.
> have you tried it?
One apparently needs git to retrieve it; it does not matter as it is small and
I applied it by hand. That indeed printed "Package gawk-3.1.6-3.fc11.x86_64.rpm is not signed" which I forgotten about.
Only with 'yum -y -d 10 --nogpgcheck localupdate gawk-3.1.6-3.fc11.x86_64.rpm' I see now:
Updating : gawk 1/2
Error unpacking rpm package gawk-3.1.6-3.fc11.x86_64
error: unpacking of archive failed on file /bin/pgawk-3.1.6a;49497911: cpio: link
Warning: scriptlet or other non-fatal errors occurred during transaction.
Running "posttrans" handler for "refresh-updatesd" plugin
yum-updatesd not on the bus
Running "posttrans" handler for "refresh-packagekit" plugin
Transaction time: 1.641
=============================== Leaving rpm code ===============================
Errors are actually pretty fatal and I am left with gawk-3.1.6-2.fc11.x86_64. I am not sure if this is much of an improvement. :-)
Since the goal was to output the error report and not have it blankly disappear I think we achieved our goal.
If you have another bug please file it separately.
> If you have another bug please file it separately.
Ah, but that another kind of bugs was already reported in the past (see, for example, bug 216221; although this one was reassigned to rpm) and closed.
In general the issue is that yum reports something done while it was not.
*** Bug 477884 has been marked as a duplicate of this bug. ***
yum-3.2.21-2.fc10 has been submitted as an update for Fedora 10.
yum-3.2.21-2.fc9 has been submitted as an update for Fedora 9.
yum-3.2.21-2.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report.
yum-3.2.21-2.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.