Bug 475633

Summary: yum silently bails out trying to update kernel
Product: [Fedora] Fedora Reporter: Valdis Kletnieks <valdis.kletnieks>
Component: yumAssignee: Seth Vidal <skvidal>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: ffesti, james.antill, katzj, michal, pmatilai, tim.lauridsen, yaneti, zaitcev
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-16 16:11:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace -r -f yum -y update kernel none

Description Valdis Kletnieks 2008-12-09 21:07:47 UTC
Created attachment 326405 [details]
strace -r -f yum -y update kernel

Description of problem:
While trying to upgrade kernel 2.6.28-0.121.rc7.git5.fc11, I found that yum would simply exit after running the transaction test, with no indication of what the problem was.

# rpm -qa kernel
kernel-2.6.28-0.114.rc7.git5.fc11.x86_64
kernel-2.6.28-0.110.rc7.git3.fc11.x86_64
kernel-2.6.28-0.113.rc7.git5.fc11.x86_64
#  yum -y update kernel
Loaded plugins: refresh-packagekit, refresh-updatesd
Excluding Packages in global exclude list
Finished
Setting up Update Process
Resolving Dependencies
There are unfinished transactions remaining. You might consider running yum-complete-transaction first to finish them.
--> Running transaction check
---> Package kernel.x86_64 0:2.6.28-0.121.rc7.git5.fc11 set to be installed
--> Finished Dependency Resolution
--> Running transaction check
---> Package kernel.x86_64 0:2.6.28-0.110.rc7.git3.fc11 set to be erased
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package     Arch        Version                           Repository      Size
================================================================================
Installing:
 kernel      x86_64      2.6.28-0.121.rc7.git5.fc11        rawhide         22 M
Removing:
 kernel      x86_64      2.6.28-0.110.rc7.git3.fc11        installed      173 M

Transaction Summary
================================================================================
Install      1 Package(s)         
Update       0 Package(s)         
Remove       1 Package(s)         

Total size: 22 M
Downloading Packages:
============================== Entering rpm code ===============================
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test



#

And that's it.  It burps out several blank lines and quits. (Note that I had already upgraded kernel-firmware and kernel-headers separately - those upgraded just fine, it's just the 'kernel' RPM exhibiting this).

I've attached the tail end of an 'strace -r -f yum update -y kernel', and from that it's pretty obvious that after it prints 'Finished transaction test', it's ready to pack it in and go home - about all that happens after that are closing files.  Oddest thing - what's writing those 3 newlines out, and why?

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 seth vidal 2008-12-09 21:28:27 UTC
So:
1. check 'dmesg' or your logs and see if the process is being oom-killed
2. how much ram do you have on this machine.
3. does it happen for anything else or just kernels?

Comment 2 Valdis Kletnieks 2008-12-09 22:35:48 UTC
1) Absolutely nothing in dmesg and /var/log/messages. /var/log/yum just has lines listing the RPMs that successfully updated, nothing resembling an error message. Plus, if a child process had OOM'ed or taken a signal, we'd have seen a SIGCHLD in the strace.

2) 2G of RAM, another 2G of swap, 0% used. /proc/meminfo and related show no memory pressure.

3) I've updated 31 other RPMs from Rawhide today, including kernel-headers and kernel-firmware.  It's only the kernel RPM that causes issue.  I wonder if it's the fact that the kernel is 'installonly' causing a problem.

Comment 3 James Antill 2008-12-10 00:04:31 UTC
The code looks like:

        self.verbose_logger.log(yum.logginglevels.INFO_2,
            _('Finished Transaction Test'))
        if len(tserrors) > 0:
            errstring = _('Transaction Check Error:\n')
            for descr in tserrors:
                errstring += '  %s\n' % to_unicode(descr)
            
            raise yum.Errors.YumBaseError, errstring + '\n' + \
                 self.errorSummary(errstring)
        self.verbose_logger.log(yum.logginglevels.INFO_2,
             _('Transaction Test Succeeded'))

...so I'll put money on the fact that "transaction check error" is being raised, but something is going wrong and the output turns into '\n\n\n' (and I'd put money on a unicode thing upsetting the logging there).
 So your best bet to get to the bottom of this is to add print statements to find out what the problem is (just before the raise, line 440 in cli.py).

Comment 4 James Antill 2008-12-10 00:06:03 UTC
 Yeh, this is where the raise ends up:

    def exFatal(e):
        logger.critical('\n\n%s', to_unicode(e))

...and the logger puts a \n on the end automatically ... and if it doesn't like the unicode => str conversion it just replaces the %s to '' ... which gives you \n\n\n.

Comment 5 Yanko Kaneti 2008-12-10 08:05:13 UTC
I would guess your /boot is getting low on space.

Comment 6 Valdis Kletnieks 2008-12-11 20:27:22 UTC
Actually, it wasn't /boot, it looks like / is a tad tight:

#  df /
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-root
                       1566576   1358472    129464  92% /

Which looked like plenty of space because I "knew" that a kernel would use about 90M in /lib/modules, leaving 40M to spare.  However....

# du -s /lib/modules/2.6.2*.x86_64
85948   /lib/modules/2.6.27.5-113.fc10.x86_64
85972   /lib/modules/2.6.27.5-117.fc10.x86_64
195384  /lib/modules/2.6.28-0.110.rc7.git3.fc11.x86_64
195384  /lib/modules/2.6.28-0.113.rc7.git5.fc11.x86_64
195788  /lib/modules/2.6.28-0.114.rc7.git5.fc11.x86_64

Those kernel modules got awfully piggy.  Compiled with -g?

# size 2.6.27.5-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko 
   text    data     bss     dec     hex filename
  19269   10752     736   30757    7825 2.6.27.5-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
# size 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko 
   text    data     bss     dec     hex filename
  22741   68784     864   92389   168e5 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko

So about the same binary image size...

# ls -l 2.6.27.5-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
-rwxr--r-- 1 root root 67552 2008-11-17 04:22 2.6.27.5-113.fc10.x86_64/kernel/arch/x86/oprofile/oprofile.ko
# ls -l 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko 
-rwxr--r-- 1 root root 134216 2008-12-07 22:49 2.6.28-0.114.rc7.git5.fc11.x86_64/kernel/arch/x86/oprofile/oprofile.ko

But the .ko's have doubled in size.

Yes, this looks like a kernel RPM packaging issue - but yum should be able to output the message that it needs more room on '/'.  It manages to do so for other filesystems.  Is the fact it's '/' with no chars before-after causing a Python indigestion?

Comment 7 seth vidal 2008-12-11 20:35:57 UTC
1. What lang/locale are you running in?
2. The 3 blank lines ARE that warning message.

Comment 8 Valdis Kletnieks 2008-12-11 20:46:39 UTC
# echo $LANG
en_US.UTF-8

And yes, I realize the blank lines are what little of the message made it out. Just commenting that the *rest* of the text would have been nice to have. ;)

Comment 9 seth vidal 2008-12-15 18:51:45 UTC
what ver of yum were you using when this happened? I'm trying to replicate it and I just cannot.

Comment 10 Valdis Kletnieks 2008-12-15 19:13:55 UTC
Wow, I thought I included it - yum-3.2.20-7.fc11.noarch

Comment 11 seth vidal 2008-12-15 19:18:16 UTC
Is this python 2.6?

Comment 12 Valdis Kletnieks 2008-12-15 19:43:12 UTC
python-2.6-2.fc11.x86_64

I just replicated it - 'df' reports 120M free on /.  I was able to successfully install one 'kernel' update the other day after 'rpm -e' an older kernel so there was enough free space on /.  So it's definitely root-caused by not enough space for the /lib/modules/<kernel> directory, but unclear why the error message gets bollixed up.  (Incidentally, I was able to verify that out-of-space on some *other* filesystem (/usr/share in this case) properly reports the error, so it's something having to do with '/' in particular.

Is some code someplace treating the filesystem name as '/$something' and going south when $something is ""?  That's my first guess...

Comment 13 seth vidal 2008-12-16 06:33:40 UTC
Okay in python 2.6 only, so far the following is happening:

We're passing an exception instance to yum.misc.to_unicode() which should just return the object b/c the exception instance is not also an instance of basestring.

Then we're logger.critical('\n\n%s', to_unicode(e)). Which is not, apparently able to do the string assertion any longer.

I tested this with a simple test and it looks like the logging module logger.critical() which I eventually traced back to

logging/__init__.py LogRecord.getMessage() method.

which does:
 msg % self.args

which, apparently on python 2.6 no longer does the string assertion for exception objects.

I scanned through the python 2.6 whats new pages looking for something related to this and I couldn't quite find anything.

I can make a change to yum's to_unicode() which sorta 'handles' this. But I ultimately think it's an issue in logging or maybe in python's str assertions itself.

Welcoming input from any of the other python folks about this.

Comment 14 seth vidal 2008-12-16 16:11:31 UTC
Okay, Ignacio found the specific problem

bug report to python: http://bugs.python.org/issue1551432


patch to solve this in yum:
http://yum.baseurl.org/gitweb?p=yum.git;a=commitdiff;h=5c4577185666afe7928073924f8c1ebd71de07ac

it'll be out in the next yum release (3.2.21)

Comment 15 Michal Jaegermann 2008-12-17 21:32:51 UTC
If somebody needs it I have a repeatable case with tons of a disk space free and not a kernel. Packaging for gawk is broken in 3.1.6-3.fc11 (see bug 476240).
'yum -y -d 10  localupdate gawk-3.1.6-3.fc11.x86_64.rpm' prints the usual stuff,
nothing really relevant to the problem on hand, and after that:

Total download size: 2.6 M
Downloading Packages:



and this is it (including all those newlines).

Comment 16 seth vidal 2008-12-17 21:42:28 UTC
Michal,
right, it is probably a transaction failure due to fileconflicts or something like that.

It doesn't really matter, the issue is not related to the transaction at all and I'm pretty sure the patch I posted in comment 14 solves it.

have you tried it?

Comment 17 Michal Jaegermann 2008-12-17 22:22:36 UTC
> it is probably a transaction failure due to fileconflicts or something
> like that.

Actually in the case of bug 476240 it is a broken attempt to create a broken cross-device hard link.

> and I'm pretty sure the patch I posted in comment 14 solves it.
> have you tried it?

One apparently needs git to retrieve it; it does not matter as it is small and
I applied it by hand.  That indeed printed "Package gawk-3.1.6-3.fc11.x86_64.rpm is not signed" which I forgotten about.

Only with 'yum -y -d 10  --nogpgcheck localupdate gawk-3.1.6-3.fc11.x86_64.rpm' I see now:
....
Running Transaction
  Updating       : gawk                                                     1/2 
Error unpacking rpm package gawk-3.1.6-3.fc11.x86_64
error: unpacking of archive failed on file /bin/pgawk-3.1.6a;49497911: cpio: link
Warning: scriptlet or other non-fatal errors occurred during transaction.
Running "posttrans" handler for "refresh-updatesd" plugin
yum-updatesd not on the bus
Running "posttrans" handler for "refresh-packagekit" plugin
Transaction time: 1.641
=============================== Leaving rpm code ===============================

Updated:
  gawk.x86_64 0:3.1.6-3.fc11                                                    

Complete!

Errors are actually pretty fatal and I am left with gawk-3.1.6-2.fc11.x86_64. I am not sure if this is much of an improvement. :-)

Comment 18 seth vidal 2008-12-18 02:42:33 UTC
Since the goal was to output the error report and not have it blankly disappear I think we achieved our goal.

If you have another bug please file it separately.

Comment 19 Michal Jaegermann 2008-12-18 04:48:35 UTC
> If you have another bug please file it separately.

Ah, but that another kind of bugs was already reported in the past (see, for example, bug 216221; although this one was reassigned to rpm) and closed.
In general the issue is that yum reports something done while it was not.

Comment 20 seth vidal 2008-12-24 20:38:33 UTC
*** Bug 477884 has been marked as a duplicate of this bug. ***

Comment 21 Fedora Update System 2009-01-12 19:20:34 UTC
yum-3.2.21-2.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/yum-3.2.21-2.fc10

Comment 22 Fedora Update System 2009-01-12 19:27:56 UTC
yum-3.2.21-2.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/yum-3.2.21-2.fc9

Comment 23 Fedora Update System 2009-02-05 02:18:47 UTC
yum-3.2.21-2.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2009-02-05 02:25:34 UTC
yum-3.2.21-2.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.