Bug 603026 - CPU save version is now 9, but the format is _very_ different from non-RHEL5 version 9
Summary: CPU save version is now 9, but the format is _very_ different from non-RHEL5 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.5
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier2 603027 603142
TreeView+ depends on / blocked
 
Reported: 2010-06-11 10:35 UTC by Paolo Bonzini
Modified: 2011-01-13 23:36 UTC (History)
7 users (show)

Fixed In Version: kvm-83-198.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 603027 (view as bug list)
Environment:
Last Closed: 2011-01-13 23:36:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
qemu patch (1.21 KB, patch)
2010-06-11 11:48 UTC, Paolo Bonzini
no flags Details | Diff
qemu patch v2 (4.95 KB, patch)
2010-07-26 18:10 UTC, Paolo Bonzini
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0028 0 normal SHIPPED_LIVE Low: kvm security and bug fix update 2011-01-13 11:03:39 UTC

Description Paolo Bonzini 2010-06-11 10:35:54 UTC
Commit a82c8e4d836121cec49ccd9031438a3110f2e192 bumped the CPU version to 9, however the format is very different from the version 9 of upstream QEMU.  This causes problems in crash, which uses QEMU's savefiles as kvm core dumps.

Until now, the differences did nothing problematic, but for version 9 upstream does this:

                int32_t pending_irq = (int32_t) get_be32 (fp);
                if (pending_irq >= 0)
                        dx86->kvm.int_bitmap[pending_irq / 64] |=

instead of this:

                for (i = 0; i < 4; i++)
                        dx86->kvm.int_bitmap[i] = get_be64 (fp);

(Source code from qemu-load.c in git://git.engineering.redhat.com/users/pbonzini/qemu-reader.git).  In other words, the first 32 bits of the bitmap are treated as an index, causing an out-of-bounds access.

Of course, adding a "<= 255" check is easily done, but it's only a matter of time until RHEL5's version will hit 12 and we'll have serious problems handling both RHEL5 and RHEL6 dumps.

I suggest adding a fake __rhel5 section in the dumps for 5.5.z and 5.6, so that we can look for that in crash.  I'll attach the patch soon.

Comment 1 Paolo Bonzini 2010-06-11 11:48:04 UTC
Created attachment 423249 [details]
qemu patch

Comment 2 Lawrence Lim 2010-07-13 08:40:28 UTC
Hi Paolo,
Could you please suggest how we could verify this patch effectively?

Thanks.

Comment 3 Paolo Bonzini 2010-07-25 22:32:51 UTC
You can try grepping a dump for the string __rhel5.  If you do the dump early enough, possibly while grub is running, the chance of a false positive is ~zero (and it is pretty unlikely even if the system has already finished booting).

Comment 4 Paolo Bonzini 2010-07-26 18:10:07 UTC
Created attachment 434483 [details]
qemu patch v2

Unlike the previous one, this patch doesn't break backwards migration.

Comment 8 Cao, Chen 2010-11-15 02:54:33 UTC
Verified on:

# rpm -q kvm
kvm-83-207.el5

# uname -r
2.6.18-231.el5

# grep __rhel5 /var/crash/2010-11-15-10:36/vmcore 
Binary file vmcore matches


host dmesg:
# dmesg |grep crashkernel
Command line: ro root=LABEL=/ crashkernel=128M@16M
Kernel command line: ro root=LABEL=/ crashkernel=128M@16M

and /proc/iomem
# grep -i crash /proc/iomem 
  01000000-08ffffff : Crash kernel

guest launching cmd:
/usr/libexec/qemu-kvm -name 'vm1' -monitor stdio -drive file='/home/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=virtio,media=disk,cache=none,boot=on,format=qcow2 -net nic,vlan=0,model=virtio,macaddr='9a:30:70:9c:34:b4' -net tap,vlan=0,ifname='virtio_xxx_5900',script='/home/qemu-ifup-switch',downscript='no' -m 4096 -smp 2 -soundhw ac97 -vnc :0  -rtc-td-hack -M rhel5.6.0 -usbdevice tablet

Comment 11 errata-xmlrpc 2011-01-13 23:36:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0028.html


Note You need to log in before you can comment on or make changes to this bug.