Bug 591176

Summary: migration fails since virtio-serial-bus is using uninitialized memory
Product: Red Hat Enterprise Linux 6 Reporter: Alon Levy <alevy>
Component: qemu-kvmAssignee: Alon Levy <alevy>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: amit.shah, dblechte, ehabkost, lihuang, llim, syeghiay, virt-maint
Target Milestone: beta   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.62.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-23 10:17:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alon Levy 2010-05-11 15:35:39 UTC
Description of problem:
migration, also probably save to file (migration to file) fails with a virtio-serial device. It fails because the ports_map is filled with garbage since it is allocated too small by a factor of 4 (sizeof(unit32)).

Version-Release number of selected component (if applicable):


How reproducible:
always


Steps to Reproduce:
1. start a vm with -device virtio-serial -device virt-console
2. start a vm with same and -incoming tcp:localhost:9000 (assuming same machine)
3. migrate first vm to second
  
Actual results:
Migration fail with a complaint about virt-console device failing


Expected results:
Migration success


Additional info:
Patch below solves this problem (tested):

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 3284d85..76c0153 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -810,7 +810,8 @@ VirtIODevice *virtio_serial_init(DeviceState *dev, uint32_t max_nr_ports)
     }

     vser->config.max_nr_ports = max_nr_ports;
-    vser->ports_map = qemu_mallocz((max_nr_ports + 31) / 32);
+    vser->ports_map = qemu_mallocz(((max_nr_ports + 31) / 32)
+        * sizeof(vser->ports_map[0]));
     /*
      * Reserve location 0 for a console port for backward compat
      * (old kernel, new qemu)
--
1.7.0.1

Comment 3 RHEL Program Management 2010-05-11 16:44:35 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 7 lihuang 2010-05-25 03:39:39 UTC
Hi Alon
I can not reproduce the failure on qemu-kvm-0.12.1.2-2.52.el6.x86_64.
could please you have a look ?


src cmd line :
root     16082 12108  6 03:11 pts/0    01:24:17 /usr/libexec/qemu-kvm -m 4096 -smp 2 -monitor stdio -boot c -drive file=rhel6-64.qcow2,if=virtio,boot=on,cache=none -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=4,bus=pci.0,addr=0x4 -device virtserialport,bus=virtio-serial0.0,chardev=channel0,name=org.linux-kvm.port.0 -chardev socket,path=channel0,server,nowait,id=channel0 -net nic -net tap -vnc :1 -usbdevice tablet -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8

dst cmd line
root      5898 13365  6 23:19 pts/1    00:00:13 /usr/libexec/qemu-kvm -m 4096 -smp 2 -monitor stdio -boot c -drive file=rhel6-64.qcow2,if=virtio,boot=on,cache=none -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=4,bus=pci.0,addr=0x4 -device virtserialport,bus=virtio-serial0.0,chardev=channel0,name=org.linux-kvm.port.0 -chardev socket,path=channel01,server,nowait,id=channel0 -net nic -net tap -vnc :2 -usbdevice tablet -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -incoming tcp:0:6789

after migration :
from src qemu monitor
(qemu) info migrate
Migration status: completed
(qemu) info status
VM status: paused
(qemu)

from dst qemu monitor
(qemu) info status
VM status: running

Comment 8 Alon Levy 2010-05-25 13:53:04 UTC
ok, I have no idea why I can't reproduce it either right now. I'm sure this is a bug because
 a. I saw it fail in virtio-serial-bus.c on virtio_serial_load when comparing the port_map gotten and the one initialized (the check is that they are the same, the bug was that the migrating machine which created the device allocated too few bytes so some of that memory was uninitialized, i.e. garbage).
 b. The original (before the patch) allocates too little space.

But for some reason I'm not seeing the bug now. Actually I'm seeing values which are weird (port_maps[0] should be 0x1, but it is 2866261505), but it is the same weird at the migrator and the target..

 It could be that migration changed a little. I will try going back a few commits. I am trying to use debug (--enable-debug) since I think I was running a debug version.

I'll update you.

Alon

Comment 9 Amit Shah 2010-05-25 16:09:22 UTC
Please try without the ,max_ports= parameter.

Comment 10 lihuang 2010-05-26 06:10:36 UTC
(In reply to comment #9)
> Please try without the ,max_ports= parameter.    

without max_ports also pass on qemu-kvm-0.12.1.2-2.52.el6.x86_64.

Comment 11 Alon Levy 2010-05-26 13:33:49 UTC
ok, it's very easy to see the erronous memory, since ports_map[0] should be 1 with a single port, and it is actually GGG1, where G stands for "garbage". The problem is that the garbage seems pretty consistent, so migration succeeds.

I'm going to stop trying to recreate the original problem (migration fails) that prompted me to look for this and find this bug.

I mean:
 a) this is clearly a bug (no one is disputing this - it is a clear that allocation is too small)
 b) the only reason it isn't happening now is the behavior of the memory - trying to run many instances of qemu I always get both the same address allocated to ports_map and the same contents at that address. This causes the comparison, which failed for me previously, to succeed, since the memory is actually the same - uninitialized in the same way.
 c) so to summarize the summary, migration doesn't fail now, but this is still a bug. And yes, if you still must have me recreate I will, it's just that it more work and I don't see the point. (i.e. I already tried and failed and am giving up right now unless you don't accept my explanation as enough). (btw, I haven't tested this, but I assume that two *different* machines will have different garbage).

Alon

Comment 14 lihuang 2010-06-23 10:16:53 UTC
(In reply to comment #11)
> ok, it's very easy to see the erronous memory, since ports_map[0] should be 1
> with a single port, and it is actually GGG1, where G stands for "garbage". The
> problem is that the garbage seems pretty consistent, so migration succeeds.
> 
> I'm going to stop trying to recreate the original problem (migration fails)
> that prompted me to look for this and find this bug.
> 
> I mean:
>  a) this is clearly a bug (no one is disputing this - it is a clear that
> allocation is too small)
>  b) the only reason it isn't happening now is the behavior of the memory -
> trying to run many instances of qemu I always get both the same address
> allocated to ports_map and the same contents at that address. This causes the
> comparison, which failed for me previously, to succeed, since the memory is
> actually the same - uninitialized in the same way.
>  c) so to summarize the summary, migration doesn't fail now, but this is still
> a bug. And yes, if you still must have me recreate I will, it's just that it
> more work and I don't see the point. (i.e. I already tried and failed and am
> giving up right now unless you don't accept my explanation as enough). (btw, I
> haven't tested this, but I assume that two *different* machines will have
> different garbage).
> 
> Alon    

Hi Alon
Thanks for the analyse. 
I have reproduced it on -2.49.el6 and verified in -2.62.el6.