Bug 1404673 - [ppc64le]reset vm when do migration, HMP in src host promp "tcmalloc: large alloc 1073872896 bytes..."
Summary: [ppc64le]reset vm when do migration, HMP in src host promp "tcmalloc: large a...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev   
(Show other bugs)
Version: 7.3
Hardware: ppc64le
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: xianwang
URL:
Whiteboard:
Keywords: ZStream
Depends On:
Blocks: 1420456
TreeView+ depends on / blocked
 
Reported: 2016-12-14 12:07 UTC by xianwang
Modified: 2017-08-02 03:17 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-rhev-2.8.0-3.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1420456 (view as bug list)
Environment:
Last Closed: 2017-08-01 23:42:15 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
"dmesg" command in vm (19.00 KB, text/plain)
2016-12-14 12:07 UTC, xianwang
no flags Details
tcmalloc_src (146.78 KB, image/png)
2016-12-14 12:10 UTC, xianwang
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description xianwang 2016-12-14 12:07:03 UTC
Created attachment 1231641 [details]
"dmesg" command in vm

Description of problem:
qemu-img-rhev-2.8.0-0.el7_upstream package is installed in ppc64le(Pegas1.0)host, After executing migration command (qemu)migrate -d tcp:$dst:$port,then reset vm with command (qemu)system_reset,there will prompt message "tcmalloc: large alloc 1073872896 bytes..."  in src HMP 

note:this bug is produced for both two hosts migration and local host migration.

Version-Release number of selected component (if applicable):
host:
distro:Pegas-7.4-20161201.n.0 Server ppc64le
kernel:4.8.0-1.el7.ppc64le
qemu-kvm-rhev-2.8.0-0.el7.lvivier201612021459.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

guest:
distro:Pegas-7.4-20161201.n.0-Server-ppc64le-dvd1.iso
kernel:4.8.0-1.el7.ppc64le

How reproducible:
5/5

Steps to Reproduce:
1.boot a vm in src host with qemu command, the full command is as follows:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -device usb-ehci,id=xhci,bus=pci.0 \
    -drive file=/root/pegas1.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,netdev=idlkwV8e \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4G \
    -smp 4 \
    -cpu host \
    -device usb-kbd \
    -device usb-tablet \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -incoming tcp:0:5801 \
    -msg timestamp=on \
    -rtc base=utc,clock=host,driftfix=slew  \
    -monitor stdio \
    -boot order=cdn,once=c,menu=on,strict=off \
    -enable-kvm
2.boot a vm in dst host,the command is same as src's with appending "-incoming tcp:0:5801"
3.do migration and then reset vm with following command:
(qemu) migrate -d tcp:$dst:$port
(qemu) system_reset

Actual results:
after executing step3 command, there will be message as following promp in src HMP:
(qemu) tcmalloc: large alloc 1073872896 bytes == 0x1001abc0000 @  0x3fffa7765fa0 0x3fffa7795fa4 0x3fffa789ab0c 0x3cc31ef4 0x3cc355e8 0x3cc16ddc 0x3cc14ea4 0x3cc2c810 0x3cdf6614 0x3cbdae94 0x3cbd7e70 0x3cbdc66c 0x3cb6e5c8 0x3cb73f30 0x3cb74420 0x3cc4a914 0x3cc4ca98 0x3ccfb594 0x3cbd6a58 0x3cbbd110 0x3fffa7408728 0x3fffa733d210
and wait a moment, the migration will be completed and vm is right.

attachment:
(1)dmesg.txt is the info after execute "dmesg" command in vm.

Expected results:
there is no "tcmalloc..." info prompt and migration completed and vm is right

Additional info:
(1)This same host with installing "qemu-kvm-rhev-2.6.0-28.el7_3.1.ppc64le" don't produce this bug, no "tcmalloc..." info prompt,the migration completed and vm is right.
(2)Host installed following version don't produce this bug, no "tcmalloc..." info prompt,the migration completed and vm is right.
host:
distro:RHEL-7.3-20161019.0 Server ppc64le
kernel:3.10.0-514.el7.ppc64le
qemu-kvm-rhev-2.6.0-27.el7.ppc64le
guest:
RHEL-Server-7.3-ppc64le-virtio-scsi.qcow2
3.10.0-418.el7.ppc64le

Comment 1 xianwang 2016-12-14 12:10 UTC
Created attachment 1231643 [details]
tcmalloc_src

Comment 3 Laurent Vivier 2016-12-15 09:54:43 UTC
I'm not able to reproduce it.

Could you try latest package qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930 I have provided to Qunfang?

Thanks

Comment 4 xianwang 2016-12-15 11:38:41 UTC
(In reply to Laurent Vivier from comment #3)
> I'm not able to reproduce it.
> 
> Could you try latest package qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930 I
> have provided to Qunfang?
> 
> Thanks

hi,lvivier,
I have retested this case with "qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930.ppc64le", and reproduced this bug 4/4.

the memory info of host is:
[root@ibm-p8-rhevm-05 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           255G        2.3G        233G         30M         19G        251G
Swap:          4.0G          0B        4.0G

Comment 5 Laurent Vivier 2016-12-15 18:32:51 UTC
Apparently this message is triggered by an environment variable:

TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD

"Allocations larger than this value cause a stack trace to be dumped to stderr. The threshold for dumping stack traces is increased by a factor of 1.125 every time we print a message so that the threshold automatically goes up by a factor of ~1000 every 60 messages. This bounds the amount of extra logging generated by this flag. Default value of this flag is very large and therefore you should see no extra logging unless the flag is overridden."

Could you check your host environment?

The numbers after the
"tcmalloc: large alloc 1073872896 bytes == 0x1001abc0000 @" are the backtrace.

Could you give me these numbers for qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930.ppc64le?

Comment 6 xianwang 2016-12-16 01:59:54 UTC
(In reply to Laurent Vivier from comment #5)
> Apparently this message is triggered by an environment variable:
> 
> TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD
> 
> "Allocations larger than this value cause a stack trace to be dumped to
> stderr. The threshold for dumping stack traces is increased by a factor of
> 1.125 every time we print a message so that the threshold automatically goes
> up by a factor of ~1000 every 60 messages. This bounds the amount of extra
> logging generated by this flag. Default value of this flag is very large and
> therefore you should see no extra logging unless the flag is overridden."
> 
> Could you check your host environment?
> 
> The numbers after the
> "tcmalloc: large alloc 1073872896 bytes == 0x1001abc0000 @" are the
> backtrace.
> 
> Could you give me these numbers for
> qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930.ppc64le?

(1)with qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930.ppc64le,host and guest kernel are both:4.8.0-1.el7.ppc64le

(qemu) tcmalloc: large alloc 1073872896 bytes == 0x1000adb0000 @  0x3fff91695fa0 0x3fff916c5fa4 0x3fff917cab0c 0x37641f94 0x37645688 0x37626e7c 0x37624f44 0x3763c8b0 0x378068b4 0x375eaf34 0x375e7f10 0x375ec70c 0x3757e668 0x37583fd0 0x375844c0 0x3765ab14 0x3765cc98 0x3770b814 0x375e6af8 0x375cd1b0 0x3fff91338728 0x3fff9126d210

(2)with qemu-kvm-rhev-2.8.0-0.el7.lvivier201612141930.ppc64le,host and guest kernel are both:4.9.0-1.el7.ppc64le

(qemu) tcmalloc: large alloc 1073872896 bytes == 0x100390b0000 @  0x3fff7ee25fa0 0x3fff7ee55fa4 0x3fff7ef5ab0c 0x570d1f94 0x570d5688 0x570b6e7c 0x570b4f44 0x570cc8b0 0x572968b4 0x5707af34 0x57077f10 0x5707c70c 0x5700e668 0x57013fd0 0x570144c0 0x570eab14 0x570ecc98 0x5719b814 0x57076af8 0x5705d1b0 0x3fff7eac8728 0x3fff7e9fd210

Comment 7 Laurent Vivier 2016-12-16 09:18:37 UTC
Thank you. I'm not able to find any of these addresses in the binary...

but I'm able now to reproduce it. The problem is triggered by the netdev virtio-net-pci device with a tap interface (I was using a bridge, and I have checked we don't have the problem with the spapr-vlan interface):

dhcp6-56 login: QEMU 2.7.93 monitor - type 'help' for more information
(qemu) migrate -d tcp:localhost:4444
(qemu) system_reset 
(qemu) 

SLOF **********************************************************************
QEMU Starting
 Build Date = Aug  5 2016 01:08:50
 FW Version = mockbuild@ release 20160223
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/nvram@71000000
Populating /vdevice/vty@71000001
Populating /pci@800000020000000
                     00 2000 (D) : 1af4 1004    virtio [ scsi ]
Populating /pci@800000020000000/scsi@4
       SCSI: Looking for devices
                     00 0800 (D) : 1af4 1001    virtio [ block ]
                     00 0000 (D) : 1af4 1000    virtio [ net ]
tcmalloc: large alloc 1073872896 bytes == 0x1000cb40000 @  0x3fff87f45fa0 0x3fff87f75fa4 0x3fff8807ab0c 0x28941f94 0x28945688 0x28926e7c 0x28924f44 0x2893c8b0 0x28b068b4 0x288eaf34 0x288e7f10 0x288ec70c 0x2887e668 0x28883fd0 0x288844c0 0x2895ab14 0x2895cc98 0x28a0b814 0x288e6af8 0x288cd1b0 0x3fff87be8728 0x3fff87b1d210
Scanning USB 
Using default console: /vdevice/vty@71000001
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@1 ...   Successfully loaded

Comment 8 Laurent Vivier 2016-12-16 11:24:19 UTC
QEMU bisected to SLOF update:

f77d4ff8506ca4f608052486d87f8a3ed03d5202 is the first bad commit
commit f77d4ff8506ca4f608052486d87f8a3ed03d5202
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date:   Wed Oct 19 10:05:26 2016 +1100

    pseries: Update SLOF firmware image to 20161019

SLOF bisected to:

d78d7322efbe81027dbc2d11635f5d68fb261c29 is the first bad commit
commit d78d7322efbe81027dbc2d11635f5d68fb261c29
Author: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Date:   Thu Mar 10 13:30:56 2016 +0530

    virtio-net: initialize to populate mac address
    
    With commit aa9566d2e(virtio-net: move setup-mac to the open routine)
    local-mac-address property started getting set during open routine. So
    the netboot workflow was addressed. This was required as the device
    needs to be probed before reading, after virtio 1.0 changes.
    
    While boot from the disk and grub is set to get kernel over network, it
    breaks. As grub looks for local-mac-address property first, which is not
    there. Fix this by creating an instance and closing it. setup-mac in the
    open will populate the local-mac-addres property
    
    Reported-by: Michael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
    Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>


Which adds only the virtio-net-pci card initialization in SLOF.

Bisecting QEMU using always the last revisions of SLOF (with '-bios' parameter) gives:

commit 357d1e3bc7d2d80e5271bc4f3ac8537e30dc8046
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Sun Oct 16 12:04:15 2016 +1100

    spapr: Improved placement of PCI host bridges in guest memory map

Comment 9 Laurent Vivier 2016-12-17 11:37:47 UTC
This warning comes from:

vhost_dev_start()
  -> vhost_log_get()
    -> vhost_log_alloc()
           ...
           log->log = g_malloc0(logsize);
           ...

and logsize is given by vhost_get_log_size().


static uint64_t vhost_get_log_size(struct vhost_dev *dev)
{
    uint64_t log_size = 0;
    int i;
    for (i = 0; i < dev->mem->nregions; ++i) {
        struct vhost_memory_region *reg = dev->mem->regions + i;
        uint64_t last = range_get_last(reg->guest_phys_addr,
                                       reg->memory_size);
        log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1);
    }
    for (i = 0; i < dev->nvqs; ++i) {
        struct vhost_virtqueue *vq = dev->vqs + i;
        uint64_t last = vq->used_phys + vq->used_size - 1;
        log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1);
    }
    return log_size;
}

So the logsize depends on the memory regions size.

Before commit 357d1e3b:

         last          (= guest_phys_addr + memory_size - 1)
REGION 0 0x3ffffffff   (= 0x0 + 0x400000000 - 1) ->         log_size = 65536
REGION 1 0x100e007ffff (= 0x100e0040000 + 0x40000 -1) ->  log_size = 4208642

After commit 357d1e3b:

REGION 0 last 0x3ffffffff (= 0 + 0x400000000 - 1) ->                   65536
REGION 1 last 0x2000c001ffff (= 0x2000C0010000 + 0x10000 - 1)  ->  134230017 

the mtree before commit is:

address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-00000003ffffffff (prio 0, RW): ppc_spapr.ram
    0000010080000000-000001008000ffff (prio 0, RW): pci@800000020000000.io
    00000100a0000000-000001011fffffff (prio 0, RW): pci@800000020000000.mmio 
--->   0x100e0040000
    0000010120000000-000001101fffffff (prio 0, RW): alias pci@800000020000

after the commit:

address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-00000003ffffffff (prio 0, RW): ppc_spapr.ram
    0000200000000000-000020000000ffff (prio 0, RW): pci@800000020000000.io 
    0000200080000000-00002000ffffffff (prio 0, RW): pci@800000020000000.mmio
--->  0x2000c001ffff
    0000210000000000-000021ffffffffff (prio 0, RW): pci@800000020000000.mmio64

As the guest_phys_addr is higher for the MMIO address space, it explains why the
g_malloc0(log_size) is bigger and generates warnings.

Michael, is it normal to allocate space to log memory between 0 address and the end of the device MMIO memory (in our case 32 terabytes!!!)?

Comment 10 Michael S. Tsirkin 2017-01-06 19:39:59 UTC
I don't think it's normal. We normally skip mmio because it is
not a ram region:
static void vhost_region_add(MemoryListener *listener,
                             MemoryRegionSection *section)
{   
    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                         memory_listener);
    
    if (!vhost_section(section)) {
        return;
    }
    
...
}

Maybe mmio is marked as ram for some reason?

Comment 11 Laurent Vivier 2017-01-09 14:18:38 UTC
(In reply to Michael S. Tsirkin from comment #10)
> I don't think it's normal. We normally skip mmio because it is
> not a ram region:
> static void vhost_region_add(MemoryListener *listener,
>                              MemoryRegionSection *section)
> {   
>     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
>                                          memory_listener);
>     
>     if (!vhost_section(section)) {
>         return;
>     }
>     
> ...
> }
> 
> Maybe mmio is marked as ram for some reason?

In fact the problem is because of "virtio-net-pci.rom" that is marked as ram, but its guest_phys_addr is 0x2000c0040000 (as it is in the address space of PCI MMIO) and size is 0x40000).

So vhost_get_log_size() computes the log size using:

last = range_get_last(0x2000c0040000, 0x40000);

then last is equal to 0x2000c007ffff, and thus log_size is 0x8003002 and the g_malloc() request 0x8003002 * sizeof(vhost_log_chunk_t) = 1073872896 bytes.

Michael, why do we use the vhost_memory_region guest_phys_addr to compute the logsize? Is it normal?

Comment 12 Michael S. Tsirkin 2017-01-09 21:58:36 UTC
yes gpa is what guests use to address the log so this makes sense.
I can't find virtio-net-pci.rom in source.
is it part of a PCI BAR?

Comment 13 Laurent Vivier 2017-01-10 11:29:44 UTC
(In reply to Michael S. Tsirkin from comment #12)
> yes gpa is what guests use to address the log so this makes sense.
> I can't find virtio-net-pci.rom in source.
> is it part of a PCI BAR?

I've added a trace in vhost_get_log_size() to have the name of memory regions used for the calculation: (dev->mem_sections + i)->mr->name and (dev->mem->regions + i)->guest_phys_addr give the name and the guest_phys_addr value.

But I don't knwo from where it comes:

(qemu) info qom-tree:

/machine (pseries-2.8-machine)
  ...
  /peripheral (container)
  ...
    /idtlLxAk (virtio-net-pci)
      /virtio-net-pci-msix[0] (qemu:memory-region)
      /msix-table[0] (qemu:memory-region)
      /virtio-pci[0] (qemu:memory-region)
      /bus master[0] (qemu:memory-region)
      /virtio-pci-cfg[0] (qemu:memory-region)
      /virtio-pci-device[0] (qemu:memory-region)
      /virtio-backend (virtio-net-device)
      /virtio-pci-notify-pio[0] (qemu:memory-region)
      /virtio-pci[1] (qemu:memory-region)
      /msix-pba[0] (qemu:memory-region)
      /virtio-pci-common[0] (qemu:memory-region)
      /virtio-pci-isr[0] (qemu:memory-region)
      /virtio-bus (virtio-pci-bus)
      /virtio-pci-notify[0] (qemu:memory-region)
      /virtio-net-pci.rom[0] (qemu:memory-region)

(qemu) info pci:

  Bus  0, device   1, function 0:
    Ethernet controller: PCI device 1af4:1000
      IRQ 0.
      BAR0: I/O at 0x0100 [0x011f].
      BAR1: 32 bit memory at 0xc0004000 [0xc0004fff].
      BAR4: 64 bit prefetchable memory at 0x210000010000 [0x210000013fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id "idtlLxAk"

but it doesn't appear in "info mtree", but should be:

memory-region: pci@800000020000000.mmio
  0000000000000000-ffffffffffffffff (prio 0, RW): pci@800000020000000.mmio
...
    0000210000010000-0000210000013fff (prio 1, RW): virtio-pci
      0000210000010000-0000210000010fff (prio 0, RW): virtio-pci-common
      0000210000011000-0000210000011fff (prio 0, RW): virtio-pci-isr
      0000210000012000-0000210000012fff (prio 0, RW): virtio-pci-device
      0000210000013000-0000210000013fff (prio 0, RW): virtio-pci-notify

but it's inside the 32 bit BAR:

address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-00000003ffffffff (prio 0, RW): ppc_spapr.ram
    0000200000000000-000020000000ffff (prio 0, RW): alias pci@800000020000000.io-alias @pci@800000020000000.io 0000000000000000-000000000000ffff
    0000200080000000-00002000ffffffff (prio 0, RW): alias pci@800000020000000.mmio32-alias @pci@800000020000000.mmio 0000000080000000-00000000ffffffff
    0000210000000000-000021ffffffffff (prio 0, RW): alias pci@800000020000000.mmio64-alias @pci@800000020000000.mmio 0000210000000000-000021ffffffffff


Could it be added automatically by "pci_add_option_rom()"?

Comment 14 Laurent Vivier 2017-01-20 10:17:03 UTC
Michael has proposed this to fix the problem and it seems to work:

--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2199,7 +2199,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
         snprintf(name, sizeof(name), "%s.rom", object_get_typename(OBJECT(pdev)));
     }
     pdev->has_rom = true;
-    memory_region_init_ram(&pdev->rom, OBJECT(pdev), name, size, &error_fatal);
+    memory_region_init_rom(&pdev->rom, OBJECT(pdev), name, size, &error_fatal);
     vmstate_register_ram(&pdev->rom, &pdev->qdev);
     ptr = memory_region_get_ram_ptr(&pdev->rom);
     load_image(path, ptr);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index d396b22..cca4838 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -583,7 +583,8 @@ static void vhost_set_memory(MemoryListener *listener,
 
 static bool vhost_section(MemoryRegionSection *section)
 {
-    return memory_region_is_ram(section->mr);
+    return memory_region_is_ram(section->mr) &&
+           !memory_region_is_rom(section->mr);
 }
 
 static void vhost_begin(MemoryListener *listener)

Comment 15 Miroslav Rezanina 2017-02-03 09:09:01 UTC
Fix included in qemu-kvm-rhev-2.8.0-3.el7

Comment 17 Laurent Vivier 2017-02-03 10:33:59 UTC
I remove the rhel-7.3.z? flag as it seems this change is useless for qemu-2.6.0.

Comment 18 Laurent Vivier 2017-02-03 11:39:34 UTC
(In reply to Laurent Vivier from comment #17)
> I remove the rhel-7.3.z? flag as it seems this change is useless for
> qemu-2.6.0.

The virtio-net PCI ROM is enabled by the following SLOF update in QEMU:

commit f77d4ff8506ca4f608052486d87f8a3ed03d5202
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date:   Wed Oct 19 10:05:26 2016 +1100

    pseries: Update SLOF firmware image to 20161019

And in SLOF, this commit is enabling the virtio-net-pci ROM:

commit d78d7322efbe81027dbc2d11635f5d68fb261c29
Author: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Date:   Thu Mar 10 13:30:56 2016 +0530

    virtio-net: initialize to populate mac address
    
    With commit aa9566d2e(virtio-net: move setup-mac to the open routine)
    local-mac-address property started getting set during open routine. So
    the netboot workflow was addressed. This was required as the device
    needs to be probed before reading, after virtio 1.0 changes.
    
    While boot from the disk and grub is set to get kernel over network, it
    breaks. As grub looks for local-mac-address property first, which is not
    there. Fix this by creating an instance and closing it. setup-mac in the
    open will populate the local-mac-addres property
    
    Reported-by: Michael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
    Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

This is why PCI ROM appears in memory regions list in 2.8.0 and not in 2.6.0.
So we don't need the 2.8.0 fix in 2.6.0 while we don't update SLOF to 20161019.

Comment 22 xianwang 2017-02-13 06:24:12 UTC
This bug is verified pass on qemu-kvm-rhev-2.8.0-3.el7.ppc64le.

Reproduced this bug on qemu-kvm-rhev-2.8.0-1.el7.ppc64le with following packages:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.8.0-1.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-558.el7.ppc64le

1) install package "kernel-devel-3.10.0-558.el7.ppc64le.rpm"
2) create a script and start systemtap
[root@ibm-p8-rhevm-13 ~]# vim qemu-watch.stp
probe glib.mem_alloc {
                if (n_bytes > 32000000)
                                        printf ("g_malloc: pid=%d n_bytes=%d\n", pid(), n_bytes);
}
[root@ibm-p8-rhevm-13 ~]# stap -v ./qemu-watch.stp 
Pass 1: ...
Pass 2: ...
Pass 3: ...
Pass 4: ...
Pass 5: starting run.
3) Open a new shell in src host, boot a guest with qemu cli as following:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.3.0 \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -chardev socket,id=devorg.qemu.guest_agent.0,path=/tmp/virtio_port-org.qemu.guest_agent.0-20160516-164929-dHQ00mMM,server,nowait \
    -device virtserialport,chardev=devorg.qemu.guest_agent.0,name=org.qemu.guest_agent.0,id=org.qemu.guest_agent.0,bus=virtio_serial_pci0.0  \
    -device nec-usb-xhci,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -drive file=/root/RHEL.7.3.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05 \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 8G \
    -smp 2 \
    -cpu host \
    -device usb-kbd \
    -device usb-tablet \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -enable-kvm
4) boot a guest with same qemu cli as src host and appending 
"-incoming tcp:0:5801"
5) do migration and then reset vm with following command:
(qemu) migrate -d tcp:10.19.112.39:5801
(qemu) system_reset

Actual result:
migration completed and vm work well, there is "tcmalloc..." lines and "g_malloc: pid=39196 n_bytes=33669136" line in src as following:
(qemu) tcmalloc: large alloc 1073872896 bytes == 0x100407a0000 @  0x3fffa0225fa0 0x3fffa0255fa4 0x3fffa035ab0c 0x24091cb4 0x240953a8 0x24076b5c 0x24074c24 0x2408c5d0 0x242560f4 0x2403ac14 0x24037bf0 0x2403c3ec 0x23fce348 0x23fd3cb0 0x23fd41a0 0x240aa354 0x240ac4d8 0x2415b054 0x240367d8 0x2401ce90 0x3fff9fec8728 0x3fff9fdfd210

...
Pass 5: starting run.
g_malloc: pid=51753 n_bytes=1073840144


Bug verified pass with following packages:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.8.0-3.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-558.el7.ppc64le

test step is same with bug reproduction.

Result:
migration completed and vm work well, there's no "tcmalloc..." lines and no "g_malloc: pid=39196 n_bytes=33669136" line in src host.

So, this bug is fixed.

Comment 24 errata-xmlrpc 2017-08-01 23:42:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 25 errata-xmlrpc 2017-08-02 01:19:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 26 errata-xmlrpc 2017-08-02 02:11:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 27 errata-xmlrpc 2017-08-02 02:52:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 28 errata-xmlrpc 2017-08-02 03:17:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.