Bug 1262670 - [PowerKVM]SIGSEGV when boot up guest with -numa node and set up the cpus in one node to the boundary
[PowerKVM]SIGSEGV when boot up guest with -numa node and set up the cpus in o...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
ppc64le Unspecified
high Severity high
: rc
: ---
Assigned To: Thomas Huth
Virtualization Bugs
: Regression
Depends On:
Blocks: RHEV3.6PPC 1277183 1277184
  Show dependency treegraph
 
Reported: 2015-09-13 23:31 EDT by Shuang Yu
Modified: 2016-02-21 06:15 EST (History)
12 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.3.0-24.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-04 11:57:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Numa boundary screenshot (27.13 KB, image/png)
2015-09-13 23:33 EDT, Shuang Yu
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2546 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2015-12-04 16:11:56 EST

  None (edit)
Description Shuang Yu 2015-09-13 23:31:44 EDT
Description of problem:
Boot up guest with -numa node and set up the cpus in one node to the
boundary,the guest will hit "signal SIGSEGV, Segmentation fault."

Version-Release number of selected component (if applicable):
kernel-3.10.0-313.el7.ppc64le
qemu-kvm-rhev-2.3.0-22.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch
Guest version:
RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso

How reproducible:
1/3

Steps to Reproduce:

1.Check the host cpu info:
# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             2
NUMA node(s):          2
Model:                 8247-21L
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72

2.Boot up the guest with -numa node and set up the cpus in one node to the boundary 80:
(gdb)/usr/libexec/qemu-kvm
(gdb)  r -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 8G -smp 80 -numa node,cpus=0-79

3.

Actual results:
The guest will hit "signal SIGSEGV, Segmentation fault."

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb434eb10 (LWP 73249)]
0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4


Expected results:
The guest should can boot up normally

Additional info:
(gdb) bt full
#0  0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#1  0x00003fffb77cf108 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#2  0x00003fffb77e5c08 in tc_free () from /lib64/libtcmalloc.so.4
No symbol table info available.
#3  0x00000000347fa29c in free_and_trace (mem=<optimized out>) at vl.c:2590
No locals.
#4  0x00003fffb7a6abe8 in g_free () from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00003fffb7a8f874 in g_strfreev () from /lib64/libglib-2.0.so.0
No symbol table info available.
#6  0x00000000348c00e4 in container_get (root=<optimized out>, 
    path=<optimized out>) at qom/container.c:46
        obj = 0x357f0570
        child = <optimized out>
        parts = 0x37417a80
        i = <optimized out>
        __PRETTY_FUNCTION__ = "container_get"
#7  0x000000003464f384 in memory_region_init (mr=0x37866c00, owner=0x0, 
    name=0x0, size=4096) at /usr/src/debug/qemu-2.3.0/memory.c:917
---Type <return> to continue, or q <return> to quit---
No locals.
#8  0x000000003464f9a8 in memory_region_init_io (mr=0x37866c00, 
    owner=<optimized out>, ops=0x34ad84c8 <subpage_ops>, opaque=0x37866c00, 
    name=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:1187
No locals.
#9  0x00000000345f139c in subpage_init (base=1101659111424, 
    as=0x34b60cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/exec.c:2058
        mmio = <optimized out>
#10 register_subpage (d=0x3755c530, section=0x3fffb434d9e8)
    at /usr/src/debug/qemu-2.3.0/exec.c:1004
        subpage = <optimized out>
        base = 1101659111424
        existing = <optimized out>
        subsection = {mr = 0x0, address_space = 0x0, offset_within_region = 0, 
          size = {lo = 4096, hi = 0}, 
          offset_within_address_space = 1101659111424, readonly = false}
        start = <optimized out>
        end = <optimized out>
        __PRETTY_FUNCTION__ = "register_subpage"
#11 0x00000000345f1624 in mem_add (listener=<optimized out>, 
    section=<optimized out>) at /usr/src/debug/qemu-2.3.0/exec.c:1043
---Type <return> to continue, or q <return> to quit---
        left = <optimized out>
        as = <optimized out>
        d = 0x3755c530
        now = {mr = 0x36c969e0, 
          address_space = 0x34b60cf0 <address_space_memory>, 
          offset_within_region = 0, size = {lo = 1, hi = 0}, 
          offset_within_address_space = 1101659111886, readonly = false}
        remain = {mr = <optimized out>, address_space = <optimized out>, 
          offset_within_region = 0, size = {lo = 1, hi = <optimized out>}, 
          offset_within_address_space = <optimized out>, 
          readonly = <optimized out>}
        page_size = {lo = 4096, hi = 0}
#12 0x000000003464e630 in address_space_update_topology_pass (
    as=0x34b60cf0 <address_space_memory>, adding=<optimized out>, 
    new_view=<optimized out>, new_view=<optimized out>, 
    old_view=<optimized out>, old_view=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:776
        _listener = 0x34b60d38 <address_space_memory+72>
        iold = <optimized out>
        inew = <optimized out>
        frold = 0x373a3638
        frnew = 0x373a31b8
#13 0x00000000346510dc in address_space_update_topology (
---Type <return> to continue, or q <return> to quit---
    as=0x34b60cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/memory.c:805
        old_view = 0x357f7aa0
        new_view = 0x357f7dd0
#14 memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.3.0/memory.c:845
        as = 0x34b60cf0 <address_space_memory>
#15 0x000000003485b74c in pci_update_mappings (d=0x37279000)
    at hw/pci/pci.c:1167
        r = 0x37279118
        i = 1
        new_addr = 3221360640
#16 0x000000003485bf90 in pci_default_write_config (d=0x37279000, 
    addr=<optimized out>, val_in=<optimized out>, l=<optimized out>)
    at hw/pci/pci.c:1219
        i = <optimized out>
        was_irq_disabled = 0
        val = <optimized out>
        __PRETTY_FUNCTION__ = "pci_default_write_config"
#17 0x0000000034899c80 in virtio_write_config (pci_dev=0x37279000, 
    address=<optimized out>, val=<optimized out>, len=<optimized out>)
    at hw/virtio/virtio-pci.c:452
        proxy = 0x37279000
---Type <return> to continue, or q <return> to quit---
        vdev = 0x37280f40
#18 0x0000000034863b58 in pci_host_config_write_common (
    pci_dev=<optimized out>, addr=<optimized out>, limit=<optimized out>, 
    val=<optimized out>, len=<optimized out>) at hw/pci/pci_host.c:57
        __PRETTY_FUNCTION__ = "pci_host_config_write_common"
#19 0x00000000346aa460 in finish_write_pci_config (spapr=<optimized out>, 
    buid=<optimized out>, addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2120320504)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_pci.c:186
        pci_dev = <optimized out>
#20 0x00000000346a65f4 in spapr_rtas_call (cpu=<optimized out>, 
    spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, rets=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_rtas.c:613
        call = <error reading variable call (value has been optimized out)>
#21 0x00000000346a23c4 in h_rtas (cpu=0x36430000, spapr=0x35ac0000, 
    opcode=<optimized out>, args=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:579
        rtas_r3 = 2120320472
        token = 8215
        nargs = 5
        nret = <optimized out>
#22 0x00000000346a4368 in spapr_hypercall (cpu=0x36430000, opcode=61440, 
---Type <return> to continue, or q <return> to quit---
    args=0x3fffb3b30030) at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:1009
        fn = <optimized out>
        spapr = <optimized out>
        __func__ = "spapr_hypercall"
#23 0x00000000347ad54c in kvm_arch_handle_exit (cs=0x36430000, 
    run=0x3fffb3b30000) at /usr/src/debug/qemu-2.3.0/target-ppc/kvm.c:1588
        cpu = <optimized out>
        __func__ = "kvm_arch_handle_exit"
        env = <optimized out>
        ret = <optimized out>
#24 0x000000003464bec0 in kvm_cpu_exec (cpu=0x36430000)
    at /usr/src/debug/qemu-2.3.0/kvm-all.c:1908
        run = 0x3fffb3b30000
        ret = <optimized out>
        run_ret = <optimized out>
#25 0x00000000346319b0 in qemu_kvm_cpu_thread_fn (arg=0x36430000)
    at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x36430000
        r = <optimized out>
#26 0x00003fffb7bf8728 in start_thread () from /lib64/power8/libpthread.so.0
No symbol table info available.
#27 0x00003fffb6d47ae0 in clone () from /lib64/power8/libc.so.6
No symbol table info available.
(gdb) 


And the Attachment is the screeshot of guest hit SIGSEGV time
Comment 1 Shuang Yu 2015-09-13 23:33:12 EDT
Created attachment 1073047 [details]
Numa boundary screenshot
Comment 3 Thomas Huth 2015-09-14 10:32:27 EDT
FWIW, I can re-create the crash even with a little shorter command line (to sort out some of the options):

/usr/libexec/qemu-kvm -machine pseries,accel=kvm,usb=off -drive file=/var/lib/libvirt/images/isos/RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -m 8G -smp 80 -numa node,cpus=0-79 -nographic -vga none
Comment 4 Shuang Yu 2015-09-15 06:13:43 EDT
Boot up guest and set up its 'mem' in one numa node to the valid boundary value,hit SIGSEGV problem again.

Version-Release number of selected component (if applicable):
kernel-3.10.0-313.el7.ppc64le
qemu-kvm-rhev-2.3.0-22.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch
Guest version:
RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso

How reproducible:
1/5

Steps to Reproduce:
1.Check the mem info on the host:
# free -m
              total        used        free      shared  buff/cache   available
Mem:          31636         548       23439          27        7648       30236
Swap:         16255           0       16255

2.Boot up guest with one numa node and set the 'mem' to the valid boundary value:
/usr/libexec/qemu-kvm -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 32768 -smp 8 -numa node,mem=32768


Actual results:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb434eb10 (LWP 42365)]
0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
    () from /lib64/libtcmalloc.so.4



Expected results:
The guest should can boot up normally

Additional info:
(gdb) bt full
#0  0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#1  0x00003fffb77cf108 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#2  0x00003fffb77e5c08 in tc_free ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#3  0x000000002855a29c in free_and_trace (
    mem=<optimized out>) at vl.c:2590
No locals.
#4  0x00003fffb7a6abe8 in g_free ()
   from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00003fffb7a8f874 in g_strfreev ()
   from /lib64/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#6  0x00000000286200e4 in container_get (
    root=<optimized out>, path=<optimized out>)
    at qom/container.c:46
        obj = 0x29550570
        child = <optimized out>
        parts = 0x2a647680
        i = <optimized out>
        __PRETTY_FUNCTION__ = "container_get"
#7  0x00000000283af384 in memory_region_init (
    mr=0x2af29000, owner=0x0, name=0x0, size=4096)
    at /usr/src/debug/qemu-2.3.0/memory.c:917
No locals.
#8  0x00000000283af9a8 in memory_region_init_io (
    mr=0x2af29000, owner=<optimized out>, 
    ops=0x288384c8 <subpage_ops>, opaque=0x2af29000, 
    name=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:1187
No locals.
---Type <return> to continue, or q <return> to quit---
#9  0x000000002835139c in subpage_init (base=1101659111424, 
    as=0x288c0cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/exec.c:2058
        mmio = <optimized out>
#10 register_subpage (d=0x2a9b9f60, section=0x3fffb434d9e8)
    at /usr/src/debug/qemu-2.3.0/exec.c:1004
        subpage = <optimized out>
        base = 1101659111424
        existing = <optimized out>
        subsection = {mr = 0x0, address_space = 0x0, 
          offset_within_region = 0, size = {lo = 4096, 
            hi = 0}, 
          offset_within_address_space = 1101659111424, 
          readonly = false}
        start = <optimized out>
        end = <optimized out>
        __PRETTY_FUNCTION__ = "register_subpage"
#11 0x0000000028351624 in mem_add (
    listener=<optimized out>, section=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/qemu-2.3.0/exec.c:1043
        left = <optimized out>
        as = <optimized out>
        d = 0x2a9b9f60
        now = {mr = 0x29628f00, 
          address_space = 0x288c0cf0 <address_space_memory>, offset_within_region = 0, size = {lo = 1, hi = 0}, 
          offset_within_address_space = 1101659111886, 
          readonly = false}
        remain = {mr = <optimized out>, 
          address_space = <optimized out>, 
          offset_within_region = 0, size = {lo = 1, 
            hi = <optimized out>}, 
          offset_within_address_space = <optimized out>, 
          readonly = <optimized out>}
        page_size = {lo = 4096, hi = 0}
#12 0x00000000283ae630 in address_space_update_topology_pass
    (as=0x288c0cf0 <address_space_memory>, adding=true, 
    new_view=<optimized out>, new_view=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    old_view=0x295518c0, old_view=0x295518c0)
    at /usr/src/debug/qemu-2.3.0/memory.c:776
        _listener = 0x288c0d38 <address_space_memory+72>
        iold = <optimized out>
        inew = <optimized out>
        frold = 0x2ab304f0
        frnew = 0x2ab32438
#13 0x00000000283b10dc in address_space_update_topology (
    as=0x288c0cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/memory.c:805
        old_view = 0x295518c0
        new_view = 0x29552d60
#14 memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.3.0/memory.c:845
        as = 0x288c0cf0 <address_space_memory>
#15 0x00000000285bb70c in pci_update_mappings (d=0x2ab09000)
    at hw/pci/pci.c:1159
        r = 0x2ab090f0
        i = 0
---Type <return> to continue, or q <return> to quit---
        new_addr = 18446744073709551615
#16 0x00000000285bbf90 in pci_default_write_config (
    d=0x2ab09000, addr=<optimized out>, 
    val_in=<optimized out>, l=<optimized out>)
    at hw/pci/pci.c:1219
        i = <optimized out>
        was_irq_disabled = 0
        val = <optimized out>
        __PRETTY_FUNCTION__ = "pci_default_write_config"
#17 0x00000000285f9c80 in virtio_write_config (
    pci_dev=0x2ab09000, address=<optimized out>, 
    val=<optimized out>, len=<optimized out>)
    at hw/virtio/virtio-pci.c:452
        proxy = 0x2ab09000
        vdev = 0x2ab10f40
#18 0x00000000285c3b58 in pci_host_config_write_common (
    pci_dev=<optimized out>, addr=<optimized out>, 
    limit=<optimized out>, val=<optimized out>, 
    len=<optimized out>) at hw/pci/pci_host.c:57
---Type <return> to continue, or q <return> to quit---
        __PRETTY_FUNCTION__ = "pci_host_config_write_common"
#19 0x000000002840a460 in finish_write_pci_config (
    spapr=<optimized out>, buid=<optimized out>, 
    addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2120181232)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_pci.c:186
        pci_dev = <optimized out>
#20 0x00000000284065f4 in spapr_rtas_call (
    cpu=<optimized out>, spapr=<optimized out>, 
    token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_rtas.c:613
        call = <error reading variable call (value has been optimized out)>
#21 0x00000000284023c4 in h_rtas (cpu=0x2a0e0000, 
    spapr=0x29820000, opcode=<optimized out>, 
    args=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:579
---Type <return> to continue, or q <return> to quit---
        rtas_r3 = 2120181200
        token = 8215
        nargs = 5
        nret = <optimized out>
#22 0x0000000028404368 in spapr_hypercall (cpu=0x2a0e0000, 
    opcode=61440, args=0x3fffb3b30030)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:1009
        fn = <optimized out>
        spapr = <optimized out>
        __func__ = "spapr_hypercall"
#23 0x000000002850d54c in kvm_arch_handle_exit (
    cs=0x2a0e0000, run=0x3fffb3b30000)
    at /usr/src/debug/qemu-2.3.0/target-ppc/kvm.c:1588
        cpu = <optimized out>
        __func__ = "kvm_arch_handle_exit"
        env = <optimized out>
        ret = <optimized out>
#24 0x00000000283abec0 in kvm_cpu_exec (cpu=0x2a0e0000)
    at /usr/src/debug/qemu-2.3.0/kvm-all.c:1908
---Type <return> to continue, or q <return> to quit---
        run = 0x3fffb3b30000
        ret = <optimized out>
        run_ret = <optimized out>
#25 0x00000000283919b0 in qemu_kvm_cpu_thread_fn (
    arg=0x2a0e0000) at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x2a0e0000
        r = <optimized out>
#26 0x00003fffb7bf8728 in start_thread ()
   from /lib64/power8/libpthread.so.0
No symbol table info available.
#27 0x00003fffb6d47ae0 in clone ()
   from /lib64/power8/libc.so.6
No symbol table info available.
(gdb)
Comment 5 Thomas Huth 2015-09-15 06:27:46 EDT
Since the crash always occurs within the tcmalloc library, I've now build a version of qemu-kvm without this library - and indeed, I did not see a crash with that version yet. So the problem might be related to tcmalloc.
Comment 6 Thomas Huth 2015-09-15 13:09:59 EDT
I think I likely found the problem. Linking against ElectricFence instead of tcmalloc revealed the following:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ffef01aeb20 (LWP 107603)]
0x000000004f39f148 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51
51	  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) bt
#0  0x000000004f39f148 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51
#1  spapr_populate_drconf_memory (fdt=0x3ffddd8f8000, spapr=0x3fff957cfd40) at /home/thuth/devel/qemu/hw/ppc/spapr.c:795
#2  spapr_h_cas_compose_response (spapr=0x3fff957cfd40, addr=508403712, size=32764, cpu_update=<optimized out>, memory_update=<optimized out>)
    at /home/thuth/devel/qemu/hw/ppc/spapr.c:832
#3  0x000000004f3a35b0 in h_client_architecture_support (cpu_=<optimized out>, spapr=0x3fff957cfd40, opcode=<optimized out>, args=0x3ffeef990030)
    at /home/thuth/devel/qemu/hw/ppc/spapr_hcall.c:963
#4  0x000000004f3a43c8 in spapr_hypercall (cpu=0x3ffef10f8c50, opcode=61442, args=0x3ffeef990030) at /home/thuth/devel/qemu/hw/ppc/spapr_hcall.c:1009
#5  0x000000004f4ad60c in kvm_arch_handle_exit (cs=0x3ffef10f8c50, run=0x3ffeef990000) at /home/thuth/devel/qemu/target-ppc/kvm.c:1588
#6  0x000000004f34bee0 in kvm_cpu_exec (cpu=0x3ffef10f8c50) at /home/thuth/devel/qemu/kvm-all.c:1908
#7  0x000000004f3319d0 in qemu_kvm_cpu_thread_fn (arg=0x3ffef10f8c50) at /home/thuth/devel/qemu/cpus.c:944
#8  0x00003fffb7bf8728 in start_thread (arg=0x3ffef01aeb20) at pthread_create.c:310
#9  0x00003fffb6db7ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

And indeed, in spapr_populate_drconf_memory() there is a bug that the "int_buf" is used twice, for setting the "ibm,dynamic-memory" property and the "ibm,associativity-lookup-arrays" property. But the size of the buffer is only calculated based on the size of the first property! So if the second property is bigger than the first one, you get a "nice" buffer overrun which then finally leads to the segmentation fault later. ... I'll write a patch to fix this issue ...
Comment 7 Shuang Yu 2015-09-17 05:54:02 EDT
Retest this issue with "qemu-kvm-rhev-2.3.0-12.el7.ppc64le" 10 times,not hit this bug.
Retest this issue with "qemu-kvm-rhev-2.3.0-13.el7.ppc64le" hit this bug.
Retest this issue with "qemu-kvm-rhev-2.3.0-14.el7.ppc64le" hit this bug.

Host kernel and SLOF version:
kernel-3.10.0-313.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch

Steps:
1.check the host cpu info:
# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             2
NUMA node(s):          2
Model:                 8247-21L
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72

2.Boot up the guest with -numa node and set up the cpus in one node to the boundary 80:
(gdb)/usr/libexec/qemu-kvm
(gdb)  r -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 8G -smp 80 -numa node,cpus=0-79

Actual results:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb434eb10 (LWP 35595)]
0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
() from /lib64/libtcmalloc.so.4

(gdb) bt full
#0  0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#1  0x00003fffb77cf108 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#2  0x00003fffb77e5c08 in tc_free ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#3  0x000000003bf1a1fc in free_and_trace (
    mem=<optimized out>) at vl.c:2590
No locals.
#4  0x00003fffb7a6abe8 in g_free ()
   from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00003fffb7a8f874 in g_strfreev ()
   from /lib64/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#6  0x000000003bfe0044 in container_get (
    root=<optimized out>, path=<optimized out>)
    at qom/container.c:46
        obj = 0x3cf10570
        child = <optimized out>
        parts = 0x3eb36fa0
        i = <optimized out>
        __PRETTY_FUNCTION__ = "container_get"
#7  0x000000003bd6f2e4 in memory_region_init (
    mr=0x3ef8d800, owner=0x0, name=0x0, size=4096)
    at /usr/src/debug/qemu-2.3.0/memory.c:917
No locals.
#8  0x000000003bd6f908 in memory_region_init_io (
    mr=0x3ef8d800, owner=<optimized out>, 
    ops=0x3c1f84d8 <subpage_ops>, opaque=0x3ef8d800, 
    name=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:1187
No locals.
---Type <return> to continue, or q <return> to quit---
#9  0x000000003bd1133c in subpage_init (base=1101659111424, 
    as=0x3c280cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/exec.c:2058
        mmio = <optimized out>
#10 register_subpage (d=0x3ec7c530, section=0x3fffb434d9e8)
    at /usr/src/debug/qemu-2.3.0/exec.c:1004
        subpage = <optimized out>
        base = 1101659111424
        existing = <optimized out>
        subsection = {mr = 0x0, address_space = 0x0, 
          offset_within_region = 0, size = {lo = 4096, 
            hi = 0}, 
          offset_within_address_space = 1101659111424, 
          readonly = false}
        start = <optimized out>
        end = <optimized out>
        __PRETTY_FUNCTION__ = "register_subpage"
#11 0x000000003bd115c4 in mem_add (
    listener=<optimized out>, section=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/qemu-2.3.0/exec.c:1043
        left = <optimized out>
        as = <optimized out>
        d = 0x3ec7c530
        now = {mr = 0x3e3b69e0, 
          address_space = 0x3c280cf0 <address_space_memory>, offset_within_region = 0, size = {lo = 1, hi = 0}, 
          offset_within_address_space = 1101659111886, 
          readonly = false}
        remain = {mr = <optimized out>, 
          address_space = <optimized out>, 
          offset_within_region = 0, size = {lo = 1, 
            hi = <optimized out>}, 
          offset_within_address_space = <optimized out>, 
          readonly = <optimized out>}
        page_size = {lo = 4096, hi = 0}
#12 0x000000003bd6e590 in address_space_update_topology_pass
    (as=0x3c280cf0 <address_space_memory>, adding=true, 
    new_view=<optimized out>, new_view=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    old_view=0x3cf17aa0, old_view=0x3cf17aa0)
    at /usr/src/debug/qemu-2.3.0/memory.c:776
        _listener = 0x3c280d38 <address_space_memory+72>
        iold = <optimized out>
        inew = <optimized out>
        frold = 0x3eac3638
        frnew = 0x3eac31b8
#13 0x000000003bd7103c in address_space_update_topology (
    as=0x3c280cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/memory.c:805
        old_view = 0x3cf17aa0
        new_view = 0x3cf17dd0
#14 memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.3.0/memory.c:845
        as = 0x3c280cf0 <address_space_memory>
#15 0x000000003bf7b6ac in pci_update_mappings (d=0x3e999000)
    at hw/pci/pci.c:1167
        r = 0x3e999118
        i = 1
---Type <return> to continue, or q <return> to quit---
        new_addr = 3221360640
#16 0x000000003bf7bef0 in pci_default_write_config (
    d=0x3e999000, addr=<optimized out>, 
    val_in=<optimized out>, l=<optimized out>)
    at hw/pci/pci.c:1219
        i = <optimized out>
        was_irq_disabled = 0
        val = <optimized out>
        __PRETTY_FUNCTION__ = "pci_default_write_config"
#17 0x000000003bfb9be0 in virtio_write_config (
    pci_dev=0x3e999000, address=<optimized out>, 
    val=<optimized out>, len=<optimized out>)
    at hw/virtio/virtio-pci.c:452
        proxy = 0x3e999000
        vdev = 0x3e9a0f40
#18 0x000000003bf83ab8 in pci_host_config_write_common (
    pci_dev=<optimized out>, addr=<optimized out>, 
    limit=<optimized out>, val=<optimized out>, 
    len=<optimized out>) at hw/pci/pci_host.c:57
---Type <return> to continue, or q <return> to quit---
        __PRETTY_FUNCTION__ = "pci_host_config_write_common"
#19 0x000000003bdca3c0 in finish_write_pci_config (
    spapr=<optimized out>, buid=<optimized out>, 
    addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2120320504)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_pci.c:186
        pci_dev = <optimized out>
#20 0x000000003bdc6554 in spapr_rtas_call (
    cpu=<optimized out>, spapr=<optimized out>, 
    token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_rtas.c:613
        call = <error reading variable call (value has been optimized out)>
#21 0x000000003bdc2324 in h_rtas (cpu=0x3db50000, 
    spapr=0x3d1e0000, opcode=<optimized out>, 
    args=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:579
---Type <return> to continue, or q <return> to quit---
        rtas_r3 = 2120320472
        token = 8215
        nargs = 5
        nret = <optimized out>
#22 0x000000003bdc42c8 in spapr_hypercall (cpu=0x3db50000, 
    opcode=61440, args=0x3fffb3b30030)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:1009
        fn = <optimized out>
        spapr = <optimized out>
        __func__ = "spapr_hypercall"
#23 0x000000003becd4ac in kvm_arch_handle_exit (
    cs=0x3db50000, run=0x3fffb3b30000)
    at /usr/src/debug/qemu-2.3.0/target-ppc/kvm.c:1588
        cpu = <optimized out>
        __func__ = "kvm_arch_handle_exit"
        env = <optimized out>
        ret = <optimized out>
#24 0x000000003bd6be20 in kvm_cpu_exec (cpu=0x3db50000)
    at /usr/src/debug/qemu-2.3.0/kvm-all.c:1889
---Type <return> to continue, or q <return> to quit---
        run = 0x3fffb3b30000
        ret = <optimized out>
        run_ret = <optimized out>
#25 0x000000003bd51950 in qemu_kvm_cpu_thread_fn (
    arg=0x3db50000) at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x3db50000
        r = <optimized out>
#26 0x00003fffb7bf8728 in start_thread ()
   from /lib64/power8/libpthread.so.0
No symbol table info available.
#27 0x00003fffb6d47ae0 in clone ()
   from /lib64/power8/libc.so.6
No symbol table info available.
(gdb)
Comment 9 Miroslav Rezanina 2015-09-18 07:54:38 EDT
Fix included in qemu-kvm-rhev-2.3.0-24.el7
Comment 10 Shuang Yu 2015-09-22 03:39:33 EDT
Retest this issue with "qemu-kvm-rhev-2.3.0-24.el7" and "SLOF-20150313-5.gitc89b0df.el7.noarch" ,the guest can boot up successful.

Host version:
kernel-3.10.0-313.el7.ppc64le
qemu-kvm-rhev-2.3.0-24.el7.ppc64le
SLOF-20150313-5.gitc89b0df.el7.noarch

1.# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             2
NUMA node(s):          2
Model:                 8247-21L
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72

# free -m
              total        used        free      shared  buff/cache   available
Mem:          31636         530       30176          17         929       30411
Swap:         16255           0       16255


2. /usr/libexec/qemu-kvm -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 8G -smp 80 -numa node,cpus=0-79

3./usr/libexec/qemu-kvm -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 45056 -smp 8 -numa node,mem=45056

Actual Result:
After step 2, the guest can boot up successful.
After step 3, the guest can boot up successful.
Comment 11 Qunfang Zhang 2015-09-22 09:27:53 EDT
Setting to VERIFIED according to comment 10.
Comment 13 errata-xmlrpc 2015-12-04 11:57:53 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html

Note You need to log in before you can comment on or make changes to this bug.