RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1262670 - [PowerKVM]SIGSEGV when boot up guest with -numa node and set up the cpus in one node to the boundary
Summary: [PowerKVM]SIGSEGV when boot up guest with -numa node and set up the cpus in o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: ppc64le
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Thomas Huth
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: RHEV3.6PPC 1277183 1277184
TreeView+ depends on / blocked
 
Reported: 2015-09-14 03:31 UTC by Shuang Yu
Modified: 2016-02-21 11:15 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-rhev-2.3.0-24.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-04 16:57:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Numa boundary screenshot (27.13 KB, image/png)
2015-09-14 03:33 UTC, Shuang Yu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2546 0 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2015-12-04 21:11:56 UTC

Description Shuang Yu 2015-09-14 03:31:44 UTC
Description of problem:
Boot up guest with -numa node and set up the cpus in one node to the
boundary,the guest will hit "signal SIGSEGV, Segmentation fault."

Version-Release number of selected component (if applicable):
kernel-3.10.0-313.el7.ppc64le
qemu-kvm-rhev-2.3.0-22.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch
Guest version:
RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso

How reproducible:
1/3

Steps to Reproduce:

1.Check the host cpu info:
# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             2
NUMA node(s):          2
Model:                 8247-21L
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72

2.Boot up the guest with -numa node and set up the cpus in one node to the boundary 80:
(gdb)/usr/libexec/qemu-kvm
(gdb)  r -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 8G -smp 80 -numa node,cpus=0-79

3.

Actual results:
The guest will hit "signal SIGSEGV, Segmentation fault."

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb434eb10 (LWP 73249)]
0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4


Expected results:
The guest should can boot up normally

Additional info:
(gdb) bt full
#0  0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#1  0x00003fffb77cf108 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#2  0x00003fffb77e5c08 in tc_free () from /lib64/libtcmalloc.so.4
No symbol table info available.
#3  0x00000000347fa29c in free_and_trace (mem=<optimized out>) at vl.c:2590
No locals.
#4  0x00003fffb7a6abe8 in g_free () from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00003fffb7a8f874 in g_strfreev () from /lib64/libglib-2.0.so.0
No symbol table info available.
#6  0x00000000348c00e4 in container_get (root=<optimized out>, 
    path=<optimized out>) at qom/container.c:46
        obj = 0x357f0570
        child = <optimized out>
        parts = 0x37417a80
        i = <optimized out>
        __PRETTY_FUNCTION__ = "container_get"
#7  0x000000003464f384 in memory_region_init (mr=0x37866c00, owner=0x0, 
    name=0x0, size=4096) at /usr/src/debug/qemu-2.3.0/memory.c:917
---Type <return> to continue, or q <return> to quit---
No locals.
#8  0x000000003464f9a8 in memory_region_init_io (mr=0x37866c00, 
    owner=<optimized out>, ops=0x34ad84c8 <subpage_ops>, opaque=0x37866c00, 
    name=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:1187
No locals.
#9  0x00000000345f139c in subpage_init (base=1101659111424, 
    as=0x34b60cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/exec.c:2058
        mmio = <optimized out>
#10 register_subpage (d=0x3755c530, section=0x3fffb434d9e8)
    at /usr/src/debug/qemu-2.3.0/exec.c:1004
        subpage = <optimized out>
        base = 1101659111424
        existing = <optimized out>
        subsection = {mr = 0x0, address_space = 0x0, offset_within_region = 0, 
          size = {lo = 4096, hi = 0}, 
          offset_within_address_space = 1101659111424, readonly = false}
        start = <optimized out>
        end = <optimized out>
        __PRETTY_FUNCTION__ = "register_subpage"
#11 0x00000000345f1624 in mem_add (listener=<optimized out>, 
    section=<optimized out>) at /usr/src/debug/qemu-2.3.0/exec.c:1043
---Type <return> to continue, or q <return> to quit---
        left = <optimized out>
        as = <optimized out>
        d = 0x3755c530
        now = {mr = 0x36c969e0, 
          address_space = 0x34b60cf0 <address_space_memory>, 
          offset_within_region = 0, size = {lo = 1, hi = 0}, 
          offset_within_address_space = 1101659111886, readonly = false}
        remain = {mr = <optimized out>, address_space = <optimized out>, 
          offset_within_region = 0, size = {lo = 1, hi = <optimized out>}, 
          offset_within_address_space = <optimized out>, 
          readonly = <optimized out>}
        page_size = {lo = 4096, hi = 0}
#12 0x000000003464e630 in address_space_update_topology_pass (
    as=0x34b60cf0 <address_space_memory>, adding=<optimized out>, 
    new_view=<optimized out>, new_view=<optimized out>, 
    old_view=<optimized out>, old_view=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:776
        _listener = 0x34b60d38 <address_space_memory+72>
        iold = <optimized out>
        inew = <optimized out>
        frold = 0x373a3638
        frnew = 0x373a31b8
#13 0x00000000346510dc in address_space_update_topology (
---Type <return> to continue, or q <return> to quit---
    as=0x34b60cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/memory.c:805
        old_view = 0x357f7aa0
        new_view = 0x357f7dd0
#14 memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.3.0/memory.c:845
        as = 0x34b60cf0 <address_space_memory>
#15 0x000000003485b74c in pci_update_mappings (d=0x37279000)
    at hw/pci/pci.c:1167
        r = 0x37279118
        i = 1
        new_addr = 3221360640
#16 0x000000003485bf90 in pci_default_write_config (d=0x37279000, 
    addr=<optimized out>, val_in=<optimized out>, l=<optimized out>)
    at hw/pci/pci.c:1219
        i = <optimized out>
        was_irq_disabled = 0
        val = <optimized out>
        __PRETTY_FUNCTION__ = "pci_default_write_config"
#17 0x0000000034899c80 in virtio_write_config (pci_dev=0x37279000, 
    address=<optimized out>, val=<optimized out>, len=<optimized out>)
    at hw/virtio/virtio-pci.c:452
        proxy = 0x37279000
---Type <return> to continue, or q <return> to quit---
        vdev = 0x37280f40
#18 0x0000000034863b58 in pci_host_config_write_common (
    pci_dev=<optimized out>, addr=<optimized out>, limit=<optimized out>, 
    val=<optimized out>, len=<optimized out>) at hw/pci/pci_host.c:57
        __PRETTY_FUNCTION__ = "pci_host_config_write_common"
#19 0x00000000346aa460 in finish_write_pci_config (spapr=<optimized out>, 
    buid=<optimized out>, addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2120320504)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_pci.c:186
        pci_dev = <optimized out>
#20 0x00000000346a65f4 in spapr_rtas_call (cpu=<optimized out>, 
    spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, rets=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_rtas.c:613
        call = <error reading variable call (value has been optimized out)>
#21 0x00000000346a23c4 in h_rtas (cpu=0x36430000, spapr=0x35ac0000, 
    opcode=<optimized out>, args=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:579
        rtas_r3 = 2120320472
        token = 8215
        nargs = 5
        nret = <optimized out>
#22 0x00000000346a4368 in spapr_hypercall (cpu=0x36430000, opcode=61440, 
---Type <return> to continue, or q <return> to quit---
    args=0x3fffb3b30030) at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:1009
        fn = <optimized out>
        spapr = <optimized out>
        __func__ = "spapr_hypercall"
#23 0x00000000347ad54c in kvm_arch_handle_exit (cs=0x36430000, 
    run=0x3fffb3b30000) at /usr/src/debug/qemu-2.3.0/target-ppc/kvm.c:1588
        cpu = <optimized out>
        __func__ = "kvm_arch_handle_exit"
        env = <optimized out>
        ret = <optimized out>
#24 0x000000003464bec0 in kvm_cpu_exec (cpu=0x36430000)
    at /usr/src/debug/qemu-2.3.0/kvm-all.c:1908
        run = 0x3fffb3b30000
        ret = <optimized out>
        run_ret = <optimized out>
#25 0x00000000346319b0 in qemu_kvm_cpu_thread_fn (arg=0x36430000)
    at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x36430000
        r = <optimized out>
#26 0x00003fffb7bf8728 in start_thread () from /lib64/power8/libpthread.so.0
No symbol table info available.
#27 0x00003fffb6d47ae0 in clone () from /lib64/power8/libc.so.6
No symbol table info available.
(gdb) 


And the Attachment is the screeshot of guest hit SIGSEGV time

Comment 1 Shuang Yu 2015-09-14 03:33:12 UTC
Created attachment 1073047 [details]
Numa boundary screenshot

Comment 3 Thomas Huth 2015-09-14 14:32:27 UTC
FWIW, I can re-create the crash even with a little shorter command line (to sort out some of the options):

/usr/libexec/qemu-kvm -machine pseries,accel=kvm,usb=off -drive file=/var/lib/libvirt/images/isos/RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -m 8G -smp 80 -numa node,cpus=0-79 -nographic -vga none

Comment 4 Shuang Yu 2015-09-15 10:13:43 UTC
Boot up guest and set up its 'mem' in one numa node to the valid boundary value,hit SIGSEGV problem again.

Version-Release number of selected component (if applicable):
kernel-3.10.0-313.el7.ppc64le
qemu-kvm-rhev-2.3.0-22.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch
Guest version:
RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso

How reproducible:
1/5

Steps to Reproduce:
1.Check the mem info on the host:
# free -m
              total        used        free      shared  buff/cache   available
Mem:          31636         548       23439          27        7648       30236
Swap:         16255           0       16255

2.Boot up guest with one numa node and set the 'mem' to the valid boundary value:
/usr/libexec/qemu-kvm -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 32768 -smp 8 -numa node,mem=32768


Actual results:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb434eb10 (LWP 42365)]
0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
    () from /lib64/libtcmalloc.so.4



Expected results:
The guest should can boot up normally

Additional info:
(gdb) bt full
#0  0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#1  0x00003fffb77cf108 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#2  0x00003fffb77e5c08 in tc_free ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#3  0x000000002855a29c in free_and_trace (
    mem=<optimized out>) at vl.c:2590
No locals.
#4  0x00003fffb7a6abe8 in g_free ()
   from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00003fffb7a8f874 in g_strfreev ()
   from /lib64/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#6  0x00000000286200e4 in container_get (
    root=<optimized out>, path=<optimized out>)
    at qom/container.c:46
        obj = 0x29550570
        child = <optimized out>
        parts = 0x2a647680
        i = <optimized out>
        __PRETTY_FUNCTION__ = "container_get"
#7  0x00000000283af384 in memory_region_init (
    mr=0x2af29000, owner=0x0, name=0x0, size=4096)
    at /usr/src/debug/qemu-2.3.0/memory.c:917
No locals.
#8  0x00000000283af9a8 in memory_region_init_io (
    mr=0x2af29000, owner=<optimized out>, 
    ops=0x288384c8 <subpage_ops>, opaque=0x2af29000, 
    name=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:1187
No locals.
---Type <return> to continue, or q <return> to quit---
#9  0x000000002835139c in subpage_init (base=1101659111424, 
    as=0x288c0cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/exec.c:2058
        mmio = <optimized out>
#10 register_subpage (d=0x2a9b9f60, section=0x3fffb434d9e8)
    at /usr/src/debug/qemu-2.3.0/exec.c:1004
        subpage = <optimized out>
        base = 1101659111424
        existing = <optimized out>
        subsection = {mr = 0x0, address_space = 0x0, 
          offset_within_region = 0, size = {lo = 4096, 
            hi = 0}, 
          offset_within_address_space = 1101659111424, 
          readonly = false}
        start = <optimized out>
        end = <optimized out>
        __PRETTY_FUNCTION__ = "register_subpage"
#11 0x0000000028351624 in mem_add (
    listener=<optimized out>, section=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/qemu-2.3.0/exec.c:1043
        left = <optimized out>
        as = <optimized out>
        d = 0x2a9b9f60
        now = {mr = 0x29628f00, 
          address_space = 0x288c0cf0 <address_space_memory>, offset_within_region = 0, size = {lo = 1, hi = 0}, 
          offset_within_address_space = 1101659111886, 
          readonly = false}
        remain = {mr = <optimized out>, 
          address_space = <optimized out>, 
          offset_within_region = 0, size = {lo = 1, 
            hi = <optimized out>}, 
          offset_within_address_space = <optimized out>, 
          readonly = <optimized out>}
        page_size = {lo = 4096, hi = 0}
#12 0x00000000283ae630 in address_space_update_topology_pass
    (as=0x288c0cf0 <address_space_memory>, adding=true, 
    new_view=<optimized out>, new_view=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    old_view=0x295518c0, old_view=0x295518c0)
    at /usr/src/debug/qemu-2.3.0/memory.c:776
        _listener = 0x288c0d38 <address_space_memory+72>
        iold = <optimized out>
        inew = <optimized out>
        frold = 0x2ab304f0
        frnew = 0x2ab32438
#13 0x00000000283b10dc in address_space_update_topology (
    as=0x288c0cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/memory.c:805
        old_view = 0x295518c0
        new_view = 0x29552d60
#14 memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.3.0/memory.c:845
        as = 0x288c0cf0 <address_space_memory>
#15 0x00000000285bb70c in pci_update_mappings (d=0x2ab09000)
    at hw/pci/pci.c:1159
        r = 0x2ab090f0
        i = 0
---Type <return> to continue, or q <return> to quit---
        new_addr = 18446744073709551615
#16 0x00000000285bbf90 in pci_default_write_config (
    d=0x2ab09000, addr=<optimized out>, 
    val_in=<optimized out>, l=<optimized out>)
    at hw/pci/pci.c:1219
        i = <optimized out>
        was_irq_disabled = 0
        val = <optimized out>
        __PRETTY_FUNCTION__ = "pci_default_write_config"
#17 0x00000000285f9c80 in virtio_write_config (
    pci_dev=0x2ab09000, address=<optimized out>, 
    val=<optimized out>, len=<optimized out>)
    at hw/virtio/virtio-pci.c:452
        proxy = 0x2ab09000
        vdev = 0x2ab10f40
#18 0x00000000285c3b58 in pci_host_config_write_common (
    pci_dev=<optimized out>, addr=<optimized out>, 
    limit=<optimized out>, val=<optimized out>, 
    len=<optimized out>) at hw/pci/pci_host.c:57
---Type <return> to continue, or q <return> to quit---
        __PRETTY_FUNCTION__ = "pci_host_config_write_common"
#19 0x000000002840a460 in finish_write_pci_config (
    spapr=<optimized out>, buid=<optimized out>, 
    addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2120181232)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_pci.c:186
        pci_dev = <optimized out>
#20 0x00000000284065f4 in spapr_rtas_call (
    cpu=<optimized out>, spapr=<optimized out>, 
    token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_rtas.c:613
        call = <error reading variable call (value has been optimized out)>
#21 0x00000000284023c4 in h_rtas (cpu=0x2a0e0000, 
    spapr=0x29820000, opcode=<optimized out>, 
    args=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:579
---Type <return> to continue, or q <return> to quit---
        rtas_r3 = 2120181200
        token = 8215
        nargs = 5
        nret = <optimized out>
#22 0x0000000028404368 in spapr_hypercall (cpu=0x2a0e0000, 
    opcode=61440, args=0x3fffb3b30030)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:1009
        fn = <optimized out>
        spapr = <optimized out>
        __func__ = "spapr_hypercall"
#23 0x000000002850d54c in kvm_arch_handle_exit (
    cs=0x2a0e0000, run=0x3fffb3b30000)
    at /usr/src/debug/qemu-2.3.0/target-ppc/kvm.c:1588
        cpu = <optimized out>
        __func__ = "kvm_arch_handle_exit"
        env = <optimized out>
        ret = <optimized out>
#24 0x00000000283abec0 in kvm_cpu_exec (cpu=0x2a0e0000)
    at /usr/src/debug/qemu-2.3.0/kvm-all.c:1908
---Type <return> to continue, or q <return> to quit---
        run = 0x3fffb3b30000
        ret = <optimized out>
        run_ret = <optimized out>
#25 0x00000000283919b0 in qemu_kvm_cpu_thread_fn (
    arg=0x2a0e0000) at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x2a0e0000
        r = <optimized out>
#26 0x00003fffb7bf8728 in start_thread ()
   from /lib64/power8/libpthread.so.0
No symbol table info available.
#27 0x00003fffb6d47ae0 in clone ()
   from /lib64/power8/libc.so.6
No symbol table info available.
(gdb)

Comment 5 Thomas Huth 2015-09-15 10:27:46 UTC
Since the crash always occurs within the tcmalloc library, I've now build a version of qemu-kvm without this library - and indeed, I did not see a crash with that version yet. So the problem might be related to tcmalloc.

Comment 6 Thomas Huth 2015-09-15 17:09:59 UTC
I think I likely found the problem. Linking against ElectricFence instead of tcmalloc revealed the following:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ffef01aeb20 (LWP 107603)]
0x000000004f39f148 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51
51	  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) bt
#0  0x000000004f39f148 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51
#1  spapr_populate_drconf_memory (fdt=0x3ffddd8f8000, spapr=0x3fff957cfd40) at /home/thuth/devel/qemu/hw/ppc/spapr.c:795
#2  spapr_h_cas_compose_response (spapr=0x3fff957cfd40, addr=508403712, size=32764, cpu_update=<optimized out>, memory_update=<optimized out>)
    at /home/thuth/devel/qemu/hw/ppc/spapr.c:832
#3  0x000000004f3a35b0 in h_client_architecture_support (cpu_=<optimized out>, spapr=0x3fff957cfd40, opcode=<optimized out>, args=0x3ffeef990030)
    at /home/thuth/devel/qemu/hw/ppc/spapr_hcall.c:963
#4  0x000000004f3a43c8 in spapr_hypercall (cpu=0x3ffef10f8c50, opcode=61442, args=0x3ffeef990030) at /home/thuth/devel/qemu/hw/ppc/spapr_hcall.c:1009
#5  0x000000004f4ad60c in kvm_arch_handle_exit (cs=0x3ffef10f8c50, run=0x3ffeef990000) at /home/thuth/devel/qemu/target-ppc/kvm.c:1588
#6  0x000000004f34bee0 in kvm_cpu_exec (cpu=0x3ffef10f8c50) at /home/thuth/devel/qemu/kvm-all.c:1908
#7  0x000000004f3319d0 in qemu_kvm_cpu_thread_fn (arg=0x3ffef10f8c50) at /home/thuth/devel/qemu/cpus.c:944
#8  0x00003fffb7bf8728 in start_thread (arg=0x3ffef01aeb20) at pthread_create.c:310
#9  0x00003fffb6db7ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

And indeed, in spapr_populate_drconf_memory() there is a bug that the "int_buf" is used twice, for setting the "ibm,dynamic-memory" property and the "ibm,associativity-lookup-arrays" property. But the size of the buffer is only calculated based on the size of the first property! So if the second property is bigger than the first one, you get a "nice" buffer overrun which then finally leads to the segmentation fault later. ... I'll write a patch to fix this issue ...

Comment 7 Shuang Yu 2015-09-17 09:54:02 UTC
Retest this issue with "qemu-kvm-rhev-2.3.0-12.el7.ppc64le" 10 times,not hit this bug.
Retest this issue with "qemu-kvm-rhev-2.3.0-13.el7.ppc64le" hit this bug.
Retest this issue with "qemu-kvm-rhev-2.3.0-14.el7.ppc64le" hit this bug.

Host kernel and SLOF version:
kernel-3.10.0-313.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch

Steps:
1.check the host cpu info:
# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             2
NUMA node(s):          2
Model:                 8247-21L
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72

2.Boot up the guest with -numa node and set up the cpus in one node to the boundary 80:
(gdb)/usr/libexec/qemu-kvm
(gdb)  r -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 8G -smp 80 -numa node,cpus=0-79

Actual results:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb434eb10 (LWP 35595)]
0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
() from /lib64/libtcmalloc.so.4

(gdb) bt full
#0  0x00003fffb77cef60 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
No symbol table info available.
#1  0x00003fffb77cf108 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#2  0x00003fffb77e5c08 in tc_free ()
   from /lib64/libtcmalloc.so.4
No symbol table info available.
#3  0x000000003bf1a1fc in free_and_trace (
    mem=<optimized out>) at vl.c:2590
No locals.
#4  0x00003fffb7a6abe8 in g_free ()
   from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00003fffb7a8f874 in g_strfreev ()
   from /lib64/libglib-2.0.so.0
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#6  0x000000003bfe0044 in container_get (
    root=<optimized out>, path=<optimized out>)
    at qom/container.c:46
        obj = 0x3cf10570
        child = <optimized out>
        parts = 0x3eb36fa0
        i = <optimized out>
        __PRETTY_FUNCTION__ = "container_get"
#7  0x000000003bd6f2e4 in memory_region_init (
    mr=0x3ef8d800, owner=0x0, name=0x0, size=4096)
    at /usr/src/debug/qemu-2.3.0/memory.c:917
No locals.
#8  0x000000003bd6f908 in memory_region_init_io (
    mr=0x3ef8d800, owner=<optimized out>, 
    ops=0x3c1f84d8 <subpage_ops>, opaque=0x3ef8d800, 
    name=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/memory.c:1187
No locals.
---Type <return> to continue, or q <return> to quit---
#9  0x000000003bd1133c in subpage_init (base=1101659111424, 
    as=0x3c280cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/exec.c:2058
        mmio = <optimized out>
#10 register_subpage (d=0x3ec7c530, section=0x3fffb434d9e8)
    at /usr/src/debug/qemu-2.3.0/exec.c:1004
        subpage = <optimized out>
        base = 1101659111424
        existing = <optimized out>
        subsection = {mr = 0x0, address_space = 0x0, 
          offset_within_region = 0, size = {lo = 4096, 
            hi = 0}, 
          offset_within_address_space = 1101659111424, 
          readonly = false}
        start = <optimized out>
        end = <optimized out>
        __PRETTY_FUNCTION__ = "register_subpage"
#11 0x000000003bd115c4 in mem_add (
    listener=<optimized out>, section=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/qemu-2.3.0/exec.c:1043
        left = <optimized out>
        as = <optimized out>
        d = 0x3ec7c530
        now = {mr = 0x3e3b69e0, 
          address_space = 0x3c280cf0 <address_space_memory>, offset_within_region = 0, size = {lo = 1, hi = 0}, 
          offset_within_address_space = 1101659111886, 
          readonly = false}
        remain = {mr = <optimized out>, 
          address_space = <optimized out>, 
          offset_within_region = 0, size = {lo = 1, 
            hi = <optimized out>}, 
          offset_within_address_space = <optimized out>, 
          readonly = <optimized out>}
        page_size = {lo = 4096, hi = 0}
#12 0x000000003bd6e590 in address_space_update_topology_pass
    (as=0x3c280cf0 <address_space_memory>, adding=true, 
    new_view=<optimized out>, new_view=<optimized out>, 
---Type <return> to continue, or q <return> to quit---
    old_view=0x3cf17aa0, old_view=0x3cf17aa0)
    at /usr/src/debug/qemu-2.3.0/memory.c:776
        _listener = 0x3c280d38 <address_space_memory+72>
        iold = <optimized out>
        inew = <optimized out>
        frold = 0x3eac3638
        frnew = 0x3eac31b8
#13 0x000000003bd7103c in address_space_update_topology (
    as=0x3c280cf0 <address_space_memory>)
    at /usr/src/debug/qemu-2.3.0/memory.c:805
        old_view = 0x3cf17aa0
        new_view = 0x3cf17dd0
#14 memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.3.0/memory.c:845
        as = 0x3c280cf0 <address_space_memory>
#15 0x000000003bf7b6ac in pci_update_mappings (d=0x3e999000)
    at hw/pci/pci.c:1167
        r = 0x3e999118
        i = 1
---Type <return> to continue, or q <return> to quit---
        new_addr = 3221360640
#16 0x000000003bf7bef0 in pci_default_write_config (
    d=0x3e999000, addr=<optimized out>, 
    val_in=<optimized out>, l=<optimized out>)
    at hw/pci/pci.c:1219
        i = <optimized out>
        was_irq_disabled = 0
        val = <optimized out>
        __PRETTY_FUNCTION__ = "pci_default_write_config"
#17 0x000000003bfb9be0 in virtio_write_config (
    pci_dev=0x3e999000, address=<optimized out>, 
    val=<optimized out>, len=<optimized out>)
    at hw/virtio/virtio-pci.c:452
        proxy = 0x3e999000
        vdev = 0x3e9a0f40
#18 0x000000003bf83ab8 in pci_host_config_write_common (
    pci_dev=<optimized out>, addr=<optimized out>, 
    limit=<optimized out>, val=<optimized out>, 
    len=<optimized out>) at hw/pci/pci_host.c:57
---Type <return> to continue, or q <return> to quit---
        __PRETTY_FUNCTION__ = "pci_host_config_write_common"
#19 0x000000003bdca3c0 in finish_write_pci_config (
    spapr=<optimized out>, buid=<optimized out>, 
    addr=<optimized out>, size=<optimized out>, 
    val=<optimized out>, rets=2120320504)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_pci.c:186
        pci_dev = <optimized out>
#20 0x000000003bdc6554 in spapr_rtas_call (
    cpu=<optimized out>, spapr=<optimized out>, 
    token=<optimized out>, nargs=<optimized out>, 
    args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_rtas.c:613
        call = <error reading variable call (value has been optimized out)>
#21 0x000000003bdc2324 in h_rtas (cpu=0x3db50000, 
    spapr=0x3d1e0000, opcode=<optimized out>, 
    args=<optimized out>)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:579
---Type <return> to continue, or q <return> to quit---
        rtas_r3 = 2120320472
        token = 8215
        nargs = 5
        nret = <optimized out>
#22 0x000000003bdc42c8 in spapr_hypercall (cpu=0x3db50000, 
    opcode=61440, args=0x3fffb3b30030)
    at /usr/src/debug/qemu-2.3.0/hw/ppc/spapr_hcall.c:1009
        fn = <optimized out>
        spapr = <optimized out>
        __func__ = "spapr_hypercall"
#23 0x000000003becd4ac in kvm_arch_handle_exit (
    cs=0x3db50000, run=0x3fffb3b30000)
    at /usr/src/debug/qemu-2.3.0/target-ppc/kvm.c:1588
        cpu = <optimized out>
        __func__ = "kvm_arch_handle_exit"
        env = <optimized out>
        ret = <optimized out>
#24 0x000000003bd6be20 in kvm_cpu_exec (cpu=0x3db50000)
    at /usr/src/debug/qemu-2.3.0/kvm-all.c:1889
---Type <return> to continue, or q <return> to quit---
        run = 0x3fffb3b30000
        ret = <optimized out>
        run_ret = <optimized out>
#25 0x000000003bd51950 in qemu_kvm_cpu_thread_fn (
    arg=0x3db50000) at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x3db50000
        r = <optimized out>
#26 0x00003fffb7bf8728 in start_thread ()
   from /lib64/power8/libpthread.so.0
No symbol table info available.
#27 0x00003fffb6d47ae0 in clone ()
   from /lib64/power8/libc.so.6
No symbol table info available.
(gdb)

Comment 9 Miroslav Rezanina 2015-09-18 11:54:38 UTC
Fix included in qemu-kvm-rhev-2.3.0-24.el7

Comment 10 Shuang Yu 2015-09-22 07:39:33 UTC
Retest this issue with "qemu-kvm-rhev-2.3.0-24.el7" and "SLOF-20150313-5.gitc89b0df.el7.noarch" ,the guest can boot up successful.

Host version:
kernel-3.10.0-313.el7.ppc64le
qemu-kvm-rhev-2.3.0-24.el7.ppc64le
SLOF-20150313-5.gitc89b0df.el7.noarch

1.# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             2
NUMA node(s):          2
Model:                 8247-21L
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72

# free -m
              total        used        free      shared  buff/cache   available
Mem:          31636         530       30176          17         929       30411
Swap:         16255           0       16255


2. /usr/libexec/qemu-kvm -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 8G -smp 80 -numa node,cpus=0-79

3./usr/libexec/qemu-kvm -name numa-test -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150304.0-Server-ppc64-2.qcow2,format=qcow2,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -drive file=RHEL-6.7-20150304.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid f17c415d-e0b1-4771-b5da-87fb46bc73fd -m 45056 -smp 8 -numa node,mem=45056

Actual Result:
After step 2, the guest can boot up successful.
After step 3, the guest can boot up successful.

Comment 11 Qunfang Zhang 2015-09-22 13:27:53 UTC
Setting to VERIFIED according to comment 10.

Comment 13 errata-xmlrpc 2015-12-04 16:57:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html


Note You need to log in before you can comment on or make changes to this bug.