RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1199873 - qemu-kvm core dumped when boot guest with 232 virtio block and multifunction=on
Summary: qemu-kvm core dumped when boot guest with 232 virtio block and multifunction=on
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.7
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Fam Zheng
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1005016
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-09 03:40 UTC by mazhang
Modified: 2016-09-20 04:41 UTC (History)
17 users (show)

Fixed In Version: qemu-kvm-0.12.1.2-2.462.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-22 06:09:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
full command line (49.06 KB, text/plain)
2015-03-09 03:40 UTC, mazhang
no flags Details
back trace (4.12 KB, text/plain)
2015-03-09 04:39 UTC, mazhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1275 0 normal SHIPPED_LIVE qemu-kvm bug fix and enhancement update 2015-07-20 17:49:16 UTC

Description mazhang 2015-03-09 03:40:13 UTC
Description of problem:
qemu-kvm core dumped when boot guest with 232 virtio block and multifunction=on
seems like the same issue as bz895436 in RHEL7.

Version-Release number of selected component (if applicable):

Host:
qemu-kvm-tools-0.12.1.2-2.456.el6.x86_64
gpxe-roms-qemu-0.9.7-6.12.el6.noarch
qemu-img-0.12.1.2-2.456.el6.x86_64
qemu-kvm-0.12.1.2-2.456.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.456.el6.x86_64
2.6.32-539.el6.x86_64

Guest:
RHEL6.6-32 GA

How reproducible:
always

Steps to Reproduce:
1.Boot guest with 232 virtio block and multifunction=on
2.
3.

Actual results:
qemu-kvm core dumped.

qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:4042: main_loop_wait: Assertion `ioh->fd < 1024' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff4a5f625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install alsa-lib-1.0.22-3.el6.x86_64 celt051-0.5.1.3-0.el6.x86_64 cyrus-sasl-lib-2.1.23-15.el6.x86_64 cyrus-sasl-md5-2.1.23-15.el6.x86_64 cyrus-sasl-plain-2.1.23-15.el6.x86_64 db4-4.7.25-18.el6_4.x86_64 dbus-libs-1.2.24-7.el6_3.x86_64 flac-1.2.1-6.1.el6.x86_64 glib2-2.28.8-4.el6.x86_64 glibc-2.12-1.149.el6.x86_64 glusterfs-api-3.6.0.28-2.el6.x86_64 glusterfs-libs-3.6.0.28-2.el6.x86_64 gnutls-2.8.5-14.el6_5.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64 libICE-1.0.6-1.el6.x86_64 libSM-1.2.1-2.el6.x86_64 libX11-1.6.0-2.2.el6.x86_64 libXau-1.0.6-4.el6.x86_64 libXext-1.3.2-2.1.el6.x86_64 libXi-1.7.2-2.2.el6.x86_64 libXtst-1.2.2-2.1.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libasyncns-0.8-1.1.el6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libgcrypt-1.4.5-11.el6_4.x86_64 libgpg-error-1.7-4.el6.x86_64 libjpeg-turbo-1.2.1-3.el6_5.x86_64 libogg-1.1.4-2.1.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libsndfile-1.0.20-5.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 libtasn1-2.3-6.el6_5.x86_64 libuuid-2.17.2-12.18.el6.x86_64 libvorbis-1.2.3-4.el6_2.1.x86_64 libxcb-1.9.1-2.el6.x86_64 lzo-2.03-3.1.el6_5.1.x86_64 nss-softokn-freebl-3.14.3-15.el6.x86_64 openssl-1.0.1e-30.el6.x86_64 pixman-0.32.4-4.el6.x86_64 pulseaudio-libs-0.9.21-17.el6.x86_64 snappy-1.1.0-1.el6.x86_64 spice-server-0.12.4-11.el6.x86_64 tcp_wrappers-libs-7.6-57.el6.x86_64 usbredir-0.5.1-1.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt full
#0  0x00007ffff4a5f625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ffff4a60e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007ffff4a5874e in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007ffff4a58810 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007ffff7db3c06 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4042
        ioh = 0x7fffe8020fb0
        rfds = {fds_bits = {0 <repeats 16 times>}}
        wfds = {fds_bits = {0 <repeats 16 times>}}
        xfds = {fds_bits = {0 <repeats 16 times>}}
        ret = <value optimized out>
        nfds = -1
        tv = {tv_sec = 0, tv_usec = 999482}
        __PRETTY_FUNCTION__ = "main_loop_wait"
#5  0x00007ffff7dd715a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2258
        fds = {492, 493}
        mask = {__val = {268443712, 0 <repeats 15 times>}}
        sigfd = 494
#6  0x00007ffff7db83d7 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4285
        r = <value optimized out>
#7  main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6742
        gdbstub_dev = 0x0
        i = <value optimized out>
        snapshot = 0
        linux_boot = 0
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = 0x7ffff7f8c10f ""
        boot_devices = "cad", '\000' <repeats 29 times>
        ds = <value optimized out>


Expected results:
qemu-kvm works well.

Additional info:

Comment 1 mazhang 2015-03-09 03:40:50 UTC
Created attachment 999433 [details]
full command line

Comment 3 mazhang 2015-03-09 04:39:40 UTC
Created attachment 999434 [details]
back trace

Comment 4 Amos Kong 2015-03-09 13:55:07 UTC
Is it duplicated with bug 1005016?

Comment 5 Ademar Reis 2015-03-12 20:08:41 UTC
(In reply to Amos Kong from comment #4)
> Is it duplicated with bug 1005016?

I think it's the same of Bug 895436, which is a dupe of Bug 1003535, both fixed in RHEL7/upstream.

Marcel: I reassigned it to you. If you have reasons to believe it's a different issue, please reassign it back to me. Thanks.

Comment 6 Vlad Yasevich 2015-03-13 13:35:46 UTC
I think this will be all addressed with patches to address Bug 1005016.

-vlad

Comment 7 Marcel Apfelbaum 2015-03-24 14:19:38 UTC
Hi Ademar,

It is the same symptom but it doesn't look like the same root cause.
Those bugs were because of the way the address space nodes were duplicated for each device and the radix tree was full. This world includes MemoryRegions and AddressSpaces features that are not part of RHEL 6.

The problem here is that the some file descriptor runs over a predefined limit and I this is why I think that Vlad may be right. I would advice QE to verify it with Vlad's proposed patch from BZ 1005016.

I am not saying that once this problem is solved we will not have a memory issue, but we don't know yet.

Since is a virtio/block issue, I don't think I am the right address, at least until the fd issue is solved.

Thanks,
Marcel

Comment 8 Ademar Reis 2015-03-24 14:55:10 UTC
(In reply to Marcel Apfelbaum from comment #7)
> Hi Ademar,
> 
> It is the same symptom but it doesn't look like the same root cause.
> Those bugs were because of the way the address space nodes were duplicated
> for each device and the radix tree was full. This world includes
> MemoryRegions and AddressSpaces features that are not part of RHEL 6.
> 
> The problem here is that the some file descriptor runs over a predefined
> limit and I this is why I think that Vlad may be right. I would advice QE to
> verify it with Vlad's proposed patch from BZ 1005016.
> 
> I am not saying that once this problem is solved we will not have a memory
> issue, but we don't know yet.
> 
> Since is a virtio/block issue, I don't think I am the right address, at
> least until the fd issue is solved.
> 

Got it, reassigning it to Fam.

Setting NEEDINFO(QE) for a retest with the patches from BZ 1005016. A new build with the patches should be available in a matter of days (patches fully ack'd already)

Comment 9 Ademar Reis 2015-03-24 14:55:47 UTC
(In reply to Ademar Reis from comment #8)
> 
> Setting NEEDINFO(QE) for a retest with the patches from BZ 1005016. A new
> build with the patches should be available in a matter of days (patches
> fully ack'd already)

Forgot the needinfo.

Comment 10 mazhang 2015-03-25 11:08:32 UTC
Hi Fam,

bz1005016 still on POST status, I don't find the new build in it, could you paste the link here?

Thanks,
Mazhang.

Comment 14 Fam Zheng 2015-03-26 01:01:00 UTC
Thanks Vlad and Jeff.

Maosheng, could you proceed with the testing?

Comment 18 Fam Zheng 2015-03-26 03:48:17 UTC
Please collect an "strace -f" log of qemu-kvm.

Comment 19 Fam Zheng 2015-03-26 06:27:45 UTC
I've looked at the trace, the ioctl(KVM_IOEVENTFD) failed:

  5082  ioctl(5, 0x4040ae79, 0x7faf7b5fd380) = -1 ENOSPC (No space left on device)

Because the number of devices in host kernel kvm hits the limit:

/* Caller must hold slots_lock. */
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
			    struct kvm_io_device *dev)
{
	struct kvm_io_bus *new_bus, *bus;

	bus = kvm->buses[bus_idx];
	/* exclude ioeventfd which is limited by maximum fd */
	if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
		return -ENOSPC;
...

qemu-kvm exits because ioeventfd is required for virtio-blk-dataplane. Please test again without dataplane.

Comment 20 Amos Kong 2015-03-26 06:28:59 UTC
(In reply to Fam Zheng from comment #18)
> Please collect an "strace -f" log of qemu-kvm.

unable to map ioeventfd: -28

The error is expected, we don't have enough ioeventfd (kvm_io_bus) in host kernel. virtio-net should fail back to userspace and continue work.

Comment 21 mazhang 2015-03-26 07:26:14 UTC
(In reply to Fam Zheng from comment #19)
> I've looked at the trace, the ioctl(KVM_IOEVENTFD) failed:
> 
>   5082  ioctl(5, 0x4040ae79, 0x7faf7b5fd380) = -1 ENOSPC (No space left on
> device)
> 
> Because the number of devices in host kernel kvm hits the limit:
> 
> /* Caller must hold slots_lock. */
> int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> 			    struct kvm_io_device *dev)
> {
> 	struct kvm_io_bus *new_bus, *bus;
> 
> 	bus = kvm->buses[bus_idx];
> 	/* exclude ioeventfd which is limited by maximum fd */
> 	if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
> 		return -ENOSPC;
> ...
> 
> qemu-kvm exits because ioeventfd is required for virtio-blk-dataplane.
> Please test again without dataplane.

without dataplane, qemu-kvm hmp got a lot of following messeags, but not quit.

qemu-kvm: virtio_pci_start_ioeventfd: failed. Fallback to a userspace (slower).
qemu-kvm: virtio_pci_set_host_notifier_internal: unable to map ioeventfd: -28
qemu-kvm: virtio_pci_start_ioeventfd: failed. Fallback to a userspace (slower).
qemu-kvm: virtio_pci_set_host_notifier_internal: unable to map ioeventfd: -28
qemu-kvm: virtio_pci_start_ioeventfd: failed. Fallback to a userspace (slower).

Comment 22 Vlad Yasevich 2015-03-26 12:24:49 UTC
Please be sure that you are testing with kernel-2.6.32-545.el6.  There is a dependency on Bug 1124311 which should be fixed in the -545.el6 kernels.

-vlad

Comment 23 mazhang 2015-03-27 01:55:13 UTC
(In reply to Vlad Yasevich from comment #22)
> Please be sure that you are testing with kernel-2.6.32-545.el6.  There is a
> dependency on Bug 1124311 which should be fixed in the -545.el6 kernels.
> 
> -vlad

kernel-2.6.32-546.el6 also hit "unable to map ioeventfd: -28".

Comment 24 Fam Zheng 2015-03-27 02:02:23 UTC
> without dataplane, qemu-kvm hmp got a lot of following messeags, but not
> quit.
> 
> qemu-kvm: virtio_pci_start_ioeventfd: failed. Fallback to a userspace
> (slower).

Yes, this is expected.

Comment 25 mazhang 2015-03-27 02:25:37 UTC
(In reply to Fam Zheng from comment #24)
> > without dataplane, qemu-kvm hmp got a lot of following messeags, but not
> > quit.
> > 
> > qemu-kvm: virtio_pci_start_ioeventfd: failed. Fallback to a userspace
> > (slower).
> 
> Yes, this is expected.

As Vlad said in comment#22 , kernel-2.6.32-545.el6 fiexed "unable to map ioeventfd: -28", but still hit the problem in my test, is this mean Bug 1124311 not completely fix ?

Comment 26 Amos Kong 2015-03-27 03:22:10 UTC
(In reply to mazhang from comment #25)
> (In reply to Fam Zheng from comment #24)
> > > without dataplane, qemu-kvm hmp got a lot of following messeags, but not
> > > quit.
> > > 
> > > qemu-kvm: virtio_pci_start_ioeventfd: failed. Fallback to a userspace
> > > (slower).
> > 
> > Yes, this is expected.
> 
> As Vlad said in comment#22 , kernel-2.6.32-545.el6 fiexed "unable to map
> ioeventfd: -28", but still hit the problem in my test, is this mean Bug
> 1124311 not completely fix ?

Yes, bz 1124311 wasn't fixed correctly. The fix might cause host kernel panic.
I will revert it in bz 1205442.

Please makre sure you used kernel-2.6.32-545.el6 or kernel-2.6.32-546.el6.

"unable to map ioeventfd: -28" means host kernel doesn't have enought io_bus.
Right now rhel6 kernel still didn't have (kernel) unlimited ioeventfds as rhel7.
I will not fix it unless someone really need it.

You can compile qemu-kvm-rhel6 on rhel7 or fedora, then host has enough ioeventfd. For verifing fd 1024 issue, using many ioeventfd is a way, but not the only way.

Comment 27 Amos Kong 2015-03-27 04:06:46 UTC
> You can compile qemu-kvm-rhel6 on rhel7 or fedora, then host has enough
> ioeventfd. For verifing fd 1024 issue, using many ioeventfd is a way, but
> not the only way.

I found a easy way to reproduce this bug, just opening a lot of chardev

ulimit -n 1200
/usr/libexec/qemu-kvm \
-chardev file,path=/dev/null,id=a1  \
-chardev file,path=/dev/null,id=a2  \
-chardev file,path=/dev/null,id=a3  \
-chardev file,path=/dev/null,id=a4  \
....
-chardev file,path=/dev/null,id=a1023  \
-chardev file,path=/dev/null,id=a1024  \
-chardev file,path=/dev/null,id=a1025  \
-chardev file,path=/dev/null,id=a1026  \
-chardev file,path=/dev/null,id=a1027  \
-chardev file,path=/dev/null,id=a1028  \

Comment 28 Amos Kong 2015-03-27 04:12:25 UTC
BTW:

qemu-kvm-0.12.1.2-2.462.el6  : can't reproduce
qemu-kvm-0.12.1.2-2.459.el6  : can reproduce (main_loop_wait: Assertion `ioh->fd < 1024' failed.)

The bug had been fixed.

Comment 29 mazhang 2015-03-27 07:17:21 UTC
(In reply to Amos Kong from comment #27)
> > You can compile qemu-kvm-rhel6 on rhel7 or fedora, then host has enough
> > ioeventfd. For verifing fd 1024 issue, using many ioeventfd is a way, but
> > not the only way.
> 
> I found a easy way to reproduce this bug, just opening a lot of chardev
> 
> ulimit -n 1200
> /usr/libexec/qemu-kvm \
> -chardev file,path=/dev/null,id=a1  \
> -chardev file,path=/dev/null,id=a2  \
> -chardev file,path=/dev/null,id=a3  \
> -chardev file,path=/dev/null,id=a4  \
> ....
> -chardev file,path=/dev/null,id=a1023  \
> -chardev file,path=/dev/null,id=a1024  \
> -chardev file,path=/dev/null,id=a1025  \
> -chardev file,path=/dev/null,id=a1026  \
> -chardev file,path=/dev/null,id=a1027  \
> -chardev file,path=/dev/null,id=a1028  \

Reproduce this bug as above with qemu-kvm-0.12.1.2-2.458.el6.

[root@dhcp-11-16 ~]# sh aaa.sh 
VNC server running on `::1:5900'
qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/vl.c:4042: main_loop_wait: Assertion `ioh->fd < 1024' failed.
aaa.sh: line 1030: 25524 Aborted                 (core dumped) /usr/libexec/qemu-kvm -chardev file,path=/dev/null,id=a0 

Verfiy this bug with qemu-kvm-0.12.1.2-2.462.el6.

Result:
qemu-kvm not core dump any more.

This bug has been fixed, and thanks for amos's help!

Thanks,
Mazhang.

Comment 31 errata-xmlrpc 2015-07-22 06:09:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1275.html


Note You need to log in before you can comment on or make changes to this bug.