Bug 892977 - qemu crashed when using macvtap with fd number is over 1024( already ulimit "open files" to 10240)
qemu crashed when using macvtap with fd number is over 1024( already ulimit "...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
medium Severity high
: rc
: ---
Assigned To: Amos Kong
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-08 06:02 EST by Qian Guo
Modified: 2015-05-24 20:07 EDT (History)
12 users (show)

See Also:
Fixed In Version: qemu-kvm-1.5.0-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 05:32:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
qemu crashes when ulimit "fd" to 10240 (38.87 KB, text/plain)
2013-01-08 06:35 EST, Qian Guo
no flags Details

  None (edit)
Description Qian Guo 2013-01-08 06:02:25 EST
Description of problem:
Can not start a qemu-kvm, when the cli using "... -netdev tap,fd= ..."with fd>=1024, it prompt this

-bash: 1024: Bad file descriptor

Version-Release number of selected component (if applicable):
# uname -r
3.6.0-0.29.el7.x86_64
#rpm -q qemu-kvm
qemu-kvm-1.3.0-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.create macvtap devices, until fetch /dev/tap1024.

I create 5000 macvtap devices, with the script:
#!/bin/sh
for i in $(seq 5000)
do 
ip link add link em1 name vepa$i type macvtap mode vepa
echo $i
done

And under /dev/ there're corresponding tap devices.

One macvtap device(fd=1024) list below 
# ip -d link show vepa1020
1024: vepa1020@em1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500
    link/ether 52:cd:8d:ce:64:35 brd ff:ff:ff:ff:ff:ff
    macvtap  mode vepa 



2.start a qemu-kvm with macvtap device, cli like this
#...-netdev tap,id=macvtap_netdev,fd=1024 -device virtio-net-pci,netdev=macvtap_netdev,mac=52:cd:8d:ce:64:35 1024<>/dev/tap1024 ...

  
Actual results:
could not start this qemu process, and after step2, returns this:

-bash: 1024: Bad file descriptor
 

Expected results:
can launch guest with tap when fd >=1024

Additional info:

1.If I delete all the macvtap (and corresponding tap devices),then recreate one macvtap, and verify its fd number is still greater than 1024, will hit same issue. so it is not related with the quantity of the tap devices.

2.when fd<1024, guest can run well .


   BTW,is there quantity boundary values of macvtaps/taps per physical nic and per host?
Comment 2 Qian Guo 2013-01-08 06:34:19 EST
For there's limitation in systems like this
# ulimit -a|grep open
open files                      (-n) 1024

so I modified this limitation like this :

# ulimit -n 10240 

And then try to launch a guest with a macvtap witch fd=1032, the qemu crashed.
like this:

# /usr/libexec/qemu-kvm -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1 -enable-kvm  -name testovs -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/rhel6u4_64.qcow2,if=none,format=qcow2,werror=stop,rerror=stop,cache=none,media=disk,id=drive-scsi0-disk0 -device virtio-scsi-pci,id=scsi0,addr=0x4 -device scsi-hd,scsi-id=0,lun=0,bus=scsi0.0,drive=drive-scsi0-disk0,id=virtio-disk0,bootindex=1 -nodefaults -nodefconfig -monitor stdio   -netdev tap,id=macvtap_netdev,fd=1032 -device virtio-net-pci,netdev=macvtap_netdev,mac=da:da:d3:7d:60:55 1032<>/dev/tap1032 -vnc :10 -vga std -boot menu=on
*** buffer overflow detected ***: /usr/libexec/qemu-kvm terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fb2c15374a7]
/lib64/libc.so.6(+0x3805b08620)[0x7fb2c1535620]
/lib64/libc.so.6(+0x3805b0a417)[0x7fb2c1537417]
/usr/libexec/qemu-kvm(+0x19b0dd)[0x7fb2c6afe0dd]
/usr/libexec/qemu-kvm(+0x1a9238)[0x7fb2c6b0c238]
/usr/libexec/qemu-kvm(main+0x1029)[0x7fb2c69e1379]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb2c144ea05]
/usr/libexec/qemu-kvm(+0x827a9)[0x7fb2c69e57a9]
======= Memory map: ========
7fb22a818000-7fb22a82d000 r-xp 00000000 fd:01 1049991                    /usr/lib64/libgcc_s-4.7.2-20121109.so.1
7fb22a82d000-7fb22aa2c000 ---p 00015000 fd:01 1049991                    /usr/lib64/libgcc_s-4.7.2-20121109.so.1

.......


7fff74e33000-7fff74e54000 rw-p 00000000 00:00 0                          [stack]
7fff74ebc000-7fff74ebd000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted (core dumped)



I will attach  this crash message as a txt file.
Comment 3 Qian Guo 2013-01-08 06:35:33 EST
Created attachment 674733 [details]
qemu crashes when ulimit "fd" to 10240
Comment 4 Qian Guo 2013-01-09 05:33:07 EST
Qemu does not have direct support for macvtap, so we the tun/tap configuration interface.

And

Fd range is not related with the character device(when create macvtap network interfaces, corresponding created these char devices) :/dev/tapN( with N corresponding to the number of network interface index of the new macvtap endpoint)


***
If boot a guest like this, no issur occurs

<qemu-kvm>... -device virtio-net-pci,netdev=macvtap_netdev,mac=da:da:d3:7d:60:59  -netdev tap,id=macvtap_netdev,fd=10 10<>/dev/tap5010 ...

***

So this bug is just one related with "file descriptor", so I edit the bug summary to 

"qemu crashed when using macvtap with fd number is over 1024( already ulimit "open files" to 10240)"
Comment 5 Amos Kong 2013-01-24 08:24:57 EST
Create _one_ macvtap device

# ip link add link eth0 name vepa1 type macvtap mode vepa
# ip -d link show vepa1
134: vepa1@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 500
    link/ether f2:24:ee:69:4e:ec brd ff:ff:ff:ff:ff:ff
    macvtap  mode vepa 

# ls /dev/tap*
/dev/tap134  /dev/tap7
# file /dev/tap*
/dev/tap134: character special
/dev/tap7:   empty

# qemu-kvm -device virtio-net-pci,netdev=macvtap_netdev,mac=f2:24:ee:69:4e:ec -netdev tap,id=macvtap_netdev,fd=1023 1023<>/dev/tap134 -vnc :0
(fine)

# qemu-kvm -device virtio-net-pci,netdev=macvtap_netdev,mac=f2:24:ee:69:4e:ec -netdev tap,id=macvtap_netdev,fd=1024 1024<>/dev/tap134 -vnc :0
bash: 1024: Bad file descriptor

# ulimit -a
open files                      (-n) 1024
max user processes              (-u) 1024

# ulimit -n 1025
# ulimit -u 1025
# qemu-kvm -device virtio-net-pci,netdev=macvtap_netdev,mac=f2:24:ee:69:4e:ec -netdev tap,id=macvtap_netdev,fd=1024 1024<>/dev/tap134 -vnc :0
(fine)

# qemu-kvm -device virtio-net-pci,netdev=macvtap_netdev,mac=f2:24:ee:69:4e:ec -netdev tap,id=macvtap_netdev,fd=1025 1025<>/dev/tap134 -vnc :0
bash: 1025: Bad file descriptor


fd=1024, '1024' the index of fd table. so number you used should smaller than open files limitation (ulimit -n)
Comment 6 Amos Kong 2013-01-24 09:50:52 EST
I tried to reproduce the crash in comment #2:
- fedora 18 : qemu-kvm-1.2.0-23.fc18.x86_64 (can reproduce)
- rhel6: qemu-kvm-0.12.1.2-2.351.el6.x86_64 (couldn't reproduce)
- upstream qemu: (4b274b1603e1d15ef51aedc8b6b7ebbae0b555ce) (could not reproduce)
- rhel7: git://git.app.eng.bos.redhat.com/virt/rhel7/qemu-kvm.git (couldn't reproduce)


>>> rhel7: qemu-kvm-1.2.0-21.el7.x86_64 (officinal?) (can reproduce)

(gdb) bt
#0  0x00007ffff3356ba5 in raise () from /lib64/libc.so.6
#1  0x00007ffff3358358 in abort () from /lib64/libc.so.6
#2  0x00007ffff33963eb in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff342b4a7 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007ffff3429620 in __chk_fail () from /lib64/libc.so.6
#5  0x00007ffff342b417 in __fdelt_warn () from /lib64/libc.so.6
#6  0x000055555564725d in qemu_iohandler_poll (readfds=readfds@entry=0x555556009b60 <rfds>, writefds=writefds@entry=0x555556009be0 <wfds>, xfds=xfds@entry=
    0x555556009c60 <xfds>, ret=ret@entry=2) at iohandler.c:156
#7  0x00005555556ecae8 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:497
#8  0x00005555555cb6e3 in main_loop () at /usr/src/debug/qemu-kvm-1.2.0/vl.c:1643
#9  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /usr/src/debug/qemu-kvm-1.2.0/vl.c:3790

(gdb) frame 6
#6  0x000055555564725d in qemu_iohandler_poll (readfds=readfds@entry=0x555556009b60 <rfds>, writefds=writefds@entry=0x555556009be0 <wfds>, xfds=xfds@entry=
    0x555556009c60 <xfds>, ret=ret@entry=2) at iohandler.c:156
156	            if (!ioh->deleted && ioh->fd_read && FD_ISSET(ioh->fd, readfds)) {

(gdb) l
151	{
152	    if (ret > 0) {
153	        IOHandlerRecord *pioh, *ioh;
154	
155	        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
156	            if (!ioh->deleted && ioh->fd_read && FD_ISSET(ioh->fd, readfds)) {
157	                ioh->fd_read(ioh->opaque);
158	            }
159	            if (!ioh->deleted && ioh->fd_write && FD_ISSET(ioh->fd, writefds)) {
160	                ioh->fd_write(ioh->opaque);
Comment 7 Amos Kong 2013-01-25 03:20:19 EST
problem can be reproduced in qemu-upstream.

# ./configure --target-list='x86_64-softmmu'

QEMU adds the tap fd to a set for synchronous IO, the fd should be less than MAX FD_SETSIZE (1024 for linux platform)

glibc defination of __fdelt_warn:
  http://felix-lang.org/$/usr/include/x86_64-linux-gnu/bits/select2.h

So the solution for this bug is added limitations when init tap device and set fd handler.

Posted a patch: http://marc.info/?l=qemu-devel&m=135910170408260&w=3
Comment 8 Amos Kong 2013-02-01 03:17:06 EST
The crash is due to the fixed size of the fd_set type used for select(2) event polling. Stefan posted a series to convert select() to g_poll().
  http://marc.info/?l=qemu-devel&m=135962966729930&w=3
Comment 9 Amos Kong 2013-02-25 02:47:15 EST
http://marc.info/?l=qemu-devel&m=136135632516801
[PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

Patchset were applied by upstream.
Comment 10 Miroslav Rezanina 2013-05-23 07:58:37 EDT
Build in qemu-kvm-1.5.0-1.el7
Comment 12 Jun Li 2014-01-15 05:46:31 EST
Reproduce this bug:
Version-Release number of selected component (if applicable):
qemu-kvm-1.4.0-4.el7.x86_64
3.10.0-65.el7.x86_64
---
1.create macvtap devices, until fetch /dev/tap1024.

I create 5000 macvtap devices, with the script:
#!/bin/sh
for i in $(seq 5000)
do 
ip link add link p4096p4 name vepa$i type macvtap mode vepa
echo $i
done

And under /dev/ there're corresponding tap devices.

One macvtap device(fd=1029) list below 
# ip -d link show vepa1020
1029: vepa1020@p4096p4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 500
    link/ether ea:ab:13:6f:1e:9f brd ff:ff:ff:ff:ff:ff promiscuity 0 
    macvtap  mode vepa 

2.start a qemu-kvm with macvtap device, cli like this:
<cli>:
# gdb --args /usr/libexec/qemu-kvm -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1 -enable-kvm  -name testovs -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/RHEL-Server-7.0-64.qcow2_v3,if=none,format=qcow2,werror=stop,rerror=stop,cache=none,media=disk,id=drive-scsi0-disk0 -device virtio-scsi-pci,id=scsi0,addr=0x4 -device scsi-hd,scsi-id=0,lun=0,bus=scsi0.0,drive=drive-scsi0-disk0,id=virtio-disk0,bootindex=1 -nodefaults -nodefconfig -monitor stdio   -netdev tap,id=macvtap_netdev,fd=1029 -device virtio-net-pci,netdev=macvtap_netdev,mac=ea:ab:13:6f:1e:9f 1029<>/dev/tap1029 -vnc :10 -vga std -boot menu=on
---
After step 2, qemu-kvm core dump.
(gdb) bt
#0  0x00007ffff3b35979 in raise () from /lib64/libc.so.6
#1  0x00007ffff3b37088 in abort () from /lib64/libc.so.6
#2  0x00007ffff3b76127 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff3c0db07 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007ffff3c0bcd0 in __chk_fail () from /lib64/libc.so.6
#5  0x00007ffff3c0da77 in __fdelt_warn () from /lib64/libc.so.6
#6  0x00005555556a6069 in qemu_iohandler_poll ()
#7  0x00005555556ab62e in main_loop_wait ()
#8  0x00005555555bba6d in main ()
-------
So reproduce this issue.
---------
Verify this bug:

Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-31.el7.x86_64
3.10.0-65.el7.x86_64

steps as "reproduce this bug", after step 2, login guest, guest can got ip.
guest and host work well.
As above show, this bug has been verified.
Comment 14 Ludek Smid 2014-06-13 05:32:01 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.