Bug 216243

Summary: xm create fails after 8 VM's are running
Product: Red Hat Enterprise Linux 5 Reporter: George Toft <george>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: ddomingo
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-26 19:04:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 197865    

Description George Toft 2006-11-17 22:43:38 UTC
Description of problem:
xm create fails after 8 VM's are running (or 4 VM's if the VM uses a CDROM 
loopback device).  Initially the limit was 4 VM's.  By 
editing /etc/xen/<config_file> and changing this line:
disk = [ 'file:/usr/local/xen/windows2003-
new.dsk,hda,w', 'file:/usr/local/xen/ISO/windows2003.iso,hdc:cdrom,r' ]
to:
disk = [ 'file:/usr/local/xen/windows2003-new.dsk,hda,w' ]

we were able to double the number of VM's.  We tried increasing the number of 
loopback devices (MAKEDEV -m 128 loop), which created more loopback devices, 
but still failed.  In fact, we could not create more than 2 VM's after that 
change.  Rebooting and not creating the loopback devices restored our 8 VM 
limit.


Version-Release number of selected component (if applicable):
# uname -a
Linux xenhvip 2.6.18-1.2732.el5xen #1 SMP Tue Oct 17 18:34:31 EDT 2006 x86_64 
x86_64 x86_64 GNU/Linux
# rpm -qa | grep xen
kernel-xen-2.6.18-1.2732.el5
xen-libs-3.0.3-2.el5
xen-libs-3.0.3-2.el5
xen-3.0.3-2.el5


How reproducible:
# xm create win2k3xen9
Using config file "/etc/xen/win2k3xen9".
Error: Device 768 (vbd) could not be connected. Backend device not found.
#


Steps to Reproduce:
1. Create 8 VM's (or 4 with 2 drives mapped to files)
2. issue xm create command (see above)

  
Actual results:
# xm create win2k3xen9
Using config file "/etc/xen/win2k3xen9".
Error: Device 768 (vbd) could not be connected. Backend device not found.
#


Expected results:
# xm create win2k3xen9
# xm create win2k3xen10
# xm create win2k3xen11
...

Additional info:

Comment 1 Brian Stein 2006-11-20 15:18:41 UTC
Please confirm the arch (currently labeled ia64 only).

Comment 2 George Toft 2006-11-20 18:18:39 UTC
We are using RHEL5 on Dell 2950 server.  It may NOT be IA64 specific as the 
following losetup commands fail in x86 as well.

Further troubleshooting shows losetup (as found in the xen scripts) fails.  
once all 8 VM's are up, issuing losetup -f yields:
# losetup -f
losetup: could not find any free loop device
# losetup /dev/loop8
loop: can't open device /dev/loop8: No such device or address
[root@xenhvip xen]# ls -l /dev/loop*
brw-r----- 1 root disk 7, 0 Nov 20 10:38 /dev/loop0
brw-r----- 1 root disk 7, 1 Nov 20 10:38 /dev/loop1
brw-r----- 1 root disk 7, 2 Nov 20 10:38 /dev/loop2
brw-r----- 1 root disk 7, 3 Nov 20 10:38 /dev/loop3
brw-r----- 1 root disk 7, 4 Nov 20 10:38 /dev/loop4
brw-r----- 1 root disk 7, 5 Nov 20 10:38 /dev/loop5
brw-r----- 1 root disk 7, 6 Nov 20 10:38 /dev/loop6
brw-r----- 1 root disk 7, 7 Nov 20 10:38 /dev/loop7
brw-r----- 1 root disk 7, 8 Nov 20 10:38 /dev/loop8
brw-r----- 1 root disk 7, 9 Nov 20 10:38 /dev/loop9
#

The device is there, but system does not recognize it.

/etc/udev/makedev.d/50-udev.nodes was modified to include loop8 and loop9 at 
line 16.  /boot/grub/grub.conf was modified to include max_loop=64 and the 
system was rebooted.  Grepping dmesg for "loop" shows the kernel command line 
and a comment that the kernel is limited to a max of 8 devices:
# dmesg | grep loop
Kernel command line: ro root=/dev/VolGroup00/LogVol00 max_loop=64
loop: loaded (max 8 devices)
#

File /etc/modules.conf was created with the following contents:
options loop max_loop=64
and system was rebooted.  Problem still exists.




Comment 3 Daniel Berrangé 2006-11-20 18:44:35 UTC
I have confirmed this behaviour - it appears each virtual disk configured for an
HVM guest uses a single loopback device. 1 disk per guest x 8 guests & you'll
hit the loopback driver limit.

Now, the interesting question is *why* are these loopback devices getting
created at all. the qemu-dm device model for HVM guests is perfectly happy
accessing the raw files directly - it has no need for the loopback device.

Looking at a running guest shows no process actually using the loopdevice

# grep disk /etc/xen/demo 
disk = [ "file:/xen/demo.img,hda,w", "file:/root/boot.iso,hdc:cdrom,r" ]
[root@localhost ~]# ps -axuwf | grep loop
root     18631  0.0  0.0      0     0 ?        S<   13:32   0:00 [loop0]
root     18673  0.0  0.0      0     0 ?        S<   13:32   0:00 [loop1]
[root@localhost ~]# lsof /dev/loop0 
[root@localhost ~]# lsof /dev/loop1
[root@localhost ~]# lsof /root/boot.iso 
COMMAND   PID USER   FD   TYPE DEVICE    SIZE    NODE NAME
qemu-dm 18551 root    6u   REG  253,0 6711296 1933665 /root/boot.iso
[root@localhost ~]# lsof /xen/demo.img 
COMMAND   PID USER   FD   TYPE DEVICE       SIZE    NODE NAME
qemu-dm 18551 root    5u   REG  253,0 4294967297 1277954 /xen/demo.img



Comment 4 George Toft 2006-11-20 19:42:08 UTC
Here is output on target machine showing all 8 loopback devices allocated:

# ps -ef | egrep "loop|xen"
root        21    19  0 10:38 ?        00:00:00 [xenwatch]
root        22    19  0 10:38 ?        00:00:00 [xenbus]
avahi     2696     1  0 10:39 ?        00:00:00 avahi-daemon: running 
[xenhvip.local]
root      2963     1  0 10:39 ?        00:00:00 xenstored --pid-
file /var/run/xenstore.pid
root      2967     1  0 10:39 ?        00:00:00 python /usr/sbin/xend start
root      2969  2967  0 10:39 ?        00:00:02 python /usr/sbin/xend start
root      2971     1  0 10:39 ?        00:00:00 xenconsoled
root      3679  2969  0 10:40 ?        00:00:09 /usr/lib64/xen/bin/qemu-dm -d 
1 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen9 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:44,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      3861     1  0 10:40 ?        00:00:00 [loop0]
root      3913  2969  0 10:40 ?        00:00:08 /usr/lib64/xen/bin/qemu-dm -d 
2 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen8 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:42,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      4038     1  0 10:40 ?        00:00:00 [loop1]
root      4059  2969  0 10:40 ?        00:00:10 /usr/lib64/xen/bin/qemu-dm -d 
3 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen7 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:38,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      4194     1  0 10:40 ?        00:00:00 [loop2]
root      4273  2969  0 10:41 ?        00:00:11 /usr/lib64/xen/bin/qemu-dm -d 
4 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen6 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:36,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      4399     1  0 10:41 ?        00:00:00 [loop3]
root      4426  2969  0 10:41 ?        00:00:11 /usr/lib64/xen/bin/qemu-dm -d 
5 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen5 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:36,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      4554     1  0 10:41 ?        00:00:00 [loop4]
root      4593  2969  0 10:41 ?        00:00:11 /usr/lib64/xen/bin/qemu-dm -d 
6 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen4 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:1d,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      4742     1  0 10:41 ?        00:00:00 [loop5]
root      4768  2969  0 10:41 ?        00:00:09 /usr/lib64/xen/bin/qemu-dm -d 
7 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen3 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:32,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      4923     1  0 10:41 ?        00:00:00 [loop6]
root      4948  2969  0 10:41 ?        00:00:09 /usr/lib64/xen/bin/qemu-dm -d 
8 -m 500 -boot c -serial pty -vcpus 1 -acpi -domain-name win2k3xen2 -net 
nic,vlan=1,macaddr=00:16:3e:4b:f8:30,model=rtl8139 -net 
tap,vlan=1,bridge=xenbr0 -vncunused -k en-us -vnclisten 0.0.0.0
root      5110     1  0 10:41 ?        00:00:00 [loop7]
root      6547  4207  0 12:41 pts/5    00:00:00 egrep loop|xen
#


# lsof | grep loop
loop0     3861      root  cwd       DIR              253,0       
4096                    2 /
loop0     3861      root  rtd       DIR              253,0       
4096                    2 /
loop0     3861      root  txt   
unknown                                                    /proc/3861/exe
loop1     4038      root  cwd       DIR              253,0       
4096                    2 /
loop1     4038      root  rtd       DIR              253,0       
4096                    2 /
loop1     4038      root  txt   
unknown                                                    /proc/4038/exe
loop2     4194      root  cwd       DIR              253,0       
4096                    2 /
loop2     4194      root  rtd       DIR              253,0       
4096                    2 /
loop2     4194      root  txt   
unknown                                                    /proc/4194/exe
loop3     4399      root  cwd       DIR              253,0       
4096                    2 /
loop3     4399      root  rtd       DIR              253,0       
4096                    2 /
loop3     4399      root  txt   
unknown                                                    /proc/4399/exe
loop4     4554      root  cwd       DIR              253,0       
4096                    2 /
loop4     4554      root  rtd       DIR              253,0       
4096                    2 /
loop4     4554      root  txt   
unknown                                                    /proc/4554/exe
loop5     4742      root  cwd       DIR              253,0       
4096                    2 /
loop5     4742      root  rtd       DIR              253,0       
4096                    2 /
loop5     4742      root  txt   
unknown                                                    /proc/4742/exe
loop6     4923      root  cwd       DIR              253,0       
4096                    2 /
loop6     4923      root  rtd       DIR              253,0       
4096                    2 /
loop6     4923      root  txt   
unknown                                                    /proc/4923/exe
loop7     5110      root  cwd       DIR              253,0       
4096                    2 /
loop7     5110      root  rtd       DIR              253,0       
4096                    2 /
loop7     5110      root  txt   
unknown                                                    /proc/5110/exe
#


Comment 5 Daniel Berrangé 2006-11-22 00:56:17 UTC
It has thus far proved to be impractical to stop HVM guests using loop devies,
thus we need to make sure max_loop is working as advertised

Wrt to the comment:

> File /etc/modules.conf was created with the following contents:
> options loop max_loop=64
> and system was rebooted.  Problem still exists.

I cannot reproduce this behaviour. I added 'max_loop=256' to modprobe.conf,
rebooted & it configured 256 loop devices without problems:

# grep loop /etc/modprobe.conf
options loop max_loop=256
# dmesg | grep loop
loop: loaded (max 256 devices)
# ls /dev/loop* | wc -l
256

Please try just adding the modprobe.conf setting, without changing udev, or grub
configs.


Comment 6 Daniel Berrangé 2006-11-22 16:24:32 UTC
Re-reading it appears you used the wrong config file for specifying the loop
device parameters.  'modules.conf' is obsolete & hasn't been used for many years
now (for compatability it was a symlink to modprobe.conf for a while too, but
even that's gone now).   Please re-test with  'max_loop=256' in modprobe.conf
instead.


Comment 7 George Toft 2006-11-22 16:56:50 UTC
As requested . . .
50-udev.nodes restored to original setting
grub.conf restored to original setting
modprobe.conf line added
modules.conf deleted.
(Yes, I used the wrong config file - sorry)


Last login: Mon Nov 20 14:51:10 2006 from 192.168.111.1
[root@rhel5 ~]# grep loop /etc/modprobe.conf
options loop max_loop=256
[root@rhel5 ~]# dmesg | grep loop
[root@rhel5 ~]# ls /dev/loop* | wc -l
8
[root@rhel5 ~]# modprobe loop
[root@rhel5 ~]# ls /dev/loop* | wc -l
256
[root@rhel5 ~]#

This command now works (it did not previously):
[root@rhel5 ~]# for I in `seq 0 255`; do  dd if=/dev/zero of=/tmp/file$I bs=1k 
count=10; losetup /dev/loop$I /tmp/file$I; done


Using lsof to validate:
[root@rhel5 ~]# lsof | grep loop | wc -l
768
[root@rhel5 ~]#


Rebooted and ran following commands:
[root@rhel5 ~]# lsof | grep loop | wc -l
0
[root@rhel5 ~]# for I in `seq 0 255`; do  dd if=/dev/zero of=/tmp/file$I bs=1k 
count=10; losetup /dev/loop$I /tmp/file$I; done
[root@rhel5 ~]# lsof | grep loop | wc -l
768
[root@rhel5 ~]#


Problem resolved.  Thank you very much.

George



Comment 8 Daniel Berrangé 2006-11-22 17:03:20 UTC
Ok, great.  So sounds like we just need to add documentation about adding
'max_loop=64' (or larger) to modprobe.conf if you want more than 8 file backed
disks for HVM guests.


Comment 11 RHEL Program Management 2006-11-28 02:46:00 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 12 RHEL Program Management 2006-11-28 02:46:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 13 George Toft 2006-11-30 22:52:21 UTC
Considering our initial error in editing /etc/modules.conf, would it make sense 
to create that file with contents similar to this:

# NOTE *** NOTE *** NOTE *** NOTE
#
# Use of modules.conf is obsolete.  THE CONTENTS OF THIS FILE IS IGNORED.
# Please edit modprobe.conf to pass parameters to modules.
# For more information, please view the modprobe.conf manpage.
#
# NOTE *** NOTE *** NOTE *** NOTE