RHEVM seems to not like the <services> block of the deployable.xml. If that block is enabled, the rhevm instance would not launch (it tries indefinitely) and conductor would report a "pending" state. The same deployable.xml works in ec2. The same deployable.xml minus the <services> would enable rhevm instance to launch. The best guess right now is that rhevm is not liking the userdata in the instance, but that could change as investigation continues. [root@dell-t7400-01 ~]# rpm -qa | egrep "imagefactory|iwhd|deltacloud|aeolus" | sort aeolus-all-0.8.0-38.el6.noarch aeolus-conductor-0.8.0-38.el6.noarch aeolus-conductor-daemons-0.8.0-38.el6.noarch aeolus-conductor-doc-0.8.0-38.el6.noarch aeolus-configure-2.5.0-15.el6.noarch deltacloud-core-0.5.0-5.el6.noarch deltacloud-core-ec2-0.5.0-5.el6.noarch deltacloud-core-rhevm-0.5.0-5.el6.noarch deltacloud-core-vsphere-0.5.0-5.el6.noarch imagefactory-1.0.0rc8-1.el6.noarch imagefactory-jeosconf-ec2-fedora-1.0.0rc8-1.el6.noarch imagefactory-jeosconf-ec2-rhel-1.0.0rc8-1.el6.noarch iwhd-1.2-3.el6.x86_64 rubygem-aeolus-cli-0.3.0-11.el6.noarch rubygem-aeolus-image-0.3.0-10.el6.noarch rubygem-deltacloud-client-0.5.0-2.el6.noarch rubygem-imagefactory-console-0.4.0-1.el6.noarch
Created attachment 566630 [details] deployable.xml
Created attachment 566631 [details] rhel_template
I was able to successfully launch this deployable with the rdu rhevm cluster that dradez built. However, we realized that the vdsm-hook-floppyinject RPM was old (didn't contain the base64 decode code). After updating the floppyinject hook on the hypervisors and restarting rhevm, the deployment failed the same way as dgao is reporting in this bug.
We redirected all output of the floppyinject hook to a log file and captured this output: shahar: /bin/mount -o loop,uid=36,gid=36 /tmp/deltacloud-user-data.txt /tmp/tmpWsCn2Y floppyinject: error /bin/mount: mount: could not find any free loop device floppyinject: [unexpected error]: Traceback (most recent call last): File "/usr/libexec/vdsm/hooks/before_vm_start/50_floppyinject", line 138, in <module> createFloppy(filename, path, content) File "/usr/libexec/vdsm/hooks/before_vm_start/50_floppyinject", line 96, in createFloppy sys.exit(2) SystemExit: 2 This is telling us that the hypervisor has eaten through all 8 of its available loopback devices. I.e., by default, you can only launch 8 guests in rhevm that have "user_data" before you hit this problem. One possibility is to bump up the number of loopback devices, but that's only a temporary measure. Ultimately, the floppyinject hook needs to cleanup. It's just unclear how it can know when to cleanup old loopbacks.
The hook should only need a loopback dev while it's assembling the image; it should unmount (and make sure it does that under all error conditions) as soon as the image has been built.
I agree with David. Audrey should eject the floppy once the user_data are consumed inside the guest. It will not solve the problem (you will still not be able to launch more than 8 instance at once) but it should make this temporary VDSM workaround we're using more clever. Also I would suggest to update the floppyhook in way, where it unmount the lo device once the instance is powered off (if this is not already there) Additionally we can make this more error-prone increasing number of loopback devices in hypervisor: /etc/modprobe.conf: options loop max_loop=64 (or kernel param) and: for i in $(seq 0 255); do mknod -m0660 /dev/loop$i b 7 $i chown root.disk /dev/loop$i done That should make us safe for 64 instances running in parallel.
git hash: 5f0fe23ea7e41f7306e19a77f4eaa9ffb9761f90 git repo: https://github.com/aeolusproject/vdsm-hook-floppyinject
rhel5: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4122953 rhel6: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4122980
Marking this as verified since I was able to successfully launch 10+ rhevm instances w/ userdata