| Summary: | Image push to vSphere fails if all virtual networks are Distributed Virtual Switches | ||
|---|---|---|---|
| Product: | [Retired] CloudForms Cloud Engine | Reporter: | Javier Peña <javier.pena> |
| Component: | imagefactory | Assignee: | Ian McLeod <imcleod> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Kočí <mkoci> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 1.0.0 | CC: | akarol, amoralej, dajohnso, deltacloud-maint, dgao, mkoci, pep, ssachdev, whayutin |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
adding ce-sprint-next bugs to ce-sprint tracker for this release Good gravy. I'm not sure what to think about this one, but it seems likely that this is a configuration we can't support. Unless I'm misunderstanding what the semantics is of their virtual switches. Assigning to Ian for confirm/deny on that point. This configuration is becoming very popular on VMware shops since vSphere 4.0 came out, so not supporting it could be a big issue on the field. I see the current upstream psphere package does provide support for the web services SDK 4.0+ (http://packages.python.org/psphere/intro.html#notes). Would it be possible to upgrade our package and see if we can make it work? So, we support VSphere version 4.0 when it is present. The very act of uploading an image file via an HTTP lease requires it. The network name should not be an empty string, it should be specified in the config file (or the passed-in configuration JSON) as described in the VMWare section here: https://www.aeolusproject.org/redmine/projects/image-factory/wiki/Documentation In my testing there has always been a default network with the name "VM Network". Please attempt to set the name value to something valid for your vsphere environment and re-push? Apologies. I mis-read a bit of the above report and got the impression the network name was not being set. Switching back to ON_DEV and re-evaluating. OK. It is not immediately clear to me where in the vsphere image hierarchy I should be looking for these distributed virtual switches. Based on Javier's debugging so far, I suspect it will be a mod at the portion of the code that obtains the initial ComputeResource handle. Need more time to investigate and, ideally, access to a suitable test setup. Javier, can you provide? Agreed.. I need more information on the test setup to proceed w/ a recreate here. Specifically I need a pointer to how to configure vsphere as the reporter has it set up. The vSphere configuration only had one special feature: all vSwitches were Distributed Virtual Switches, versus the normal switches defined per-host that were available before vSphere 4.0. So the "VM Network" default vswitch did not exist. Unfortunately I found this at the customer's site, and I'm afraid I will not be there during the next few months. If we have an internal vSphere environment for testing, I'd be happy to dedicate some time to replicate the environment internally. I have some vSphere knowledge from my past working experience, so I think I could be of help with it. adding ce-sprint-next bugs to ce-sprint recreated the failure.. but w/ a different error I think..
2012-01-19 15:08:00,249 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: Builder (40cc559d-4dc3-43c2-98ee-205c26776f29) changed percent complete from 0 to 0
2012-01-19 15:08:00,251 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Image file /var/lib/imagefactory/images/vmware-image-ed16267c-0a8a-4001-bd58-d438815f335e.vmdk already present - skipping warehouse download
2012-01-19 15:08:00,869 DEBUG paste.httpserver.ThreadPool pid(23970) Message: Added task (0 tasks queued)
2012-01-19 15:08:17,416 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Exception caught in ImageFactory
2012-01-19 15:08:17,476 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 245, in push_image_upload
credentials)
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 232, in vmware_push_image_upload
guest_id='otherLinux64Guest', imagefilename=input_image)
File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 110, in import_vm
nic_spec = self.create_nic(target, nic)
File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 200, in create_nic
networks = self.vim.get_views(mo_refs=target.network, properties=['name'])
File "/usr/lib/python2.6/site-packages/psphere/vim25.py", line 132, in get_views
view = eval(str(object_content.obj._type))(mo_ref=object_content.obj,
File "<string>", line 1, in <module>
NameError: name 'DistributedVirtualPortgroup' is not defined
2012-01-19 15:08:17,476 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: Builder (40cc559d-4dc3-43c2-98ee-205c26776f29) changed status from PUSHING to FAILED
2012-01-19 15:08:17,476 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: 40cc559d-4dc3-43c2-98ee-205c26776f29 for vsphere about to exit None queue...
2012-01-19 15:08:17,476 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Exception caught in ImageFactory
2012-01-19 15:08:17,476 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 203, in push_image
self.push_image_upload(target_image_id, provider, credentials)
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 245, in push_image_upload
credentials)
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 232, in vmware_push_image_upload
guest_id='otherLinux64Guest', imagefilename=input_image)
File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 110, in import_vm
nic_spec = self.create_nic(target, nic)
File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 200, in create_nic
networks = self.vim.get_views(mo_refs=target.network, properties=['name'])
File "/usr/lib/python2.6/site-packages/psphere/vim25.py", line 132, in get_views
view = eval(str(object_content.obj._type))(mo_ref=object_content.obj,
File "<string>", line 1, in <module>
NameError: name 'DistributedVirtualPortgroup' is not defined
2012-01-19 15:08:17,477 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: Builder (40cc559d-4dc3-43c2-98ee-205c26776f29) changed status from FAILED to FAILED
vsphere_deltacloud_provider: 10.16.3434.34 vsphere_username: Administrator vsphere_password: asdf! vsphere_datastore: datastore1 vsphere_network_name: "dvSwitch" I have reproduced the error reported by Wes in Comment 12. The python-psphere-0.1-4 build in brew addresses this. I was unable to reliably reproduce the error from the original bug text. That error occurs very early in the vpshere push process and essentially indicates that the python SOAP bindings cannot find the root object in the SOAP hierarchy. This seems to be unrelated to the issue of different virtual switch types. (I was only able to reproduce it by testing the VMWare.py module through the python interpreter and artificially inducing a timeout in the suds connection by waiting several minutes before executing a query. This cannot happen in the normal flow of execution within the Factory.) Switching to ON_QA able to build/push in my vsphere env w/ distributed switches will paste a log shortly [root@qeblade30 ~]# rpm -qa | grep imagefactory imagefactory-jeosconf-ec2-rhel-1.0.0rc3-1.el6.noarch rubygem-imagefactory-console-0.4.0-1.el6.noarch imagefactory-1.0.0rc3-1.el6.noarch imagefactory-jeosconf-ec2-fedora-1.0.0rc3-1.el6.noarch |
Description of problem: When pushing an image to vSphere, it fails with the following log message: 2011-11-18 09:06:18,126 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(2504) Message: Exception caught in ImageFactory 2011-11-18 09:06:18,127 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(2504) Message: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 192, in push_image self.push_image_upload(target_image_id, provider, credentials) File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 234, in push_image_upload credentials) File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 221, in vmware_push_image_upload guest_id='otherLinux64Guest', imagefilename=input_image) File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 109, in import_vm nic_spec = self.create_nic(target, nic) File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 185, in create_nic networks = self.vim.get_views(mo_refs=target.network, properties=['name']) File "/usr/lib/python2.6/site-packages/psphere/vim25.py", line 108, in get_views property_spec.type = str(mo_refs[0]._type) IndexError: list index out of range Version-Release number of selected component (if applicable): aeolus-configserver-proxy-0.4.0-3.el6.noarch aeolus-configure-2.3.0-0.20111028220920gitf01b051.el6.noarch imagefactory-jeosconf-ec2-fedora-0.8.0-1.el6.noarch rubygem-imagefactory-console-0.5.0-4.20110824113238gitd9debef.el6.noarch rubygem-aeolus-cli-0.1.0-3.20111028152758git7063136.el6.noarch imagefactory-jeosconf-ec2-rhel-0.8.0-1.el6.noarch rubygem-rack-mount-0.7.1-3.aeolus.el6.noarch rubygem-aeolus-image-0.1.0-4.20111024205454git6b2b696.el6.noarch rubygem-arel-2.0.10-0.aeolus.el6.noarch aeolus-conductor-doc-0.6.0-0.20111029030732git7410602.el6.noarch imagefactory-0.8.0-1.el6.noarch aeolus-conductor-0.6.0-0.20111029030732git7410602.el6.noarch aeolus-conductor-daemons-0.6.0-0.20111029030732git7410602.el6.noarch aeolus-all-0.6.0-0.20111029030732git7410602.el6.noarch rubygem-ZenTest-4.3.3-2.aeolus.el6.noarch aeolus-configserver-0.4.0-3.el6.noarch python-psphere-0.1-3.noarch How reproducible: Always, if the vSphere environment has no legacy vSwitches and only Distributed Virtual Switches. Steps to Reproduce: 1. Have a vSphere environment has no legacy vSwitches and only Distributed Virtual Switches. 2. aeolus-image build ... 3. aeolus-image push 4. Check imagefactory.log for the log messages Actual results: Failed image push. Expected results: Image imported successfully. Additional info: Doing some source code investigation, I see that in the import_vm function at /usr/lib/python2.6/site-packages/imgfac/VMWare.py, after the following piece of code: target = self.vim.find_entity_view(view_type='ComputeResource') target.update_view_data(['name', 'datastore', 'network', 'parent', 'resourcePool']) target.network is an empty array. This is causing get_views in the psphere library to fail. I see that the psphere version in use only supports the VMware Infrastructure SDK 2.5, while distributed virtual switches were added in vSphere 4.0. As a summary, ImageFactory should properly support vSphere 4.0 and later by providing support for Distributed Virtual Switches.