Bug 754943 - Image push to vSphere fails if all virtual networks are Distributed Virtual Switches
Summary: Image push to vSphere fails if all virtual networks are Distributed Virtual S...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: imagefactory
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
Assignee: Ian McLeod
QA Contact: Martin Kočí
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-18 10:13 UTC by Javier Peña
Modified: 2012-08-30 17:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Javier Peña 2011-11-18 10:13:46 UTC
Description of problem:

When pushing an image to vSphere, it fails with the following log message:

2011-11-18 09:06:18,126 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(2504) Message: Exception caught in ImageFactory
2011-11-18 09:06:18,127 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(2504) Message: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 192, in push_image
    self.push_image_upload(target_image_id, provider, credentials)
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 234, in push_image_upload
    credentials)
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 221, in vmware_push_image_upload
    guest_id='otherLinux64Guest', imagefilename=input_image)
  File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 109, in import_vm
    nic_spec = self.create_nic(target, nic)
  File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 185, in create_nic
    networks = self.vim.get_views(mo_refs=target.network, properties=['name'])
  File "/usr/lib/python2.6/site-packages/psphere/vim25.py", line 108, in get_views
    property_spec.type = str(mo_refs[0]._type)
IndexError: list index out of range


Version-Release number of selected component (if applicable):

aeolus-configserver-proxy-0.4.0-3.el6.noarch
aeolus-configure-2.3.0-0.20111028220920gitf01b051.el6.noarch
imagefactory-jeosconf-ec2-fedora-0.8.0-1.el6.noarch
rubygem-imagefactory-console-0.5.0-4.20110824113238gitd9debef.el6.noarch
rubygem-aeolus-cli-0.1.0-3.20111028152758git7063136.el6.noarch
imagefactory-jeosconf-ec2-rhel-0.8.0-1.el6.noarch
rubygem-rack-mount-0.7.1-3.aeolus.el6.noarch
rubygem-aeolus-image-0.1.0-4.20111024205454git6b2b696.el6.noarch
rubygem-arel-2.0.10-0.aeolus.el6.noarch
aeolus-conductor-doc-0.6.0-0.20111029030732git7410602.el6.noarch
imagefactory-0.8.0-1.el6.noarch
aeolus-conductor-0.6.0-0.20111029030732git7410602.el6.noarch
aeolus-conductor-daemons-0.6.0-0.20111029030732git7410602.el6.noarch
aeolus-all-0.6.0-0.20111029030732git7410602.el6.noarch
rubygem-ZenTest-4.3.3-2.aeolus.el6.noarch
aeolus-configserver-0.4.0-3.el6.noarch
python-psphere-0.1-3.noarch

How reproducible:
Always, if the vSphere environment has no legacy vSwitches and only Distributed Virtual Switches.

Steps to Reproduce:
1. Have a vSphere environment has no legacy vSwitches and only Distributed Virtual Switches.
2. aeolus-image build ...
3. aeolus-image push
4. Check imagefactory.log for the log messages
  
Actual results:
Failed image push.

Expected results:

Image imported successfully.

Additional info:

Doing some source code investigation, I see that in the import_vm function at /usr/lib/python2.6/site-packages/imgfac/VMWare.py, after the following piece of code:

            target = self.vim.find_entity_view(view_type='ComputeResource')
            target.update_view_data(['name', 'datastore', 'network', 'parent',
                                     'resourcePool'])

target.network is an empty array. This is causing get_views in the psphere library to fail. I see that the psphere version in use only supports the VMware Infrastructure SDK 2.5, while distributed virtual switches were added in vSphere 4.0.

As a summary, ImageFactory should properly support vSphere 4.0 and later by providing support for Distributed Virtual Switches.

Comment 1 wes hayutin 2011-11-28 01:24:08 UTC
adding ce-sprint-next bugs to ce-sprint tracker for this release

Comment 2 jrd 2011-11-28 17:35:49 UTC
Good gravy.  I'm not sure what to think about this one, but it seems likely that this is a configuration we can't support.  Unless I'm misunderstanding what the semantics is of their virtual switches.

Assigning to Ian for confirm/deny on that point.

Comment 3 Javier Peña 2011-11-28 18:24:07 UTC
This configuration is becoming very popular on VMware shops since vSphere 4.0 came out, so not supporting it could be a big issue on the field. 

I see the current upstream psphere package does provide support for the web services SDK 4.0+ (http://packages.python.org/psphere/intro.html#notes). Would it be possible to upgrade our package and see if we can make it work?

Comment 4 Ian McLeod 2011-11-30 20:27:00 UTC
So, we support VSphere version 4.0 when it is present.  The very act of uploading an image file via an HTTP lease requires it.

The network name should not be an empty string, it should be specified in the config file (or the passed-in configuration JSON) as described in the VMWare section here:

https://www.aeolusproject.org/redmine/projects/image-factory/wiki/Documentation

In my testing there has always been a default network with the name "VM Network".

Please attempt to set the name value to something valid for your vsphere environment and re-push?

Comment 5 Ian McLeod 2011-11-30 20:31:41 UTC
Apologies.  I mis-read a bit of the above report and got the impression the network name was not being set.  Switching back to ON_DEV and re-evaluating.

Comment 6 Ian McLeod 2011-11-30 21:03:54 UTC
OK.  It is not immediately clear to me where in the vsphere image hierarchy I should be looking for these distributed virtual switches.  Based on Javier's debugging so far, I suspect it will be a mod at the portion of the code that obtains the initial ComputeResource handle.

Need more time to investigate and, ideally, access to a suitable test setup.

Javier, can you provide?

Comment 7 wes hayutin 2011-12-01 14:32:14 UTC
Agreed.. I need more information on the test setup to proceed w/ a recreate here.  Specifically I need a pointer to how to configure vsphere as the reporter has it set up.

Comment 8 Javier Peña 2011-12-01 15:10:35 UTC
The vSphere configuration only had one special feature: all vSwitches were Distributed Virtual Switches, versus the normal switches defined per-host that were available before vSphere 4.0. So the "VM Network" default vswitch did not exist.

Unfortunately I found this at the customer's site, and I'm afraid I will not be there during the next few months. If we have an internal vSphere environment for testing, I'd be happy to dedicate some time to replicate the environment internally. I have some vSphere knowledge from my past working experience, so I think I could be of help with it.

Comment 9 wes hayutin 2012-01-03 17:42:19 UTC
adding ce-sprint-next bugs to ce-sprint

Comment 12 wes hayutin 2012-01-19 20:10:49 UTC
recreated the failure.. but w/ a different error I think..


2012-01-19 15:08:00,249 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: Builder (40cc559d-4dc3-43c2-98ee-205c26776f29) changed percent complete from 0 to 0
2012-01-19 15:08:00,251 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Image file /var/lib/imagefactory/images/vmware-image-ed16267c-0a8a-4001-bd58-d438815f335e.vmdk already present - skipping warehouse download
2012-01-19 15:08:00,869 DEBUG paste.httpserver.ThreadPool pid(23970) Message: Added task (0 tasks queued)
2012-01-19 15:08:17,416 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Exception caught in ImageFactory
2012-01-19 15:08:17,476 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 245, in push_image_upload
    credentials)
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 232, in vmware_push_image_upload
    guest_id='otherLinux64Guest', imagefilename=input_image)
  File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 110, in import_vm
    nic_spec = self.create_nic(target, nic)
  File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 200, in create_nic
    networks = self.vim.get_views(mo_refs=target.network, properties=['name'])
  File "/usr/lib/python2.6/site-packages/psphere/vim25.py", line 132, in get_views
    view = eval(str(object_content.obj._type))(mo_ref=object_content.obj,
  File "<string>", line 1, in <module>
NameError: name 'DistributedVirtualPortgroup' is not defined

2012-01-19 15:08:17,476 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: Builder (40cc559d-4dc3-43c2-98ee-205c26776f29) changed status from PUSHING to FAILED
2012-01-19 15:08:17,476 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: 40cc559d-4dc3-43c2-98ee-205c26776f29 for vsphere about to exit None queue...
2012-01-19 15:08:17,476 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Exception caught in ImageFactory
2012-01-19 15:08:17,476 DEBUG imgfac.builders.BaseBuilder.RHEL6_vsphere_Builder pid(23970) Message: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 203, in push_image
    self.push_image_upload(target_image_id, provider, credentials)
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 245, in push_image_upload
    credentials)
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_vsphere_Builder.py", line 232, in vmware_push_image_upload
    guest_id='otherLinux64Guest', imagefilename=input_image)
  File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 110, in import_vm
    nic_spec = self.create_nic(target, nic)
  File "/usr/lib/python2.6/site-packages/imgfac/VMWare.py", line 200, in create_nic
    networks = self.vim.get_views(mo_refs=target.network, properties=['name'])
  File "/usr/lib/python2.6/site-packages/psphere/vim25.py", line 132, in get_views
    view = eval(str(object_content.obj._type))(mo_ref=object_content.obj,
  File "<string>", line 1, in <module>
NameError: name 'DistributedVirtualPortgroup' is not defined

2012-01-19 15:08:17,477 DEBUG imgfac.BuildJob.BuildJob pid(23970) Message: Builder (40cc559d-4dc3-43c2-98ee-205c26776f29) changed status from FAILED to FAILED

Comment 13 wes hayutin 2012-01-19 20:12:21 UTC
  vsphere_deltacloud_provider: 10.16.3434.34
  vsphere_username: Administrator
  vsphere_password: asdf!
  vsphere_datastore: datastore1
  vsphere_network_name: "dvSwitch"

Comment 14 Ian McLeod 2012-01-20 20:55:21 UTC
I have reproduced the error reported by Wes in Comment 12.  The python-psphere-0.1-4 build in brew addresses this.

I was unable to reliably reproduce the error from the original bug text.  That error occurs very early in the vpshere push process and essentially indicates that the python SOAP bindings cannot find the root object in the SOAP hierarchy.  This seems to be unrelated to the issue of different virtual switch types.  

(I was only able to reproduce it by testing the VMWare.py module through the python interpreter and artificially inducing a timeout in the suds connection by waiting several minutes before executing a query.  This cannot happen in the normal flow of execution within the Factory.)

Switching to ON_QA

Comment 15 wes hayutin 2012-01-31 13:50:30 UTC
able to build/push in my vsphere env w/ distributed switches
will paste a log shortly

[root@qeblade30 ~]# rpm -qa | grep imagefactory
imagefactory-jeosconf-ec2-rhel-1.0.0rc3-1.el6.noarch
rubygem-imagefactory-console-0.4.0-1.el6.noarch
imagefactory-1.0.0rc3-1.el6.noarch
imagefactory-jeosconf-ec2-fedora-1.0.0rc3-1.el6.noarch


Note You need to log in before you can comment on or make changes to this bug.