Description of problem: I'm trying to deploy additional node to self hosted engine and I'm getting this error: [ ERROR ] Failed to execute stage 'Setup validation': cannot marshal None unless allow_none is enabled Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch ovirt-host-deploy-1.3.2-1.el6ev.noarch vdsm-4.16.32-1.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. On freshly installed RHEL6 install ovirt-hosted-engine-setup 2. Run hosted-engine --deploy 3. Enter the NFS share for RHEVM, scp the answer file from 1st node 4. At the end enter password for admin@internal Actual results: Setup fails with: [ ERROR ] Failed to execute stage 'Setup validation': cannot marshal None unless allow_none is enabled Expected results: Additional node is added to hosted-engine Additional info: In the generated ovirt-hosted-engine-setup-XXXXX.log file I see: 2016-02-13 14:15:59 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py", line 270, in _validation imgVolUUID[1], File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.6/xmlrpclib.py", line 1483, in __request allow_none=self.__allow_none) File "/usr/lib64/python2.6/xmlrpclib.py", line 1132, in dumps data = m.dumps(params) File "/usr/lib64/python2.6/xmlrpclib.py", line 677, in dumps dump(v, write) File "/usr/lib64/python2.6/xmlrpclib.py", line 699, in __dump f(self, value, write) File "/usr/lib64/python2.6/xmlrpclib.py", line 703, in dump_nil raise TypeError, "cannot marshal None unless allow_none is enabled" TypeError: cannot marshal None unless allow_none is enabled In vdsm.log I see: Thread-37::DEBUG::2016-02-13 14:15:59,192::task::993::Storage.TaskManager.Task::(_decref) Task=`57da5167-6c6f-438a-b19c-af2a3b332d9d`::ref 1 aborting False Thread-37::ERROR::2016-02-13 14:15:59,192::task::866::Storage.TaskManager.Task::(_setError) Task=`57da5167-6c6f-438a-b19c-af2a3b332d9d`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 621, in spmStop pool.stopSpm() File "/usr/share/vdsm/storage/securable.py", line 75, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state As suggested by a colleague I re-generated /etc/vdsm/vdsm.id (it was missing) with: dmidecode -s system-uuid > /etc/vdsm/vdsm.id but it didn't helped.
Looks like an issue on vdsm side, related to system-uuid
Sandro, why do you think its an issue with the host uuid? - MainProcess|Thread-14::DEBUG::2016-02-13 14:13:25,644::supervdsmServer::109::SuperVdsm.ServerCallback::(wrapper) return getHardwareInfo with {'systemProductName': 'IBM eServer BladeCenter HS21 -[7995G3U]-', 'systemUUID': 'c0e9acde-1782-b601-4c39-00215e233eb4', 'systemSerialNumber': '99K8813', 'systemManufacturer': 'IBM'} dmidecode works alright. from the description I don't understand if something is missing on the host and if its reproducible - not always we use /etc/vdsm/vdsm.id, and if its not exist it shouldn't block the adding flow. from the log seems like something suspicious happened with the spm Thread-37::ERROR::2016-02-13 14:15:59,192::task::866::Storage.TaskManager.Task::(_setError) Task=`57da5167-6c6f-438a-b19c-af2a3b332d9d`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 621, in spmStop pool.stopSpm() File "/usr/share/vdsm/storage/securable.py", line 75, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state we need reproduction and fuller log if possible
After checking again logs and source code we (thanks Adam) found that validateStorageDomain always returns None We have this block of code: # prepareImage to populate /var/run/vdsm/storage for imgVolUUID in [ [ self.environment[ohostedcons.StorageEnv.IMG_UUID], self.environment[ohostedcons.StorageEnv.VOL_UUID] ], [ self.environment[ohostedcons.StorageEnv.METADATA_IMAGE_UUID], self.environment[ohostedcons.StorageEnv.METADATA_VOLUME_UUID] ], [ self.environment[ohostedcons.StorageEnv.LOCKSPACE_IMAGE_UUID], self.environment[ohostedcons.StorageEnv.LOCKSPACE_VOLUME_UUID] ], ]: self.cli.prepareImage( self.environment[ ohostedcons.StorageEnv.SP_UUID ], self.environment[ ohostedcons.StorageEnv.SD_UUID ], imgVolUUID[0], imgVolUUID[1], ) And at least one of those UUIDs (I cannot tell which) is None which is resulting in an invalid prepareImage call. From the vdsm log I can see that exactly one prepareImage succeeded: Thread-36::INFO::2016-02-22 08:47:13,471::logUtils::44::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='88a8314c-8174-4c88-a931-eb8d577300f1', spUUID='0a824c8e-3582-4925-b67f-c6a78a864aeb', imgUUID='6ca64c01-e758-4256-96ae-bfc5c4e8bfcb', leafUUID='0551c43d-ecb9-4ce9-b37c-9b1f4089533e') This probably means that ohostedcons.StorageEnv.METADATA_IMAGE_UUID and ohostedcons.StorageEnv.METADATA_VOLUME_UUID are not being handled properly.
The issue was that some values was missing in the answerfile it downloaded from the first host ( robocop01.rhev.stage.mwc.hst.phx2.redhat.com ) 2016-02-22 08:46:26 INFO otopi.plugins.ovirt_hosted_engine_setup.core.remote_answerfile remote_answerfile._fetch_answer_file:180 Answer file successfully downloaded ... 2016-02-22 08:46:26 DEBUG otopi.plugins.ovirt_hosted_engine_setup.core.remote_answerfile remote_answerfile._parse_answer_file:189 OVEHOSTED_STORAGE/lockspaceImageUUID=None 2016-02-22 08:46:26 DEBUG otopi.plugins.ovirt_hosted_engine_setup.core.remote_answerfile remote_answerfile._parse_answer_file:189 OVEHOSTED_STORAGE/iSCSILunId=None 2016-02-22 08:46:26 DEBUG otopi.plugins.ovirt_hosted_engine_setup.core.remote_answerfile remote_answerfile._parse_answer_file:189 OVEHOSTED_STORAGE/metadataImageUUID=None 2016-02-22 08:46:26 DEBUG otopi.plugins.ovirt_hosted_engine_setup.core.remote_answerfile remote_answerfile._parse_answer_file:189 OVEHOSTED_STORAGE/imgAlias=None Vladimir, could you please check what do you have in /etc/ovirt-hosted-engine/hosted-engine.conf on your first host and attach here the hosted-engine-setup logs from that host?
Based on that, moving back to integration.
hosted-engine.conf and answers.conf attached I see they're empty in hosted-engine.conf: metadata_volume_UUID=None metadata_image_UUID=None lockspace_volume_UUID=None lockspace_image_UUID=None And for some reason I don't have them set at all in production, only staging. Do I need to put correct values for them or just remove altogether?
Correct value for sure; you can use vdsClient to scan that storage domain and get that values. Fell free to ask help. Now the question is just why they are empty in that answerfile. Can you please attach the logs from the latest execution of hosted-engine-setup on robocop01 ?
Setup logs from robocop01 attached. I see that this system was setup in beginning of 2015 and hosted-engine at that time didn't had these entries in the answer files or in hosted-engine.conf. Just compared with production and it's the same case. Hypervisors which were installed 1 year ago don't have entries for the mentioned uuids in both conf files and the new hypervisors which were added after updating hosted-engine have them as None. If I re-deploy brand new cluster I will not have this problem. Looks like this happens only when you copy the answer file generated by older version of hosted-engine. How to find the correct uuids? When I run: vdsClient -s 0 getStorageDomainInfo for the rhevm domain I'm getting only uuid (and no vguuid or lockspace). Or I need just the uuid part from getStorageDomainInfo ?
No, they were already there the issue is just that ovirt-hosted-engine-setup didn't sucesfully completed on robocop01 when you deployed it on 2015-05-18 2015-05-18 09:26:35 INFO otopi.plugins.ovirt_hosted_engine_setup.core.answerfile answerfile._save_answers:52 Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150518092635.conf' 2015-05-18 09:26:35 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/core/answerfile.py", line 128, in _save_answers_at_cleanup self._save_answers(name) File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/core/answerfile.py", line 73, in _save_answers if self.environment[ohostedcons.CoreEnv.NODE_SETUP]: KeyError: 'OVEHOSTED_CORE/nodeSetup' 2015-05-18 09:26:35 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Clean up': 'OVEHOSTED_CORE/nodeSetup' Adding an additional host if the first wasn't correctly deployed it isn't a supported path. You need to run getVolumesList to get the list of all the volumes and then getVolumeInfo for each of them till you identify all the uuidof lockspace and metadata images and volumes. Depending from when hosted-engine-setup halted when you deployed on robocop01, that volumes can miss at all. Keeping the system if hosted-engine-setup failed wasn't a good idea.
Reopening after better understanding the issue. That values weren't present in the answerfile at 3.4 time. Upgrading existing hosts from 3.4 to 3.5 and from there to 3.6 is not an issue (excluding the case of 3.5 on el6 -> 3.5 on el7 -> 3.6 on el7 since there we ask to the user to redeploy the host with 3.5 after reinstalling it with el7). The issue is just if we want to add an additional 3.5 host to a system that was initially deployed with 3.4 (so NFS only since iSCSI was introduced only with 3.5). Probably the easiest solution is to document how to manually gather that values adding them to the answerfile.
Issue: metadata and lockspace volumes were not in use on 3.4 hosted-engine host and the 3.4 -> 3.5 upgrade doesn't touch the existing answerfile. While deploying an additional 3.5 host (or redeploying an existing host with el7 instead of el6) requires a value != None (an empty string does the job!) Workaround: manually edit /etc/ovirt-hosted-engine/answers.conf on the host where you are going to download the answerfile from and add: OVEHOSTED_STORAGE/metadataImageUUID=str: OVEHOSTED_STORAGE/metadataVolumeUUID=str: OVEHOSTED_STORAGE/lockspaceImageUUID=str: OVEHOSTED_STORAGE/lockspaceVolumeUUID=str: Please ensure to avoid any white space after 'str:'. Deploy the additional 3.5 host as usual ensuring to use that answerfile.
(In reply to Simone Tiraboschi from comment #27) > Issue: > metadata and lockspace volumes were not in use on 3.4 hosted-engine host and > the 3.4 -> 3.5 upgrade doesn't touch the existing answerfile. > While deploying an additional 3.5 host (or redeploying an existing host with > el7 instead of el6) requires a value != None (an empty string does the job!) > > Workaround: > manually edit /etc/ovirt-hosted-engine/answers.conf on the host where you > are going to download the answerfile from and add: > > OVEHOSTED_STORAGE/metadataImageUUID=str: > OVEHOSTED_STORAGE/metadataVolumeUUID=str: > OVEHOSTED_STORAGE/lockspaceImageUUID=str: > OVEHOSTED_STORAGE/lockspaceVolumeUUID=str: > > Please ensure to avoid any white space after 'str:'. > > Deploy the additional 3.5 host as usual ensuring to use that answerfile. Works well for me, I did as follows: 1)Reprovisioned one of my hosts to el7.2/3.5. 2)Added repositories of 3.5. 3)yum install ovirt-hosted-engine-setup -y. 4)yum update -y 5)vi /etc/ovirt-hosted-engine/answers.conf 6)Added: OVEHOSTED_STORAGE/metadataImageUUID=str: OVEHOSTED_STORAGE/metadataVolumeUUID=str: OVEHOSTED_STORAGE/lockspaceImageUUID=str: OVEHOSTED_STORAGE/lockspaceVolumeUUID=str: 7)Avoided any white space after 'str:'. 8)Saved the file. 9)Ran hosted-engine --deploy. 10)Redeployed the host, while taken the answerfile from another host.
Worked the same way also on Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160219.0.el7ev).