Description of problem: I was doing a RHV(self-hosted, 1H) + OCP(2 nodes) deployment. It got through the RHV deployment and then died at 10% progress of the OCP deployment. In the ansible log it had the following error: 2017-01-31 16:25:19,493 p=24145 u=foreman | failed: [blaise-rhv-engine.b.b] (item={ '_ansible_parsed': True, '_ansible_no_log': False, u'ansible_job_id': u'659194948587.8620', u'started': 1, '_ansible_item_result': True, 'item': [{ u'nic': { u'boot_protocol': u'dhcp', u'mac': u'72:b0:a4:04:3b:fb' }, u'memory': u'8GiB', u'cpus': 2, u'name': u'blaise-ocp-master1.b.b' }, { u'bootable': u'True', u'size': 30, u'image_path': u'/usr/share/rhel-guest-image-7/rhel-guest-image-7.3-32.x86_64.qcow2', u'name': u'blaise-ocp-master1.b.b-disk1' }], u'finished': 0, u'results_file': u'/root/.ansible_async/659194948587.8620' }) => { "ansible_job_id": "659194948587.8620", "attempts": 1, "changed": true, "cmd": [ "/etc/qci/scripts/upload_image", "--url=https://blaise-rhv-engine.b.b/ovirt-engine/api", "--username=admin@internal", "--password=changeme", "--disk-name=blaise-ocp-master1.b.b-disk1", "--disk-size=30", "--image=/usr/share/rhel-guest-image-7/rhel-guest-image-7.3-32.x86_64.qcow2", "--storage-domain=my_storage" ], "delta": "0:00:00.360122", "end": "2017-01-31 16:25:14.344962", "failed": true, "finished": 1, "item": { "ansible_job_id": "659194948587.8620", "finished": 0, "item": [{ "cpus": 2, "memory": "8GiB", "name": "blaise-ocp-master1.b.b", "nic": { "boot_protocol": "dhcp", "mac": "72:b0:a4:04:3b:fb" } }, { "bootable": "True", "image_path": "/usr/share/rhel-guest-image-7/rhel-guest-image-7.3-32.x86_64.qcow2", "name": "blaise-ocp-master1.b.b-disk1", "size": 30}], "results_file": "/root/.ansible_async/659194948587.8620", "started": 1 }, "rc": 1, "start": "2017-01-31 16:25:13.984840", "stderr": " Traceback (most recent call last):\n File \"/etc/qci/scripts/upload_image\", line 159, in <module>\n main()\n File \"/etc/qci/scripts/upload_image\", line 26, in main\n disk = create_disk(disks_service, args.disk_name, args.disk_size, args.storage_domain)\n File \"/etc/qci/scripts/upload_image\", line 47, in create_disk\n return get_resource(disks_service, disk_name) or disks_service.add(\n File \"/etc/qci/scripts/upload_image\", line 38, in get_resource\n resource = service.list(search='name={}'.format(name))\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 5409, in list\n response = self._connection.send(request)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 276, in send\n return self.__send(request)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 298, in __send\n self._sso_token = self._get_access_token()\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 470, in _get_access_token\n sso_response = self._get_sso_response(self._sso_url, post_data)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 508, in _get_sso_response\n return json.loads(body_buf.getvalue().decode('utf-8'))\n File \"/usr/lib64/python2.7/json/__init__.py\", line 338, in loads\n return _default_decoder.decode(s)\n File \"/usr/lib64/python2.7/json/decoder.py\", line 366, in decode\n obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n File \"/usr/lib64/python2.7/json/decoder.py\", line 384, in raw_decode\n raise ValueError(\"No JSON object could be decoded\")\n ValueError: No JSON object could be decoded", "stdout": "Got: username, disk_size, image_path, url, storage_domain, password, disk_name", "stdout_lines": ["Got: username, disk_size, image_path, url, storage_domain, password, disk_name"], "warnings": []} As would be expected with this error, the none of the OCP VM's had been created yet. I am in a nested virtualization environment. Version-Release number of selected component (if applicable): QCI-1.1-RHEL-7-20170130.t.0 How reproducible: First time I have seen it. Steps to Reproduce: 1. Do a RHV(self-hosted, 1H) + OCP(2 nodes) deployment Actual results: Failed early on in deploying OCP while trying to initially create the master node. Expected results: No failures.
I ran another deployment just like this and the error did not occur. I would have to say the frequency is intermittent then.
I was unable to replicate this, if it happens again attach logs and let me know, I probably need to get into the host and poke around.
As a manual workaround, ssh into each hypervisor and run systemctl restart imageio-daemon then go to the RHV engine UI, select the Disks tab, and remove all disks associated with the OCP deployment. Then, resume the task and the deployment should continue without issue.
I haven't seen this since so I am going to mark this as verified. Compose: QCI-1.1-RHEL-7-20170203.t.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0335