1418343 – OCP deployment fails uploading an image from RHV engine.

Bug 1418343 - OCP deployment fails uploading an image from RHV engine.

Summary: OCP deployment fails uploading an image from RHV engine.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Quickstart Cloud Installer
Classification:	Red Hat
Component:	Installation - OpenShift
Sub Component:
Version:	1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	1.1
Assignee:	Fabian von Feilitzsch
QA Contact:	James Olin Oden
Docs Contact:	Derek
URL:
Whiteboard:
Depends On:	1411491
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-01 15:30 UTC by James Olin Oden
Modified:	2017-02-28 01:45 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-02-28 01:45:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:0335	0	normal	SHIPPED_LIVE	Red Hat Quickstart Installer 1.1	2017-02-28 06:36:13 UTC

Description James Olin Oden 2017-02-01 15:30:16 UTC

Description of problem:
I was doing a RHV(self-hosted, 1H) + OCP(2 nodes) deployment.  It got through the RHV deployment and then died at 10% progress of the OCP deployment.   In the ansible log it had the following error:

    2017-01-31 16:25:19,493 p=24145 u=foreman |  
    failed: [blaise-rhv-engine.b.b] (item={
            '_ansible_parsed': True,
            '_ansible_no_log': False,
            u'ansible_job_id': u'659194948587.8620',
            u'started': 1,
            '_ansible_item_result': True,
            'item': [{
                    u'nic': {
                            u'boot_protocol': u'dhcp',
                            u'mac': u'72:b0:a4:04:3b:fb'
                    },
                    u'memory': u'8GiB',
                    u'cpus': 2,
                    u'name': u'blaise-ocp-master1.b.b'
            }, {
                    u'bootable': u'True',
                    u'size': 30,
                    u'image_path': u'/usr/share/rhel-guest-image-7/rhel-guest-image-7.3-32.x86_64.qcow2',
                    u'name': u'blaise-ocp-master1.b.b-disk1'
            }],
            u'finished': 0,
            u'results_file': u'/root/.ansible_async/659194948587.8620'
    }) => {
            "ansible_job_id": "659194948587.8620",
            "attempts": 1,
            "changed": true,
            "cmd": [
                    "/etc/qci/scripts/upload_image",
                    "--url=https://blaise-rhv-engine.b.b/ovirt-engine/api",
                    "--username=admin@internal",
                    "--password=changeme",
                    "--disk-name=blaise-ocp-master1.b.b-disk1",
                    "--disk-size=30",
                    "--image=/usr/share/rhel-guest-image-7/rhel-guest-image-7.3-32.x86_64.qcow2",
                    "--storage-domain=my_storage"
            ],
            "delta": "0:00:00.360122",
            "end": "2017-01-31 16:25:14.344962",
            "failed": true,
            "finished": 1,
            "item": {
                    "ansible_job_id": "659194948587.8620",
                    "finished": 0,
                    "item": [{
                            "cpus": 2,
                            "memory": "8GiB",
                            "name": "blaise-ocp-master1.b.b",
                            "nic": {
                                    "boot_protocol": "dhcp",
                                    "mac": "72:b0:a4:04:3b:fb"
                            }
                    }, {
                            "bootable": "True",
                            "image_path": "/usr/share/rhel-guest-image-7/rhel-guest-image-7.3-32.x86_64.qcow2",
                            "name": "blaise-ocp-master1.b.b-disk1", "size": 30}],
                            "results_file": "/root/.ansible_async/659194948587.8620",
                            "started": 1
                    },
                    "rc": 1,
                    "start": "2017-01-31 16:25:13.984840",
                    "stderr": "
                            Traceback (most recent call last):\n
                                File \"/etc/qci/scripts/upload_image\", line 159, in <module>\n
                                main()\n
                              File \"/etc/qci/scripts/upload_image\", line 26, in main\n
                                disk = create_disk(disks_service, args.disk_name, args.disk_size,  args.storage_domain)\n
                              File \"/etc/qci/scripts/upload_image\", line 47, in create_disk\n
                                return get_resource(disks_service, disk_name) or disks_service.add(\n
                              File \"/etc/qci/scripts/upload_image\", line 38, in get_resource\n
                                resource = service.list(search='name={}'.format(name))\n
                              File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 5409, in list\n
                                response = self._connection.send(request)\n
                              File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 276, in send\n
                                return self.__send(request)\n
                              File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 298, in __send\n
                                self._sso_token = self._get_access_token()\n
                              File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 470, in _get_access_token\n
                                sso_response = self._get_sso_response(self._sso_url, post_data)\n
                              File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 508, in _get_sso_response\n
                                return json.loads(body_buf.getvalue().decode('utf-8'))\n
                              File \"/usr/lib64/python2.7/json/__init__.py\", line 338, in loads\n
                                return _default_decoder.decode(s)\n
                              File \"/usr/lib64/python2.7/json/decoder.py\", line 366, in decode\n
                                obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n
                              File \"/usr/lib64/python2.7/json/decoder.py\", line 384, in raw_decode\n
                                raise ValueError(\"No JSON object could be decoded\")\n
                            ValueError: No JSON object could be decoded",
                            "stdout": "Got: username, disk_size, image_path, url, storage_domain, password, disk_name",
                            "stdout_lines": ["Got: username, disk_size, image_path, url, storage_domain, password, disk_name"],
                            "warnings": []}

As would be expected with this error, the none of the OCP VM's had been created yet.

I am in a nested virtualization environment.
 
Version-Release number of selected component (if applicable):
QCI-1.1-RHEL-7-20170130.t.0

How reproducible:
First time I have seen it.

Steps to Reproduce:
1. Do a RHV(self-hosted, 1H) + OCP(2 nodes) deployment

Actual results:
Failed early on in deploying OCP while trying to initially create the 
master node.

Expected results:
No failures.

Comment 1 James Olin Oden 2017-02-01 20:33:54 UTC

I ran another deployment just like this and the error did not occur.   I would have to say the frequency is intermittent then.

Comment 3 Fabian von Feilitzsch 2017-02-03 15:35:35 UTC

I was unable to replicate this, if it happens again attach logs and let me know, I probably need to get into the host and poke around.

Comment 4 Fabian von Feilitzsch 2017-02-03 16:06:57 UTC

As a manual workaround, ssh into each hypervisor and run

    systemctl restart imageio-daemon

then go to the RHV engine UI, select the Disks tab, and remove all disks associated with the OCP deployment.



Then, resume the task and the deployment should continue without issue.

Comment 6 James Olin Oden 2017-02-07 20:16:13 UTC

I haven't seen this since so I am going to mark this as verified.

Compose: QCI-1.1-RHEL-7-20170203.t.0

Comment 8 errata-xmlrpc 2017-02-28 01:45:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335

Note You need to log in before you can comment on or make changes to this bug.