Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1445155

Summary: Deployment fails due to ironic conductor ironic.drivers.modules.agent_base_vendor Stderr: Error: The location 40962 is outside of the device /dev/sda.
Product: Red Hat OpenStack Reporter: Gurenko Alex <agurenko>
Component: openstack-ironicAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED NOTABUG QA Contact: mlammon
Severity: high Docs Contact:
Priority: unspecified    
Version: 11.0 (Ocata)CC: agurenko, augol, dtrainor, ebarrera, mburns, racedoro, rhel-osp-director-maint, sasha, srevivo, ukalifon
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-26 09:02:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gurenko Alex 2017-04-25 06:51:39 UTC
Description of problem: when deploying various setups HA or non-HA


Version-Release number of selected component (if applicable):

build 2017-04-20.2

How reproducible:

100% on particular virt hosts

Steps to Reproduce:
1. perform deployment with InfraRed v1 or via OSPd UI
2. 
3.

Actual results:

overcloud deployment fails with no valid host found error

Expected results:

deployment completes successfully

Additional info:

After some investigation 3 virt hosts has same errors in ironic-conductor.log:

2017-04-24 16:44:44.307 24927 ERROR ironic.drivers.modules.deploy_utils [req-083cb76e-fcda-4a9a-9ca1-6ea59c463fe9 - - - - -] StdErr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:44:44.715 24927 ERROR ironic.drivers.modules.iscsi_deploy Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:44:47.285 24927 ERROR ironic.drivers.modules.agent_base_vendor Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:46:01.323 24927 ERROR ironic.drivers.modules.deploy_utils [req-e70005d9-51db-4db9-b33d-51ffdad8d5a4 - - - - -] StdErr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:46:01.733 24927 ERROR ironic.drivers.modules.iscsi_deploy Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:46:04.366 24927 ERROR ironic.drivers.modules.agent_base_vendor Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:47:10.731 24927 ERROR ironic.drivers.modules.deploy_utils [req-fe3f96a2-5c39-456b-aa20-dc82a01ad468 - - - - -] StdErr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:47:11.170 24927 ERROR ironic.drivers.modules.iscsi_deploy Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'
2017-04-24 16:47:13.790 24927 ERROR ironic.drivers.modules.agent_base_vendor Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'

Initial assumption was that total size of virtual disks is bigger than virt host total free space, but latest attempt showed that it's not correct.

Some environments are still available if some logs are required

Comment 1 Dmitry Tantsur 2017-04-25 12:44:05 UTC
Hi! I suspect the VM disk size is not enough for deployment. Make sure the real disk size is greater than one in flavor. Note, greater, not greater-or-equal: we have encountered certain weirdness around parted, so we always leave 1 GiB of padding.

Please let me know if it helps.

Comment 2 Gurenko Alex 2017-04-25 12:59:46 UTC
(In reply to Dmitry Tantsur from comment #1)
> Hi! I suspect the VM disk size is not enough for deployment. Make sure the
> real disk size is greater than one in flavor. Note, greater, not
> greater-or-equal: we have encountered certain weirdness around parted, so we
> always leave 1 GiB of padding.
> 
> Please let me know if it helps.

Here are the disks and flavors:

[root@seal05 images]# ll -h
total 26G
-rw-r--r--. 1 root root  21G Apr 24 01:10 ceph-0-disk1.qcow2
-rw-r--r--. 1 root root  41G Apr 23 23:51 ceph-0-disk2.qcow2
-rw-r--r--. 1 root root  41G Apr 24 16:08 compute-0-disk1.qcow2
-rw-r--r--. 1 root root  41G Apr 24 16:08 compute-1-disk1.qcow2
-rw-r--r--. 1 root root  41G Apr 24 01:11 compute-2-disk1.qcow2
-rw-r--r--. 1 root root  41G Apr 24 16:08 controller-0-disk1.qcow2
-rw-r--r--. 1 root root  41G Apr 24 16:08 controller-1-disk1.qcow2
-rw-r--r--. 1 root root  41G Apr 24 01:10 controller-2-disk1.qcow2
-rw-r--r--. 1 qemu qemu  71G Apr 25 15:58 ironic-0-disk1.qcow2

[stack@ironic-0 ~]$ openstack flavor list
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
| ID                                   | Name          |  RAM | Disk | Ephemeral | VCPUs | Is Public |
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
| 35dd5e11-c64f-45df-baa2-10839b0456dd | block-storage | 4096 |   40 |         0 |     1 | True      |
| 38fa3cee-89a4-4a8d-b783-89ff314ac8c7 | baremetal     | 4096 |   40 |         0 |     1 | True      |
| 89ac1996-c480-4b26-9acb-0fd685054c86 | compute       | 4096 |   40 |         0 |     1 | True      |
| ad00f54c-ee66-403f-a029-ee797ae64d6d | ceph-storage  | 4096 |   40 |         0 |     1 | True      |
| b5568260-2b37-4ee6-93c4-e9c9f719ea6b | control       | 4096 |   40 |         0 |     1 | True      |
| e39d9d2c-b89d-42a2-816d-c4eb1cff95c0 | swift-storage | 4096 |   40 |         0 |     1 | True      |
+--------------------------------------+---------------+------+------+-----------+-------+-----------+

I would say it's +1 Gb to the flavor

Comment 3 Dmitry Tantsur 2017-04-25 13:09:19 UTC
Do you set root device hints for ironic nodes? It has to be done every time when you have several disks on a node, see http://tripleo.org/advanced_deployment/root_device.html#root-device for details. You may get hit by this. I suspect Ironic may pick the smaller disk for deployment, and fail. Could you please try it?

Comment 4 Gurenko Alex 2017-04-25 14:27:38 UTC
(In reply to Dmitry Tantsur from comment #3)
> Do you set root device hints for ironic nodes? It has to be done every time
> when you have several disks on a node, see
> http://tripleo.org/advanced_deployment/root_device.html#root-device for
> details. You may get hit by this. I suspect Ironic may pick the smaller disk
> for deployment, and fail. Could you please try it?

I would assume you're referring to the ceph nodes, which, when I look at it now, make sense to fail since 21G disk is used as primary, but I'm trying to do 1 compute + 1 controller deployment on this virtual host with similar result, I would guess since there is only 1 virtual disk root device property should not change anything, am I correct?

Comment 6 Dan Trainor 2017-04-25 15:41:17 UTC
I've done a considerable number of deployments on OVB with systems that each provide 41GB as the root disk (to echo Dmitry's comment #1 with +1GB) and have seen no issues between that and the default flavor size requirements.  As Alex points out, if any given node's root disk does not meet these requirements, a failure may happen.

Comment 7 Lucas Alvares Gomes 2017-04-25 15:49:23 UTC
Hi Gurenko,

(In reply to Gurenko Alex from comment #2)
> (In reply to Dmitry Tantsur from comment #1)
> > Hi! I suspect the VM disk size is not enough for deployment. Make sure the
> > real disk size is greater than one in flavor. Note, greater, not
> > greater-or-equal: we have encountered certain weirdness around parted, so we
> > always leave 1 GiB of padding.
> > 
> > Please let me know if it helps.
> 
> Here are the disks and flavors:
> 
> [root@seal05 images]# ll -h
> total 26G
> -rw-r--r--. 1 root root  21G Apr 24 01:10 ceph-0-disk1.qcow2
> -rw-r--r--. 1 root root  41G Apr 23 23:51 ceph-0-disk2.qcow2
> -rw-r--r--. 1 root root  41G Apr 24 16:08 compute-0-disk1.qcow2
> -rw-r--r--. 1 root root  41G Apr 24 16:08 compute-1-disk1.qcow2
> -rw-r--r--. 1 root root  41G Apr 24 01:11 compute-2-disk1.qcow2
> -rw-r--r--. 1 root root  41G Apr 24 16:08 controller-0-disk1.qcow2
> -rw-r--r--. 1 root root  41G Apr 24 16:08 controller-1-disk1.qcow2
> -rw-r--r--. 1 root root  41G Apr 24 01:10 controller-2-disk1.qcow2
> -rw-r--r--. 1 qemu qemu  71G Apr 25 15:58 ironic-0-disk1.qcow2
> 
> [stack@ironic-0 ~]$ openstack flavor list
> +--------------------------------------+---------------+------+------+-------
> ----+-------+-----------+
> | ID                                   | Name          |  RAM | Disk |
> Ephemeral | VCPUs | Is Public |
> +--------------------------------------+---------------+------+------+-------
> ----+-------+-----------+
> | 35dd5e11-c64f-45df-baa2-10839b0456dd | block-storage | 4096 |   40 |      
> 0 |     1 | True      |
> | 38fa3cee-89a4-4a8d-b783-89ff314ac8c7 | baremetal     | 4096 |   40 |      
> 0 |     1 | True      |
> | 89ac1996-c480-4b26-9acb-0fd685054c86 | compute       | 4096 |   40 |      
> 0 |     1 | True      |
> | ad00f54c-ee66-403f-a029-ee797ae64d6d | ceph-storage  | 4096 |   40 |      
> 0 |     1 | True      |
> | b5568260-2b37-4ee6-93c4-e9c9f719ea6b | control       | 4096 |   40 |      
> 0 |     1 | True      |
> | e39d9d2c-b89d-42a2-816d-c4eb1cff95c0 | swift-storage | 4096 |   40 |      
> 0 |     1 | True      |
> +--------------------------------------+---------------+------+------+-------
> ----+-------+-----------+
> 
> I would say it's +1 Gb to the flavor

One thing to keep in mind is that Ironic uses the sizes as kibibytes, e.g MiB and GiB instead of MB and GB [0] so, in that case the 40 GiB = ~42 GB.

Could you update the flavor for, I don't know 35 GB and see if that works out for you ? So that we can narrow the problem down. 

[0] https://github.com/openstack/ironic-lib/blob/87c196d670e5cebe355ac74a7ea7ba319cbfc4bb/ironic_lib/disk_partitioner.py#L78

Thanks,
Lucas

Comment 8 Dmitry Tantsur 2017-04-25 16:03:55 UTC
> I'm trying to do 1 compute + 1 controller deployment on this virtual host with similar result

Depending on how you configure your deployment (--compute-flavor, etc), these instances can still be scheduled on the small nodes. Are you sure it's not the case?

Comment 9 Dan Trainor 2017-04-25 16:32:34 UTC
I popped in to the systems in question and found that introspection shows root disks which are not sufficient as specified by the assigned node flavors.  I found the following disk data from examining the nodes' ironic data (note the third key):

[stack@ironic-0 ~]$ for NODE in `openstack baremetal node list -fvalue -cName`; do openstack baremetal node show ${NODE} | grep properties; done
| properties             | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', u'cpus': u'2', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                   |
| properties             | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                  |
| properties             | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                  |
| properties             | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                  |
| properties             | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                  |
| properties             | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                  |
| properties             | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'}                                                  |
[stack@ironic-0 ~]$ 

All of the flavors assigned to these nodes specify a larger disk size requirement than what the node provides:

[stack@ironic-0 ~]$ openstack flavor list
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
| ID                                   | Name          |  RAM | Disk | Ephemeral | VCPUs | Is Public |
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
| 06dddb8e-a1a6-4b9e-aad3-12747db41b46 | swift-storage | 4096 |   40 |         0 |     1 | True      |
| 52822c2b-03bf-42c3-bbc5-9ccf73a144c3 | block-storage | 4096 |   40 |         0 |     1 | True      |
| b6dddc62-21cb-4819-9b15-ca1191a4df56 | control       | 4096 |   40 |         0 |     1 | True      |
| cbcd308d-d2ab-4eb4-8421-cc1f9361caab | ceph-storage  | 4096 |   40 |         0 |     1 | True      |
| cefb6698-4a04-4dd0-a60c-4456cde76621 | baremetal     | 4096 |   40 |         0 |     1 | True      |
| e9b2374b-2891-4c4d-b383-ccd53e15aec9 | compute       | 4096 |   40 |         0 |     1 | True      |
+--------------------------------------+---------------+------+------+-----------+-------+-----------+

Comment 10 Ramon Acedo 2017-04-25 16:58:31 UTC
Thanks Dan, I think your assessment is right and consistent with the error. If your flavor says 40 GiB and local_gb shows the node's disk is 39 the partition end is outside the device.

Would changing the flavours to something less than 39 work?

Comment 11 Alexander Chuzhoy 2017-04-25 17:50:29 UTC
I changed the baremetal flavor to 20G and was able to deploy overcloud with 1 controller +1 compute successfully on a setup that previously showed:
"2017-04-24 16:47:13.790 24927 ERROR ironic.drivers.modules.agent_base_vendor Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'"


[stack@ironic-0 ~]$ openstack flavor show baremetal
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| access_project_ids         | None                                 |
| disk                       | 20                                   |
| id                         | 4ed7bbef-8b1f-4a67-8e5b-8f482c26d220 |
| name                       | baremetal                            |
| os-flavor-access:is_public | True                                 |
| properties                 | capabilities:boot_option='local'     |
| ram                        | 4096                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 1                                    |
+----------------------------+--------------------------------------+



[stack@ironic-0 ~]$ for i in `ironic node-list|awk '/power/ {print $2}'`; do ironic node-show $i|grep -A2 properties; done

| properties             | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19',      |
|                        | u'cpus': u'2', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}                          |

| properties             | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39',     |
|                        | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}                          |

| properties             | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39',     |
|                        | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}                          |

| properties             | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39',     |
|                        | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}                          |

| properties             | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39',     |
|                        | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}                          |

| properties             | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39',     |
|                        | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}                          |

| properties             | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39',     |
|                        | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
|                        | _1g:true,cpu_hugepages:true,boot_option:local'}

Comment 13 Dmitry Tantsur 2017-04-26 09:02:29 UTC
Thanks everyone. Looks like the problem was indeed caused by missing root device hints, so I'm closing it. To avoid such problems in the future, I've filed 2 RFEs for the UI team: bug 1445650 to support setting root device hints and bug 1445662 for validations to check that they are set. I hope that will help avoid such issues in the future. Please let me know if we can do more than that.