Bug 1445155
| Summary: | Deployment fails due to ironic conductor ironic.drivers.modules.agent_base_vendor Stderr: Error: The location 40962 is outside of the device /dev/sda. | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Gurenko Alex <agurenko> |
| Component: | openstack-ironic | Assignee: | Dmitry Tantsur <dtantsur> |
| Status: | CLOSED NOTABUG | QA Contact: | mlammon |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 11.0 (Ocata) | CC: | agurenko, augol, dtrainor, ebarrera, mburns, racedoro, rhel-osp-director-maint, sasha, srevivo, ukalifon |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-26 09:02:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Gurenko Alex
2017-04-25 06:51:39 UTC
Hi! I suspect the VM disk size is not enough for deployment. Make sure the real disk size is greater than one in flavor. Note, greater, not greater-or-equal: we have encountered certain weirdness around parted, so we always leave 1 GiB of padding. Please let me know if it helps. (In reply to Dmitry Tantsur from comment #1) > Hi! I suspect the VM disk size is not enough for deployment. Make sure the > real disk size is greater than one in flavor. Note, greater, not > greater-or-equal: we have encountered certain weirdness around parted, so we > always leave 1 GiB of padding. > > Please let me know if it helps. Here are the disks and flavors: [root@seal05 images]# ll -h total 26G -rw-r--r--. 1 root root 21G Apr 24 01:10 ceph-0-disk1.qcow2 -rw-r--r--. 1 root root 41G Apr 23 23:51 ceph-0-disk2.qcow2 -rw-r--r--. 1 root root 41G Apr 24 16:08 compute-0-disk1.qcow2 -rw-r--r--. 1 root root 41G Apr 24 16:08 compute-1-disk1.qcow2 -rw-r--r--. 1 root root 41G Apr 24 01:11 compute-2-disk1.qcow2 -rw-r--r--. 1 root root 41G Apr 24 16:08 controller-0-disk1.qcow2 -rw-r--r--. 1 root root 41G Apr 24 16:08 controller-1-disk1.qcow2 -rw-r--r--. 1 root root 41G Apr 24 01:10 controller-2-disk1.qcow2 -rw-r--r--. 1 qemu qemu 71G Apr 25 15:58 ironic-0-disk1.qcow2 [stack@ironic-0 ~]$ openstack flavor list +--------------------------------------+---------------+------+------+-----------+-------+-----------+ | ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public | +--------------------------------------+---------------+------+------+-----------+-------+-----------+ | 35dd5e11-c64f-45df-baa2-10839b0456dd | block-storage | 4096 | 40 | 0 | 1 | True | | 38fa3cee-89a4-4a8d-b783-89ff314ac8c7 | baremetal | 4096 | 40 | 0 | 1 | True | | 89ac1996-c480-4b26-9acb-0fd685054c86 | compute | 4096 | 40 | 0 | 1 | True | | ad00f54c-ee66-403f-a029-ee797ae64d6d | ceph-storage | 4096 | 40 | 0 | 1 | True | | b5568260-2b37-4ee6-93c4-e9c9f719ea6b | control | 4096 | 40 | 0 | 1 | True | | e39d9d2c-b89d-42a2-816d-c4eb1cff95c0 | swift-storage | 4096 | 40 | 0 | 1 | True | +--------------------------------------+---------------+------+------+-----------+-------+-----------+ I would say it's +1 Gb to the flavor Do you set root device hints for ironic nodes? It has to be done every time when you have several disks on a node, see http://tripleo.org/advanced_deployment/root_device.html#root-device for details. You may get hit by this. I suspect Ironic may pick the smaller disk for deployment, and fail. Could you please try it? (In reply to Dmitry Tantsur from comment #3) > Do you set root device hints for ironic nodes? It has to be done every time > when you have several disks on a node, see > http://tripleo.org/advanced_deployment/root_device.html#root-device for > details. You may get hit by this. I suspect Ironic may pick the smaller disk > for deployment, and fail. Could you please try it? I would assume you're referring to the ceph nodes, which, when I look at it now, make sense to fail since 21G disk is used as primary, but I'm trying to do 1 compute + 1 controller deployment on this virtual host with similar result, I would guess since there is only 1 virtual disk root device property should not change anything, am I correct? I've done a considerable number of deployments on OVB with systems that each provide 41GB as the root disk (to echo Dmitry's comment #1 with +1GB) and have seen no issues between that and the default flavor size requirements. As Alex points out, if any given node's root disk does not meet these requirements, a failure may happen. Hi Gurenko, (In reply to Gurenko Alex from comment #2) > (In reply to Dmitry Tantsur from comment #1) > > Hi! I suspect the VM disk size is not enough for deployment. Make sure the > > real disk size is greater than one in flavor. Note, greater, not > > greater-or-equal: we have encountered certain weirdness around parted, so we > > always leave 1 GiB of padding. > > > > Please let me know if it helps. > > Here are the disks and flavors: > > [root@seal05 images]# ll -h > total 26G > -rw-r--r--. 1 root root 21G Apr 24 01:10 ceph-0-disk1.qcow2 > -rw-r--r--. 1 root root 41G Apr 23 23:51 ceph-0-disk2.qcow2 > -rw-r--r--. 1 root root 41G Apr 24 16:08 compute-0-disk1.qcow2 > -rw-r--r--. 1 root root 41G Apr 24 16:08 compute-1-disk1.qcow2 > -rw-r--r--. 1 root root 41G Apr 24 01:11 compute-2-disk1.qcow2 > -rw-r--r--. 1 root root 41G Apr 24 16:08 controller-0-disk1.qcow2 > -rw-r--r--. 1 root root 41G Apr 24 16:08 controller-1-disk1.qcow2 > -rw-r--r--. 1 root root 41G Apr 24 01:10 controller-2-disk1.qcow2 > -rw-r--r--. 1 qemu qemu 71G Apr 25 15:58 ironic-0-disk1.qcow2 > > [stack@ironic-0 ~]$ openstack flavor list > +--------------------------------------+---------------+------+------+------- > ----+-------+-----------+ > | ID | Name | RAM | Disk | > Ephemeral | VCPUs | Is Public | > +--------------------------------------+---------------+------+------+------- > ----+-------+-----------+ > | 35dd5e11-c64f-45df-baa2-10839b0456dd | block-storage | 4096 | 40 | > 0 | 1 | True | > | 38fa3cee-89a4-4a8d-b783-89ff314ac8c7 | baremetal | 4096 | 40 | > 0 | 1 | True | > | 89ac1996-c480-4b26-9acb-0fd685054c86 | compute | 4096 | 40 | > 0 | 1 | True | > | ad00f54c-ee66-403f-a029-ee797ae64d6d | ceph-storage | 4096 | 40 | > 0 | 1 | True | > | b5568260-2b37-4ee6-93c4-e9c9f719ea6b | control | 4096 | 40 | > 0 | 1 | True | > | e39d9d2c-b89d-42a2-816d-c4eb1cff95c0 | swift-storage | 4096 | 40 | > 0 | 1 | True | > +--------------------------------------+---------------+------+------+------- > ----+-------+-----------+ > > I would say it's +1 Gb to the flavor One thing to keep in mind is that Ironic uses the sizes as kibibytes, e.g MiB and GiB instead of MB and GB [0] so, in that case the 40 GiB = ~42 GB. Could you update the flavor for, I don't know 35 GB and see if that works out for you ? So that we can narrow the problem down. [0] https://github.com/openstack/ironic-lib/blob/87c196d670e5cebe355ac74a7ea7ba319cbfc4bb/ironic_lib/disk_partitioner.py#L78 Thanks, Lucas > I'm trying to do 1 compute + 1 controller deployment on this virtual host with similar result
Depending on how you configure your deployment (--compute-flavor, etc), these instances can still be scheduled on the small nodes. Are you sure it's not the case?
I popped in to the systems in question and found that introspection shows root disks which are not sufficient as specified by the assigned node flavors. I found the following disk data from examining the nodes' ironic data (note the third key):
[stack@ironic-0 ~]$ for NODE in `openstack baremetal node list -fvalue -cName`; do openstack baremetal node show ${NODE} | grep properties; done
| properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', u'cpus': u'2', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages_1g:true,cpu_hugepages:true,boot_option:local'} |
[stack@ironic-0 ~]$
All of the flavors assigned to these nodes specify a larger disk size requirement than what the node provides:
[stack@ironic-0 ~]$ openstack flavor list
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
| ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
| 06dddb8e-a1a6-4b9e-aad3-12747db41b46 | swift-storage | 4096 | 40 | 0 | 1 | True |
| 52822c2b-03bf-42c3-bbc5-9ccf73a144c3 | block-storage | 4096 | 40 | 0 | 1 | True |
| b6dddc62-21cb-4819-9b15-ca1191a4df56 | control | 4096 | 40 | 0 | 1 | True |
| cbcd308d-d2ab-4eb4-8421-cc1f9361caab | ceph-storage | 4096 | 40 | 0 | 1 | True |
| cefb6698-4a04-4dd0-a60c-4456cde76621 | baremetal | 4096 | 40 | 0 | 1 | True |
| e9b2374b-2891-4c4d-b383-ccd53e15aec9 | compute | 4096 | 40 | 0 | 1 | True |
+--------------------------------------+---------------+------+------+-----------+-------+-----------+
Thanks Dan, I think your assessment is right and consistent with the error. If your flavor says 40 GiB and local_gb shows the node's disk is 39 the partition end is outside the device. Would changing the flavours to something less than 39 work? I changed the baremetal flavor to 20G and was able to deploy overcloud with 1 controller +1 compute successfully on a setup that previously showed:
"2017-04-24 16:47:13.790 24927 ERROR ironic.drivers.modules.agent_base_vendor Stderr: u'Error: The location 40962 is outside of the device /dev/sda.\n'"
[stack@ironic-0 ~]$ openstack flavor show baremetal
+----------------------------+--------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 20 |
| id | 4ed7bbef-8b1f-4a67-8e5b-8f482c26d220 |
| name | baremetal |
| os-flavor-access:is_public | True |
| properties | capabilities:boot_option='local' |
| ram | 4096 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+--------------------------------------+
[stack@ironic-0 ~]$ for i in `ironic node-list|awk '/power/ {print $2}'`; do ironic node-show $i|grep -A2 properties; done
| properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', |
| | u'cpus': u'2', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', |
| | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', |
| | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', |
| | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', |
| | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', |
| | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'} |
| properties | {u'memory_mb': u'17000', u'cpu_arch': u'x86_64', u'local_gb': u'39', |
| | u'cpus': u'4', u'capabilities': u'cpu_vt:true,cpu_aes:true,cpu_hugepages |
| | _1g:true,cpu_hugepages:true,boot_option:local'}
Thanks everyone. Looks like the problem was indeed caused by missing root device hints, so I'm closing it. To avoid such problems in the future, I've filed 2 RFEs for the UI team: bug 1445650 to support setting root device hints and bug 1445662 for validations to check that they are set. I hope that will help avoid such issues in the future. Please let me know if we can do more than that. |