Bug 1299202

Summary: unable to deploy with nodes larger then 3TB disk
Product: Red Hat OpenStack Reporter: Sanjay Upadhyay <supadhya>
Component: openstack-ironicAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED WONTFIX QA Contact: Toure Dunnon <tdunnon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: mburns, rhel-osp-director-maint, sankarshan, srevivo
Target Milestone: ---Keywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-05 10:52:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanjay Upadhyay 2016-01-17 10:37:33 UTC
Description of problem:

Nodes with 3T disks are not getting imaged by ironic.

Version-Release number of selected component (if applicable):
openstack-ironic-common-2015.1.2-2.el7ost.noarch
openstack-ironic-conductor-2015.1.2-2.el7ost.noarch
openstack-ironic-api-2015.1.2-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-8.el7ost.noarch

How reproducible:

I tried this with controller nodes with 3T of disk

Steps to Reproduce:
1. Flavor -
openstack flavor list                                                                                                                                                 +--------------------------------------+-----------+--------+------+-----------+-------+-----------+                                                                                          | ID                                   | Name      |    RAM | Disk | Ephemeral | VCPUs | Is Public |                                                                                          +--------------------------------------+-----------+--------+------+-----------+-------+-----------+
| 63c0da57-4709-4b4d-af44-6a48787a0efc | control   | 131072 | 3724 |         0 |    32 | True      |
| 66528f20-6c8c-4d5d-87cf-8d5c7845810c | compute   | 131072 |  300 |         0 |    40 | True      |
| dc6e42e0-2ecf-4481-b79f-06c2ff8b6537 | baremetal |   4096 |   40 |         0 |     1 | True      |
| e3974c18-1cdd-4034-a5f5-01c031911972 | ceph      |  65536 |  300 |         0 |    12 | True    |

2. Node-show, tag the 3T nodes with control flavor
 ironic node-show fb2312d1-d996-4cfb-91d8-15a799853bfd
+------------------------+-------------------------------------------------------------------------+
| Property               | Value                                                                   |
+------------------------+-------------------------------------------------------------------------+
| target_power_state     | None                                                                    |
| extra                  | {u'newly_discovered': u'true', u'block_devices': {u'serials':           |
|                        | [u'6b083fe0d7c25a001d2a0b2b10caa024']}, u'hardware_swift_object': u     |
|                        | 'extra_hardware-fb2312d1-d996-4cfb-91d8-15a799853bfd'}                  |
| last_error             | None                                                                    |
| updated_at             | 2016-01-03T12:09:46+00:00                                               |
| maintenance_reason     | None                                                                    |
| provision_state        | available                                                               |
| uuid                   | fb2312d1-d996-4cfb-91d8-15a799853bfd                                    |
| console_enabled        | False                                                                   |
| target_provision_state | None                                                                    |
| maintenance            | False                                                                   |
| inspection_started_at  | None                                                                    |
| inspection_finished_at | None                                                                    |
| power_state            | power off                                                               |
| driver                 | pxe_ipmitool                                                            |
| reservation            | None                                                                    |
| properties             | {u'memory_mb': u'131072', u'cpu_arch': u'x86_64', u'local_gb': u'3724', |
|                        | u'cpus': u'32', u'capabilities': u'profile:control,boot_option:local'}  |
| instance_uuid          | None                                                                    |
| name                   | None                                                                    |
| driver_info            | {u'ipmi_password': u'******', u'ipmi_address': u'192.168.15.23',        |
|                        | u'ipmi_username': u'root', u'deploy_kernel': u'f368732b-34c8-40b7-b3ff- |
|                        | bbd3bccd99e0', u'deploy_ramdisk': u'0c8831d8-19de-4c25-a42f-            |
|                        | 1bca41612e12'}                                                          |
| created_at             | 2016-01-03T08:34:40+00:00                                               |
| driver_internal_info   | {u'clean_steps': None, u'is_whole_disk_image': False}                   |
| chassis_uuid           |                                                                         |
| instance_info          | {}                                                                      |
+------------------------+-------------------------------------------------------------------------+
3. Run deploy 

Actual results:

Deploying templates in the directory /home/stack/NUS-OSP/nus_ospd_templates
Stack failed with status: Resource CREATE failed: ResourceInError: resources.Controller.resources[1].resources.Controller: Went to status ERROR due to "Message: No valid host was found. Exceeded max scheduling attempts 3 for instance b511975c-89f6-4909-bc3b-c96b96f3e472. Last exception: [u'Traceback (most recent call last): \n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2261, in _do, Code: 500"
ERROR: openstack Heat Stack create failed.



Logs - /var/log/ironic/ironic-conductor.log 
2016-01-03 04:42:07.938 1214 DEBUG ironic.common.utils [-] Execution completed, command line is "ipmitool -I lanplus -H 192.168.15.23 -L ADMINISTRATOR -U root -R 3 -N 5 -f /tmp/tmpR5s3WU pow
er status" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:83
2016-01-03 04:42:07.939 1214 DEBUG ironic.common.utils [-] Command stdout is: "Chassis Power is off
" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:84
2016-01-03 04:42:07.940 1214 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:85
2016-01-03 04:42:07.942 1214 INFO ironic.conductor.utils [-] Successfully set node fb2312d1-d996-4cfb-91d8-15a799853bfd power state to power off.
2016-01-03 04:42:07.980 1214 DEBUG oslo_concurrency.lockutils [-] Lock "master_image" acquired by "clean_up" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutil
s.py:444
2016-01-03 04:42:07.981 1214 DEBUG ironic.drivers.modules.image_cache [-] Starting clean up for master image cache /var/lib/ironic/master_images clean_up /usr/lib/python2.7/site-packages/iro
nic/drivers/modules/image_cache.py:195
2016-01-03 04:42:07.982 1214 DEBUG oslo_concurrency.lockutils [-] Lock "master_image" released by "clean_up" :: held 0.002s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.
py:456
2016-01-03 04:42:07.983 1214 ERROR ironic.drivers.base [-] vendor_passthru failed with method pass_deploy_info
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base Traceback (most recent call last):
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base   File "/usr/lib/python2.7/site-packages/ironic/drivers/base.py", line 516, in passthru_handler
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base     return func(*args, **kwargs)
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base   File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 129, in wrapper
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base     return f(*args, **kwargs)
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/pxe.py", line 598, in pass_deploy_info
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base     uuid_dict = iscsi_deploy.continue_deploy(task, **kwargs)
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 328, in continue_deploy
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base     _fail_deploy(task, msg)
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 301, in _fail_deploy
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base     raise exception.InstanceDeployFailure(msg)
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base InstanceDeployFailure: Deploy failed for instance 82a5e5fd-4aa5-4380-967b-a124657f0c1b. Error: Unexpected error while running command.
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base Command: sudo ironic-rootwrap /etc/ironic/rootwrap.conf parted -a optimal -s /dev/disk/by-path/ip-192.168.16.191:3260-iscsi-iqn.2008-10.org.openstack:fb2312d1-d996-4cfb-91d8-15a799853bfd-lun-1 -- unit MiB mklabel msdos mkpart primary  1 2 mkpart primary  2 3813378 set 2 boot on
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base Exit code: 1
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base Stdout: u''
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base Stderr: u'Error: partition length of 7809794048 sectors exceeds the msdos-partition-table-imposed maximum of 4294967295\n'
2016-01-03 04:42:07.983 1214 TRACE ironic.drivers.base 
2016-01-03 04:42:07.995 1214 DEBUG ironic.conductor.task_manager [-] Successfully released exclusive lock for calling vendor passthru on node fb2312d1-d996-4cfb-91d8-15a799853bfd (lock was held 12.05 sec) release_resources /usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py:283
/var/log/ironic/ironic-conductor.log:2016-01-01 01:13:18.344 1214 ERROR ironic.drivers.modules.iscsi_deploy [-] Deploy failed for instance ca9a6d72-c24a-47d1-8494-b22217a97fec. Error: Unexpected error while running command.
/var/log/ironic/ironic-conductor.log:2016-01-01 01:13:28.728 1214 TRACE ironic.drivers.base InstanceDeployFailure: Deploy failed for instance ca9a6d72-c24a-47d1-8494-b22217a97fec. Error: Unexpected error while running command.
/var/log/ironic/ironic-conductor.log:2016-01-01 01:13:33.811 1214 INFO ironic.conductor.manager [-] Successfully unprovisioned node a9d8b700-129c-40b7-8b1d-b5cfdfae90b4 with instance ca9a6d72-c24a-47d1-8494-b22217a97fec.
/var/log/ironic/ironic-conductor.log:2016-01-01 01:16:58.672 1214 ERROR ironic.drivers.modules.iscsi_deploy [-] Deploy failed for instance ca9a6d72-c24a-47d1-8494-b22217a97fec. Error: Unexpected error while running command.
/var/log/ironic/ironic-conductor.log:2016-01-01 01:17:08.942 1214 TRACE ironic.drivers.base InstanceDeployFailure: Deploy failed for instance ca9a6d72-c24a-47d1-8494-b22217a97fec. Error: Unexpected error while running command.
/var/log/ironic/ironic-conductor.log:2016-01-01 01:17:14.046 1214 INFO ironic.conductor.manager [-] Successfully unprovisioned node 6e53132f-ef9d-4473-827e-349adce4c8f4 with instance ca9a6d72-c24a-47d1-8494-b22217a97fec.



Expected results:
overcloud deployed


Additional info:
When reducing the disk size to say 100G the same nodes are getting imaged properly

Comment 2 Lucas Alvares Gomes 2016-07-05 10:52:41 UTC
Hi Sanjay,

Thanks for the report. The fact that we can't create a partition over 2TiB is a limitation in the MBR [0].

Unfortunately for OSP 7 or 8 we do not support opt-in choosing the disk label (between MBR or GPT), the disk label for these versions of OSP is implicitly set according to the boot mode (BIOS vs UEFI). So, the only way to create a partition over 2TiB is by using UEFI.

For OSP 9+ we allow operators to opt-in [1] and create a GPT partition for nodes booting in BIOS mode as well.

[0] https://en.wikipedia.org/wiki/Master_boot_record 

[1] http://docs.openstack.org/developer/ironic/deploy/install-guide.html?highlight=disk_label#choosing-the-disk-label