1433088 – Stop storing configdrive in the database, as it can cause failures with "Data too long for column 'instance_info'"

Bug 1433088 - Stop storing configdrive in the database, as it can cause failures with "Data too long for column 'instance_info'"

Summary: Stop storing configdrive in the database, as it can cause failures with "Data...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-ironic
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	Upstream M2
Target Release:	13.0 (Queens)
Assignee:	Dmitry Tantsur
QA Contact:	Alexander Chuzhoy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-16 18:45 UTC by Vincent S. Cojot
Modified:	2022-08-09 14:08 UTC (History)
CC List:	13 users (show)
Fixed In Version:	openstack-ironic-10.1.2-0.20180223164019.4c3a611.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-27 13:29:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1575935	None	None	None	2017-09-28 18:15:03 UTC
Launchpad	1596421	None	None	None	2018-02-13 12:27:53 UTC
OpenStack gerrit	501761	'None'	MERGED	Add ability to provide configdrive when rebuilding	2020-10-21 11:19:48 UTC
OpenStack gerrit	546551	'None'	MERGED	Increase the instance_info column size to LONGTEXT on MySQL/MariaDB	2020-10-21 11:19:35 UTC
Red Hat Issue Tracker	OSP-8560	None	None	None	2022-08-09 14:08:30 UTC
Red Hat Knowledge Base (Solution)	2973911	None	None	None	2019-11-19 09:42:43 UTC
Red Hat Product Errata	RHEA-2018:2086	None	None	None	2018-06-27 13:30:45 UTC

Internal Links: 1703019

Description Vincent S. Cojot 2017-03-16 18:45:11 UTC

Description of problem:

In several virtual setups, I've been using VMs for the undercloud and overcloud nodes. The ironic nodes are using pxe_ssh with an ssh key.

This has been working fine on OSP8/9/10.. until recently (Started around March 9th). As of a few days ago, I have not been able to get a successful deploy on OSP10. 

The issue that I am running into is this:
- Ironic nodes introspect fine. I can turn them on or off using ironic node-set-power-state.
- When I attempt a deploy, nova nodes get scheduled but they never get powered on for the deploy:

This used to work fine from OSP10 GA until a few days ago.
I restored the OSP8 snapshot of my undercloud and was able to deploy without trouble so it seems limited to OSP10 and possibly recent-updates.

Comment 1 Vincent S. Cojot 2017-03-16 18:46:18 UTC

The ironic nodes are defined as:
ironic node-create -n ${IRONIC_NODE} \
    -d pxe_ssh \
    -i ssh_address=${HVM_HOST_IP} \
    -i ssh_username=${HVM_USER} \
    -i ssh_virt_type=vbox \
    -i ssh_key_contents=\"\$(cat ${IRONIC_KEY} ) \" "

ironic node-update ${IRONIC_UUID} add \
    properties/cpus=${vm_slave_cpu_default} \
    properties/memory_mb=${vm_slave_memory_default} \
    properties/local_gb=62 \
    properties/cpu_arch=x86_64 \
    driver_info/vbox_use_headless=true \

And the ironic port is defined as: ironic port-create -n ${IRONIC_UUID} -a ${IRONIC_MAC}
(Ref: https://www.rdoproject.org/install/tripleo-cli)

Comment 2 Vincent S. Cojot 2017-03-16 18:53:55 UTC

The deploy command is (OSP8/9/10)
time openstack overcloud deploy \
--templates ${TOP_DIR}/templates \
--control-scale 1 \
--compute-scale 1 \
--ceph-storage-scale 1 \
--swift-storage-scale 0 \
--control-flavor control \
--compute-flavor compute \
--ceph-storage-flavor ceph-storage \
--swift-storage-flavor swift-storage \
--ntp-server '10.20.0.1", "10.20.0.2' \
--validation-errors-fatal \
-e ${TOP_DIR}/templates/environments/network-isolation.yaml \
-e ${TOP_DIR}/templates/environments/storage-environment.yaml \
-e ${TOP_DIR}/net-bond-with-vlans-with-nic4.yaml \
-e ${TOP_DIR}/rhel-registration-environment.yaml \
-e ${TOP_DIR}/storage-environment.yaml \
-e ${TOP_DIR}/krynn-environment.yaml \
-e ${TOP_DIR}/extraconfig-environment.yaml \
-e ${TOP_DIR}/enable-tls.yaml \
-e ${TOP_DIR}/inject-trust-anchor.yaml \
-e ${TOP_DIR}/local-environment.yaml

Comment 3 Vincent S. Cojot 2017-03-16 19:00:21 UTC

On OSP8, the nodes look like this:
+--------------------------------------+-----------------------------------------+
| Node UUID                            | Node Capabilities                       |
+--------------------------------------+-----------------------------------------+
| 4a375e67-0e97-4125-93b7-e28bd2e53cea | profile:control,boot_option:local       |
| 9929d709-268c-4a9a-8d86-66c7ff1c9230 | profile:control,boot_option:local       |
| 8abd67d0-db10-4954-9069-9a0055517f3b | profile:control,boot_option:local       |
| 3f69e60b-6c0f-4924-8aa5-09b3cd46a2f7 | profile:ceph-storage,boot_option:local  |
| 5dfa06f8-3e10-4ca5-9521-7b256f8dac68 | profile:ceph-storage,boot_option:local  |
| 9a41846c-9e81-41a2-8aad-3ab8c5f0195b | profile:ceph-storage,boot_option:local  |
| 9bbfea6b-171a-43db-ab29-3d2c5ac3360b | profile:swift-storage,boot_option:local |
| fa22a008-4472-4710-85d2-4bff5d8396cb | profile:swift-storage,boot_option:local |
| 7ba4044e-d266-4fcc-9ba5-51dc9285967f | profile:swift-storage,boot_option:local |
| bd563d93-decb-4b99-960a-5a2265a5f29d | profile:compute,boot_option:local       |
| 0dd30949-bc0d-4b1e-8f13-40992eb161f6 | profile:compute,boot_option:local       |
| 1a1cffde-c135-4951-a4b9-9070243950ec | profile:compute,boot_option:local       |
| e538092c-39ea-4339-8548-f46c9d6ff089 | profile:compute,boot_option:local       |
| 32ff444b-e920-4174-9e17-187f41b0036e | profile:compute,boot_option:local       |
| e06ba4f5-8c10-4afd-9177-9c872df9c22e | profile:compute,boot_option:local       |
| c813fbfa-09ca-446b-8c07-aa0c3c4189a4 | profile:compute,boot_option:local       |
+--------------------------------------+-----------------------------------------+

on OSP10, I am using the same type of node tagging, which can be seen using
 "openstack overcloud profiles list".

Comment 4 Vincent S. Cojot 2017-03-16 19:00:32 UTC

on OSP8, nodes spawn successfully:
[stack@instack ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks             |
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
| 2b073652-e7cc-4ab4-995b-d1ea67c56619 | krynn-ceph-0 | ACTIVE | -          | Running     | ctlplane=10.20.0.108 |
| 911d08b3-3da0-404d-b6ca-2cad856c565d | krynn-cmpt-0 | ACTIVE | -          | Running     | ctlplane=10.20.0.107 |
| 42d36d92-81c8-467e-8884-d266a4ef89ed | krynn-ctrl-0 | ACTIVE | -          | Running     | ctlplane=10.20.0.104 |
+--------------------------------------+--------------+--------+------------+-------------+----------------------+


On OSP10, the nodes stay in 'SPAWNING' until the deploy times out:
- When I attempt a deploy, nova nodes get scheduled but they never get powered on for the deploy:

[stack@instack ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
| 23c40e7a-688c-440d-8b4c-9a91fe989e22 | krynn-cmpt-0 | BUILD | spawning | NOSTATE | ctlplane=10.20.0.104 |
| 7ae97372-8c79-4905-a7cd-cc8bea993dc8 | krynn-ctrl-0 | BUILD | spawning | NOSTATE | ctlplane=10.20.0.111 |
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
[stack@instack ~]$ ironic node-list
+--------------------------------------+------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------------+--------------------------------------+-------------+--------------------+-------------+
[....]
| 57b7a089-d3b6-44c9-8c1d-77eba0b1d011 | osp-baremetal-2 | 7ae97372-8c79-4905-a7cd-cc8bea993dc8 | power off | deploying | False |
[....]
| aa04082a-7a58-4b81-bbf9-b8cab8891b9d | osp-baremetal-11 | 23c40e7a-688c-440d-8b4c-9a91fe989e22 | power off | deploying | False |
[....]
+--------------------------------------+------------------+--------------------------------------+-------------+--------------------+-------------+

Comment 5 Vincent S. Cojot 2017-03-16 19:02:54 UTC

On OSP10, 'openstack overcloud deploy' stays stuck at:
[....]
2017-03-16 16:20:05Z [overcloud.Compute.0.NovaCompute]: CREATE_IN_PROGRESS state changed
2017-03-16 16:20:06Z [overcloud.Controller.0.Controller]: CREATE_IN_PROGRESS state changed

In nova-compute.log, I see this message scrolling:
2017-03-16 12:26:57.164 1759 DEBUG nova.virt.ironic.driver [-] [instance: 23c40e7a-688c-440d-8b4c-9a91fe989e22] Still waiting for ironic node aa04082a-7a58-4b81-bbf9-b8cab8891b9d to become ACTIVE: power_state="power on", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:120

and in ironic-api.log:
2017-03-16 12:20:13.152 12872 INFO eventlet.wsgi.server [req-cba6bc11-75ca-4583-9cea-e677dd13015d ironic service - - -] 10.20.0.2 "GET /v1/nodes/?instance_uuid=23c40e7a-688c-440d-8b4c-9a91fe989e22&fields=uuid,power_state,target_power_state,provision_state,target_provision_state,last_error,maintenance,properties,instance_uuid HTTP/1.1" status: 200 len: 1007 time: 0.0328538

Comment 6 Vincent S. Cojot 2017-03-16 20:36:29 UTC

please take note that all of the following ironic operations work fine in that OSP10 environmement.

[stack@instack ~]$ ironic node-set-boot-device osp-baremetal-1 pxe
[stack@instack ~]$ ironic node-set-power-state osp-baremetal-1 on
[stack@instack ~]$ ironic node-set-power-state osp-baremetal-1 off
[stack@instack ~]$ ironic node-set-boot-device osp-baremetal-1 disk
[stack@instack ~]$ ironic node-set-power-state osp-baremetal-1 on
[stack@instack ~]$ openstack baremetal introspection start osp-baremetal-1

Comment 7 Vincent S. Cojot 2017-03-16 20:38:57 UTC






[stack@instack ~]$ ironic node-list
+--------------------------------------+------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name             | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------------+---------------+-------------+--------------------+-------------+
| c34ec3ff-fc32-410a-934d-b59c7b281bff | osp-baremetal-1  | None          | power off   | available          | False       |
| 9bc16019-e06e-48a2-bd0d-339bc142c1b3 | osp-baremetal-2  | None          | power off   | available          | False       |
| eb416481-f60a-47fe-a081-2a94b8db621b | osp-baremetal-3  | None          | power off   | available          | False       |
| ddbd07f2-a9ee-44b6-b96c-9804365c3da6 | osp-baremetal-4  | None          | power off   | available          | False       |
| 5a6f6568-a4ad-4b9d-84e6-8ac01deb7f03 | osp-baremetal-5  | None          | power off   | available          | False       |
| f0c5fd84-6c0e-499c-af3c-6d8cdcdb4e64 | osp-baremetal-6  | None          | power off   | available          | False       |
| 4dcb81d1-fc5f-407d-b081-d79dc2e3df3d | osp-baremetal-7  | None          | power off   | available          | False       |
| 3edab1d3-e5e1-431a-8a79-74405f02e614 | osp-baremetal-8  | None          | power off   | available          | False       |
| 11c00cb9-c467-486c-bacb-a8aeb84225ae | osp-baremetal-9  | None          | power off   | available          | False       |
| 7a99deb8-927a-4113-b564-f84de3e68b1b | osp-baremetal-10 | None          | power off   | available          | False       |
| ebbd5862-a404-4488-b210-df755886cb7c | osp-baremetal-11 | None          | power off   | available          | False       |
| f3d39ac9-bc5d-4ef3-974b-d74b06317aa8 | osp-baremetal-12 | None          | power off   | available          | False       |
| 2ab58646-8999-44ff-92d6-45aa0ef59e93 | osp-baremetal-13 | None          | power off   | available          | False       |
| 491e7821-74f0-4ed9-b19a-2a6bb894add1 | osp-baremetal-14 | None          | power off   | available          | False       |
| b36046a3-3beb-4dba-a556-5121fddad739 | osp-baremetal-15 | None          | power off   | available          | False       |
| fab2ca7f-f124-4b65-9704-8c7aa0bb7b31 | osp-baremetal-16 | None          | power off   | available          | False       |
+--------------------------------------+------------------+---------------+-------------+--------------------+-------------+

Comment 8 Vincent S. Cojot 2017-03-16 20:39:15 UTC

Deploy started with:

[stack@instack ~]$ ./OSP/osp10/bin/deploy17_micro.sh
+ openstack overcloud deploy --templates --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --swift-storage-scale 0 --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage --swift-storage-flavor swift-storage --ntp-server '10.20.0.1", "10.20.0.2' --validation-errors-fatal -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/OSP/osp10/net-bond-with-vlans-with-nic4.yaml -e /home/stack/OSP/osp10/rhel-registration-environment.yaml -e /home/stack/OSP/osp10/storage-environment.yaml -e /home/stack/OSP/osp10/krynn-environment.yaml -e /home/stack/OSP/osp10/extraconfig-environment.yaml -e /home/stack/OSP/osp10/enable-tls.yaml -e /home/stack/OSP/osp10/inject-trust-anchor.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml -e /home/stack/OSP/osp10/local-environment.yaml
2 nodes with profile control won't be used for deployment now
6 nodes with profile compute won't be used for deployment now
2 nodes with profile ceph-storage won't be used for deployment now
Configuration has 3 warnings, fix them before proceeding. 
Removing the current plan files

Comment 9 Vincent S. Cojot 2017-03-16 20:43:43 UTC

After a little while, I get this:

[....]
2017-03-16 20:41:41Z [overcloud.Controller.0.Controller]: CREATE_IN_PROGRESS  state changed
2017-03-16 20:41:42Z [overcloud.CephStorage.0.UserData]: CREATE_COMPLETE  state changed
2017-03-16 20:41:42Z [overcloud.CephStorage.0.CephStorage]: CREATE_IN_PROGRESS  state changed


And in nova:
[stack@instack ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks             |
+--------------------------------------+--------------+--------+------------+-------------+----------------------+
| 9d746160-11dd-432b-99de-5ef8b05391da | krynn-ceph-0 | BUILD  | spawning   | NOSTATE     | ctlplane=10.20.0.107 |
| 1d119d19-eacb-4b09-ab90-80428dfd7a64 | krynn-cmpt-0 | BUILD  | spawning   | NOSTATE     | ctlplane=10.20.0.109 |
| b0edc22a-da3f-4beb-bcb1-02e1fb4dfd7e | krynn-ctrl-0 | BUILD  | spawning   | NOSTATE     | ctlplane=10.20.0.105 |
+--------------------------------------+--------------+--------+------------+-------------+----------

but these nodes never attempt to power-on:
[stack@instack ~]$ ironic node-list|grep -v None
+--------------------------------------+------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name             | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------------+--------------------------------------+-------------+--------------------+-------------+
| 9bc16019-e06e-48a2-bd0d-339bc142c1b3 | osp-baremetal-2  | b0edc22a-da3f-4beb-bcb1-02e1fb4dfd7e | power off   | deploying          | False       |
| ddbd07f2-a9ee-44b6-b96c-9804365c3da6 | osp-baremetal-4  | 9d746160-11dd-432b-99de-5ef8b05391da | power off   | deploying          | False       |
| fab2ca7f-f124-4b65-9704-8c7aa0bb7b31 | osp-baremetal-16 | 1d119d19-eacb-4b09-ab90-80428dfd7a64 | power off   | deploying          | False       |
+--------------------------------------+------------------+--------------------------------------+-------------+--------------------+-------------+

Comment 10 Vincent S. Cojot 2017-03-16 20:49:31 UTC

If we take the instance named fab2ca7f-f124-4b65-9704-8c7aa0bb7b31 on osp-baremetal-16, there are messages in /var/log/nova/nova-scheduler.log about it:

First we have:
2017-03-16 16:41:41.736 30962 DEBUG oslo_concurrency.lockutils [req-c2af360a-96f3-4ab6-b737-bd6bc73974d0 a1012a438d54465a8299389609938b35 c03f66100ae044c8b205585c01fae8bf - - -] Lock "(u'instack.lasthome.solace.krynn', u'fab2ca7f-f124-4b65-9704-8c7aa0bb7b31')" acquired by "nova.scheduler.host_manager._locked_update" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270
2017-03-16 16:41:41.736 30962 DEBUG nova.scheduler.host_manager [req-c2af360a-96f3-4ab6-b737-bd6bc73974d0 a1012a438d54465a8299389609938b35 c03f66100ae044c8b205585c01fae8bf - - -] Update host state from compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='',created_at=2017-03-16T19:33:04Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=62,free_disk_gb=62,free_ram_mb=8256,host='instack.lasthome.solace.krynn',host_ip=10.0.128.169,hypervisor_hostname='fab2ca7f-f124-4b65-9704-8c7aa0bb7b31',hypervisor_type='ironic',hypervisor_version=1,id=62,local_gb=62,local_gb_used=0,memory_mb=8256,memory_mb_used=0,metrics='[]',numa_topology=None,pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=0,service_id=None,stats={boot_option='local',cpu_arch='x86_64',profile='compute'},supported_hv_specs=[HVSpec],updated_at=2017-03-16T20:41:04Z,uuid=9bd7f87c-66d7-4835-847f-7a27f0ec979a,vcpus=4,vcpus_used=0) _locked_update /usr/lib/python2.7/site-packages/nova/scheduler/host_manager.py:168

this repeats a few times, and then we have:

2017-03-16 16:41:43.315 30962 DEBUG nova.scheduler.filters.compute_capabilities_filter [req-c27cd857-5778-4487-bade-280cf153e9b4 a1012a438d54465a8299389609938b35 c03f66100ae044c8b205585c01fae8bf - - -] (instack.lasthome.solace.krynn, fab2ca7f-f124-4b65-9704-8c7aa0bb7b31) ram: 4160MB disk: 22528MB io_ops: 0 instances: 0 fails extra_spec requirements. 'control' does not match 'compute' _satisfies_extra_specs /usr/lib/python2.7/site-packages/nova/scheduler/filters/compute_capabilities_filter.py:103
2017-03-16 16:41:43.316 30962 DEBUG nova.scheduler.filters.compute_capabilities_filter [req-c27cd857-5778-4487-bade-280cf153e9b4 a1012a438d54465a8299389609938b35 c03f66100ae044c8b205585c01fae8bf - - -] (instack.lasthome.solace.krynn, fab2ca7f-f124-4b65-9704-8c7aa0bb7b31) ram: 4160MB disk: 22528MB io_ops: 0 instances: 0 fails instance_type extra_specs requirements host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/compute_capabilities_filter.py:112

Comment 11 Vincent S. Cojot 2017-03-16 20:52:33 UTC

I tried parsing the logs for 'ERROR' but there aren't any relevant errors, except in nova as the undercloud was booting.

I'll attaching the logs shortly.

Comment 13 Vincent S. Cojot 2017-03-17 18:57:06 UTC

There seems to be some strange stuff going on, I just checked ironic-conductor.log and noticed this:

[random blob....]
bsz6syET1fOUV5rRZxsS5fOLYzEUms/FPR5M6cFL3Io85L13XGbjqV4r3B2dFefkZl+rOaZ+qKgz3JFBwX9djL3AM59czTGB1okkZjKDIujyqXeiab88r07FwsDFLhMRqnYildK/m2kOJkL66W5c6iGu3X3KswNYXF/8NcY9uH5cxeec6VTifmED7ePzqP5
ZqJWY9Gva14rsbWuJ6VLCYbjgdNpIs6/z2ebhZbeHt6sh+zmVZz86ytLt9fXU5//OPjdDxWztZWnrz68x/eXjXR7crRWW9zd3vzl/H4pDjvrfZS5fJwd/3B+OB8fav3w9HZoHpQce/9fx4PBl/vjNcHuwcr69tfP9p6+LDorW1trfceFsVab/BgOOit339Y
PBh+8/VqMRj8Uz7p0rd8O9rb6adfZ7i59mhtbX1zPf239ujxZrH2+P43jzfXN79ZXR8Ujx6ljW5987tmreLDcQqHRv37aw/XHz5+8Ojx13ki32qqrTzj6yi6idMyu726w64X9WkqdbtrDx/m6WOWqsN5dJKP8VKxdT9PeV8d0ZiENlKZUrvi09/3qfV+EAO
E+R6ok3R2zTktVh9ccw97PXMmAsDfu4+msgcA9T8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPwN+m/gxomVADALAA==", "swap_mb": "0"}', 61)]
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters context)
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 146, in execute
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 296, in _query
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
[....]
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 112, in _check_mysql_exception
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errorvalue)
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters DataError: (1406, u"Data too long for column 'instance_info' at row 1")
2017-03-17 14:32:05.886 18764 ERROR oslo_db.sqlalchemy.exc_filters 
2017-03-17 14:32:05.912 18764 ERROR oslo_db.sqlalchemy.exc_filters [req-eb000c3c-2ac0-41ec-914d-3e8aae9a2417 - - - - -] DBAPIError exception wrapped from (pymysql.err.DataError) (1406, u"Data too long for column 'last_error' at row 1") [SQL: u'UPDATE nodes SET updated_at=%s, provision_state=%s, provision_updated_at=%s, last_error=%s, instance_info=%s WHERE nodes.id = %s'] [parameters: (datetime.datetime(2017, 3, 17, 18, 32, 5, 902889), u'deploy failed', datetime.datetime(2017, 3, 17, 18, 32, 5, 902312), u'Async execution of do_node_deploy failed with error: (pymysql.err.DataError) (1406, u"Data too long for column \'instance_info\' at row 1") [SQL: u\'UPDATE nodes SET updated_at=%s, provision_state=%s, provision_updated_at=%s, instance_info=%s WHERE nodes.id = %s\'] [parameters: (datetime.datetime(2017, 3, 17, 18, 32, 5, 881427), u\'deploy failed\', datetime.datetime(2017, 3, 17, 18, 32, 5, 880807), \'{"root_gb": "40", "display_name": "krynn-cmpt-0", "image_source": "92b54d1e-99f5-4ea3-9cc9-62d496d4ff0e", "capabilities": "{\\\\"profile\\\\": \\\\"compute\\\\", \\\\"boot_option\\\\": \\\\"local\\\\"}", "memory_mb": "4096", "vcpus": "1", "local_gb": "127", "configdrive": "H4sICJIrzFgC/3RtcHNmNWNFNwDsvemSG8mWHgje29PdGd3WM9N9RzYma3V7gaybmSQCW+4s4TZzQTJB5sJKZDJJFnlhAUQAiEwgAowIAAmysk39Q+qW2Zj+ankDLW8gqWWmB5CeRDb6qX8z5xz3
[random blob....]

There seems to be some kind of binary blob that gets passed as args, hence the failure..

Comment 14 Vincent S. Cojot 2017-03-17 20:53:01 UTC

This was subsequently tracked down to this issue:
Data too long for column 'instance_info' at row 1"

With multiple firstboot scripts, the instance_info column in the ironic DB became too small to hold the config drive information.

The solution was to remove some scripts from firstboot and deployments started working again.

This strikes me as a technical issue for two reasons:
1) config_drive is supposed to be capable of expanding as large as 64Mb but here, the 'instance_info' column in the ironic DB limits it to a much smaller set.

2) The message that instance_info is suddenly too large for the DB column should get propagated back to the user.. It really wasn't obvious from the symptoms or from the message in ironic-conductor.

Comment 15 Vincent S. Cojot 2017-03-20 15:13:54 UTC

Other seems to have run into this problem and attempted to enhance the situation. Here are two (unfortunately abandonned) patches:
https://review.openstack.org/#/c/369617
https://review.openstack.org/#/c/334967

Comment 16 Bob Fournier 2017-09-28 18:06:26 UTC

Upstream patch in progress - https://review.openstack.org/#/c/501761/

Comment 17 Bob Fournier 2017-09-28 18:15:03 UTC

Upstream launched pad is a bit complicated - the upstream bug opened here was https://bugs.launchpad.net/ironic/+bug/1673887 which was closed as a duplicate of https://bugs.launchpad.net/ironic/+bug/1596421, which was then closed as Invalid for https://bugs.launchpad.net/ironic/+bug/1575935.

Updated referenced Launchpad bug.

Comment 18 Dmitry Tantsur 2017-10-02 11:32:42 UTC

Nova part is https://blueprints.launchpad.net/openstack/?searchtext=rebuild-ironic-config-drive, we may need an RFE for them..

Comment 22 Dmitry Tantsur 2018-02-13 12:24:25 UTC

The fix has not landed. We have updated Nova to pass configdrive on rebuild, but we haven't stopped storing configdrive in the database. I'll check upstream what our next steps are.

Comment 23 skhodri 2018-02-14 16:49:54 UTC

We have another customer facing this issue in LLR OSP10. Is there a plan to backport the fix in OSP10 since it is a regression issue? My customer is not willing to upgrade to OSP13 in a short term and living with this issue in OSP10 is not an option.

Comment 24 Bob Fournier 2018-02-19 17:23:48 UTC

>We have another customer facing this issue in LLR OSP10. Is there a plan to backport >the fix in OSP10 since it is a regression issue? My customer is not willing to upgrade >to OSP13 in a short term and living with this issue in OSP10 is not an option.

Salim - we currently do not have a fix for this upstream.  See https://bugs.launchpad.net/ironic/+bug/1596421

The potential fix is under discussion, once it lands we'll consider the backport.  It is likely this fix can be backported but we need to review once the upstream fix lands.

Changing priority of this to High.

Comment 25 Dmitry Tantsur 2018-02-19 18:07:31 UTC

Unfortunately, we cannot come up with a solution that can be backported. You may have to change the "instance_info" column of the "nodes" table from TEXT to LONGTEXT manually.

Comment 28 Bob Fournier 2018-04-16 11:59:21 UTC

Pierre - this package will be in the OSP-13 beta release on My 9th. Is that OK?

Comment 33 Vincent S. Cojot 2018-04-30 20:12:40 UTC

(In reply to Dmitry Tantsur from comment #25)
> Unfortunately, we cannot come up with a solution that can be backported. You
> may have to change the "instance_info" column of the "nodes" table from TEXT
> to LONGTEXT manually.

Hi Dmitry,
As OSP10 deployments become more and more complex, what is the procedure to change instance_info from TEXT to LONGTEXT on existing underclouds? I know I can just go into the DB and change that but would there be a local override that would work for that in the undercloud configuration?
Is there something better than :
mysql -e 'alter table ironic.nodes modify instance_info longtext;' and restarting openstack-ironic\* ?

Comment 34 Alexander Chuzhoy 2018-05-17 14:25:04 UTC

Verified:
Environment:
puppet-ironic-12.4.0-0.20180329034302.8285d85.el7ost.noarch
python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch
openstack-ironic-conductor-10.1.2-4.el7ost.noarch
python-ironic-inspector-client-3.1.1-1.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
python2-ironicclient-2.2.0-1.el7ost.noarch
instack-undercloud-8.4.1-4.el7ost.noarch
openstack-ironic-api-10.1.2-4.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch
python-ironic-lib-2.12.1-1.el7ost.noarch
openstack-ironic-common-10.1.2-4.el7ost.noarch



Checked the type on instance_info field in nodes table that's in ironic DB:

it's 'longtext' 

On previous releases it was 'text'

Comment 37 errata-xmlrpc 2018-06-27 13:29:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.