Bug 1641653

Summary: [UPGRADES][14] Failed to upgrade uc post ffwd: The database is not compatible with this release of ironic
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-ironicAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 14.0 (Rocky)CC: augol, bfournie, ccamacho, dtantsur, jjoyce, mburns, sgolovat, yprokule
Target Milestone: rcKeywords: Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-11.1.1-0.20181012152841.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1643511 (view as bug list) Environment:
Last Closed: 2019-01-11 11:54:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1643511, 1649551    

Description Yurii Prokulevych 2018-10-22 12:48:23 UTC
Description of problem:
-----------------------
Attempt to upgrade UC after FFWD fails:
        "INFO:nova_statedir:Nova statedir ownership complete",
        "stdout: efca5abba7c5a82d0111758b17b0ae5d3e405958d622dd3aa1121cb3bdee1766",
        "stdout: Cell0 is already setup",
        "stdout: da74a3024e136851cd056015e864d74e247df07845996d27986a96fe6a265536",
        "Error running ['docker', 'run', '--name', 'ironic_db_sync', '--label', 'config_id=tripleo_step3', '--label', 'container_name=ironic_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"s
tart_order\": 1, \"image\": \"192.168.24.1:8787/rhosp14/openstack-ironic-api:2018-10-10.3\", \"command\": \"/usr/bin/bootstrap_host_exec ironic_api su ironic -s /bin/bash -c \\'ironic-dbsync --config-file /etc/i
ronic/ironic.conf\\'\", \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/s
ource/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\"
, \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/ironic_api/etc/ironic
:/etc/ironic:ro\", \"/var/log/containers/ironic:/var/log/ironic\", \"/var/log/containers/httpd/ironic-api:/var/log/httpd\"], \"net\": \"host\", \"detach\": false, \"privileged\": false}', '--net=host', '--privil
eged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/so
urce/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.t
rust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--v
olume=/var/lib/config-data/ironic_api/etc/ironic:/etc/ironic:ro', '--volume=/var/log/containers/ironic:/var/log/ironic', '--volume=/var/log/containers/httpd/ironic-api:/var/log/httpd', '192.168.24.1:8787/rhosp14
/openstack-ironic-api:2018-10-10.3', '/usr/bin/bootstrap_host_exec', 'ironic_api', 'su', 'ironic', '-s', '/bin/bash', '-c', \"'ironic-dbsync\", '--config-file', \"/etc/ironic/ironic.conf'\"]. [2]",
        "stderr: The database is not compatible with this release of ironic (11.1.1.dev8). Please run \"ironic-dbsync online_data_migrations\" using the previous release.",
        "stdout: (cellv2) Updating default cell_v2 cell 30c11559-302d-4feb-bb44-338fab6e915e",
        "INFO  [alembic.runtime.migration] Running upgrade 18440d0834af -> 2970d2d44edc, Add manage_boot to nodes",



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-9.0.0-0.20181001174822.90afd18.0rc2.el7ost.noarch

python2-ironicclient-2.5.0-0.20180810135843.fb94fb8.el7ost.noarch
python2-ironic-neutron-agent-1.2.1-0.20180831194318.af95c72.el7ost.noarch
puppet-ironic-13.3.1-0.20180911185738.317a1e5.el7ost.noarch
openstack-ironic-staging-drivers-0.10.1-0.20180820161038.39c4e93.el7ost.noarch
openstack-ironic-api-11.1.1-0.20181001152939.4167083.el7ost.noarch
openstack-ironic-common-11.1.1-0.20181001152939.4167083.el7ost.noarch
openstack-ironic-conductor-11.1.1-0.20181001152939.4167083.el7ost.noarch
openstack-ironic-inspector-8.0.1-0.20180924215820.e89450c.el7ost.noarch
python2-ironic-inspector-client-3.3.0-0.20180810080932.53bf4e8.el7ost.noarch
python-ironic-lib-2.14.0-0.20180810074837.344161b.el7ost.noarch

python-tripleoclient-10.6.1-0.20180929200237.1d8dcb6.el7ost.noarch
python-tripleoclient-heat-installer-10.6.1-0.20180929200237.1d8dcb6.el7ost.noarch

Steps to Reproduce:
-------------------
1. Perform FFWD of RHOS-10 env
2. Setup repos and prepare container-image-prepare file for uc upgrade
3. Follow upgrade doc for undercloud upgrade

Actual results:
---------------
Undercloud upgrade failed

Expected results:
-----------------
Undercloud upgrade succeeded

Additional info:
----------------
Virtual environment 3controllers + 2computes + 3ceph

Comment 2 Jiri Stransky 2018-10-22 13:20:25 UTC
This is likely similar to bug 1624899, but it's on *undercloud*, so the fix needs to go into a different repo.

The 13->14 upgrade which failed uses the new t-h-t approach to deploying/upgrading UC, but it looks like the migration(s) were missed in earlier UC upgrades (12->13 i guess?). So the breakage needs to be fixed in instack-undercloud.

Comment 6 Dmitry Tantsur 2018-10-22 13:56:40 UTC
I don't think that's the cause. The class is included indirectly via the root manifest: https://github.com/openstack/puppet-ironic/blob/stable/queens/manifests/init.pp#L416 if this value is set to true: https://github.com/openstack/instack-undercloud/blob/stable/queens/elements/puppet-stack-config/puppet-stack-config.yaml.template#L495.

Comment 7 Dmitry Tantsur 2018-10-23 14:54:51 UTC
In the sosreport I see signs of online_data_migrations running on Queens:

2018-10-16 11:48:22.645 8618 INFO ironic.db.sqlalchemy.api [req-2dbfe38b-353b-4ece-949a-b84578c9074d - - - - -] Migrating nodes with driver pxe_ipmitool to {'management_interface': 'ipmitool', 'inspect_interface': 'inspector', 'raid_interface': 'no-raid', 'power_interface': 'ipmitool', 'driver': 'ipmi', 'deploy_interface': 'iscsi', 'boot_interface': 'pxe', 'console_interface': 'no-console', 'rescue_interface': 'no-rescue', 'vendor_interface': 'ipmitool'}

This migration only existed in that release.

Comment 8 Dmitry Tantsur 2018-10-23 15:15:00 UTC
However, judging by the nodes table, the online data migrations never ran.

MariaDB [ironic]> select version from nodes;
+---------+
| version |
+---------+
| 1.21    |
| 1.21    |
| 1.21    |
| 1.21    |
| 1.21    |
| 1.21    |
| 1.21    |
| 1.21    |
+---------+

1.21 corresponds to Pike.

Comment 9 Dmitry Tantsur 2018-10-23 15:19:27 UTC
Still, the logs insist everything was done by puppet:

$ grep -rn ironic-db undercloud_upgrade_1*.log
undercloud_upgrade_11.log:2647:2018-10-16 11:16:39,245 INFO: Notice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]/returns: executed successfully
undercloud_upgrade_11.log:2648:2018-10-16 11:16:40,417 INFO: Notice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]: Triggered 'refresh' from 1 events
undercloud_upgrade_12.log:3276:2018-10-16 11:31:37,222 INFO: Notice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]/returns: executed successfully
undercloud_upgrade_12.log:3277:2018-10-16 11:31:38,461 INFO: Notice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]: Triggered 'refresh' from 1 events
undercloud_upgrade_12.log:3280:2018-10-16 11:31:39,726 INFO: Notice: /Stage[main]/Ironic::Db::Online_data_migrations/Exec[ironic-db-online-data-migrations]/returns: executed successfully
undercloud_upgrade_12.log:3281:2018-10-16 11:31:40,937 INFO: Notice: /Stage[main]/Ironic::Db::Online_data_migrations/Exec[ironic-db-online-data-migrations]: Triggered 'refresh' from 3 events
undercloud_upgrade_13.log:4274:2018-10-16 11:48:18,246 INFO: Notice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]/returns: executed successfully
undercloud_upgrade_13.log:4275:2018-10-16 11:48:19,665 INFO: Notice: /Stage[main]/Ironic::Db::Sync/Exec[ironic-dbsync]: Triggered 'refresh' from 2 events
undercloud_upgrade_13.log:4278:2018-10-16 11:48:23,255 INFO: Notice: /Stage[main]/Ironic::Db::Online_data_migrations/Exec[ironic-db-online-data-migrations]/returns: executed successfully
undercloud_upgrade_13.log:4279:2018-10-16 11:48:26,553 INFO: Notice: /Stage[main]/Ironic::Db::Online_data_migrations/Exec[ironic-db-online-data-migrations]: Triggered 'refresh' from 4 events

Comment 10 Dmitry Tantsur 2018-10-23 15:28:51 UTC
Yurii, could you run the same procedure, but please stop before trying an upgrade to 14? Essentially, just do FFU and leave the environment for me to investigate?

Comment 11 Dmitry Tantsur 2018-10-25 09:14:46 UTC
Discussed upstream. Apparently we have an incomplete upgrade procedure in ironic :( Affects all versions from Queens to master. Affects all upgrades, not only FFU.

Comment 17 errata-xmlrpc 2019-01-11 11:54:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045