Bug 1413686 - Upgrade from OSP 8 - OSP 9 Step - major-upgrade-pacemaker-converge.yaml
Summary: Upgrade from OSP 8 - OSP 9 Step - major-upgrade-pacemaker-converge.yaml
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Sofer Athlan-Guyot
QA Contact: Omri Hochman
URL:
Whiteboard:
: 1382127 1385143 1388521 1396360 1396365 1426253 (view as bug list)
Depends On:
Blocks: 1305654 1373538 1400606 1410575
TreeView+ depends on / blocked
 
Reported: 2017-01-16 16:39 UTC by Randy Perryman
Modified: 2017-03-30 19:34 UTC (History)
20 users (show)

Fixed In Version: openstack-tripleo-heat-templates-2.0.0-45.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-30 19:34:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport partc (986.93 KB, application/octet-stream)
2017-01-20 22:11 UTC, Randy Perryman
no flags Details
sopsreport partb (10.00 MB, application/octet-stream)
2017-01-20 22:12 UTC, Randy Perryman
no flags Details
sosreport part a (10.00 MB, application/x-xz)
2017-01-20 22:14 UTC, Randy Perryman
no flags Details
OS-Collect-Config cntl0 (13.76 MB, text/plain)
2017-01-24 14:25 UTC, Randy Perryman
no flags Details
OS-Collect-Config cntl1 (13.33 MB, text/plain)
2017-01-24 14:25 UTC, Randy Perryman
no flags Details
OS-Collect-Config cntl2 (1.21 MB, text/plain)
2017-01-24 14:26 UTC, Randy Perryman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1661202 0 None None None 2017-02-02 12:08:27 UTC
OpenStack gerrit 422837 0 None None None 2017-02-23 22:19:47 UTC
OpenStack gerrit 428093 0 None None None 2017-02-02 12:10:12 UTC
OpenStack gerrit 435560 0 None None None 2017-02-23 16:40:31 UTC
Red Hat Product Errata RHBA-2017:0859 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 director Bug Fix Advisory 2017-03-30 23:34:09 UTC

Description Randy Perryman 2017-01-16 16:39:22 UTC
Description of problem:
Complete all steps to major-upgrade-pacemaker-converge.yaml
Running the command comes back with a failure, whereas before I have had success:


Version-Release number of selected component (if applicable):


How reproducible:

This is happened twice now

Steps to Reproduce:
1. Completge Minor Update
2. Follow Upgrade instrucitons
3. Shutdown all non-migrated instance and complete compute/storage upgradce
4. run command per guide for major-upgrade-pacemaker-converge.yaml

Actual results:

Update Failed

Expected results:
Update Computle
---------------
Error from deployment-show
[rlp@paisley-dir ~]$ heat deployment-show ed372707-bdb5-4707-8136-2025d6a5dbb0
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "f8e038e1-7e52-418c-9fc6-18f2debdd902",
  "config_id": "0b5f1d99-bfa6-4135-bc49-26c869c2d689",
  "output_values": {
    "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-controller-0.localdomain in environment production in 14.42 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key '' --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\nam72XCbNtTCnkNX2PRF8knWdA\npassword\nregionOne\n-1\nTrue\nrabbit\nrcFby9wdYpMbrptBbF49uxh7K\n192.168.140.104,192.168.140.103,192.168.140.105\nredis://:gGydTWau2zVUcKWj3Y7d6KcTg.140.100:6379/\n600\nnotifications\n0.0.0.0\nDefault\nDefault\nTrue\ndatabase\nFalse\nhttp://192.168.140.101:5000/v2.0\ndatabase\n4952\nhttp://192.168.140.101:5000\nhttp://192.168.140.101:35357\n\u001b[mNotice: /Stage[main]/Gnocchi::Storage::Ceph/Package[python-cradox]/ensure: created\u001b[0m\n/var/log/ceilometer\n192.168.140.104\n\u001b[mNotice: /Stage[main]/Aodh::Client/Package[python-aodhclient]/ensure: created\u001b[0m\nservice\nceilometer\n/\n60\nservice\nguest\n2\nrcFby9wdYpMbrptBbF49uxh7K\nceilometer\n-1\nmongodb://192.168.140.104:27017,192.168.140.103:27017,192.168.140.105:27017/ceilometer?replicaSet=tripleo\nFalse\n8777\nservice\nhttp://192.168.140.101:8041\ngnocchi_resources.yaml\nlow\n\u001b[mNotice: /Stage[main]/Swift::Proxy/Swift::Service[swift-proxy-server]/Service[swift-proxy-server]/enable: enable changed 'true' to 'false'\u001b[0m\nGnmeJpcZTBha7R7D3qVNQ9MrY\ninternalURL\n\u001b[mNotice: /Stage[main]/Main/Exec[galera-ready]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:17.808 4211 INFO gnocchi.cli [-] Upgrading indexer <gnocchi.indexer.sqlalchemy.SQLAlchemyIndexer object at 0x3e34ad0>\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:20.840 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 10 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:33.847 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 9 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:46.863 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 8 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:59.878 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 7 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:43:12.893 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 6 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:43:25.909 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 5 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:43:38.923 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 4 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:43:51.940 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 3 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:04.955 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 2 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:17.971 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 CRITICAL gnocchi [-] DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi Traceback (most recent call last):\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/bin/gnocchi-upgrade\", line 10, in <module>\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     sys.exit(upgrade())\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/gnocchi/cli.py\", line 56, in upgrade\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     create_legacy_resource_types=conf.create_legacy_resource_types)\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/gnocchi/indexer/sqlalchemy.py\", line 249, in upgrade\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     with self.facade.writer_connection() as connection:\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib64/python2.7/contextlib.py\", line 17, in __enter__\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     return self.gen.next()\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 759, in _transaction_scope\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     allow_async=self._allow_async) as resource:\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib64/python2.7/contextlib.py\", line 17, in __enter__\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     return self.gen.next()\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 461, in _connection\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     mode=self.mode)\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 262, in _create_connection\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     self._start()\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 338, in _start\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     engine_args, maker_args)\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 362, in _setup_for_connection\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     sql_connection=sql_connection, **engine_kwargs)\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py\", line 152, in create_engine\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     test_conn = _test_connection(engine, max_retries, retry_interval)\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py\", line 334, in _test_connection\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi     six.reraise(type(de_ref), de_ref)\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi   File \"<string>\", line 2, in reraise\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')\u001b[0m\n\u001b[mNotice: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 ERROR gnocchi \u001b[0m\n\u001b[mNotice: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Triggered 'refresh' from 2 events\u001b[0m\n\u001b[mNotice: Finished catalog run in 162.66 seconds\u001b[0m\n",
    "deploy_stderr": "\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Swift]): swift_hash_suffix has been deprecated and should be replaced with swift_hash_path_suffix, this will be removed as part of the N-cycle\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Keystone]): Execution of db_sync does not depend on $enabled anymore. Please use sync_db instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Api]): The known_stores parameter is deprecated, use stores instead\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Api]): default_store not provided, it will be automatically set to glance.store.http.Store\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Registry]): Execution of db_sync does not depend on $manage_service or $enabled anymore. Please use sync_db instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Api]): ec2_listen_port, ec2_workers and keystone_ec2_url are deprecated and have no effect. Deploy openstack/ec2-api instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_host'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_protocol'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_port'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_path'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Server]): identity_uri, auth_tenant, auth_user, auth_password, auth_region configuration options are deprecated in favor of auth_plugin and related options\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Dhcp]): The dhcp_delete_namespaces parameter was removed in Mitaka, it does not take any affect\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::L3]): parameter external_network_bridge is deprecated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::L3]): parameter router_delete_namespaces was removed in Mitaka, it does not take any affect\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Metadata]): The auth_password parameter is deprecated and was removed in Mitaka release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Metadata]): The auth_tenant parameter is deprecated and was removed in Mitaka release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Metadata]): The auth_url parameter is deprecated and was removed in Mitaka release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer::Api]): The keystone_auth_uri parameter is deprecated. Please use auth_uri instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer::Api]): The keystone_identity_uri parameter is deprecated. Please use identity_uri instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Heat]): \"admin_user\", \"admin_password\", \"admin_tenant_name\" configuration options are deprecated in favor of auth_plugin and related options\u001b[0m\n\u001b[1;31mWarning: You cannot collect exported resources without storeconfigs being set; the collection will be ignored on line 123 in file /etc/puppet/modules/gnocchi/manifests/api.pp\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]: Failed to call refresh: gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --skip-storage --create-legacy-resource-types returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]: gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --skip-storage --create-legacy-resource-types returned 1 instead of one of [0]\u001b[0m\n",
    "deploy_status_code": 6
  },
  "creation_time": "2017-01-16T15:40:59",
  "updated_time": "2017-01-16T15:44:35",
  "input_values": {
    "step": 3,
    "update_identifier": {
      "deployment_identifier": 1484578322,
      "controller_config": {
        "1": "os-apply-config deployment 4be8e48c-ac35-4f01-9391-93f4c531592d completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "0": "os-apply-config deployment e149bd9d-92ad-4e03-b833-3ec14c8b900b completed,Root CA cert injection not enabled.,TLS not enabled.,None,",
        "2": "os-apply-config deployment df40fd45-7bb8-41e9-bdd6-4f0e3dbf6227 completed,Root CA cert injection not enabled.,TLS not enabled.,None,"
      },
      "allnodes_extra": "none"
    }
  },
  "action": "CREATE",
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6",
  "id": "ed372707-bdb5-4707-8136-2025d6a5dbb0"



Additional info:

Comment 1 Sofer Athlan-Guyot 2017-01-18 18:21:05 UTC
Hi,

So the error is a failure during the /usr/bin/gnocchi-upgrade

    /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:17.808 4211 INFO gnocchi.cli [-] Upgrading indexer <gnocchi.indexer.sqlalchemy.SQLAlchemyIndexer object at 0x3e34ad0>
    /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:42:20.840 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 10 attempts left.

    ... attempt to connect ...

    /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:17.971 4211 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.

    ... and finally appears to connect ... but got the (2013, 'Lost connection to MySQL server during query')

    /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]/returns: 2017-01-16 15:44:27.982 4211 CRITICAL gnocchi [-] DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')

To start debug we would need:
 from all three controllers:
   - /var/log/mariadb/mariadb.log
   - the output of journalctl -u mariadb
   - journalctl -u os-collect-config

the /var/log/gnocchi/gnocchi-upgrade.log which should be on the
bootstrap controller.

Comment 2 Randy Perryman 2017-01-18 19:01:10 UTC
The install this was done on is no longer available, but you should be able to replicate the issue very easy in your lab.

Comment 3 Randy Perryman 2017-01-20 22:10:03 UTC
I have replicated the error again .  Pulling SOS Report foryou.

Comment 4 Randy Perryman 2017-01-20 22:11:53 UTC
Created attachment 1243026 [details]
sosreport partc

Comment 5 Randy Perryman 2017-01-20 22:12:55 UTC
Created attachment 1243027 [details]
sopsreport partb

Comment 6 Randy Perryman 2017-01-20 22:14:14 UTC
Created attachment 1243028 [details]
sosreport part a

Comment 7 Randy Perryman 2017-01-20 22:15:10 UTC
Error from this run was: 
"deploy_stderr": "\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Swift]): swift_hash_suffix has been deprecated and should be replaced with swift_hash_path_suffix, this will be removed as part of the N-cycle\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Keystone]): Execution of db_sync does not depend on $enabled anymore. Please use sync_db instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Api]): The known_stores parameter is deprecated, use stores instead\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Api]): default_store not provided, it will be automatically set to glance.store.http.Store\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Registry]): Execution of db_sync does not depend on $manage_service or $enabled anymore. Please use sync_db instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Api]): ec2_listen_port, ec2_workers and keystone_ec2_url are deprecated and have no effect. Deploy openstack/ec2-api instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_host'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_protocol'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_port'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_path'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Server]): identity_uri, auth_tenant, auth_user, auth_password, auth_region configuration options are deprecated in favor of auth_plugin and related options\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Dhcp]): The dhcp_delete_namespaces parameter was removed in Mitaka, it does not take any affect\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::L3]): parameter external_network_bridge is deprecated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::L3]): parameter router_delete_namespaces was removed in Mitaka, it does not take any affect\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Metadata]): The auth_password parameter is deprecated and was removed in Mitaka release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Metadata]): The auth_tenant parameter is deprecated and was removed in Mitaka release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Agents::Metadata]): The auth_url parameter is deprecated and was removed in Mitaka release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer::Api]): The keystone_auth_uri parameter is deprecated. Please use auth_uri instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer::Api]): The keystone_identity_uri parameter is deprecated. Please use identity_uri instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Heat]): \"admin_user\", \"admin_password\", \"admin_tenant_name\" configuration options are deprecated in favor of auth_plugin and related options\u001b[0m\n\u001b[1;31mWarning: You cannot collect exported resources without storeconfigs being set; the collection will be ignored on line 123 in file /etc/puppet/modules/gnocchi/manifests/api.pp\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]: Failed to call refresh: gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --skip-storage --create-legacy-resource-types returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]: gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --skip-storage --create-legacy-resource-types returned 1 instead of one of [0]\u001b[0m\n",

Comment 8 Randy Perryman 2017-01-20 22:16:04 UTC
Command used to Upgrade I added the --force-postconfig

 openstack overcloud deploy --log-file ~/pilot/upgrade_converge_deployment.log -t 120 --force-postconfig --templates ~/pilot/templates/overcloud -e ~/pilot/templates/overcloud/environments/network-isolation.yaml -e ~/pilot/templates/overcloud/environments/storage-environment.yaml -e ~/pilot/templates/overcloud/environments/puppet-pacemaker.yaml -e ~/pilot/templates/overcloud/environments/major-upgrade-pacemaker-converge.yaml -e ~/pilot/templates/dell-environment.yaml -e ~/pilot/templates/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage --swift-storage-flavor swift-storage --block-storage-flavor block-storage --neutron-public-interface bond1 --neutron-network-type vlan --neutron-disable-tunneling --control-scale 3 --compute-scale 3 --ceph-storage-scale 3 --ntp-server 10.127.1.3 --neutron-network-vlan-ranges physint:201:220,physext --neutron-bridge-mappings physint:br-tenant,physext:br-ex

Comment 9 Sofer Athlan-Guyot 2017-01-23 19:17:37 UTC
Hi,

The internal CI and myself have tested an OSP8 upgrade and we couldn't
reproduce the error.

In the logs I could find a lot of errors, not just the
gnocchi-db-upgrade error.  So I can't explain why the deployment went
on and didn't stop at the first error.  Here are the errors from the
Jan 20 (there are others from the days before, but I'll focus on the
latest):

    Jan 20 19:22:18: [0]
     - exit 1:  ControllerLoadBalancerDeployment_Step1: Duplicate declaration: User[hacluster]

    Jan 20 19:22:34: [1]
     - exit 1: ControllerOvercloudServicesDeployment_Step4: Duplicate declaration: User[hacluster]

This duplicate declaration error is very strange.  It indicates that
we use this code from OSP8

https://github.com/openstack/tripleo-heat-templates/blob/b0ba9e8e09d70cb5871a6f343a698e3b481ac297/puppet/manifests/overcloud_controller_pacemaker.pp#L71-L73

but it's not present in mitaka (OSP9)

in the log from this point on we can see that the db is not available
on controller0, and pacemaker is moving resources around.  Which
indicate a messing pcs cluster status.  Then we have the
gnocchi-db-upgrade error, and the duplicate declaration error again:

   Jan 20 19:25:45: [2]
    - exit 6: ControllerOvercloudServicesDeployment_Step4: gnocchi-db-upgrade error, cannot reach db.

   Jan 20 19:25:50: [3]
    - exit 1: ControllerOvercloudServicesDeployment_Step5: duplicate declaration

Here we have another error with exactly the same resource name:
"ControllerOvercloudServicesDeployment_Step5"

   Jan 20 19:29:29: [4]
    - exit 6: ControllerOvercloudServicesDeployment_Step5: gnocchi-db-upgrade

And so on

   Jan 20 19:29:34: [5]
    - exit 1: ControllerOvercloudServicesDeployment_Step6: duplicate declaration

   Jan 20 19:54:24: [6]
    - exit 6: ControllerOvercloudServicesDeployment_Step6: gnocchi-db-upgrade

   Jan 20 19:54:40: [7]
    - exit 1: ControllerServicesBaseDeployment_Step2: duplicate declaration

   Jan 20 20:08:20: [8]
    - exit 6: ControllerOvercloudServicesDeployment_Step4: gnocchi-db-upgrade

Eventually we have this last gnocchi-db-upgrade error.

   Jan 20 21:23:54: [9]
    - exit 6: ControllerOvercloudServicesDeployment_Step4: gnocchi-db-upgrade

I don't really get why the deployment didn't stop at the first error,
and how we can have the duplicate declaration error which would
indicate the use of osp8 templates on the undercloud.

So maybe I'm missing something here.

I hope that it helps, (please have a look at the PPS)

PS:  All those errors should be available from the sos report:

    grep -E 'deploy_status_code[^0-9]+[1-9]' sosreport-overcloud-controller-0.localdomain-20170120212759/sos_commands/pacemaker/crm_report/overcloud-controller-0.localdomain/journal.log

On the controller node you can use:

    journalctl -u os-collect-config | grep -E 'deploy_status_code[^0-9]+[1-9]'


PPS: As a side note, while digging into the log I noticed that this point
was note done in
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/upgrading-red-hat-openstack-platform/chapter-3-director-based-environments-performing-upgrades-to-major-versions
in section 3.4.1. Pre-Upgrade Notes for the Overcloud you have to
generate a ceph key:

       key=$(ssh heat-admin@${ceph_node} ceph-authtool --gen-print-key)
        cat > ceph-client-key.yaml <<EOF
    parameter_defaults:
      CephClientKey: '${key}'
    EOF

The ceph key I saw in the logs was empty.

==== ERRORS ====

[0]:

Jan 20 19:22:18 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:22:18,003] (heat-config)
[DEBUG] [2017-01-20 19:22:12,238] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/e83b313e-d02f-41a5-a2ec-4a61243b2829"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerLoadBalancerDeployment_Step1"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/e83b313e-d02f-41a5-a2ec-4a61243b2829.pp

Jan 20 19:22:18 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:22:17,997] (heat-config)
[INFO] Return code 1

Jan 20 19:22:18 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:22:17,997] (heat-config)
[INFO] Error: Duplicate declaration: User[hacluster] is already
declared in file
/var/lib/heat-config/heat-config-puppet/e83b313e-d02f-41a5-a2ec-4a61243b2829.pp:91;
cannot redeclare at
/etc/puppet/modules/pacemaker/manifests/corosync.pp:121 on node
overcloud-controller-0.localdomain



[1]:

Jan 20 19:22:34 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:22:34,452] (heat-config)
[DEBUG] [2017-01-20 19:22:30,590] (heat-config) [DEBUG] Running
FACTER_heat
_outputs_path="/var/run/heat-config/heat-config-puppet/f7e4dfa7-9634-4e19-9669-df33a19bce40"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudS
ervicesDeployment_Step4" puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/f7e4dfa7-9634-4e19-9669-df33a19bce40.pp

Jan 20 19:22:34 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:22:34,446] (heat-config)
[INFO] Return code 1

[2]:

Jan 20 19:25:45 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:25:45,462] (heat-config)
[DEBUG] [2017-01-20 19:22:35,150] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/c64676fa-0b3b-4ae3-b8a7-f1cab5ed6fa5"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step4"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/c64676fa-0b3b-4ae3-b8a7-f1cab5ed6fa5.pp
Jan 20 19:25:45 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:25:45,456] (heat-config)
[INFO] Return code 6

[3]:

Jan 20 19:25:50 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:25:50,022] (heat-config)
[DEBUG] [2017-01-20 19:25:46,227] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/5efb6561-178f-4e15-9767-abb53e2da51f"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step5"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/5efb6561-178f-4e15-9767-abb53e2da51f.pp

Jan 20 19:25:50 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:25:50,017] (heat-config)
[INFO] Return code 1

[4]:

Jan 20 19:29:29 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:29:29,938] (heat-config)
[DEBUG] [2017-01-20 19:25:50,834] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/558bc8c7-90f1-4f74-b489-5bedb84c94c5"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step5"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/558bc8c7-90f1-4f74-b489-5bedb84c94c5.pp

Jan 20 19:29:29 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:29:29,933] (heat-config)
[INFO] Return code 6

[5]:

Jan 20 19:29:34 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:29:34,445] (heat-config)
[DEBUG] [2017-01-20 19:29:30,691] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/318a872d-7ef7-4d26-877c-d87d800e4dbb"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step6"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/318a872d-7ef7-4d26-877c-d87d800e4dbb.pp

Jan 20 19:29:34 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:29:34,441] (heat-config)
[INFO] Return code 1

[6]:

Jan 20 19:54:24 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:54:24,942] (heat-config)
[DEBUG] [2017-01-20 19:29:35,134] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/46062b89-3f61-4bd5-9c0b-313e03fc3682"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step6"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/46062b89-3f61-4bd5-9c0b-313e03fc3682.pp

Jan 20 19:54:24 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:54:24,928] (heat-config)
[INFO] Return code 6

[7]

Jan 20 19:54:40 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:54:40,319] (heat-config)
[DEBUG] [2017-01-20 19:54:36,412] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/5a012e7a-726e-41f1-a273-bf355886703a"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerServicesBaseDeployment_Step2"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/5a012e7a-726e-41f1-a273-bf355886703a.pp

Jan 20 19:54:40 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 19:54:40,313] (heat-config)
[INFO] Return code 1

[8]

Jan 20 20:08:20 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 20:08:20,372] (heat-config)
[DEBUG] [2017-01-20 20:05:12,642] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/ef993960-bd4e-47a4-84b4-fd62c2e7abb7"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step4"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/ef993960-bd4e-47a4-84b4-fd62c2e7abb7.pp

Jan 20 20:08:20 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 20:08:20,366] (heat-config)
[INFO] Return code 6

[9]

Jan 20 21:23:54 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 21:23:54,286] (heat-config)
[DEBUG] [2017-01-20 21:20:47,794] (heat-config) [DEBUG] Running
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/8aff46f5-6f7b-4fda-acbf-0fd02ab3df84"
FACTER_fqdn="overcloud-controller-0.localdomain"
FACTER_deploy_config_name="ControllerOvercloudServicesDeployment_Step4"
puppet apply --detailed-exitcodes
/var/lib/heat-config/heat-config-puppet/8aff46f5-6f7b-4fda-acbf-0fd02ab3df84.pp

Jan 20 21:23:54 overcloud-controller-0.localdomain
os-collect-config[5189]: [2017-01-20 21:23:54,280] (heat-config)
[INFO] Return code 6

Comment 10 Randy Perryman 2017-01-23 19:25:38 UTC
I will look into the Ceph key.  Not sure what to do about the rest.

Comment 11 Sofer Athlan-Guyot 2017-01-24 08:53:07 UTC
Hi,

if you still have the platform, or plan to deploy it again, I would really like to have the output of:

    journalctl -u os-collect-config

right at the end of the deployment on the controller nodes (all the nodes would be even better)

The logs I've seen in the sos report spans on several days, so maybe I'm missing something about the way it's handled.

The idea here would be to check if that exact same pattern error reproduce, that
would indicate a problem in the way the upgrade is done and/or in the environment.

Regards,

Comment 12 Randy Perryman 2017-01-24 14:25:02 UTC
Created attachment 1243938 [details]
OS-Collect-Config cntl0

Comment 13 Randy Perryman 2017-01-24 14:25:56 UTC
Created attachment 1243939 [details]
OS-Collect-Config cntl1

Comment 14 Randy Perryman 2017-01-24 14:26:48 UTC
Created attachment 1243940 [details]
OS-Collect-Config cntl2

Comment 15 Sofer Athlan-Guyot 2017-01-24 22:50:37 UTC
Hi,

Thanks for the logs.  So for cntl0 there are the same as the one I've
got, so nothing new here.

In cntl1 we have those errors:

   Jan 20 19:25:29 overcloud-controller-1.localdomain
   os-collect-config[4761]: [2017-01-20 19:25:29,577] (heat-config)
   [INFO] {"deploy_stdout": "", "deploy_stderr": "\u001b[1;31mError:
   Duplicate declaration: User[hacluster] is already declared in file
   /var/lib/heat-config/heat-config-puppet/e989f9ea-c90b-4ebf-bd20-53dfaa9f4b53.pp:91;
   cannot redeclare at
   /etc/puppet/modules/pacemaker/manifests/corosync.pp:121 on node
   overcloud-controller-1.localdomain\u001b[0m\n\u001b[1;31mError:
   Duplicate declaration: User[hacluster] is already declared in file
   /var/lib/heat-config/heat-config-puppet/e989f9ea-c90b-4ebf-bd20-53dfaa9f4b53.pp:91;
   cannot redeclare at
   /etc/puppet/modules/pacemaker/manifests/corosync.pp:121 on node
   overcloud-controller-1.localdomain\u001b[0m\n",
   "deploy_status_code": 1}

With are the duplicate hacluster error as seen on cntl0.  This again
could indicate the use of the wrong templates on the undercloud during
the upgrade.

If we get a look at
/var/lib/heat-config/heat-config-puppet/e989f9ea-c90b-4ebf-bd20-53dfaa9f4b53.pp
we should see that it matches a osp8 puppet configuration.

As another check, I would like to have a look at /var/lib/heat-config/
on cntl0 and cntl1 (we can ignore cntl2, should the same as cntl1)

  tar cfJ ctnlX.tar.xz /var/lib/heat-config/

The idea here would be to confirm that we are polluted we osp8
configuration scripts during the upgrade.

Next would be to redo the deployment using the cephclientkey
configuration as it will put that out of the way.

Comment 16 Randy Perryman 2017-01-24 22:55:59 UTC
I have take the cluster down and working on installing again.  Has anyone else made it to this step and been successful?

Comment 17 Randy Perryman 2017-01-31 19:29:59 UTC
We have cluster ready for testing here.

Comment 18 Michele Baldessari 2017-02-01 16:43:45 UTC
Just recapping a bit what we discussed/observed in the call today.

The symptom that was observed after the convergence step is that a bunch of
pacemaker resourced did not start. The reason for this was that there was an /etc/my.cnf.d/server.cnf file generated by puppet that contained bind-address = 127.0.0.1. The investigation will continue tomorrow and we will try to figure out why and at which step such a file would be created by puppet.

The odd thing is that such a file should not even be created/managed. This would only happen if the hiera variable "enable_galera" was set to false (which we observed being set to true on the controllers).

We will do another session where the setup has been through the upgrades up to step 6 (but not included). Before doing step 6 (major-upgrade-pacemaker step). We will verify in what state the controllers are (is server.cnf present, services, os-collect-config state. etc)

We observed both today and in the sosreport attached to this case that server.cnf is dated before the galera.cnf file (in today's lab by 30mins and the sosreports by a day) which suggests that server.cnf gets created in a step before the convergence one (although we can't yet be 100% sure). We tried looking at the logs in this sosreports but they actually got rotated after the server.cnf creation date, so we cannot infer much as to what created it.

Comment 19 Randy Perryman 2017-02-01 17:08:59 UTC
I have asked for a listing of /etc/my.cnf.d between each step during this install and if a server.cnf appears stop the process.

Comment 20 arkady kanevsky 2017-02-01 17:42:41 UTC
Can we do something quick short term?
Like removing file from puppet control? 
Or adding a manual step that we can automate in upgrade script to change an entry in a file.

We should do the right solution including pushing it upstream correctly.
But this is a release blocker and we need workaround for release yesterday...

Why QE had not bumped into it?

Comment 21 Randy Perryman 2017-02-01 19:15:07 UTC
FYI - server.cnf is installed as part of OSP8
here are the contents  and as you can see most are commented out:



[heat-admin@overcloud-controller-0 ~]$ cat /etc/my.cnf.d/server.cnf
#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql/
#

# this is read by the standalone daemon and embedded servers
[server]

# this is only for the mysqld standalone daemon
[mysqld]

#
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0

# this is only for embedded server
[embedded]

# This group is only read by MariaDB-5.5 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mysqld-5.5]

# These two groups are only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

[mariadb-5.5]

[heat-admin@overcloud-controller-0 ~]$

Comment 22 Sofer Athlan-Guyot 2017-02-02 12:08:27 UTC
Hi,

so the root of the problem is the patch for having the vm migration
working between controller upgrade and convergence.  It was created
there https://bugzilla.redhat.com/show_bug.cgi?id=1385143 .  This
would explain why QA didn't bump into it as it's not yet officially
released.

Part of it is applied during controller upgrade.  It creates the
default /etc/my.cnf.d/server.cnf, but as the mysqld is not restarted
at that time it stays unnoticed.

At convergence time when the mysqld is restarted the new bind-address is
taken into account and breaks the haproxy/mysql link.

Adding a new review to:
 - fix this bug (adding the associated upstream bug)
 - ensure working vm migration/creation during all stages.

The code is still currently WIP, I will update the bugzilla when it will be ready for consumption.

So currently the best course of action would be to not apply the patches for having the vm migration working between controller upgrade and convergence.

Furthermore, from the sosreport, it appears that you bump into the error during keystone-migration due to the installation of the the version of the puppet modules.  The relevant bug is described there https://bugzilla.redhat.com/show_bug.cgi?id=1414784 .  The solution here is to remove the point 2.a/2.b where yum -y update openstack-puppet-modules" is done.

Comment 23 arkady kanevsky 2017-02-02 16:18:15 UTC
Sofer, thank you for getting to the bottom of it.

So how do we deliver fixes for this BZ and https://bugzilla.redhat.com/show_bug.cgi?id=1385143 at the same time?
We need both to deliver upgrade minimizing disruption on data plane.
Without VM migration experience is getting worse with the increased #s of compute nodes.

Solving one without another is not very useful.

Comment 24 Sofer Athlan-Guyot 2017-02-03 18:02:42 UTC
Hi,

Arkady, yes the goal is to be able to manipulate vm in all those states:

 - controllers upgraded/compute not upgraded
 - controllers upgraded/part of compute upgraded
 - controllers upgraded/all computes upgraded/no convergence

Randy, this is still WIP.  It looks like it works for vm creation in
all those states.  VM migration hasn't been tested yet.

It still requires manual intervention.  But it's very close.

With the latest revision of the patch the server.cnf is to not updated
anymore, but it still misses a parameter for the nova_api database.

I'm going to continue the work and should have most of it covered by
Monday.

The relevant patch is https://review.openstack.org/#/c/428093/.  When
it's working I will update here about how to apply it on OSP8 cleanly.

Comment 25 Sofer Athlan-Guyot 2017-02-03 18:21:21 UTC
Hi,

Updated a new version that should solved the last manual trick.  Still
not fully tested.

Nevertheless here is how you would apply it:

    # BZ 1413686
    curl https://review.openstack.org/changes/408669/revisions/current/patch?download | \
        base64 -d | \
        sudo patch -d /usr/share/openstack-tripleo-heat-templates -p1

    curl https://review.openstack.org/changes/422837/revisions/current/patch?download | \
        base64 -d | \
        sudo patch -d /usr/share/openstack-tripleo-heat-templates -p1

    curl https://review.openstack.org/changes/428093/revisions/current/patch?download | \
        base64 -d | \
        sudo patch -d /usr/share/openstack-tripleo-heat-templates -p1

On the undercloud, and before the controller upgrade step.

Regards,

Comment 26 Mike Orazi 2017-02-03 18:27:32 UTC
(In reply to arkady kanevsky from comment #23)
> Sofer, thank you for getting to the bottom of it.
> 
> So how do we deliver fixes for this BZ and
> https://bugzilla.redhat.com/show_bug.cgi?id=1385143 at the same time?
> We need both to deliver upgrade minimizing disruption on data plane.
> Without VM migration experience is getting worse with the increased #s of
> compute nodes.
> 
> Solving one without another is not very useful.

These should be able to coexist (and Sofer has indicated that he is performing his testing with one layered on top of the other as described in:  https://bugzilla.redhat.com/show_bug.cgi?id=1413686#c25

Comment 27 Randy Perryman 2017-02-03 18:49:15 UTC
Thank You for the update. We are now testing the series of patches in sequence.

Comment 28 arkady kanevsky 2017-02-03 21:47:09 UTC
Thank you Sofer and Mike.
Much appreciate that it was done in such a short time.

One correction to Sofer instructions for documentation/scripting.

Last patch https://review.openstack.org/#/c/428093 had not been merged yet and hence can be changed before final merge.
Suggest that for our release we do not add depedency on unknown code.
Instead use the current version that Sofer submitted:
https://review.openstack.org/#/c/428093/11/

Consistent with our philosophy of locked bits and controlling what code is being used by customer.

Comment 29 Randy Perryman 2017-02-03 22:05:54 UTC
The tests are in and now we are receiving an error with CephStorage:


{
  "status": "FAILED",
  "server_id": "87c1ee96-8cb1-46ae-ac19-9391c325ce59",
  "config_id": "1dd17b4c-01e0-48dd-a5fd-60d823020397",
  "output_values": {
    "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-cephstorage-0.localdomain in environment production in 0.96 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_keystone_revocation_interval]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_keystone_url]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_s3_auth_use_keystone]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_init_timeout]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_keystone_admin_token]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key '' --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_keystone_make_new_tenants]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_keystone_accepted_roles]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Conf/Ceph_config[client.radosgw.gateway/rgw_keystone_token_cache size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ntp::Config/File[/etc/ntp.conf]/content: content changed '{md5}04ef455e1ab8ac186bb2055a3ae65754' to '{md5}895e208998c1be1ae515236df50aef64'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ntp::Service/Service[ntp]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: + test -b /dev/sdm\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: + ceph-disk prepare /dev/sdm /dev/sdc\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: Could not create partition 2 from 34 to 20480033\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: Unable to set partition 2's name to 'ceph journal'!\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: Could not change partition 2's type code to 45b0969e-9b03-4f30-b4c6-b4b80ceff106!\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: Error encountered; not saving changes.\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: ceph-disk: Error: Command '['/usr/sbin/sgdisk', '--new=2:0:+10000M', '--change-name=2:ceph journal', '--partition-guid=2:ca5ffc4f-c062-427e-afd7-37c49ad73ab5', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--mbrtogpt', '--', '/dev/sdc']' returned non-zero exit status 4\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + ceph-disk activate /dev/sde1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: === osd.10 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: Starting Ceph osd.10 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -b /dev/sdd\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -b /dev/sdd\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -b /dev/sdd1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -b /dev/sdd1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + ceph-disk activate /dev/sdd1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: === osd.0 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: Starting Ceph osd.0 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-activate-/dev/sdm]: Dependency Exec[ceph-osd-prepare-/dev/sdm] has failures: true\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: + test -b /dev/sdo\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: + test -b /dev/sdo\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: + test -b /dev/sdo1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: + test -b /dev/sdo1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: + ceph-disk activate /dev/sdo1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: === osd.8 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: Starting Ceph osd.8 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdo]/Exec[ceph-osd-activate-/dev/sdo]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: + test -b /dev/sdi\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: + test -b /dev/sdi\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: + test -b /dev/sdi1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: + test -b /dev/sdi1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: + ceph-disk activate /dev/sdi1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: === osd.9 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: Starting Ceph osd.9 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdi]/Exec[ceph-osd-activate-/dev/sdi]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: + test -b /dev/sdk\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: + test -b /dev/sdk\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: + test -b /dev/sdk1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: + test -b /dev/sdk1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: + ceph-disk activate /dev/sdk1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: === osd.6 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: Starting Ceph osd.6 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdk]/Exec[ceph-osd-activate-/dev/sdk]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: + test -b /dev/sdf\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: + test -b /dev/sdf\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: + test -b /dev/sdf1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: + test -b /dev/sdf1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: + ceph-disk activate /dev/sdf1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: === osd.24 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: Starting Ceph osd.24 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdf]/Exec[ceph-osd-activate-/dev/sdf]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: + test -b /dev/sdj\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: + test -b /dev/sdj\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: + test -b /dev/sdj1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: + test -b /dev/sdj1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: + ceph-disk activate /dev/sdj1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: === osd.30 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: Starting Ceph osd.30 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdj]/Exec[ceph-osd-activate-/dev/sdj]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: + test -b /dev/sdg\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: + test -b /dev/sdg\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: + test -b /dev/sdg1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: + test -b /dev/sdg1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: + ceph-disk activate /dev/sdg1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: === osd.33 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: Starting Ceph osd.33 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdg]/Exec[ceph-osd-activate-/dev/sdg]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: + test -b /dev/sdn\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: + ceph-disk prepare /dev/sdn /dev/sdc\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: Could not create partition 2 from 34 to 20480033\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: Unable to set partition 2's name to 'ceph journal'!\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: Could not change partition 2's type code to 45b0969e-9b03-4f30-b4c6-b4b80ceff106!\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: Error encountered; not saving changes.\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: ceph-disk: Error: Command '['/usr/sbin/sgdisk', '--new=2:0:+10000M', '--change-name=2:ceph journal', '--partition-guid=2:90a0c431-9082-4436-a432-0410c763952c', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--mbrtogpt', '--', '/dev/sdc']' returned non-zero exit status 4\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-activate-/dev/sdn]: Dependency Exec[ceph-osd-prepare-/dev/sdn] has failures: true\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: + test -b /dev/sdl\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: + test -b /dev/sdl\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: + test -b /dev/sdl1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: + test -b /dev/sdl1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: + ceph-disk activate /dev/sdl1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: === osd.27 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: Starting Ceph osd.27 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdl]/Exec[ceph-osd-activate-/dev/sdl]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: + test -b /dev/sdh\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: + test -b /dev/sdh\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: + test -b /dev/sdh1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: + test -f /usr/lib/udev/rules.d/95-ceph-osd.rules.disabled\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: + test -b /dev/sdh1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: + ceph-disk activate /dev/sdh1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: === osd.21 === \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: Starting Ceph osd.21 on overcloud-cephstorage-0...already running\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdh]/Exec[ceph-osd-activate-/dev/sdh]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 10.63 seconds\u001b[0m\n",
    "deploy_stderr": "\u001b[1;31mError: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdm ; then\n  mkdir -p /dev/sdm\nfi\nceph-disk prepare  /dev/sdm /dev/sdc\nudevadm settle\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-prepare-/dev/sdm]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdm ; then\n  mkdir -p /dev/sdm\nfi\nceph-disk prepare  /dev/sdm /dev/sdc\nudevadm settle\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdm]/Exec[ceph-osd-activate-/dev/sdm]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mError: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdn ; then\n  mkdir -p /dev/sdn\nfi\nceph-disk prepare  /dev/sdn /dev/sdc\nudevadm settle\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-prepare-/dev/sdn]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdn ; then\n  mkdir -p /dev/sdn\nfi\nceph-disk prepare  /dev/sdn /dev/sdc\nudevadm settle\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdn]/Exec[ceph-osd-activate-/dev/sdn]: Skipping because of failed dependencies\u001b[0m\n",
    "deploy_status_code": 6
  },
  "creation_time": "2017-02-03T21:24:28",
  "updated_time": "2017-02-03T21:24:53",
  "input_values": {
    "update_identifier": {
      "cephstorage_config": {
        "1": "os-apply-config deployment 6b20a379-046b-462a-95bd-4a9e2614e238 completed,Root CA cert injection not enabled.,None,",
        "0": "os-apply-config deployment 548c2f07-7a31-4aa6-b06b-4256dcf71341 completed,Root CA cert injection not enabled.,None,",
        "2": "os-apply-config deployment fc341e8e-7f56-4131-ba68-7de5228bd740 completed,Root CA cert injection not enabled.,None,"
      },
      "deployment_identifier": 1486155926,
      "allnodes_extra": "none"
    }
  },
  "action": "CREATE",
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6",
  "id": "c4fa66bd-8c56-45b8-bf55-8deea9fde674"
}

Comment 30 Audra Cooper 2017-02-08 14:45:18 UTC
I ran through Upgrade again successfully this time!!  Looks like the patch worked!
Thank you!

Comment 31 Sofer Athlan-Guyot 2017-02-09 11:30:33 UTC
Hi Audra,

that's really good news.  So using the command in https://bugzilla.redhat.com/show_bug.cgi?id=1413686#c25 you were able, yesterday, to have a successful upgrade with vm migration during the upgrade.  That is you must have use https://review.openstack.org/#/c/428093/18 .

In my tests of the last version of my patch I was able to create vm during all on upgraded/non-upgraded compute nodes between controller upgrade and convergence stages.

I'm going to have this merged as soon as possible upstream.  A final check will be done by red-hat QE before landing, especially for migration during upgrade that I could not test myself.

Regards,

Comment 32 Sofer Athlan-Guyot 2017-02-23 16:40:32 UTC
Adding another necessary patch.

Comment 33 Sofer Athlan-Guyot 2017-02-23 22:01:33 UTC
Hi,

so to have this working you need to apply those patch.  Assuming the templates are in /usr/share/openstack/tripleo-heat-templates, the necessary commands are:

curl https://review.openstack.org/changes/408669/revisions/current/patch?download | \
    base64 -d | \
    sudo patch -d /usr/share/openstack-tripleo-heat-templates -p1

curl https://review.openstack.org/changes/422837/revisions/current/patch?download | \
    base64 -d | \
    sudo patch -d /usr/share/openstack-tripleo-heat-templates -p1

curl https://review.openstack.org/changes/428093/revisions/current/patch?download | \
    base64 -d | \
    sudo patch -d /usr/share/openstack-tripleo-heat-templates -p1

The reviews are merged upstream and won't change anymore.

Comment 34 Randy Perryman 2017-02-23 22:31:54 UTC
We just verified that all three patches are applied to our install.

Comment 35 Randy Perryman 2017-02-23 22:35:58 UTC
(In reply to Randy Perryman from comment #34)
> We just verified that all three patches are applied to our Upgrade patch list.
We are going to validate them and be sure they are in the running template directory and see is we missed one.

Comment 36 Audra Cooper 2017-02-24 17:42:55 UTC
(In reply to Randy Perryman from comment #35)
> (In reply to Randy Perryman from comment #34)
> > We just verified that all three patches are applied to our Upgrade patch list.
> We are going to validate them and be sure they are in the running template
> directory and see is we missed one.

We had missed one.  After re-running, Upgrade completes successfully!

Comment 37 Michele Baldessari 2017-02-24 17:59:23 UTC
*** Bug 1426253 has been marked as a duplicate of this bug. ***

Comment 38 Michele Baldessari 2017-02-24 18:04:21 UTC
*** Bug 1382127 has been marked as a duplicate of this bug. ***

Comment 39 Mike Burns 2017-03-09 18:15:57 UTC
*** Bug 1385143 has been marked as a duplicate of this bug. ***

Comment 40 Sofer Athlan-Guyot 2017-03-17 10:15:48 UTC
*** Bug 1396360 has been marked as a duplicate of this bug. ***

Comment 41 Sofer Athlan-Guyot 2017-03-17 10:18:45 UTC
*** Bug 1396365 has been marked as a duplicate of this bug. ***

Comment 42 Sofer Athlan-Guyot 2017-03-17 10:39:28 UTC
*** Bug 1388521 has been marked as a duplicate of this bug. ***

Comment 46 errata-xmlrpc 2017-03-30 19:34:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0859


Note You need to log in before you can comment on or make changes to this bug.