rhel-osp-director: Scaling of computes post upgrade 7.3->8.0 failes with error: yum -y update returned 1 instead of one of [0] Environment: openstack-tripleo-heat-templates-0.8.14-5.el7ost.noarch openstack-puppet-modules-7.0.17-1.el7ost.noarch instack-undercloud-2.2.7-2.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-5.el7ost.noarch Steps to reproduce: 1. deploy overcloud 7.3 with 1 compute and popuate it with objects. 2. upgrade the setup to 8.0 3. Attemp to scale computes from 1 to 3. Result: Stack overcloud UPDATE_FAILED Heat Stack update failed. Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::host'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_protocol'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::port'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_path'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer::Agent::Compute]): This class is deprecated. Please use ceilometer::agent::polling with compute namespace instead.\u001b[0m\n\u001b[1;31mWarning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.\n (at /usr/share/ruby/vendor_ruby/puppet/type.rb:816:in `set_default')\u001b[0m\n\u001b[1;31mError: yum -y update returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Tripleo::Packages/Exec[package-upgrade]/returns: change from notrun to 0 failed: yum -y update returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Vswitch::Ovs/Service[openvswitch]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ntp::Service/Service[ntp]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ntp/Anchor[ntp::end]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron::Plugins::Ovs::Bridge[datacentre:br-ex]/Vs_bridge[br-ex]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Package[neutron-ovs-agent]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/local_ip]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/extensions]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/enable_distributed_routing]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/tunnel_bridge]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/prevent_arp_spoofing]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/enable_tunneling]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/integration_bridge]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[securitygroup/firewall_driver]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[ovs-cleanup-service]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/bridge_mappings]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/arp_responder]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/polling_interval]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/drop_flows_on_start]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/vxlan_udp_port]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Snmp/Service[snmptrapd]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Libvirt/Service[messagebus]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Libvirt/Service[libvirt]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/l2_population]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Snmp/Service[snmpd]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/tunnel_types]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[neutron-ovs-agent-service]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute/Nova::Generic_service[compute]/Service[nova-compute]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova/Exec[networking-refresh]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Rbd/File[/etc/nova/secret.xml]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Rbd/Exec[get-or-set virsh secret]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Rbd/Exec[set-secret-value virsh]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ceilometer::Agent::Compute/Service[ceilometer-agent-compute]: Skipping because of failed dependencies\u001b[0m\n", "deploy_status_code": 6 The issue reproduces. When I attempt to run "yum update" on the new computes, I get: Loaded plugins: product-id, search-disabled-repos, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. There are no enabled repos. Run "yum repolist all" to see the repos you have. You can enable repos with yum-config-manager --enable <repo> And the exit code from the above command is "1". But should it fail the scale up?
WIP patches (not tested manually yet) sumbitted to tripleo-common and python-tripleoclient. We'll also need a tripleo-heat-templates patch which i'm working on now. And i'll try to test those together.
Adding my test here as it seems to have the same cause: Doing openstack overcloud deploy after the upgrade is finished resulted in the following error: The upgrade command at step 6: stack@instack:~>>> cat deploy.ha.ceph.ipv6.ssl.8.0-step6 export THT=~/templates/my-overcloud-8.0 openstack overcloud deploy --templates $THT \ -e $THT/environments/network-isolation-v6.yaml \ -e ~/templates/network-environment-8.0-v6.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/enable-tls.yaml \ -e ~/templates/inject-trust-anchor.yaml \ -e $THT/environments/major-upgrade-pacemaker-converge.yaml \ --control-scale 3 \ --compute-scale 1 \ --ceph-storage-scale 2 \ --ntp-server clock.redhat.com \ --libvirt-type qemu Removing major-upgrade-pacemaker-converge.yaml and running deploy: stack@instack:~>>> cat deploy.ha.ceph.ipv6.ssl.8.0 export THT=~/templates/my-overcloud-8.0 openstack overcloud deploy --templates $THT \ -e $THT/environments/network-isolation-v6.yaml \ -e ~/templates/network-environment-8.0-v6.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/enable-tls.yaml \ -e ~/templates/inject-trust-anchor.yaml \ --control-scale 3 \ --compute-scale 1 \ --ceph-storage-scale 2 \ --ntp-server clock.redhat.com \ --libvirt-type qemu ended with: { "status": "FAILED", "server_id": "9e1aca53-99b3-4d00-a825-678ef7c9348c", "config_id": "cd969547-ca04-44b8-b819-1adf729e63fd", "output_values": { "deploy_stdout": "Started yum_update.sh on server 9e1aca53-99b3-4d00-a825-678ef7c9348c at Thu Apr 7 12:52:02 UTC 2016\nDumping Pacemaker config\nChecking for missing constraints\n start openstack-nova-novncproxy-clone then start openstack-nova-api-clone (kind:Mandatory)\n start rabbitmq-clone then start openstack-keystone-clone (kind:Mandatory)\n promote galera-master then start openstack-keystone-clone (kind:Mandatory)\n Clone Set: haproxy-clone [haproxy]\n start haproxy-clone then start openstack-keystone-clone (kind:Mandatory)\n start memcached-clone then start openstack-keystone-clone (kind:Mandatory)\n promote redis-master then start openstack-ceilometer-central-clone (kind:Mandatory) (Options: require-all=false)\n start neutron-server-clone then start neutron-openvswitch-agent-clone (kind:Mandatory)\nresource-stickiness: INFINITY\nSetting resource start/stop timeouts\nMaking sure rabbitmq has the notify=true meta parameter\nApplying new Pacemaker config\nERROR failed to apply new pacemaker config\n", "deploy_stderr": "Error: unable to push cib\nCall cib_replace failed (-205): Update was older than existing configuration\n\n", "update_managed_packages": "false", "deploy_status_code": 1 }, "creation_time": "2016-04-06T10:47:19", "updated_time": "2016-04-07T12:52:21", "input_values": {}, "action": "UPDATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "37bb7abe-f286-44d3-8089-0e5f798815b0" }
I'm able to scale down from the failed scale up just fine. Tried several times already.
No blocker, there is revert back, quick 0 day or close to that day async is fine.
T-h-t part submitted too [1] but to allow the upgrades job to pass upstream, 2 more patches need to be merged first [2][3]. The upstream upgrades job doesn't test full upgrades just yet, but it tests a stack-update, so it does test the involved code at least partially. [1] https://review.openstack.org/#/c/304094 [2] https://review.openstack.org/#/c/296592 [3] https://review.openstack.org/#/c/304592
After adding "UpdateIdentifier:" to the parameter_defaults section of the included environmant file, was able to re-run the deployment command successfully post upgrade.