Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1324691

Summary:	rhel-osp-director: deploy or scaling operations implicitly call update after upgrade
Product:	Red Hat OpenStack	Reporter:	Alexander Chuzhoy <sasha>
Component:	python-tripleoclient	Assignee:	Jiri Stransky <jstransk>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Arik Chernetsky <achernet>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.0 (Liberty)	CC:	aschultz, augol, dbecker, dmacpher, hbrock, jcoufal, jschluet, jslagle, jstransk, kbasil, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang
Target Milestone:	async
Target Release:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	Cause: The upgrade process currently sets a variable in the heat stack which does not get cleared on completion Consequence: all deploy or scale commands after an upgrade will trigger the Update workflow which does yum updates and various other process which are not expected in deploy/scale workflows Workaround (if any): On the first deploy or scale attempt after an upgrade, pass an environment file containing: parameter_defaults: UpdateIdentifier: Result: update will not be triggered on that or future deploy/scale operations	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-05-02 17:47:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2016-04-07 02:36:18 UTC

rhel-osp-director: Scaling of computes post upgrade 7.3->8.0 failes with error:
yum -y update returned 1 instead of one of [0]


Environment:
openstack-tripleo-heat-templates-0.8.14-5.el7ost.noarch
openstack-puppet-modules-7.0.17-1.el7ost.noarch
instack-undercloud-2.2.7-2.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-5.el7ost.noarch


Steps to reproduce:
1. deploy overcloud 7.3 with 1 compute and popuate it with objects.
2. upgrade the setup to 8.0
3. Attemp to scale computes from 1 to 3.


Result:
Stack overcloud UPDATE_FAILED
Heat Stack update failed.

Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::host'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_protocol'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::port'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_path'; class ::nova::vncproxy has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer::Agent::Compute]): This class is deprecated. Please use ceilometer::agent::polling with compute namespace instead.\u001b[0m\n\u001b[1;31mWarning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.\n   (at /usr/share/ruby/vendor_ruby/puppet/type.rb:816:in `set_default')\u001b[0m\n\u001b[1;31mError: yum -y update returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Tripleo::Packages/Exec[package-upgrade]/returns: change from notrun to 0 failed: yum -y update returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Vswitch::Ovs/Service[openvswitch]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ntp::Service/Service[ntp]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ntp/Anchor[ntp::end]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron::Plugins::Ovs::Bridge[datacentre:br-ex]/Vs_bridge[br-ex]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Package[neutron-ovs-agent]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/local_ip]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/extensions]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/enable_distributed_routing]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/tunnel_bridge]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/prevent_arp_spoofing]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/enable_tunneling]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/integration_bridge]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[securitygroup/firewall_driver]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[ovs-cleanup-service]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[ovs/bridge_mappings]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/arp_responder]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/polling_interval]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/drop_flows_on_start]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/vxlan_udp_port]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Snmp/Service[snmptrapd]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Libvirt/Service[messagebus]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Libvirt/Service[libvirt]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/l2_population]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Snmp/Service[snmpd]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron_agent_ovs[agent/tunnel_types]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[neutron-ovs-agent-service]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute/Nova::Generic_service[compute]/Service[nova-compute]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova/Exec[networking-refresh]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Rbd/File[/etc/nova/secret.xml]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Rbd/Exec[get-or-set virsh secret]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Nova::Compute::Rbd/Exec[set-secret-value virsh]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ceilometer::Agent::Compute/Service[ceilometer-agent-compute]: Skipping because of failed dependencies\u001b[0m\n",
    "deploy_status_code": 6


The issue reproduces.

When I attempt to run "yum update" on the new computes, I get:
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
There are no enabled repos.
 Run "yum repolist all" to see the repos you have.
 You can enable repos with yum-config-manager --enable <repo>

And the exit code from the above command is "1".

But should it fail the scale up?

Comment 2 Jiri Stransky 2016-04-07 12:47:39 UTC

WIP patches (not tested manually yet) sumbitted to tripleo-common and python-tripleoclient. We'll also need a tripleo-heat-templates patch which i'm working on now. And i'll try to test those together.

Comment 3 Marius Cornea 2016-04-07 13:07:31 UTC

Adding my test here as it seems to have the same cause: 

Doing openstack overcloud deploy after the upgrade is finished resulted in the following error:

The upgrade command at step 6:

stack@instack:~>>> cat deploy.ha.ceph.ipv6.ssl.8.0-step6
export THT=~/templates/my-overcloud-8.0
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation-v6.yaml \
-e ~/templates/network-environment-8.0-v6.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/enable-tls.yaml \
-e ~/templates/inject-trust-anchor.yaml \
-e $THT/environments/major-upgrade-pacemaker-converge.yaml \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 2 \
--ntp-server clock.redhat.com \
--libvirt-type qemu

Removing major-upgrade-pacemaker-converge.yaml and running deploy:

stack@instack:~>>> cat deploy.ha.ceph.ipv6.ssl.8.0

export THT=~/templates/my-overcloud-8.0
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation-v6.yaml \
-e ~/templates/network-environment-8.0-v6.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/enable-tls.yaml \
-e ~/templates/inject-trust-anchor.yaml \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 2 \
--ntp-server clock.redhat.com \
--libvirt-type qemu

ended with:

{
  "status": "FAILED", 
  "server_id": "9e1aca53-99b3-4d00-a825-678ef7c9348c", 
  "config_id": "cd969547-ca04-44b8-b819-1adf729e63fd", 
  "output_values": {
    "deploy_stdout": "Started yum_update.sh on server 9e1aca53-99b3-4d00-a825-678ef7c9348c at Thu Apr  7 12:52:02 UTC 2016\nDumping Pacemaker config\nChecking for missing constraints\n  start openstack-nova-novncproxy-clone then start openstack-nova-api-clone (kind:Mandatory)\n  start rabbitmq-clone then start openstack-keystone-clone (kind:Mandatory)\n  promote galera-master then start openstack-keystone-clone (kind:Mandatory)\n Clone Set: haproxy-clone [haproxy]\n  start haproxy-clone then start openstack-keystone-clone (kind:Mandatory)\n  start memcached-clone then start openstack-keystone-clone (kind:Mandatory)\n  promote redis-master then start openstack-ceilometer-central-clone (kind:Mandatory) (Options: require-all=false)\n  start neutron-server-clone then start neutron-openvswitch-agent-clone (kind:Mandatory)\nresource-stickiness: INFINITY\nSetting resource start/stop timeouts\nMaking sure rabbitmq has the notify=true meta parameter\nApplying new Pacemaker config\nERROR failed to apply new pacemaker config\n", 
    "deploy_stderr": "Error: unable to push cib\nCall cib_replace failed (-205): Update was older than existing configuration\n\n", 
    "update_managed_packages": "false", 
    "deploy_status_code": 1
  }, 
  "creation_time": "2016-04-06T10:47:19", 
  "updated_time": "2016-04-07T12:52:21", 
  "input_values": {}, 
  "action": "UPDATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", 
  "id": "37bb7abe-f286-44d3-8089-0e5f798815b0"
}

Comment 4 Alexander Chuzhoy 2016-04-07 13:11:21 UTC

I'm able to scale down from the failed scale up just fine. Tried several times already.

Comment 5 Jaromir Coufal 2016-04-07 13:39:41 UTC

No blocker, there is revert back, quick 0 day or close to that day async is fine.

Comment 6 Jiri Stransky 2016-04-12 17:12:54 UTC

T-h-t part submitted too [1] but to allow the upgrades job to pass upstream, 2 more patches need to be merged first [2][3].

The upstream upgrades job doesn't test full upgrades just yet, but it tests a stack-update, so it does test the involved code at least partially.

[1] https://review.openstack.org/#/c/304094
[2] https://review.openstack.org/#/c/296592
[3] https://review.openstack.org/#/c/304592

Comment 7 Alexander Chuzhoy 2016-04-14 15:56:47 UTC

After adding "UpdateIdentifier:" to the parameter_defaults section of the included environmant file, was able to re-run the deployment command successfully post upgrade.