1290796 – Set UpdateIdentifier after yum update causes subsequent scale out attempt to fail

Bug 1290796 - Set UpdateIdentifier after yum update causes subsequent scale out attempt to fail

Summary: Set UpdateIdentifier after yum update causes subsequent scale out attempt to ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-rdomanager-oscplugin
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	unspecified
Target Milestone:	y2
Target Release:	7.0 (Kilo)
Assignee:	James Slagle
QA Contact:	Udi Kalifon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-11 13:30 UTC by James Slagle
Modified:	2015-12-21 16:53 UTC (History)
CC List:	7 users (show)
Fixed In Version:	python-rdomanager-oscplugin-0.0.10-22.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, when scaling out the Compute nodes in the Overcloud after an update was performed, the default UpdateIdentifier parameter in the Heat stack caused the new Compute node to attempt an update as soon as it was coming up. Since the yum repositories were not configured on the new Compute nodes yet, it would cause the update to fail, which in turn caused the scale out to fail. With this update, the client, python-rdomanager-oscplugin, does not clear the UpdateIdentifier parameter on the subsequent stack-update attempts (including the scale out) until after the initial update has been completed. As a result, scale out attempts after the update now succeeds.
Clone Of:
Environment:
Last Closed:	2015-12-21 16:53:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	256670	0	None	None	None	Never
Red Hat Product Errata	RHSA-2015:2650	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update	2015-12-21 21:44:54 UTC

Description James Slagle 2015-12-11 13:30:57 UTC

When doing a package update, we set the UpdateIdentifier parameter to a unique value, this triggers the SoftwareDeployment and yum_update.sh also has a check to make sure UpdateIdentifier is not empty and is a new unique value it hasn't seen before, and if so, it proceeds with updating packages.

Assuming the package update is successful, if you then immediately try a scaling attempt after, it will likely fail. UpdateIdentifier is still set in the saved Heat environment (as expected), when the new node you're scaling out comes up for the first time, UpdateDeployment is triggered, and it sees a new value (to this node) of UpdateIdentifier. It then tries a yum update. That fails b/c no repos are configured on the node yet. Even when using rhel registration, which is enabled via the NodeExtraConfig resource, this will fail because there is no depends_on set on UpdateDeployment for NodeExtraConfig, so there is no guarantee any repos would even be enabled yet.

The error from UpdateDeployment looks like:

{
  "status": "FAILED", 
  "server_id": "853ce223-2051-4cb5-868c-7cf72c312c2b", 
  "config_id": "51cc12e8-b1bf-4b2a-b318-a977b1fc1a30", 
  "output_values": {
    "deploy_stdout": "Started yum_update.sh on server 853ce223-2051-4cb5-868c-7cf72c312c2b at Thu Dec 10 21:33:33 EST 2015\nExcluding upgrading packages that are handled by config management tooling\nRunning: yum -y update  --skip-broken\nLoaded plugins: product-id, subscription-manager\nThis system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.\nyum return code: 1\nFinished yum_update.sh on server 853ce223-2051-4cb5-868c-7cf72c312c2b at Thu Dec 10 21:33:36 EST 2015\n", 
    "deploy_stderr": "cat: /var/lib/tripleo/installed-packages/*: No such file or directory\nThere are no enabled repos.\n Run \"yum repolist all\" to see the repos you have.\n You can enable repos with yum-config-manager --enable <repo>\n", 
    "update_managed_packages": "true", 
    "deploy_status_code": 1
  }, 
  "creation_time": "2015-12-11T02:32:26Z", 
  "updated_time": "2015-12-11T02:33:38Z", 
  "input_values": {}, 
  "action": "CREATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", 
  "id": "974ac676-67e1-4a1c-947d-92ed68b375f0"
}

Comment 2 Jaromir Coufal 2015-12-14 15:05:14 UTC

James, what is the severity here? Does it affect each scale out after update? Thanks

Comment 3 James Slagle 2015-12-14 21:43:57 UTC

to verify:

deploy with 7.0 undercloud and 7.0 overcloud, HA, net-iso. Update undercloud to 7.2, update overcloud to 7.2. After the update completes successfully, attempt scale out of compute nodes. The compute node should scale out fine and should not run any yum update. You could verify this by looking in the journalctl for os-collect-config or /var/log/yum.log on the new compute nodes.

repeat, but start at 7.1.

Comment 6 errata-xmlrpc 2015-12-21 16:53:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650

Note You need to log in before you can comment on or make changes to this bug.