Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1290796

Summary: Set UpdateIdentifier after yum update causes subsequent scale out attempt to fail
Product: Red Hat OpenStack Reporter: James Slagle <jslagle>
Component: python-rdomanager-oscpluginAssignee: James Slagle <jslagle>
Status: CLOSED ERRATA QA Contact: Udi Kalifon <ukalifon>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: calfonso, dnavale, jcoufal, jslagle, mburns, rhel-osp-director-maint, sasha
Target Milestone: y2Keywords: Triaged
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-rdomanager-oscplugin-0.0.10-22.el7ost Doc Type: Bug Fix
Doc Text:
Previously, when scaling out the Compute nodes in the Overcloud after an update was performed, the default UpdateIdentifier parameter in the Heat stack caused the new Compute node to attempt an update as soon as it was coming up. Since the yum repositories were not configured on the new Compute nodes yet, it would cause the update to fail, which in turn caused the scale out to fail. With this update, the client, python-rdomanager-oscplugin, does not clear the UpdateIdentifier parameter on the subsequent stack-update attempts (including the scale out) until after the initial update has been completed. As a result, scale out attempts after the update now succeeds.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:53:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James Slagle 2015-12-11 13:30:57 UTC
When doing a package update, we set the UpdateIdentifier parameter to a unique value, this triggers the SoftwareDeployment and yum_update.sh also has a check to make sure UpdateIdentifier is not empty and is a new unique value it hasn't seen before, and if so, it proceeds with updating packages.

Assuming the package update is successful, if you then immediately try a scaling attempt after, it will likely fail. UpdateIdentifier is still set in the saved Heat environment (as expected), when the new node you're scaling out comes up for the first time, UpdateDeployment is triggered, and it sees a new value (to this node) of UpdateIdentifier. It then tries a yum update. That fails b/c no repos are configured on the node yet. Even when using rhel registration, which is enabled via the NodeExtraConfig resource, this will fail because there is no depends_on set on UpdateDeployment for NodeExtraConfig, so there is no guarantee any repos would even be enabled yet.

The error from UpdateDeployment looks like:

{
  "status": "FAILED", 
  "server_id": "853ce223-2051-4cb5-868c-7cf72c312c2b", 
  "config_id": "51cc12e8-b1bf-4b2a-b318-a977b1fc1a30", 
  "output_values": {
    "deploy_stdout": "Started yum_update.sh on server 853ce223-2051-4cb5-868c-7cf72c312c2b at Thu Dec 10 21:33:33 EST 2015\nExcluding upgrading packages that are handled by config management tooling\nRunning: yum -y update  --skip-broken\nLoaded plugins: product-id, subscription-manager\nThis system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.\nyum return code: 1\nFinished yum_update.sh on server 853ce223-2051-4cb5-868c-7cf72c312c2b at Thu Dec 10 21:33:36 EST 2015\n", 
    "deploy_stderr": "cat: /var/lib/tripleo/installed-packages/*: No such file or directory\nThere are no enabled repos.\n Run \"yum repolist all\" to see the repos you have.\n You can enable repos with yum-config-manager --enable <repo>\n", 
    "update_managed_packages": "true", 
    "deploy_status_code": 1
  }, 
  "creation_time": "2015-12-11T02:32:26Z", 
  "updated_time": "2015-12-11T02:33:38Z", 
  "input_values": {}, 
  "action": "CREATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", 
  "id": "974ac676-67e1-4a1c-947d-92ed68b375f0"
}

Comment 2 Jaromir Coufal 2015-12-14 15:05:14 UTC
James, what is the severity here? Does it affect each scale out after update? Thanks

Comment 3 James Slagle 2015-12-14 21:43:57 UTC
to verify:

deploy with 7.0 undercloud and 7.0 overcloud, HA, net-iso. Update undercloud to 7.2, update overcloud to 7.2. After the update completes successfully, attempt scale out of compute nodes. The compute node should scale out fine and should not run any yum update. You could verify this by looking in the journalctl for os-collect-config or /var/log/yum.log on the new compute nodes.

repeat, but start at 7.1.

Comment 6 errata-xmlrpc 2015-12-21 16:53:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650