Bug 1572677

Summary: get-occ-config.sh counts "wrong" cause of missing tonumber screwing up the deployment
Product: Red Hat OpenStack Reporter: Sven Michels <svmichel>
Component: openstack-tripleo-heat-templatesAssignee: James Slagle <jslagle>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: urgent Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: aschultz, dwojewod, jslagle, mburns, ohochman, owalsh, slinaber
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.12-9.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1573346 1573347 (view as bug list) Environment:
Last Closed: 2018-12-05 18:52:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1573346, 1573347    

Description Sven Michels 2018-04-27 14:53:54 UTC
Description of problem:
When using the get-occ-config.sh on a deployment with more than 10 nodes, we noticed that it mixed up the metadata urls. So we had a few nodes configured correctly, and some had the wrong IPs deployed in the configs (split-stack), for example my_ip was wrong.


Version-Release number of selected component (if applicable):
7.0.3-22

How reproducible:
deploy a split stack environment with more than 10 nodes.


Steps to Reproduce:
1. setup rhosp12
2. deploy with more than 10 compute nodes
3. compare the deployed config with expected config and see the errors


Actual results:
Deployment sometimes got stuck because one node was missing (didn't get its
config because of wrong meta data) or deployment went fine but instead of 11 different nodes you see one node twice.

Expected results:
11 different nodes deployed and configured correctly

Additional info:
I checked how and why this would happen and i think i found the issue. In line
67 there is a sort_by on resource_name. If you check what it generates, you'll
notice that the sorting is *not* done as expected. This is cause it does string
sorting instead of integer sorting. The easy fix was to add a "tonumber" to
the sort_by: sort_by(.resource_name | tonumber)
That way node 11 (10 is the number of the node in the resource list) will end
up at the very end of the list, otherwise it showed up as 3rd node (0, 1, 10, 2).

Demo:
(undercloud) [stack@director ~]$ openstack stack resource list 55a9e155-edab-4734-a655-523339d0c8ac -c resource_name -c physical_resource_id
+---------------+--------------------------------------+
| resource_name | physical_resource_id                 |
+---------------+--------------------------------------+
| 10            | e4ee76de-9794-4bf8-81d5-3fd5c54fc33e |
| 1             | 46f3d8f0-8d5d-47a7-ad59-a2487ce55fb5 |
| 0             | ddf8db32-4bf3-4635-9020-28b07c21d66a |
| 3             | 48745fa1-f5e2-41ca-8a3e-e9da4cf2bf4e |
| 2             | f4a586f8-e93a-49aa-bf88-97436da05cb6 |
| 5             | 137a14db-a641-4782-b5c0-5f554cf76f46 |
| 4             | afe87089-683e-4c8d-bf43-92a11c0b1f96 |
| 7             | bc84c066-827a-4bfe-b830-3532ffd24875 |
| 6             | f80f41b9-f89b-443e-b0bf-ee99bea43654 |
| 9             | c0df0d59-9c44-4712-abd2-54f05cdd59c2 |
| 8             | da466290-adac-4c53-b930-d42d634c8436 |
+---------------+--------------------------------------+
(undercloud) [stack@director ~]$ openstack stack resource list 55a9e155-edab-4734-a655-523339d0c8ac -c resource_name -c physical_resource_id -f json | jq -r "sort_by(.resource_name) | .[] | .physical_resource_id"
ddf8db32-4bf3-4635-9020-28b07c21d66a
46f3d8f0-8d5d-47a7-ad59-a2487ce55fb5
e4ee76de-9794-4bf8-81d5-3fd5c54fc33e
f4a586f8-e93a-49aa-bf88-97436da05cb6
48745fa1-f5e2-41ca-8a3e-e9da4cf2bf4e
afe87089-683e-4c8d-bf43-92a11c0b1f96
137a14db-a641-4782-b5c0-5f554cf76f46
f80f41b9-f89b-443e-b0bf-ee99bea43654
bc84c066-827a-4bfe-b830-3532ffd24875
da466290-adac-4c53-b930-d42d634c8436
c0df0d59-9c44-4712-abd2-54f05cdd59c2
(undercloud) [stack@director ~]$ openstack stack resource list 55a9e155-edab-4734-a655-523339d0c8ac -c resource_name -c physical_resource_id -f json | jq -r "sort_by(.resource_name | tonumber) | .[] | .physical_resource_id"
ddf8db32-4bf3-4635-9020-28b07c21d66a
46f3d8f0-8d5d-47a7-ad59-a2487ce55fb5
f4a586f8-e93a-49aa-bf88-97436da05cb6
48745fa1-f5e2-41ca-8a3e-e9da4cf2bf4e
afe87089-683e-4c8d-bf43-92a11c0b1f96
137a14db-a641-4782-b5c0-5f554cf76f46
f80f41b9-f89b-443e-b0bf-ee99bea43654
bc84c066-827a-4bfe-b830-3532ffd24875
da466290-adac-4c53-b930-d42d634c8436
c0df0d59-9c44-4712-abd2-54f05cdd59c2
e4ee76de-9794-4bf8-81d5-3fd5c54fc33e

So you can clearly see the difference with and without "tonumber".

Comment 2 James Slagle 2018-10-29 14:17:40 UTC
what's being requested?

Comment 3 Dariusz Wojewódzki 2018-10-29 17:08:29 UTC
I would like to know when approximately the tht 7.0.12-9 package will be available? The next maint release (z4) is somehow undefined.

Comment 10 Gurenko Alex 2018-11-21 09:43:48 UTC
Verified on puddle 2018-11-14.1, successfully deployed 12 compute nodes

Comment 13 errata-xmlrpc 2018-12-05 18:52:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3789

Comment 14 Red Hat Bugzilla 2023-09-15 00:07:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days