Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1561255 - FFU: /etc/os-net-config/config.json is empty after updating the stack outputs
FFU: /etc/os-net-config/config.json is empty after updating the stack outputs
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
urgent Severity urgent
: rc
: 13.0 (Queens)
Assigned To: Marios Andreou
Marius Cornea
: Triaged
: 1574258 (view as bug list)
Depends On:
Blocks: 1561169
  Show dependency treegraph
 
Reported: 2018-03-27 20:32 EDT by Marius Cornea
Modified: 2018-06-27 09:50 EDT (History)
14 users (show)

See Also:
Fixed In Version: openstack-tripleo-common-8.6.1-14.el7ost python-tripleoclient-9.2.1-10.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-06-27 09:49:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
mistral logs (801.29 KB, application/x-gzip)
2018-05-14 15:27 EDT, Marius Cornea
no flags Details
some excerpts from the logs attached in https://bugzilla.redhat.com/show_bug.cgi?id=1561255#c17 (29.54 KB, text/plain)
2018-05-15 06:37 EDT, Marios Andreou
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 566336 None master: NEW python-tripleoclient: Add .deployment.v1.deploy_on_servers to ffwd-upgrade prepare (I77fdc9deab97c785725f09b8529c5129507... 2018-05-15 07:53 EDT
OpenStack gerrit 567538 None stable/queens: NEW tripleo-common: Add special 'all' for deploy_on_servers server name to match all (Ic138c4925c5c96d1cf718af7ae59bcf5f0ba9... 2018-05-15 07:53 EDT
OpenStack gerrit 568604 None None None 2018-05-15 10:53 EDT
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 09:50 EDT

  None (edit)
Description Marius Cornea 2018-03-27 20:32:02 EDT
Description of problem:
FFU: /etc/os-net-config/config.json is empty after updating the stack outputs.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.0-0.20180304031148.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10
2. Upgrade undercloud to OSP11/12/13
3. Run the overcloud deploy command to update the stack outputs:
#!/bin/bash
openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
--control-scale 3 \
--control-flavor controller \
--compute-scale 2 \
--compute-flavor compute \
--ceph-storage-scale 3 \
--ceph-storage-flavor ceph \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/fast-forward-upgrade.yaml \
-e /home/stack/ffu_repos.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/config-download-environment.yaml \
-e /home/stack/ceph-ansible-env.yaml \

4. SSH to any of the overcloud nodes and check /etc/os-net-config/config.json 

Actual results:
empty - 

[root@ceph-0 ~]# wc /etc/os-net-config/config.json
0 0 0 /etc/os-net-config/config.json


Expected results:
/etc/os-net-config/config.json gets preserved with the content it was populated before running the overcloud deploy command

Additional info:

Before running the overcloud deploy command for updating the stack outputs /etc/os-net-config/config.json was correctly populated.
Comment 3 Marios Andreou 2018-04-04 07:54:39 EDT
o/ took this for triage this week. Is this duplicate of/same root cause as BZ 1559151 ? It might be... this is happening after a stack update and coming from an OSP10 env (so the 'old' element based os-net-config is being on the original deployment, but OSP13 templates are using the new script based one). 

I suspect if we "rm /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true " like [1] at the start of the FFU it should solve the issue, assuming it is the same.

It would be great if you could test that, I mean, manually remove that from overcloud nodes before the FFU stack update for the ansible generation. We might want to carry that in the FFU env [2] or some other suitable place.

I think it makes sense to keep this bz anyway even if it is the same since BZ 1559151 is for upgrades and the solution will be slightly different here/land in different place

I am going to mark triaged for now, remove if you disagree.

[1] https://review.openstack.org/#/c/556533/2/environments/major-upgrade-composable-steps-docker.yaml@15
[2]https://github.com/openstack/tripleo-heat-templates/blob/master/environments/fast-forward-upgrade.yaml#L17
Comment 4 Marius Cornea 2018-04-04 09:55:38 EDT
(In reply to Marios Andreou from comment #3)
> o/ took this for triage this week. Is this duplicate of/same root cause as
> BZ 1559151 ? It might be... this is happening after a stack update and
> coming from an OSP10 env (so the 'old' element based os-net-config is being
> on the original deployment, but OSP13 templates are using the new script
> based one). 
> 
> I suspect if we "rm
> /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true
> " like [1] at the start of the FFU it should solve the issue, assuming it is
> the same.
> 
> It would be great if you could test that, I mean, manually remove that from
> overcloud nodes before the FFU stack update for the ansible generation. We
> might want to carry that in the FFU env [2] or some other suitable place.

That's right, by manually removing /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json before running the deploy command for updating the stack output /etc/os-net-config/config.json keeps its content.

> I think it makes sense to keep this bz anyway even if it is the same since
> BZ 1559151 is for upgrades and the solution will be slightly different
> here/land in different place
> 

I'd keep this BZ to track of this issue for the FFU workflow. BZ#1559151 is related to upgrades and also the consequences are different than the ones observed during FFU.
Comment 5 Marios Andreou 2018-04-25 10:28:05 EDT
Spent some time looking into this today. Working with "lets remove /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json before running the deploy command for updating the stack output" I initially thought we might be able to use the UpgradeInit and set it in the ffwd-upgrade-prepare.yaml and unset it on converge as we do for the major upgrade. However UpgradeInit is a SoftwareConfig @ [1] but the ffwd-upgrade-prepare.yaml is setting that to config download @ [2] so we can't expect that to be applied during the ffwd-upgrade prepare heat stack update. It *would* be applied with the ansible playbooks but I believe it needs to happen before the heat stack update. 

A really 'easy' way is if we consider something like "openstack overcloud execute" for this [3][4][5] but that would require the operator to run something like: 

cat <<EOF > remove_os_net_config.sh
#!/bin/bash

rm /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true
EOF

Then they run it with

     "openstack overcloud execute remove_os_net_config.sh --server_name "overcloud"
     ## --server_name will do a partial match so overcloud-controller-x, overcloud-compute-x

Otherwise we will have to work out another way in the client before we call the prepare stack update.

[1]  https://github.com/openstack/tripleo-heat-templates/blob/1bec57e9770f27d44c0768410fc7d4b5926858da/puppet/role.role.j2.yaml#L468-L469
[2] https://github.com/openstack/tripleo-heat-templates/blob/1bec57e9770f27d44c0768410fc7d4b5926858da/environments/lifecycle/ffwd-upgrade-prepare.yaml#L10-L11 
[3] https://github.com/openstack/python-tripleoclient/blob/5c7c923a01d4f8b460fc9481b7c38454cde10f5f/tripleoclient/v1/overcloud_execute.py#L63
[4] https://github.com/openstack/tripleo-common/blob/eb43cf0c2993cb20342ba04563866371ddec3773/workbooks/deployment.yaml#L24
[5] https://github.com/openstack/tripleo-common/blob/eb43cf0c2993cb20342ba04563866371ddec3773/tripleo_common/actions/deployment.py#L97
Comment 6 Marios Andreou 2018-05-04 11:00:18 EDT
o/ so digged into this a little more today. The way I see it we have 3 options: 

  1. semi "manual"/docs way with suggestion in comment #5 , or
  2. add invocation to the client, before the stack update, possibly using the tripleo.deployment.v1.deploy_on_servers , or 
  3. fix it so that upgradeinit does run during the heat stack update (see comment #5 on why it isn't). Will need tweaks in tripleo-heat-templates (redirect the upgrade init to another resource from softwareconfig).

I played with 2. today and posted https://review.openstack.org/#/c/566336/ as a WIP for discussion to continue next week.
Comment 7 Lukas Bezdicka 2018-05-04 11:50:25 EDT
I threw in idea but not sure if correct one https://review.openstack.org/566348
Comment 8 Marius Cornea 2018-05-04 11:56:26 EDT
*** Bug 1574258 has been marked as a duplicate of this bug. ***
Comment 11 Bob Fournier 2018-05-09 15:06:48 EDT
Marios - note that this fix https://review.openstack.org/#/c/560022/ to run-os-net-config.sh for https://bugzilla.redhat.com/show_bug.cgi?id=1514949 explicitly removes  /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json to prevent an overwrite and an empty /etc/os-net-config/config.json.  I wonder why that's not removing /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json?
Comment 12 Marios Andreou 2018-05-10 06:02:59 EDT
o/ Bob , in BZ 1514949 we also landed the removal of the file right at the start of an upgrade (i.e. before those deployment steps run which will run that os-net-config script and remove the file as you point to) with https://review.openstack.org/#/c/557739/

SO here we need something equivalent but for ffu, ie remove the /usr/libexec... as the very first step in the overcloud ffu
Comment 13 Marios Andreou 2018-05-11 07:17:39 EDT
some discussion about this on https://review.openstack.org/#/c/567613/1/common/deploy-steps.j2@657 the alternative proposal from lbezdick at https://review.openstack.org/#/c/567613 to clean the element files
Comment 15 Marios Andreou 2018-05-11 11:26:20 EDT
some update on the discussion today. it seems we agree on to go with the patches currently in trackers i.e. https://review.openstack.org/566576 & https://review.openstack.org/566336

so we need those merged into queens for starters
Comment 16 Marius Cornea 2018-05-14 15:27:28 EDT
(In reply to Marios Andreou from comment #15)
> some update on the discussion today. it seems we agree on to go with the
> patches currently in trackers i.e. https://review.openstack.org/566576 &
> https://review.openstack.org/566336
> 
> so we need those merged into queens for starters

I tried applying the ^ patches by running:

curl -s -4 https://review.openstack.org/changes/566336/revisions/current/patch?download | base64 -d | sudo patch -d /usr/lib/python2.7/site-packages/ -p1; curl -s -4 https://review.openstack.org/changes/566576/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-common/ -p1; source /home/stack/stackrc; mistral workbook-update /usr/share/openstack-tripleo-common/workbooks/deployment.yaml

but unfortunately when I ran openstack overcloud ffwd-upgrade prepare command it got stuck. Attaching the mistral logs.
Comment 17 Marius Cornea 2018-05-14 15:27 EDT
Created attachment 1436516 [details]
mistral logs
Comment 18 Marios Andreou 2018-05-15 06:37 EDT
Created attachment 1436757 [details]
some excerpts from the logs attached in https://bugzilla.redhat.com/show_bug.cgi?id=1561255#c17

o/ mcornea... I checked through the attached logs and see some db connection errors (in api.log and engine.log) pasting the relevant bits as attachment here for ease of reference. I wonder if those are the source of the hanging. It should be other services are also reporting that possibly if it is a problem on your undercloud. Also as a sanity check can you please include db populate and all mistral services restart before you try it again please

    sudo mistral-db-manage  populate
    sudo systemctl restart openstack-mistral-api.service
    sudo systemctl restart openstack-mistral-engine.service
    sudo systemctl restart openstack-mistral-executor.service

i'll sanity check it again on my pike environment too but from first pass those db connection errors really stuck out for me wdyt
Comment 19 Marios Andreou 2018-05-15 08:14:08 EDT
update #2 - besides the issues that mcornea env might have as per comment #18, there is also a nit in the client review @ https://review.openstack.org/#/c/566336/4/tripleoclient/workflows/package_update.py@178 I just commented there and will post an update there momentarily. Can you please try again with the latest today - fwiw it seems to work OK for me
Comment 20 Marius Cornea 2018-05-15 22:37:23 EDT
(In reply to Marios Andreou from comment #19)
> update #2 - besides the issues that mcornea env might have as per comment
> #18, there is also a nit in the client review @
> https://review.openstack.org/#/c/566336/4/tripleoclient/workflows/
> package_update.py@178 I just commented there and will post an update there
> momentarily. Can you please try again with the latest today - fwiw it seems
> to work OK for me

Yep, worked fine this time, probably it was environmental issue with my env.
Comment 31 errata-xmlrpc 2018-06-27 09:49:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.