Bug 1561255
Summary: | FFU: /etc/os-net-config/config.json is empty after updating the stack outputs | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||||
Component: | python-tripleoclient | Assignee: | Marios Andreou <mandreou> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 13.0 (Queens) | CC: | bfournie, ccamacho, dbecker, hbrock, jschluet, jslagle, lbezdick, mandreou, mbultel, mburns, morazi, rhel-osp-director-maint, sathlang, sclewis | ||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||
Target Release: | 13.0 (Queens) | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openstack-tripleo-common-8.6.1-14.el7ost python-tripleoclient-9.2.1-10.el7ost | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-06-27 13:49:05 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1561169 | ||||||||
Attachments: |
|
Description
Marius Cornea
2018-03-28 00:32:02 UTC
o/ took this for triage this week. Is this duplicate of/same root cause as BZ 1559151 ? It might be... this is happening after a stack update and coming from an OSP10 env (so the 'old' element based os-net-config is being on the original deployment, but OSP13 templates are using the new script based one). I suspect if we "rm /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true " like [1] at the start of the FFU it should solve the issue, assuming it is the same. It would be great if you could test that, I mean, manually remove that from overcloud nodes before the FFU stack update for the ansible generation. We might want to carry that in the FFU env [2] or some other suitable place. I think it makes sense to keep this bz anyway even if it is the same since BZ 1559151 is for upgrades and the solution will be slightly different here/land in different place I am going to mark triaged for now, remove if you disagree. [1] https://review.openstack.org/#/c/556533/2/environments/major-upgrade-composable-steps-docker.yaml@15 [2]https://github.com/openstack/tripleo-heat-templates/blob/master/environments/fast-forward-upgrade.yaml#L17 (In reply to Marios Andreou from comment #3) > o/ took this for triage this week. Is this duplicate of/same root cause as > BZ 1559151 ? It might be... this is happening after a stack update and > coming from an OSP10 env (so the 'old' element based os-net-config is being > on the original deployment, but OSP13 templates are using the new script > based one). > > I suspect if we "rm > /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true > " like [1] at the start of the FFU it should solve the issue, assuming it is > the same. > > It would be great if you could test that, I mean, manually remove that from > overcloud nodes before the FFU stack update for the ansible generation. We > might want to carry that in the FFU env [2] or some other suitable place. That's right, by manually removing /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json before running the deploy command for updating the stack output /etc/os-net-config/config.json keeps its content. > I think it makes sense to keep this bz anyway even if it is the same since > BZ 1559151 is for upgrades and the solution will be slightly different > here/land in different place > I'd keep this BZ to track of this issue for the FFU workflow. BZ#1559151 is related to upgrades and also the consequences are different than the ones observed during FFU. Spent some time looking into this today. Working with "lets remove /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json before running the deploy command for updating the stack output" I initially thought we might be able to use the UpgradeInit and set it in the ffwd-upgrade-prepare.yaml and unset it on converge as we do for the major upgrade. However UpgradeInit is a SoftwareConfig @ [1] but the ffwd-upgrade-prepare.yaml is setting that to config download @ [2] so we can't expect that to be applied during the ffwd-upgrade prepare heat stack update. It *would* be applied with the ansible playbooks but I believe it needs to happen before the heat stack update. A really 'easy' way is if we consider something like "openstack overcloud execute" for this [3][4][5] but that would require the operator to run something like: cat <<EOF > remove_os_net_config.sh #!/bin/bash rm /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true EOF Then they run it with "openstack overcloud execute remove_os_net_config.sh --server_name "overcloud" ## --server_name will do a partial match so overcloud-controller-x, overcloud-compute-x Otherwise we will have to work out another way in the client before we call the prepare stack update. [1] https://github.com/openstack/tripleo-heat-templates/blob/1bec57e9770f27d44c0768410fc7d4b5926858da/puppet/role.role.j2.yaml#L468-L469 [2] https://github.com/openstack/tripleo-heat-templates/blob/1bec57e9770f27d44c0768410fc7d4b5926858da/environments/lifecycle/ffwd-upgrade-prepare.yaml#L10-L11 [3] https://github.com/openstack/python-tripleoclient/blob/5c7c923a01d4f8b460fc9481b7c38454cde10f5f/tripleoclient/v1/overcloud_execute.py#L63 [4] https://github.com/openstack/tripleo-common/blob/eb43cf0c2993cb20342ba04563866371ddec3773/workbooks/deployment.yaml#L24 [5] https://github.com/openstack/tripleo-common/blob/eb43cf0c2993cb20342ba04563866371ddec3773/tripleo_common/actions/deployment.py#L97 o/ so digged into this a little more today. The way I see it we have 3 options: 1. semi "manual"/docs way with suggestion in comment #5 , or 2. add invocation to the client, before the stack update, possibly using the tripleo.deployment.v1.deploy_on_servers , or 3. fix it so that upgradeinit does run during the heat stack update (see comment #5 on why it isn't). Will need tweaks in tripleo-heat-templates (redirect the upgrade init to another resource from softwareconfig). I played with 2. today and posted https://review.openstack.org/#/c/566336/ as a WIP for discussion to continue next week. I threw in idea but not sure if correct one https://review.openstack.org/566348 *** Bug 1574258 has been marked as a duplicate of this bug. *** Marios - note that this fix https://review.openstack.org/#/c/560022/ to run-os-net-config.sh for https://bugzilla.redhat.com/show_bug.cgi?id=1514949 explicitly removes /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json to prevent an overwrite and an empty /etc/os-net-config/config.json. I wonder why that's not removing /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json? o/ Bob , in BZ 1514949 we also landed the removal of the file right at the start of an upgrade (i.e. before those deployment steps run which will run that os-net-config script and remove the file as you point to) with https://review.openstack.org/#/c/557739/ SO here we need something equivalent but for ffu, ie remove the /usr/libexec... as the very first step in the overcloud ffu some discussion about this on https://review.openstack.org/#/c/567613/1/common/deploy-steps.j2@657 the alternative proposal from lbezdick at https://review.openstack.org/#/c/567613 to clean the element files some update on the discussion today. it seems we agree on to go with the patches currently in trackers i.e. https://review.openstack.org/566576 & https://review.openstack.org/566336 so we need those merged into queens for starters (In reply to Marios Andreou from comment #15) > some update on the discussion today. it seems we agree on to go with the > patches currently in trackers i.e. https://review.openstack.org/566576 & > https://review.openstack.org/566336 > > so we need those merged into queens for starters I tried applying the ^ patches by running: curl -s -4 https://review.openstack.org/changes/566336/revisions/current/patch?download | base64 -d | sudo patch -d /usr/lib/python2.7/site-packages/ -p1; curl -s -4 https://review.openstack.org/changes/566576/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-common/ -p1; source /home/stack/stackrc; mistral workbook-update /usr/share/openstack-tripleo-common/workbooks/deployment.yaml but unfortunately when I ran openstack overcloud ffwd-upgrade prepare command it got stuck. Attaching the mistral logs. Created attachment 1436516 [details]
mistral logs
Created attachment 1436757 [details] some excerpts from the logs attached in https://bugzilla.redhat.com/show_bug.cgi?id=1561255#c17 o/ mcornea... I checked through the attached logs and see some db connection errors (in api.log and engine.log) pasting the relevant bits as attachment here for ease of reference. I wonder if those are the source of the hanging. It should be other services are also reporting that possibly if it is a problem on your undercloud. Also as a sanity check can you please include db populate and all mistral services restart before you try it again please sudo mistral-db-manage populate sudo systemctl restart openstack-mistral-api.service sudo systemctl restart openstack-mistral-engine.service sudo systemctl restart openstack-mistral-executor.service i'll sanity check it again on my pike environment too but from first pass those db connection errors really stuck out for me wdyt update #2 - besides the issues that mcornea env might have as per comment #18, there is also a nit in the client review @ https://review.openstack.org/#/c/566336/4/tripleoclient/workflows/package_update.py@178 I just commented there and will post an update there momentarily. Can you please try again with the latest today - fwiw it seems to work OK for me (In reply to Marios Andreou from comment #19) > update #2 - besides the issues that mcornea env might have as per comment > #18, there is also a nit in the client review @ > https://review.openstack.org/#/c/566336/4/tripleoclient/workflows/ > package_update.py@178 I just commented there and will post an update there > momentarily. Can you please try again with the latest today - fwiw it seems > to work OK for me Yep, worked fine this time, probably it was environmental issue with my env. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |