Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1506283

Summary: OSP10 minor update fails when using no custom nic profiles
Product: Red Hat OpenStack Reporter: Gregory Charot <gcharot>
Component: openstack-tripleo-heat-templatesAssignee: anil venkata <vkommadi>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: dbecker, emacchi, mbultel, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang, slinaber, vkommadi, yprokule
Target Milestone: z7Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.3.8-1.el7ost Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-27 16:50:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gregory Charot 2017-10-25 15:07:55 UTC
Description of problem:

When deploying OSP10 with no network customisation minor update fails.

Version-Release number of selected component (if applicable):

cat /etc/rhosp-release
Red Hat OpenStack Platform release 10.0 (Newton)

(undercloud) rpm -qa | grep tripleo
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch
openstack-tripleo-heat-templates-compat-2.0.0-58.el7ost.noarch
openstack-tripleo-validations-5.1.2-1.el7ost.noarch
openstack-tripleo-image-elements-5.3.0-3.el7ost.noarch
openstack-tripleo-puppet-elements-5.3.0-1.el7ost.noarch
openstack-tripleo-heat-templates-5.3.0-6.el7ost.noarch
python-tripleoclient-5.4.3-1.el7ost.noarch
puppet-tripleo-5.6.1-4.el7ost.noarch
openstack-tripleo-ui-1.2.0-1.el7ost.noarch
openstack-tripleo-common-5.4.2-4.el7ost.noarch

(undercloud) rpm -qa | grep director
rhosp-director-images-ipa-10.0-20170920.1.el7ost.noarch
rhosp-director-images-10.0-20170920.1.el7ost.noarch

rhos-release osp10

How reproducible:

Always based on my env

Steps to Reproduce:
1. openstack overcloud deploy     --templates      --ntp-server x.x.x.x --control-scale 1 --compute-scale 2     --neutron-tunnel-types vxlan --neutron-network-type vxlan --control-flavor control     --compute-flavor compute
2. openstack overcloud update stack -i overcloud


Actual results:

Update fails on the compute node(s)

openstack stack failures list overcloud
overcloud.Controller.0:
  resource_type: OS::TripleO::Controller
  physical_resource_id: 0f6f365f-f37a-43d6-810a-a309a5f29883
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted
overcloud.Compute.1.UpdateDeployment:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: f71762af-c166-4396-a725-638e58ed5ede
  status: UPDATE_FAILED
  status_reason: |
    Error: resources.UpdateDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    Started yum_update.sh on server 5db9b0c7-513f-493a-9781-9d628be6bdb0 at Tue Oct 17 18:36:28 UTC 2017
    Checking openstack-nova-migration is installed
    Loaded plugins: product-id, search-disabled-repos, subscription-manager
    This system is not registered with an entitlement server. You can use subscription-manager to register.
    Metadata Cache Created
    Checking for ceph-osd dependency issues
    ceph-osd package is available from an enabled repo
    Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
    yum update os-net-config return code: 0
    ERROR: os-net-config configuration failed
  deploy_stderr: |
    [2017/10/17 06:37:00 PM] [INFO] Using config file at: /etc/os-net-config/config.json
    [2017/10/17 06:37:00 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
    [2017/10/17 06:37:00 PM] [INFO] Ifcfg net config provider created.
    Traceback (most recent call last):
      File "/usr/bin/os-net-config", line 10, in <module>
        sys.exit(main())
      File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 157, in main
        iface_array = yaml.load(cf.read()).get("network_config")
    AttributeError: 'NoneType' object has no attribute 'get'
overcloud.Compute.0.UpdateDeployment:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: ae63b2e7-afc4-40e6-86bc-cfe38e8b8f59
  status: UPDATE_FAILED
  status_reason: |
    Error: resources.UpdateDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    Started yum_update.sh on server 4a796a30-986f-4858-806f-6e3a9bd94f93 at Tue Oct 17 18:05:29 UTC 2017
    Checking openstack-nova-migration is installed
    Loaded plugins: product-id, search-disabled-repos, subscription-manager
    This system is not registered with an entitlement server. You can use subscription-manager to register.
    Metadata Cache Created
    Checking for ceph-osd dependency issues
    ceph-osd package is available from an enabled repo
    Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
    yum update os-net-config return code: 0
    ERROR: os-net-config configuration failed
  deploy_stderr: |
    [2017/10/17 06:06:03 PM] [INFO] Using config file at: /etc/os-net-config/config.json
    [2017/10/17 06:06:03 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
    [2017/10/17 06:06:03 PM] [INFO] Ifcfg net config provider created.
    Traceback (most recent call last):
      File "/usr/bin/os-net-config", line 10, in <module>
        sys.exit(main())
      File "/usr/lib/python2.7/site-packages/os_net_config/cli.py", line 157, in main
        iface_array = yaml.load(cf.read()).get("network_config")
    AttributeError: 'NoneType' object has no attribute 'get'


Relevant error is ERROR: os-net-config configuration failed due to  yaml.load(cf.read()).get("network_config")

Expected results:

Update completes sucessfully

Additional info:

On the compute nodes  /etc/os-net-config/config.json is empty, file exists on the controller. os-net-config fails to run because there is no config.json

looking at yum_update.sh, it does a update_network which is declare in pacemaker_common_functions.sh

    os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes

    RETVAL=$?
    if [[ $RETVAL == 2 ]]; then
        echo "os-net-config: interface configuration files updated successfully"
    elif [[ $RETVAL != 0 ]]; then
        echo "ERROR: os-net-config configuration failed"
        exit $RETVAL
    fi
    set -e

we can see the same error message "ERROR: os-net-config configuration failed" present in the stack failures list.

Instead if I use https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/yum_update.sh
AND
https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/pacemaker_common_functions.sh

The problem "goes away", config.json is still empty on the computes but the update_network does not call os-net-config

If using custom network config, the problem does not appear as the config.json files are not empty.

Comment 3 anil venkata 2017-10-31 06:37:09 UTC
Gregory said, commit
https://github.com/openstack/tripleo-heat-templates/commit/bce61783bc175e98b535c678d90829344dab5c47#diff-002d345b79ce06e07a34abc8da5ade5f
fixes the issue. And this commit is part of 
openstack-tripleo-heat-templates-5.3.3-1.el7ost

Comment 4 Sofer Athlan-Guyot 2017-10-31 09:58:00 UTC
Hi,

so here is the whole story.  Hold tight.

So in the current (non working in this case) update_network function the command os-net-config is run unconditionally, whether there was an upgrade of the package or not.  This is the root cause of the problem here as the /etc/os-net-config/config.json is empty.

The changed mentioned to solve the problem make it work because it has the side-effect of removing the special os-net-config from the update-network function[2] preventing any non-conditional run of os-net-config.

But as I said it's a side-effect, meaning that the special os-net-config treatment meant by the original change here[3] has been erased, which may not be a good thing.  In the patch that "solves" the problem[4] os-net-config special treatment is kept only for the osp9->osp10 upgrade and only on the controllers.

So I think we should re-include the special handling of the os-net-config for everything, using the new function[5] as it check whether there has been an upgrade of the package.  We should also add a check for the non-emptyness of the configuration file even if a upgrade happen to be on the safest side.

Going to post a review going in that direction.


[1] https://github.com/openstack/tripleo-heat-templates/blob/9f8ba2c052e04c1ba8db756a48181a54c9cd8f68/extraconfig/tasks/pacemaker_common_functions.sh#L334

[2] https://github.com/openstack/tripleo-heat-templates/blob/bce61783bc175e98b535c678d90829344dab5c47/extraconfig/tasks/pacemaker_common_functions.sh#L375-L378

[3] https://github.com/openstack/tripleo-heat-templates/commit/9f8ba2c052e04c1ba8db756a48181a54c9cd8f68#diff-002d345b79ce06e07a34abc8da5ade5fR326

[4] https://github.com/openstack/tripleo-heat-templates/commit/bce61783bc175e98b535c678d90829344dab5c47#diff-002d345b79ce06e07a34abc8da5ade5f

[5] https://github.com/openstack/tripleo-heat-templates/blob/bce61783bc175e98b535c678d90829344dab5c47/extraconfig/tasks/pacemaker_common_functions.sh#L350-L373

Comment 7 Sofer Athlan-Guyot 2017-10-31 11:08:55 UTC
So this review should be applied on top of the review in there[1], meaning this one[2]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1434621
[2] https://review.openstack.org/#/c/474967/

Comment 11 Marius Cornea 2018-02-26 19:57:39 UTC
Verified on openstack-tripleo-heat-templates-5.3.8-1.el7ost.noarch

Minor update successfuly completed on deployment without using network isolation.

Comment 14 errata-xmlrpc 2018-02-27 16:50:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0364