Bug 1263792

Summary: When network validation detect a problem with network-isolation files - it throws an Error but proceed with deployment that likely to fail.
Product: Red Hat OpenStack Reporter: Omri Hochman <ohochman>
Component: python-tripleoclientAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: Arik Chernetsky <achernet>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: dtantsur, hbrock, jcoufal, jslagle, mburns, rhel-osp-director-maint, sasha
Target Milestone: ---Keywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-17 13:23:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Omri Hochman 2015-09-16 17:38:44 UTC
Network validation detect a problem with network-isolation files - it throws an Error but proceed with deployment that likely to fail.

Environment:
------------
python-rdomanager-oscplugin-0.0.10-3.el7ost.noarch
instack-0.0.7-1.el7ost.noarch
instack-undercloud-2.1.2-26.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
instack-undercloud-2.1.2-26.el7ost.noarch


Steps:
------
(1) tamper the network-isolation files (change something in network-environment.yaml )
(2) attempt to start deployment . 

Results : 
----------
- ERROR: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud Configuration has 1 warnings, fix them before proceeding. "  

- Deployment proceed after the Error - and deployment will fail. 


Expected results : 
--------------------
- In case of an error , deployment should not start. 


Deployment command view : 
--------------------------
[stack@undercloud ~]$ openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/network-environment.yaml --ntp-server 10.5.26.10 --neutron-network-type vxlan --neutron-tunnel-types vxlan --timeout 90
WARNING: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud There are 7 ironic nodes with no profile that will not be used: 332bed1d-ed34-45fe-ad53-5a4f1240fb9c, 40bb78f6-8919-47e4-b897-9e87b78dafdb, 1d45d31a-9e43-4586-bc05-9a5fa2133032, b192a449-96ca-4de8-acea-3fdf151cdea7, 5f5f37e7-dd45-414b-b8dd-fbe8a75960e5, 41ba0cc4-8f44-4e1f-a33d-b20778c1669f, 624df7bb-8d78-4e4c-abaf-2987a85b8e01
ERROR: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud Configuration has 1 warnings, fix them before proceeding. 
Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates

Comment 1 Alexander Chuzhoy 2015-09-16 18:17:40 UTC
Reproduced the issue.
Also noticed that once the hosts are assigned with profiles - the issue is gone.

Comment 3 Ryan Brown 2015-09-17 17:38:02 UTC
This specific warning is a false positive - fixed here https://bugzilla.redhat.com/show_bug.cgi?id=1260776 and has both upstream and downstream reviews open. 

The more general issue of continuing by default in the face of validation errors is a design decision. 

It comes down to the "guarantee" we provide within major releases, which is that we won't change APIs or break functionality in major ways. At the time of the 7.0 release, these validations didn't exist, and so a setup that would have failed validation would have continued to deploy. 

This was introduced in 7.1, and we couldn't introduce a change that would automatically exit at that point because it would have been a major behavior change and would have also broken CI. One considered alternative was to pause for a few minutes when there were validation errors to give the user a chance to kill the deploy safely, but still let unattended deploys continue normally.

Additionally, these validations are not perfect because there are many different supported director configurations, and validations that would be perfect for all cases would be too generic to be useful. For example, contrast virt and physical deployments. 

In 8.0 I would consider changing the default to be "exit on warnings/errors, with an option to force-continue"

Comment 4 Mike Burns 2016-04-07 20:50:54 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 6 Dmitry Tantsur 2016-10-17 13:19:53 UTC
Hi! Warning should never be fatal, that's why they're warnings. Particularly, there is nothing wrong in your example, even though it can be a sign of problems for real.

And errors are apparently fatal now, hence moving to modified.

Comment 7 Dmitry Tantsur 2016-10-17 13:23:34 UTC

*** This bug has been marked as a duplicate of bug 1318445 ***