Bug 1126072 - Deployment fails with multiple compute hosts on first run
Summary: Deployment fails with multiple compute hosts on first run
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rubygem-staypuft
Version: 5.0 (RHEL 7)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: async
: Installer
Assignee: Scott Seago
QA Contact: Omri Hochman
URL:
Whiteboard:
: 1127766 (view as bug list)
Depends On: 1135079
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-01 19:05 UTC by Alexander Chuzhoy
Modified: 2016-04-26 15:54 UTC (History)
11 users (show)

Fixed In Version: ruby193-rubygem-staypuft-0.2.6-1.el6ost
Doc Type: Bug Fix
Doc Text:
A race condition during the deployment of the first 2 compute hosts causes a deployment to fail. Fix: The first host is now deployed first, then the rest of the compute hosts in parallel avoiding the race condition.
Clone Of:
Environment:
Last Closed: 2014-09-02 18:21:07 UTC


Attachments (Terms of Use)
Nova compute with working settings (1.57 KB, text/plain)
2014-08-04 17:49 UTC, Jason Guiditta
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1138 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Bug Fix Advisory 2014-09-02 22:20:02 UTC

Description Alexander Chuzhoy 2014-08-01 19:05:37 UTC
Rubygem-Staypuft:  Nova deployment with VLAN network type fails installing the compute node: Execution of '/usr/bin/nova-manage network create novanetwork 192.168.100.0/21 6 --vlan 10' returned 1: Command failed, please check log for more info

Environment: 
rhel-osp-installer-0.1.6-5.el6ost.noarch
openstack-foreman-installer-2.0.16-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-puppet-modules-2014.1-19.9.el6ost.noarch

Steps to reproduce:
1. Install rhel-osp-installer
2. Create/run a Nova deployment with VLAN network type:
   a. Vlan range field: 10:15
   b. floating ip range for external network: 10.8.30.100/31
   c. private IP range for tenant networks: 192.168.100.0/21

Result:
The deployment gets paused on error during the deployment of the compute nodes.

The puppet error I get is:

Info: Applying configuration version '1406908167'
Error: Execution of '/usr/bin/nova-manage floating create 10.8.30.100/31' returned 1: Command failed, please check log for more info
Error: /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/Nova_floating[nova-vm-floating]/ensure: change from absent to present failed: Execution of '/usr/bin/nova-manage floating create 10.8.30.100/31' returned 1: Command failed, please check log for more info
Error: Execution of '/usr/bin/nova-manage network create novanetwork 192.168.100.0/21 6  --vlan 10' returned 1: Command failed, please check log for more info
Error: /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/Nova_network[nova-vm-net]/ensure: change from absent to present failed: Execution of '/usr/bin/nova-manage network create novanetwork 192.168.100.0/21 6  --vlan 10' returned 1: Command failed, please check log for more info
Notice: Finished catalog run in 5.72 seconds

Comment 3 Mike Burns 2014-08-01 19:39:43 UTC
This is likely either puppet (quickstack) or in nova.  Moving to OFI for further investigation

Comment 4 Brent Eagles 2014-08-01 20:33:27 UTC
I've verified the command "nova-manage network create novanetwork 192.168.100.0/21 6 --vlan 10" works fine running from the command line against a packstack install.

Comment 5 Jason Guiditta 2014-08-01 21:38:10 UTC
Sasha, can you see what is in the nova-manage log and attach it?  Also, do you get the same fail if you run that command by hand?  I am also wondering if it would be useful to compare your nova.conf and Brent's.

Comment 6 Jason Guiditta 2014-08-04 14:03:43 UTC
Also, the controller's yaml would make it easier for me to attempt to reproduce

Comment 7 Jason Guiditta 2014-08-04 17:49:01 UTC
So, I am not sure if I have the same settings or not, but with floating ip range for external network: 10.0.1.0/31, I got:

Error: Execution of '/usr/bin/nova-manage floating create 10.0.1.0/31' returned 1: Command failed, please check log for more info
Error: /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/Nova_floating[nova-vm-floating]/ensure: change from absent to present failed: Execution of '/usr/bin/no
va-manage floating create 10.0.1.0/31' returned 1: Command failed, please check log for more info

The error in nova-manage was:

2014-08-04 10:28:05.038 4807 CRITICAL nova [req-26de1d67-aad7-431f-8df9-949bd1bc2bd7 None None] InvalidInput: Invalid input received: /31 should be specified as single address(es) not in cidr format


The other command succeeded without issue (the one brent tried): 
Debug: Executing '/usr/bin/nova-manage network create novanetwork 10.0.0.0/21 1 6 --vlan 10'
Notice: /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/Nova_network[nova-vm-net]/ensure: created
Debug: /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/Nova_network[nova-vm-net]: The container Nova::Manage::Network[nova-vm-net] will propagate my refresh event
Debug: Nova::Manage::Network[nova-vm-net]: The container Class[Nova::Network] will propagate my refresh event

So, I canted the 10.0.1.0/31 to 10.0.1.0/24, and all succeeded.  I am not sure if /31 is generally invalid, or some special case for nova in this context, but I can show the success output:

Debug: Executing '/usr/bin/nova-manage floating list'
Debug: Executing '/usr/bin/nova-manage floating create 10.0.1.0/24'
Notice: /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/Nova_floating[nova-vm-floating]/ensure: created

Brent, any thoughts on this?

Comment 8 Jason Guiditta 2014-08-04 17:49:47 UTC
Created attachment 923969 [details]
Nova compute with working settings

Comment 9 Jason Guiditta 2014-08-04 17:57:33 UTC
(In reply to Jason Guiditta from comment #7)
> So, I am not sure if I have the same settings or not, but with floating ip
> range for external network: 10.0.1.0/31, I got:
> 
> Error: Execution of '/usr/bin/nova-manage floating create 10.0.1.0/31'
> returned 1: Command failed, please check log for more info
> Error:
> /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/
> Nova_floating[nova-vm-floating]/ensure: change from absent to present
> failed: Execution of '/usr/bin/no
> va-manage floating create 10.0.1.0/31' returned 1: Command failed, please
> check log for more info
> 
> The error in nova-manage was:
> 
> 2014-08-04 10:28:05.038 4807 CRITICAL nova
> [req-26de1d67-aad7-431f-8df9-949bd1bc2bd7 None None] InvalidInput: Invalid
> input received: /31 should be specified as single address(es) not in cidr
> format
> 
> 
> The other command succeeded without issue (the one brent tried): 
> Debug: Executing '/usr/bin/nova-manage network create novanetwork
> 10.0.0.0/21 1 6 --vlan 10'
> Notice:
> /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/
> Nova_network[nova-vm-net]/ensure: created
> Debug:
> /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/
> Nova_network[nova-vm-net]: The container Nova::Manage::Network[nova-vm-net]
> will propagate my refresh event
> Debug: Nova::Manage::Network[nova-vm-net]: The container
> Class[Nova::Network] will propagate my refresh event
> 
> So, I canted the 10.0.1.0/31 to 10.0.1.0/24, and all succeeded.  I am not
Oops, typo here ^ s/canted/changed

> sure if /31 is generally invalid, or some special case for nova in this
> context, but I can show the success output:
> 
> Debug: Executing '/usr/bin/nova-manage floating list'
> Debug: Executing '/usr/bin/nova-manage floating create 10.0.1.0/24'
> Notice:
> /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/
> Nova_floating[nova-vm-floating]/ensure: created
> 
> Brent, any thoughts on this?

Comment 10 Russell Bryant 2014-08-04 18:03:44 UTC
(In reply to Jason Guiditta from comment #7)
> So, I am not sure if I have the same settings or not, but with floating ip
> range for external network: 10.0.1.0/31, I got:
> 
> Error: Execution of '/usr/bin/nova-manage floating create 10.0.1.0/31'
> returned 1: Command failed, please check log for more info
> Error:
> /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/
> Nova_floating[nova-vm-floating]/ensure: change from absent to present
> failed: Execution of '/usr/bin/no
> va-manage floating create 10.0.1.0/31' returned 1: Command failed, please
> check log for more info
> 
> The error in nova-manage was:
> 
> 2014-08-04 10:28:05.038 4807 CRITICAL nova
> [req-26de1d67-aad7-431f-8df9-949bd1bc2bd7 None None] InvalidInput: Invalid
> input received: /31 should be specified as single address(es) not in cidr
> format
> 
> 
> The other command succeeded without issue (the one brent tried): 
> Debug: Executing '/usr/bin/nova-manage network create novanetwork
> 10.0.0.0/21 1 6 --vlan 10'
> Notice:
> /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/
> Nova_network[nova-vm-net]/ensure: created
> Debug:
> /Stage[main]/Nova::Network/Nova::Manage::Network[nova-vm-net]/
> Nova_network[nova-vm-net]: The container Nova::Manage::Network[nova-vm-net]
> will propagate my refresh event
> Debug: Nova::Manage::Network[nova-vm-net]: The container
> Class[Nova::Network] will propagate my refresh event
> 
> So, I canted the 10.0.1.0/31 to 10.0.1.0/24, and all succeeded.  I am not
> sure if /31 is generally invalid, or some special case for nova in this
> context, but I can show the success output:
> 
> Debug: Executing '/usr/bin/nova-manage floating list'
> Debug: Executing '/usr/bin/nova-manage floating create 10.0.1.0/24'
> Notice:
> /Stage[main]/Nova::Network/Nova::Manage::Floating[nova-vm-floating]/
> Nova_floating[nova-vm-floating]/ensure: created
> 
> Brent, any thoughts on this?

The error you found is indeed the issue.  Don't use /31.  Nova will reject a network specification if the number of addresses is < 4, which would be /30 or larger.

So, this doesn't appear to be a bug.  The UI could potentially use some validation to ensure a network of a proper size is specified.

Comment 11 Omri Hochman 2014-08-04 18:29:37 UTC
We should in QE use /24 (and not /31) . 

Example: 
external network: 10.8.30.100/24  private IP range for tenant networks: 192.168.100.0/24 


adding nova-manage.log: 
----------------------
2014-08-04 17:58:47.659 12098 CRITICAL nova [req-b1f47aa0-32c5-4233-ba95-5f6d122f2f13 None None] InvalidInput: Invalid input received: /31 should be specified as single address(es) not in cidr format
2014-08-04 17:58:47.659 12098 TRACE nova Traceback (most recent call last):
2014-08-04 17:58:47.659 12098 TRACE nova   File "/usr/bin/nova-manage", line 10, in <module>
2014-08-04 17:58:47.659 12098 TRACE nova     sys.exit(main())
2014-08-04 17:58:47.659 12098 TRACE nova   File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1374, in main
2014-08-04 17:58:47.659 12098 TRACE nova     ret = fn(*fn_args, **fn_kwargs)
2014-08-04 17:58:47.659 12098 TRACE nova   File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 439, in create
014-08-04 17:58:47.659 12098 TRACE nova     for address in self.address_to_hosts(ip_range))
2014-08-04 17:58:47.659 12098 TRACE nova   File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 415, in address_to_hosts
2014-08-04 17:58:47.659 12098 TRACE nova     raise exception.InvalidInput(reason=reason)
2014-08-04 17:58:47.659 12098 TRACE nova InvalidInput: Invalid input received: /31 should be specified as single address(es) not in cidr format
2014-08-04 17:58:47.659 12098 TRACE nova 
2014-08-04 17:58:50.061 12160 INFO nova.network.driver [-] Loading network driver 'nova.network.linux_net'
2014-08-04 17:58:50.174 12160 CRITICAL nova [req-b98b34c9-681a-48f1-8584-e272498d726b None None] ValueError: The network range is not big enough to fit 6 networks. Network size is 256
2014-08-04 17:58:50.174 12160 TRACE nova Traceback (most recent call last):
2014-08-04 17:58:50.174 12160 TRACE nova   File "/usr/bin/nova-manage", line 10, in <module>
2014-08-04 17:58:50.174 12160 TRACE nova     sys.exit(main())
2014-08-04 17:58:50.174 12160 TRACE nova   File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1374, in main
2014-08-04 17:58:50.174 12160 TRACE nova     ret = fn(*fn_args, **fn_kwargs)
2014-08-04 17:58:50.174 12160 TRACE nova   File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 528, in create
2014-08-04 17:58:50.174 12160 TRACE nova     net_manager.create_networks(context.get_admin_context(), **kwargs)
2014-08-04 17:58:50.174 12160 TRACE nova   File "/usr/lib/python2.7/site-packages/nova/network/manager.py", line 1860, in create_networks
2014-08-04 17:58:50.174 12160 TRACE nova     'size is %(network_size)s') % kwargs)
2014-08-04 17:58:50.174 12160 TRACE nova ValueError: The network range is not big enough to fit 6 networks. Network size is 256
2014-08-04 17:58:50.174 12160 TRACE nova 
/var/log/nova/nova-manage.log (END)

Comment 12 Omri Hochman 2014-08-04 20:07:30 UTC
Failed again with : 
external network: 10.8.30.100/24   ,
private IP range for tenant networks: 192.168.100.0/24

Comment 14 Alexander Chuzhoy 2014-08-05 20:47:39 UTC
Omri, did you install it with 2 compute nodes?

Comment 15 Omri Hochman 2014-08-06 08:49:22 UTC
Sasha, It was one compute on my environment.

Comment 16 Jason Guiditta 2014-08-12 15:48:53 UTC
This cannot be fixed in the puppet, it is a race condition that can occur if both compute nodes try to create a network at almost exactly the same time.  My opinion is that the compute nodes should be orchestrated by staypuft to not run concurrently.  Note that this is only the case for computes 1/2.  After 1 node is configured, you could configure as many more as you wanted simultaneously.

Comment 17 Ami Jeain 2014-08-19 14:55:06 UTC
it failed to me with:
Floating IP range for external network: 10.35.117.100/21 and
Fixed IP range for tenant networks: 192.168.100.0/22

Comment 18 Alexander Chuzhoy 2014-08-19 18:22:06 UTC
Reproduced with rhelosp-installer-live-6.5-20140818.3.iso

Comment 19 Alexander Chuzhoy 2014-08-19 18:23:00 UTC
Resuming the deployment after subsequent run of puppet makes the paused with errors deployment a successful one.

Comment 20 Ami Jeain 2014-08-20 06:50:59 UTC
I wish It would pause, but it goes all the way to a successful deployment, not allowing me to resume a failed run. I am basically stuck

Comment 22 Mike Burns 2014-08-21 13:21:26 UTC
*** Bug 1127766 has been marked as a duplicate of this bug. ***

Comment 23 Scott Seago 2014-08-26 13:58:53 UTC
PR is here: https://github.com/theforeman/staypuft/pull/273

The agreed-upon solution for now is that the first compute node will deploy, then the remainder will deploy in parallel. Once we're using PuppetSSH, this can be refactored to allow the provisioning to happen all in parallel, so only the puppet run will happen in this manner, with one completing, then the remainder of compute nodes will run puppet in parallel)

Comment 25 Alexander Chuzhoy 2014-08-29 21:59:48 UTC
Verified: 
rhel-osp-installer-0.1.10-2.el6ost.noarch
openstack-foreman-installer-2.0.22-1.el6ost.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch
openstack-puppet-modules-2014.1-21.7.el6ost.noarch


The issue didn't reproduce, the deployment has completed successfully.

Comment 27 errata-xmlrpc 2014-09-02 18:21:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1138.html


Note You need to log in before you can comment on or make changes to this bug.