984311 – Cloud-init fails to connect on guest running in Openstack with quantum

Bug 984311 - Cloud-init fails to connect on guest running in Openstack with quantum

Summary: Cloud-init fails to connect on guest running in Openstack with quantum

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	distribution
Sub Component:
Version:	3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	3.0
Assignee:	RHOS Maint
QA Contact:	Jaroslav Henner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-14 18:56 UTC by Jaroslav Henner
Modified:	2016-04-26 17:05 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-03-23 06:52:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	971188	0	unspecified	CLOSED	Console log lacks dashes.	2021-02-22 00:41:40 UTC

Internal Links: 971188

Description Jaroslav Henner 2013-07-14 18:56:37 UTC

Description of problem:
Cloud-init is adding route
169.254.0.0/16 dev eth0
causing OS to think it should do ARP resolution of 169.254.169.254. This address is not a real address anywhere in the stack, there is only a NAT rule. This breaks the cloud init:

20130714 13:58:06,348  url_helper.py[WARNING]: Calling 'http://169.254.169.254/20090404/metadata/instanceid' failed [13/120s]: url error [[Errno 113] No route to host]

This is repeated several times and causes the VM booting to take AFAIK 4 minutes.

When trying to curl the failing URL on guest, I got the same error. I am though able to get the valid result after removing the route:
ip r d 169.254.0.0/16 dev eth0
curl http://169.254.169.254/2009-04-04/meta-data/instance-id

Note that there are missing dashes in the log in Horizon.


Version-Release number of selected component (if applicable):
cloud-init-0.7.1-2.el6.noarch
openstack-quantum-2013.1.2-4.el6ost.noarch
openstack-quantum-openvswitch-2013.1.2-4.el6ost.noarch

How reproducible:
always

Steps to Reproduce:
1. deploy using packstack
2. add image with cloud-init
3. boot the image
4. Check the console

Actual results:
cloud-init fails, time to login delayed significantly

Expected results:
not sure whether cloud-init should be adding the route. If it should, there need to be a port with 169.254.169.254 address assigned or the ARP reply has to be ensured by some other way (ARP table entry?).

Additional info:
[root@controller ~]# ip netns list | while read ns; do ip netns exec ip a| grep 169.254; done
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
[root@controller ~]# ip netns list | while read ns; do ip netns exec $ns ip a | grep 169.254; done
[root@controller ~]# ip netns list | while read ns; do ip netns exec $ns ip r | grep 169.254; done
[root@controller ~]# ip r
10.ZZZ.XXX.0/23 dev br-ex  proto kernel  scope link  src 10.ZZZ.YYY.XXX 
169.254.0.0/16 dev eth0  scope link  metric 1002 
169.254.0.0/16 dev eth1  scope link  metric 1003 
169.254.0.0/16 dev br-eth1  scope link  metric 1004 
169.254.0.0/16 dev br-int  scope link  metric 1005 
169.254.0.0/16 dev br-ex  scope link  metric 1006 
default via 10.ZZZ.YYY.XXX dev br-ex

Comment 2 Jaroslav Henner 2013-07-15 08:36:38 UTC

This can be workarounded by 
openstack-config --set /etc/nova/nova.conf DEFAULT force_config_drive always

Comment 5 Jaroslav Henner 2013-09-16 12:14:35 UTC

NOZEROCONF helped. Do we have some notice somewhere that the guest should have it configured?

Comment 9 lpeer 2014-01-20 16:39:51 UTC

Hi jhenner
Can you please check if this error also appears when you use Havana?

Comment 11 Jaroslav Henner 2014-02-17 14:23:58 UTC

It looks like lo, nor tap devices support adding the multicast address, even it is enabled: 

[root@controller ~]# ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip maddr add 169.254.169.254 dev qr-d725dbcd-ab
ioctl: Invalid argument
[root@controller ~]# ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip maddr add 169.254.169.254 dev qg-19f7ff97-cc
ioctl: Invalid argument
[root@controller ~]# ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip maddr add 169.254.169.254 dev lo
ioctl: Invalid argument

44: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
146: qr-d725dbcd-ab: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:5a:13:f1 brd ff:ff:ff:ff:ff:ff
148: qg-19f7ff97-cc: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:f1:ca:35 brd ff:ff:ff:ff:ff:ff

However, I was able to do assign the address to a router or gateway:
ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip a a 169.254.169.254 dev qr-d725dbcd-ab

Then the ip was then pingable even though there was the default route (zeroconf enabled)

I believe neutron should be doing because of how IPv4 zeroconf is implemented (the nodes should check whether the address is already present on the subnet prior to configuring it on the iface).

Comment 12 lpeer 2014-03-23 06:52:26 UTC

As far as I know we don't have customers using Neutron with Grizzly and there is no customer ticket associated with this bug.

I am closing this bug as won't fix as we are very short in resources.

Note You need to log in before you can comment on or make changes to this bug.