Bug 984311 - Cloud-init fails to connect on guest running in Openstack with quantum
Cloud-init fails to connect on guest running in Openstack with quantum
Status: CLOSED WONTFIX
Product: Red Hat OpenStack
Classification: Red Hat
Component: distribution (Show other bugs)
3.0
x86_64 Linux
high Severity medium
: ---
: 3.0
Assigned To: RHOS Maint
Jaroslav Henner
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-14 14:56 EDT by Jaroslav Henner
Modified: 2016-04-26 13:05 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-03-23 02:52:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jaroslav Henner 2013-07-14 14:56:37 EDT
Description of problem:
Cloud-init is adding route
169.254.0.0/16 dev eth0
causing OS to think it should do ARP resolution of 169.254.169.254. This address is not a real address anywhere in the stack, there is only a NAT rule. This breaks the cloud init:

20130714 13:58:06,348  url_helper.py[WARNING]: Calling 'http://169.254.169.254/20090404/metadata/instanceid' failed [13/120s]: url error [[Errno 113] No route to host]

This is repeated several times and causes the VM booting to take AFAIK 4 minutes.

When trying to curl the failing URL on guest, I got the same error. I am though able to get the valid result after removing the route:
ip r d 169.254.0.0/16 dev eth0
curl http://169.254.169.254/2009-04-04/meta-data/instance-id

Note that there are missing dashes in the log in Horizon.


Version-Release number of selected component (if applicable):
cloud-init-0.7.1-2.el6.noarch
openstack-quantum-2013.1.2-4.el6ost.noarch
openstack-quantum-openvswitch-2013.1.2-4.el6ost.noarch

How reproducible:
always

Steps to Reproduce:
1. deploy using packstack
2. add image with cloud-init
3. boot the image
4. Check the console

Actual results:
cloud-init fails, time to login delayed significantly

Expected results:
not sure whether cloud-init should be adding the route. If it should, there need to be a port with 169.254.169.254 address assigned or the ARP reply has to be ensured by some other way (ARP table entry?).

Additional info:
[root@controller ~]# ip netns list | while read ns; do ip netns exec ip a| grep 169.254; done
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
Cannot open network namespace: No such file or directory
[root@controller ~]# ip netns list | while read ns; do ip netns exec $ns ip a | grep 169.254; done
[root@controller ~]# ip netns list | while read ns; do ip netns exec $ns ip r | grep 169.254; done
[root@controller ~]# ip r
10.ZZZ.XXX.0/23 dev br-ex  proto kernel  scope link  src 10.ZZZ.YYY.XXX 
169.254.0.0/16 dev eth0  scope link  metric 1002 
169.254.0.0/16 dev eth1  scope link  metric 1003 
169.254.0.0/16 dev br-eth1  scope link  metric 1004 
169.254.0.0/16 dev br-int  scope link  metric 1005 
169.254.0.0/16 dev br-ex  scope link  metric 1006 
default via 10.ZZZ.YYY.XXX dev br-ex
Comment 2 Jaroslav Henner 2013-07-15 04:36:38 EDT
This can be workarounded by 
openstack-config --set /etc/nova/nova.conf DEFAULT force_config_drive always
Comment 5 Jaroslav Henner 2013-09-16 08:14:35 EDT
NOZEROCONF helped. Do we have some notice somewhere that the guest should have it configured?
Comment 9 lpeer 2014-01-20 11:39:51 EST
Hi jhenner
Can you please check if this error also appears when you use Havana?
Comment 11 Jaroslav Henner 2014-02-17 09:23:58 EST
It looks like lo, nor tap devices support adding the multicast address, even it is enabled: 

[root@controller ~]# ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip maddr add 169.254.169.254 dev qr-d725dbcd-ab
ioctl: Invalid argument
[root@controller ~]# ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip maddr add 169.254.169.254 dev qg-19f7ff97-cc
ioctl: Invalid argument
[root@controller ~]# ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip maddr add 169.254.169.254 dev lo
ioctl: Invalid argument

44: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
146: qr-d725dbcd-ab: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:5a:13:f1 brd ff:ff:ff:ff:ff:ff
148: qg-19f7ff97-cc: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:f1:ca:35 brd ff:ff:ff:ff:ff:ff

However, I was able to do assign the address to a router or gateway:
ip netns exec qrouter-98ff3993-d665-4404-860e-ed8a04a5ddc7 ip a a 169.254.169.254 dev qr-d725dbcd-ab

Then the ip was then pingable even though there was the default route (zeroconf enabled)

I believe neutron should be doing because of how IPv4 zeroconf is implemented (the nodes should check whether the address is already present on the subnet prior to configuring it on the iface).
Comment 12 lpeer 2014-03-23 02:52:26 EDT
As far as I know we don't have customers using Neutron with Grizzly and there is no customer ticket associated with this bug.

I am closing this bug as won't fix as we are very short in resources.

Note You need to log in before you can comment on or make changes to this bug.