Bug 1351795

Summary:	Isolated metadata will not work if ipv6 subnet is created first
Product:	Red Hat OpenStack	Reporter:	kahou <kalei>
Component:	openstack-neutron	Assignee:	Jakub Libosvar <jlibosva>
Status:	CLOSED DUPLICATE	QA Contact:	Toni Freger <tfreger>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	8.0 (Liberty)	CC:	amuller, charcrou, chrisw, jdonohue, jlibosva, kalei, nyechiel, skulkarn, srevivo
Target Milestone:	async	Keywords:	ZStream
Target Release:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Linux
Whiteboard:	hot
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-01-18 14:10:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1194008

Description kahou 2016-06-30 21:28:05 UTC

Description of problem:

we notice that VM cannot talk to metadata server when the VM is booting up.

Turns out the metadata agent cannot talk to neutron server to get the port information as the metadata agent locked up itself.

If I do either one of the following options, then metadata server will start working again:
1. Turn off syslog
2. Change the metadata agent worker to 0

Version-Release number of selected component (if applicable):

Neutron 7.0

Steps to Reproduce:
1. Boot a cirros instance
2. Run nova console-log <vm>
3. Observe the vm cannot talk to the metadata server

Actual results:


Expected results:


Additional info:

Comment 1 kahou 2016-06-30 21:29:39 UTC

This may relate to https://bugzilla.redhat.com/show_bug.cgi?id=1330778

Comment 2 Charles Crouch 2016-07-18 21:55:13 UTC

FWIW Kahou tested this issue with the fix from https://bugzilla.redhat.com/show_bug.cgi?id=1330778 and the problem remained, i.e. doing

[admin@mcp1 ~]$ sudo ip netns exec qdhcp-c2c6cb56-d741-4650-9dc6-db0f7e59dea2 curl http://169.254.169.254/

is expected to return immediately but it hangs.

So this issue and BZ1330778 appear to have different causes

Comment 3 Charles Crouch 2016-07-18 21:57:02 UTC

Also here is the corresponding support case: https://access.redhat.com/support/cases/#/case/01640942

Comment 4 Charles Crouch 2016-07-20 14:25:00 UTC

My bad please ignore the above, that case is for a different issue we have with the neutron metadata agent (BZ1339014). We dont have a support case for this BZ

Comment 5 Assaf Muller 2016-07-26 21:04:41 UTC

Jakub can you please help triage?

Comment 6 Jakub Libosvar 2016-07-26 21:27:30 UTC

(In reply to Charles Crouch from comment #4)
> My bad please ignore the above, that case is for a different issue we have
> with the neutron metadata agent (BZ1339014). We dont have a support case for
> this BZ

The symptoms for bug 1339014 where that rpc was initialized before forking process. According description of this bug it seems like the same symptom. Is it possible to test the metadata agent with the same hotfix provided for bug 1339014 ?

Comment 7 kahou 2016-08-02 22:20:37 UTC

Hi,

I installed the suggested build but I still see the issue.

info: initramfs: up at 2.34
GROWROOT: CHANGED: partition=1 start=16065 old: size=64260 end=80325 new: size=2072385,end=2088450
info: initramfs loading root from /dev/vda1
info: /etc/init.d/rc.sysinit: up at 2.44
info: container: none
Starting logging: OK
modprobe: module virtio_blk not found in modules.dep
modprobe: module virtio_net not found in modules.dep
WARN: /etc/rc3.d/S10-load-modules failed
Initializing random number generator... done.
Starting acpid: OK
cirros-ds 'local' up at 2.56
no results found for mode=local. up 2.60. searched: nocloud configdrive ec2
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending select for 10.209.0.3...
Lease of 10.209.0.3 obtained, lease time 14400
route: SIOCADDRT: File exists
WARN: failed: route add -net "0.0.0.0/0" gw "10.209.0.1"
cirros-ds 'net' up at 2.65
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 2.66. request failed
failed 2/20: up 14.70. request failed
failed 3/20: up 26.71. request failed
failed 4/20: up 38.72. request failed
failed 5/20: up 50.73. request failed
failed 6/20: up 62.75. request failed

Comment 8 kahou 2016-08-02 22:20:48 UTC

Hi,

I installed the suggested build but I still see the issue.

info: initramfs: up at 2.34
GROWROOT: CHANGED: partition=1 start=16065 old: size=64260 end=80325 new: size=2072385,end=2088450
info: initramfs loading root from /dev/vda1
info: /etc/init.d/rc.sysinit: up at 2.44
info: container: none
Starting logging: OK
modprobe: module virtio_blk not found in modules.dep
modprobe: module virtio_net not found in modules.dep
WARN: /etc/rc3.d/S10-load-modules failed
Initializing random number generator... done.
Starting acpid: OK
cirros-ds 'local' up at 2.56
no results found for mode=local. up 2.60. searched: nocloud configdrive ec2
Starting network...
udhcpc (v1.20.1) started
Sending discover...
Sending select for 10.209.0.3...
Lease of 10.209.0.3 obtained, lease time 14400
route: SIOCADDRT: File exists
WARN: failed: route add -net "0.0.0.0/0" gw "10.209.0.1"
cirros-ds 'net' up at 2.65
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 2.66. request failed
failed 2/20: up 14.70. request failed
failed 3/20: up 26.71. request failed
failed 4/20: up 38.72. request failed
failed 5/20: up 50.73. request failed
failed 6/20: up 62.75. request failed

Comment 9 Jakub Libosvar 2016-08-05 14:32:53 UTC

(In reply to kahou from comment #8)
> Hi,
> 
> I installed the suggested build but I still see the issue.
> 
> info: initramfs: up at 2.34
> GROWROOT: CHANGED: partition=1 start=16065 old: size=64260 end=80325 new:
> size=2072385,end=2088450
> info: initramfs loading root from /dev/vda1
> info: /etc/init.d/rc.sysinit: up at 2.44
> info: container: none
> Starting logging: OK
> modprobe: module virtio_blk not found in modules.dep
> modprobe: module virtio_net not found in modules.dep
> WARN: /etc/rc3.d/S10-load-modules failed
> Initializing random number generator... done.
> Starting acpid: OK
> cirros-ds 'local' up at 2.56
> no results found for mode=local. up 2.60. searched: nocloud configdrive ec2
> Starting network...
> udhcpc (v1.20.1) started
> Sending discover...
> Sending select for 10.209.0.3...
> Lease of 10.209.0.3 obtained, lease time 14400
> route: SIOCADDRT: File exists
> WARN: failed: route add -net "0.0.0.0/0" gw "10.209.0.1"
> cirros-ds 'net' up at 2.65
> checking http://169.254.169.254/2009-04-04/instance-id
> failed 1/20: up 2.66. request failed
> failed 2/20: up 14.70. request failed
> failed 3/20: up 26.71. request failed
> failed 4/20: up 38.72. request failed
> failed 5/20: up 50.73. request failed
> failed 6/20: up 62.75. request failed

Can you please provide debug logs from metadata agent? Does it work, when you turn off syslog? Do you use "use_syslog" config option?

Comment 10 Charles Crouch 2016-08-09 16:33:02 UTC

[5:28 PM] Kahou Lei: Turn off syslog doesn't work either. Sorry I forgot to update the ticket
[5:29 PM] Charles Crouch: thanks, then I think they will definitely need some logs
[5:29 PM] Kahou Lei: Tell him that metadata log doesn't show anything even I turn on debug log level

Comment 11 Jakub Libosvar 2016-08-10 09:15:16 UTC

(In reply to Charles Crouch from comment #10)
> [5:28 PM] Kahou Lei: Turn off syslog doesn't work either. Sorry I forgot to
> update the ticket
> [5:29 PM] Charles Crouch: thanks, then I think they will definitely need
> some logs
> [5:29 PM] Kahou Lei: Tell him that metadata log doesn't show anything even I
> turn on debug log level

That sounds like configuration issue of loggers. If there are no logs, can we get the sos report, please?

Comment 12 kahou 2016-09-19 15:04:46 UTC

Hi Jakub,

Sorry for the late reply. I will generate the sosreport by tomorrow.

Thanks,
Kahou

Comment 13 kahou 2016-09-20 17:22:36 UTC

Hi Jakub,

Due to some technical issue which delay the debugging. I am still trying to gather the sosreport.

Thanks,
Kahou

Comment 14 Charles Crouch 2016-10-04 15:05:48 UTC

Quick update
There has been good progress on this issue in the background between Red Hat and Metacloud engineering. There should be an update posted this week with further details.

Comment 15 Jakub Libosvar 2016-11-21 09:51:53 UTC

Any updates on this?

Comment 16 kahou 2016-11-21 19:15:26 UTC

Hi Jakub,

This is the same issue as the upstream bug: https://bugs.launchpad.net/neutron/+bug/1556991

Thanks,
Kahou