Bug 1729007
Summary: | Metadata proxy not ready when vm spawns when using DVR | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Slawek Kaplonski <skaplons> |
Component: | openstack-neutron | Assignee: | Slawek Kaplonski <skaplons> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eran Kuris <ekuris> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 15.0 (Stein) | CC: | amuller, bcafarel, chrisw, njohnston, ralonsoh, scohen |
Target Milestone: | --- | Keywords: | Triaged, ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-26 07:57:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 1
Luigi Toscano
2019-07-11 08:44:58 UTC
Hi Luigi, It is definitely different issue as this BZ is related to ML2/OVS setup. And it also happens in U/S CI where we are using Devstack and there are no containers at all. Once again description of bug: It happens sometimes in our CI that when DVR is used and vm is spawned on host, vm is unpaused and booting before L3 agent prepares router namespace and metadata proxy for this router. That cause problem with connectivity to the metadata service from vm thus e.g. public key is not configured on instance, ssh to it is not possible and test fails. Sending select for 10.100.0.13... Lease of 10.100.0.13 obtained, lease time 86400 route: SIOCADDRT: File exists WARN: failed: route add -net "0.0.0.0/0" gw "10.100.0.1" checking http://169.254.169.254/2009-04-04/instance-id failed 1/20: up 3.43. request failed failed 2/20: up 15.44. request failed failed 3/20: up 27.46. request failed failed 4/20: up 39.47. request failed failed 5/20: up 51.48. request failed failed 6/20: up 63.50. request failed failed 7/20: up 75.52. request failed failed 8/20: up 87.53. request failed failed 9/20: up 99.54. request failed failed 10/20: up 111.55. request failed failed 11/20: up 123.57. request failed failed 12/20: up 135.58. request failed failed 13/20: up 147.59. request failed failed 14/20: up 159.61. request failed failed 15/20: up 171.62. request failed failed 16/20: up 183.63. request failed failed 17/20: up 195.64. request failed[ 205.660296] random: nonblocking pool is initialized failed 18/20: up 207.67. request failed failed 19/20: up 219.68. request failed failed 20/20: up 231.69. request failed failed to read iid from metadata. tried 20 failed to get instance-id of datasource Top of dropbear init script It happens also in U/S CI quite often. Going to track mlavalle's efforts upstream to debug this issue. I think I found what is the reason. It is race condition when 2 routers are created in short time and configured on same snat node. Then when both routers are configuring external gateway it may happend that one of routers will add external net to subscribers list in https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_fip_ns.py#L129 so second router will got info that it's not "first" and will go to update gateway port instead of creating it. But if in fact gateway wasn't created yet it will cause exception in: https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_fip_ns.py#L332 And if this will happend, one of routers will not have properly configured iptables rules to allow requests to 169.254.169.254 so metadata will not work for this instance. |