Bug 1300584
Summary: | Backport: Tracker for IPV6 router is not working with VRRP | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Nir Magnezi <nmagnezi> |
Component: | openstack-neutron | Assignee: | Nir Magnezi <nmagnezi> |
Status: | CLOSED WONTFIX | QA Contact: | Toni Freger <tfreger> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.0 (Juno) | CC: | amuller, chrisw, dcadzow, ihrachys, jschluet, mburns, nyechiel, oblaut, srevivo, tfreger, vcojot |
Target Milestone: | async | Keywords: | FeatureBackport, ZStream |
Target Release: | 6.0 (Juno) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-neutron-2014.2.3-34.el7ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1222775 | Environment: | |
Last Closed: | 2016-05-02 19:03:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1222775 | ||
Bug Blocks: | 1300580 |
Comment 6
Toni Freger
2016-04-13 16:27:37 UTC
I have used the very same setup to debug this issue, the root cause: After hosts reboot ports seem to be wired correctly (neutron wise). When radvd send IPv6 router advertisements (which does not always happen, hold on..), they reach the instances right-away and the instances obtain an IPv6 address. Upon reproduction (reboot servers), nova instances indeed won't obtain IPv6 addresses. Looking closely I've found the following: 1. Instance won't get router advertisements, hence no IPv6 addresses. 2. radvd takes a very long time (sometimes, a lot more than a minute) to send the first router advertisement. 3. In some cases there won't be any router advertisements coming from radvd (at that time radvd process is running). Looking at /var/log/messages (on both nodes) I saw the following error: radvd[xxxx]: no linklocal address configured for qr-27493a15-58 radvd[xxxx]: sendmsg: Cannot assign requested address radvd[xxxx]: resuming normal operation and then, numerous repeats of: radvd[xxxx]: resetting ipv6-allrouters membership on qr-27493a15-58 Backup Node: ============ This error is indeed expected since neutron won't configure any IP address in the qrouter namespace, Hence no Link Local Address: # sudo ip netns exec qrouter-01efcb45-0589-4313-8b3a-49057524656c ifconfig qr-27493a15-58 qr-27493a15-58: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether fa:16:3e:cf:e6:10 txqueuelen 0 (Ethernet) RX packets 543 bytes 59138 (57.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1 bytes 110 (110.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 -> There is another problem here with the fact that radvd is running in backup node to begin with, I will explaing this in a bit. Master Node: ============ This error should not be expected, since we do have a Link Local Address configured: # ip netns exec qrouter-01efcb45-0589-4313-8b3a-49057524656c ifconfig qr-27493a15-58 qr-27493a15-58: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::f816:3eff:fecf:e610 prefixlen 64 scopeid 0x20<link> inet6 2001:db3::1 prefixlen 64 scopeid 0x0<global> ether fa:16:3e:cf:e6:10 txqueuelen 0 (Ethernet) RX packets 105 bytes 9574 (9.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 602 bytes 65932 (64.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 -> What happens here is that radvd is spawned too early, causing it to complain with 'no linklocal address configured'. That address is configured afterwards but radvd take a long time, if any, to recover. As long as radvd is in that state, it won't send any router advertisements. To conclude we have two issues here: ==================================== 1. radvd is spawned on the backup node, which it shouldn't since it will enter an error state and chances are that it will fail to recover upon master/backup transition. 2. radvd is spawned too early on the master node, which leads to an error state and lack of router advertisements. There is a fix[1] for this by Sridhar Gaddam, starting from Kilo. If you read the commit message[2] you will see it addresses both issues mentioned above. There was a major overall to the L3 agent code, so we cannot easily cherry-pick Sridhar's fix. We would have to think if and how this issue can be fixed in OSP6. [1] https://review.openstack.org/#/c/179392 [2] https://review.openstack.org/#/c/179392/1//COMMIT_MSG Hey Toni, The issue you describe in comment #6 is a different bug by itself. Did you manage to verify the keepalived configuration? Keeplived configuration is correct. After further investigation, it seems like the fix cannot be implemented in OSP6. The reason is that the fix[1] for the two issues described in comment 7 requires the neutron l3-agent to be aware of its state, meaning it must be able to determine if it is currently in MASTER or BACKUP state. In the OSP6 this information is external to the neutron codebase. that information was held exclusively in keepalived. Starting from OSP7 (Kilo), thanks to vast modification and additions[2] to the l3-agent code base, the agent became aware of its current state, hence we can make decisions by that information and by events such as failover. Therefore, in order to fix this bug in OSP6 we will need to implement a whole new feature. Due to the above, closing as WONTFIX. [1] https://review.openstack.org/#/c/179392 [2] http://specs.openstack.org/openstack/neutron-specs/specs/kilo/report-ha-router-master.html |