Bug 1478061 - An instance doesn't get an IP after deployment
Summary: An instance doesn't get an IP after deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 13.0 (Queens)
Assignee: Sridhar Gaddam
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-03 13:54 UTC by Itzik Brown
Modified: 2018-10-18 07:18 UTC (History)
5 users (show)

Fixed In Version: opendaylight-8.0.0-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
N/A
Last Closed: 2018-06-27 13:33:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight Bug 8926 0 None None None 2017-08-03 14:03:57 UTC
OpenDaylight gerrit 61995 0 None None None 2017-08-18 17:26:24 UTC
Red Hat Product Errata RHEA-2018:2086 0 None None None 2018-06-27 13:34:57 UTC

Description Itzik Brown 2017-08-03 13:54:01 UTC
Description of problem:
After deployment when launching an instance - it doesn't get an IP.
When launching a second instance on the same node - it gets and IP.
Then when rebooting the first one - it gets an IP

Version-Release number of selected component (if applicable):
opendaylight-6.1.0-2.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.Deploy an overcloud
2.Launch an instance.
3.Open The proper security group rules
4.Verify that it doesn't get an IP (can be pinged from the DHCP name space).
5.Launch an instance on the same node as the first one and make sure it gets an IP.

Actual results:


Expected results:


Additional info:

Comment 5 Sridhar Gaddam 2017-08-18 17:25:22 UTC
In a fresh multinode deployment with Controller node running
ODL + dhcp-agent and a Compute node, when we spawn a first VM
on the compute node, it was seen that VM does not acquire the
IPAddress. On debugging, it turns out that the remote broadcast
group entries were not programmed on the Compute node. 

Setup details:
1. Multi-node with Controller and a Compute node.
2. Create a tenant neutron network with an IPv4 subnet.
3. Create a neutron router.
4. Associate the ipv4 subnet to the neutron router.

At this stage, you can see that there is no tunnel between
Controller node and Compute node.

5. Now spawn a VM on the Compute node (you can explicitly
   specify that VM has to be spawned on the compute node
   by passing --availability-zone to nova boot command).

When the VM is spawned, following would be the sequence
of events.

t1: Nova creates a tap interface for the VM, this translates
    to an add event for elanInterface (i.e., ElanInterfaceStateChangeListener
    is invoked, and addElanInterface gets processed)
t2: In addElanInterface, elanManager checks if the interface
    is part of existingElanDpnInterfaces (i.e., DpnInterfaces YANG model)
t3: Since its a new interface, it invokes createElanInterfacesList()
    which would update the DpnInterfaces model. At this stage, the
    transaction/information is still not comitted to the datastore.
t4: The processing continues to installEntriesForFirstInterfaceonDpn(),
    where we try to program the local/remote BC Group entries. 
    In this API, we have an explicit sleep for (300 + 300) ms and when we
    try to query the getEgressActionsForInterface (which is an API in GENIUS).
    GENIUS returns an empty list with the following reason - "Interface
    information not present in oper DS for the tunnel interface".
t5: So the remote BC Group does not include the actions to send the
    packets over the tunnel interface at this stage.
t6: addElanInterface processing continues further and we commit the
    transaction (i.e., DpnInterfaces model is now updated in the datastore).

While t1 to t6 is going on, in parallel, auto-tunnel code in GENIUS
creates the tunnel interfaces.

A1: A tunnel interface is created on the Compute node. When the tunnel interface
    state is up, TunnelsState YANG model is updated in GENIUS (ItmTunnelStateUpdateHelper).
A2: A notification is received in ElanTunnelInterfaceStateListener, which
    is handled in the following api - handleInternalTunnelStateEvent.
A3: In this API, when we query the ElanDpnInterfaces it only includes
    the DPNInfo of the Controller and not the Compute node (because of
    the delay in updating the model step t3-t6 above) 
A4: Due to this, in handleInternalTunnelStateEvent, we do not invoke
    setupElanBroadcastGroups() to program the Remote Group entries
    on the Compute node and the remote Broadcast Group entries on
    the Compute node never get updated.

So, the fix is, not to delay the updation of model (i.e., DpnInterfaces
until step t6) since this information is used while processing
ElanTunnelInterfaceState.

The following patch would address this issue.
https://git.opendaylight.org/gerrit/#/c/61995/

Comment 6 Sridhar Gaddam 2017-08-28 06:49:38 UTC
Upstream patch is now merged.

Comment 11 Itzik Brown 2018-03-18 11:23:45 UTC
Checked with:
opendaylight-8.0.0-2.el7ost.noarch

Comment 16 errata-xmlrpc 2018-06-27 13:33:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.