Red Hat Bugzilla – Bug 1469780
Instance boots with two IPs although only one net-id is supplied
Last modified: 2018-02-20 13:11:12 EST
Description of problem: An instance is created using the command openstack server create --image 2017-06-21-ocp-3.6.121-atomic --flavor m1.small --nic net-id=85caaf07-bf6f-415a-a616-44e0325baad8 svt-i-1. The instance is expected to have only one IP annd one NIC. However, the instance boots with two IPs and two NICs(eth0 and eth1), however eth1 wasn;t even setup (had no entry in /etc/sysconfig/network-scripts/)
On doing an openstack server list, we see the VM with two ips on the same subnet
| 743427ab-af57-4704-a9d1-1d02f5c9ed3a | svt-ng-23 | ACTIVE | private=172.16.1.225, 172.16.1.222 | 2017-06-28-ocp-126.96.36.199-2-rhel |
The instance id is 743427ab-af57-4704-a9d1-1d02f5c9ed3a
Both neutron ports are active. The port ids of the neutron ports are
The instance is only reachable via the IP 172.16.1.225.
Grepping with the instance ID and port ids in the neutron and nova logs on the 3 controllers we have the following: https://gist.github.com/smalleni/3ae0792440d0276f62f9d865ce0001ad
On doing an ip a on the instance we see https://gist.github.com/smalleni/2214f9ca06e22f4e4ad31a9f2cf5702f
Version-Release number of selected component (if applicable):
Happened on two instances in this environment
Steps to Reproduce:
1. Boot an instance, passing only one net-id
Instance should boot with one IP
Instance boots with two neutron ports and IPs
Can you please attach Nova and Neutron logs from all controllers?
I grepped for instance id, and the two port ids in nova/neutron logs and pasted it here- https://gist.github.com/smalleni/3ae0792440d0276f62f9d865ce0001ad
Do you need the full logs? There's a lot of them.
From the provided logs it seems Neutron is asked twice for the port, we have following ports:
9cbec257-81ac-4b4a-b0b2-bbaa0c13f26c created at: 2017-07-10T22:52:46Z
380883d7-e9c2-4bb0-9c11-a96ae94a0d01 created at: 2017-07-10T22:53:22Z
I can see for each port a POST request:
2017-07-10 22:52:47.453 [c0] 129603 INFO neutron.wsgi [req-26a883fa-f597-4cc5-beac-1334c9ad61a1 accde8c8d00f4ae0a765fa7df5a20d7e 6934d26fc8f64fbdb12a80de7850b958 - - -] 192.168.25.16 - - [10/Jul/2017 22:52:47] "POST /v2.0/ports.json HTTP/1.1" 201 1047 1.111623
2017-07-10 22:53:23.607 [c2] 490233 INFO neutron.wsgi [req-fa3834ea-d3e6-4012-b213-ca991aa9cae2 accde8c8d00f4ae0a765fa7df5a20d7e 6934d26fc8f64fbdb12a80de7850b958 - - -] 192.168.25.16 - - [10/Jul/2017 22:53:23] "POST /v2.0/ports.json HTTP/1.1" 201 1047 1.049874
So either nova or haproxy sends the request twice.
I took a look at the env, debug wasn't universally enabled to it was of limited use. I'll try to use the sosreports from bz 1390199 to figure it out. Comment 5 in that bz  is suggesting that this happens if an instance is rescheduled - this would make it the same as upstream bug 1609526 . If that's indeed what's happening we'll move the bug to opentack-nova and take care of the fix.
I'm flipping this to nova to triage as the symptoms are the same as in upstream bug from comment 4 - https://bugs.launchpad.net/nova/+bug/1639230
But it should be fixed in Newton while this is OSP 11 (Ocata).
This can't be bug 1639230 , because that's been fixed in Ocata (OSP11) and backported to Newton (OSP10).
I took a look at the sosreports in case 01927760. Based on comment #6, I looked for instance 35bed9a2-071b-472d-a3e8-cfb6c823842a in order to trace its creation from nova API accepting the boot request to the instance running on the compute. I found the following line in the compute log (slightly formatted for legibility):
$ grep 2>/dev/null -I 35bed9a2-071b-472d-a3e8-cfb6c823842a.*build -R .
2017-09-08 12:17:39.716 21691 INFO nova.compute.manager
1a4003cd5dd5425799a578968c2b7c1a - - -]
Took 4.79 seconds to build instance.
So it looks like the request ID that we need to trace is req-d532789b-4b1e-4b55-b885-391a055d1c6a. Now the confusion starts. In the conductor logs, that request ID is associated with the following instance IDs:
However, in the compute logs:
* Instance 72fdcf4e-7bd9-4b46-97dd-225c27ff31c4 does not appear for that
* Instance 0bb50738-1d96-43c3-a09c-5656d5859ab0 does appear, thankfully
* Other instance IDs appear for that request ID:
I'm guessing the most reasonable explanation for all this is that:
1. The boot request that created instance 35bed9a2-071b-472d-a3e8-cfb6c823842a
was for multiple instances
2. Some sosreports from other computes and controllers are missing.
For point 1, can I ask you to do some testing and rule out whether this bug needs a multi-VM boot request to be reproduced? In other words, can this bug be reproduced by booting a single VM at a time?
For point 2, I would like to have sosreports from all nodes.
If either point 1 or 2 are wrong - in other words, if the new instance IDs in nova-compute.log are not from a different controller whose sosreports I don't have - then I would think about blaming rabbitmq, perhaps because the compute service is getting instance build requests replayed more than once for a given instance.
Before we go there however, I would like to check points 1 and 2 with you.