Bug 1335277

Summary: dhcp_release isn't ran on a originating compute when live migrating a VM from computeA to computeB
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-novaAssignee: Artom Lifshitz <alifshit>
Status: CLOSED WONTFIX QA Contact: Prasanth Anbalagan <panbalag>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.0 (RHEL 7)CC: akaris, alifshit, awaugama, berrange, byount, dasmith, dhill, eglynn, kchamart, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---Keywords: ZStream
Target Release: 5.0 (RHEL 7)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1371636 1371664 1371673 1371699 (view as bug list) Environment:
Last Closed: 2017-02-03 16:38:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1371636, 1371664, 1371673, 1371699    

Description David Hill 2016-05-11 18:27:14 UTC
Description of problem:
dhcp_release isn't ran on the originating compute when live migrating a VM from computeA to computeB and if dhcp_lease_time is set to a high value, the lease will never expire and the original compute will retain the DHCP lease in dnsmasq which will fail to re-allocate the same IP on the originating compute once the original VM is destroyed

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1) launch a VM with a known IP on node 1
2) live-migrate the VM away to node 2
3) delete the VM while it's still on node 2
4) launch a new VM on node 1 using the same IP

Node 1 refuses to give the IP to the new VM, thinking that the IP is owned by the old VM.

It's a live-migration bug. After a VM is live-migrated to node 2, the lease cache on node 1 is not cleared.

This is usually not an issue with the default 2-min dhcp lease time. But this environment, dhcp_lease_time is set to 604800s (or 7 days).

Actual results:
IP should be free

Expected results:
IP is still in the lease cache of dnsmasq

Additional info:

Comment 1 Artom Lifshitz 2016-05-24 23:10:23 UTC
Hello,

Fist of all, just to make sure, can we explicitly confirm that nova-network is in use here and not Neutron?

I've reproduced what I think is the same behaviour in Nova upstream master. 

1. Boot an instance
2. Live-migrate it
2. Delete it
3. Boot another instance with the same IP

This fails with "Fixed IP address is already in use on instance"

As a control, I tried:

1. Boot an instance
2. Delete it
3. Boot another instance with the same IP

This succeeds.

However, I'm not sure it has anything to do with the DHCP lease not being released. Rather, it seems live-migrating an instance somehow causes its fixed IPs to remain associate with the deleted instance in the database even if the instance itself has been deleted. To confirm this, would it be possible to attach sosreports to this BZ?

If we confirm I've indeed observed the same behaviour in Nova master as you're seeing in RHOS 5 I'll need to submit an upstream bugfix and then do a downstream-only backport to RHOS 5, as Icehouse is no longer supported upstream.

Cheers!

Comment 2 David Hill 2016-05-24 23:13:21 UTC
Hello sir,

   I can confirm it is openstack-nova-network that is being used and that killing dnsmasq and restarting nova-network solves this issue.

Thank you very much,

David Hill