Bug 1459644

Summary: Getting DHCP conflicts while creating multiple hosts at same time via REST API
Product: Red Hat Satellite Reporter: Nithin Thomas <nitthoma>
Component: DHCP & DNSAssignee: Dmitri Dolguikh <ddolguik>
Status: CLOSED ERRATA QA Contact: Roman Plevka <rplevka>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2.9CC: bbuckingham, bkearney, ddolguik, dmitri, egolov, ehelms, inecas, jcallaha, jcoopman, lzap, mhulan, nitthoma, pcreech, rplevka, satellite6-bugs, tstrachota
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: foreman-proxy-1.15.6.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1472891 (view as bug list) Environment:
Last Closed: 2018-02-21 16:54:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1514885    
Bug Blocks: 1472891    

Description Nithin Thomas 2017-06-07 17:08:12 UTC
Description of problem:

Customer getting the following error while building multiple hosts at the same time:

Error Message:

"DHCP records xxxx31174.idm.xxx.xxx.com-14:02:ec:86:8d:0c/10.91.241.31 already exists"

Version-Release number of selected component (if applicable):

satellite 6.2

How reproducible:

Always

Steps to Reproduce:

1. Use REST API to create multiple hosts at same time
   HTTP  POST to https://satellite.server.com/api/v2/hosts


Actual results:

Error Message:

"DHCP records xxxx31174.idm.xxx.xxx.com-14:02:ec:86:8d:0c/10.91.241.31 already exists"

Expected results:

Hosts created successfully.

Additional info:

Customer added a 5-second delay in the DHCP API call before it moves on to the build and that seems to be a workaround to the issue.

Comment 5 Lukas Zapletal 2017-06-10 10:08:17 UTC
Hello,

foreman-proxy process keeps track of unused IPs which were handled over to Satellite, this is kept in temp files:

cat /tmp/foreman-proxy_192.168.220.0_24.tmp
56

This mechanism works, but if there are not enough free IP addresses it will cycle in DHCP range configured in dhcpd.conf for that subnet which can lead to conflicts.

Make sure you have enough free IPs in the give range. Check the temp file. I am unable to reproduce, can you attach proxy.log with debug turned on of a whole transaction which leads to conflict? Also attach information about which IP address caused the conflict, so we can track how free IPs were searched.

In general, to prevent this error make sure you use B or A class subnets with many free IPs for this kind of mass-creation.

Comment 11 Dmitri Dolguikh 2017-06-30 10:37:21 UTC
Created an upstream issue: http://projects.theforeman.org/issues/20173

Comment 12 Lukas Zapletal 2017-06-30 11:36:05 UTC
Ok Dmitri found the race condition in the code, I thought Webrick is single threaded but that's not the case anymore. We have a patch, see the PR above.

Since there is an easy workaround to slower down queries, I am setting this as 6.3 bug. If this needs to be backported create new BZ with 6.2.z flag.

Comment 13 Ivan Necas 2017-08-08 09:10:10 UTC
Connecting redmine issue http://projects.theforeman.org/issues/20173 from this bug

Comment 14 Lukas Zapletal 2017-08-08 14:20:48 UTC
Ivan, there are two issues actually. One in Proxy which was fixed by Dmitri, the other one in Foreman core which I am currently working on: 

http://projects.theforeman.org/issues/20474

Comment 16 Lukas Zapletal 2017-08-09 06:47:50 UTC
I plan to work on this, but haven't started yet. So perhaps NEW would be more appropriate, but I am not moving because the bot will change this anyway to POST.

That's right its two commits, both in proxy. There is another one in core but it is irrelevant (randomizing DB IPAM).

Comment 17 Satellite Program 2017-09-07 22:01:47 UTC
Upstream bug assigned to ddolguik

Comment 18 Satellite Program 2017-09-07 22:01:53 UTC
Upstream bug assigned to ddolguik

Comment 19 Satellite Program 2017-11-03 10:02:20 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20474 has been resolved.

Comment 20 Roman Plevka 2017-11-19 16:40:50 UTC
Failed QA:

I believe something went wrong with 
https://github.com/theforeman/smart-proxy/pull/543
as this is now causing a regression:


- the calls to proxy:9090/unused_ip made by satellite now exit with err 400:
wrong number of arguments (4 for 3)

(this happens on e.g. creating host)

The request looks like the following:
/dhcp/192.168.100.0/unused_ip?mac=de:ad:be:ef:fe:ed&from=192.168.100.10&to=192.168.100.254


the full traceback with the actual request and its parameters are:


D, [2017-11-19T11:26:35.038302 ] DEBUG -- : accept: 10.16.69.103:55582
D, [2017-11-19T11:26:35.039853 ] DEBUG -- : Rack::Handler::WEBrick is invoked.
D, [2017-11-19T11:26:35.041028 ] DEBUG -- : verifying remote client 10.16.69.103 against trusted_hosts ["blah.blah.com"]
E, [2017-11-19T11:26:35.041834 ] ERROR -- : wrong number of arguments (4 for 3)
D, [2017-11-19T11:26:35.041885 ] DEBUG -- : wrong number of arguments (4 for 3) (ArgumentError)
/usr/share/foreman-proxy/modules/dhcp_common/free_ips.rb:54:in `find_free_ip'
/usr/share/foreman-proxy/modules/dhcp_common/server.rb:99:in `unused_ip'
/usr/share/foreman-proxy/modules/dhcp/dhcp_api.rb:34:in `block in <class:DhcpApi>'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1611:in `call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1611:in `block in compile!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:975:in `[]'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:975:in `block (3 levels) in route!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:994:in `route_eval'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:975:in `block (2 levels) in route!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1015:in `block in process_route'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1013:in `catch'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1013:in `process_route'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:973:in `block in route!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:972:in `each'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:972:in `route!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1085:in `block in dispatch!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1067:in `block in invoke'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1067:in `catch'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1067:in `invoke'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1082:in `dispatch!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:907:in `block in call!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1067:in `block in invoke'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1067:in `catch'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1067:in `invoke'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:907:in `call!'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:895:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/methodoverride.rb:22:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/commonlogger.rb:33:in `call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:219:in `call'
/usr/share/foreman-proxy/lib/proxy/log.rb:109:in `call'
/usr/share/foreman-proxy/lib/proxy/request_id_middleware.rb:9:in `call'
/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in `call'
/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:in `call'
/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in `call'
/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in `call'
/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in `call'
/usr/share/gems/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/nulllogger.rb:9:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/head.rb:13:in `call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/show_exceptions.rb:25:in `call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:182:in `call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:2013:in `call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1487:in `block in call'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1787:in `synchronize'
/usr/share/gems/gems/sinatra-1.4.7/lib/sinatra/base.rb:1487:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/urlmap.rb:66:in `block in call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/urlmap.rb:50:in `each'
/usr/share/gems/gems/rack-1.6.4/lib/rack/urlmap.rb:50:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/builder.rb:153:in `call'
/usr/share/gems/gems/rack-1.6.4/lib/rack/handler/webrick.rb:88:in `service'
/usr/share/ruby/webrick/httpserver.rb:138:in `service'
/usr/share/ruby/webrick/httpserver.rb:94:in `run'
/usr/share/ruby/webrick/server.rb:295:in `block in start_thread'


I, [2017-11-19T11:26:35.042308 ]  INFO -- : 10.16.69.103 - - [19/Nov/2017:11:26:35 -0500] "GET /dhcp/192.168.100.0/unused_ip?mac=de:ad:be:ef:fe:ed&from=192.168.100.10&to=192.168.100.254 HTTP/1.1" 400 35 0.0015

D, [2017-11-19T11:26:35.082673 ] DEBUG -- : close: 10.16.69.103:55582

Comment 21 Ivan Necas 2017-11-20 14:00:49 UTC
Dmitri: this looks like a regression, that seems to be also in upstream, introduced by https://github.com/theforeman/smart-proxy/commit/54f44d807b8540fd3561a0d16ab68c68deb88c0f - different number of args for the find_free_ip method. Could you have a look?

Comment 22 Lukas Zapletal 2017-11-20 15:03:35 UTC
These two bugs are dupe, I will let Dmitri and QAs to decide which one to close as a dupe:

https://bugzilla.redhat.com/show_bug.cgi?id=1514885
https://bugzilla.redhat.com/show_bug.cgi?id=1459644

Comment 23 Dmitri Dolguikh 2017-11-20 20:38:40 UTC
This bug is separate from https://bugzilla.redhat.com/show_bug.cgi?id=1459644 AND is tracked separately upstream, I think this warrants a dedicated BZ.

Comment 24 Ivan Necas 2017-11-21 15:09:12 UTC
The fix for wrong number of arguments (4 for 3) is tracked under http://projects.theforeman.org/issues/21710/

Comment 25 Ivan Necas 2017-11-24 09:56:15 UTC
Putting to modified, as https://bugzilla.redhat.com/show_bug.cgi?id=1514885, that blocks this BZ was moved to Modified as well: once that one in ON_QA, we can move this one as well

Comment 27 Dmitri Dolguikh 2018-01-29 20:05:24 UTC
Smart-proxy randomizes ip addresses it returns when isc backend is used. Depending on the size of the pool it might be awhile before you see it again.

Comment 30 Dmitri Dolguikh 2018-02-06 18:25:59 UTC
The blackout interval was set to be sufficiently large for anyone to complete a manual host creation and such. It is a setting and can be adjusted in accordance with the pool size.

For a normal pool size (tens to hundreds of hosts) nothing needs to be adjusted. We have yet to hear any support requests related to the blackout interval too (although this feature is still new), which gives me some confidence in the 30min number.

Comment 31 Roman Plevka 2018-02-07 14:18:01 UTC
> For a normal pool size (tens to hundreds of hosts) nothing needs to be
> adjusted. We have yet to hear any support requests related to the blackout
> interval too (although this feature is still new), which gives me some
> confidence in the 30min number.

thanks for the info, Dmitri.

I'm putting this to VERIFIED now.
I did not hit the race condition after triggering simultaneous host.create requests.

Comment 32 Satellite Program 2018-02-21 16:54:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> 
> For information on the advisory, and where to find the updated files, follow the link below.
> 
> If the solution does not work for you, open a new bug report.
> 
> https://access.redhat.com/errata/RHSA-2018:0336