Bug 1554597
Summary: | Provisioning a host with a lease (discovered host) leads to Unable to set DHCP entry (409 Conflict) | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Sandeep MJ <sjayapra> |
Component: | Provisioning | Assignee: | Lukas Zapletal <lzap> |
Status: | CLOSED UPSTREAM | QA Contact: | Roman Plevka <rplevka> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.3.0 | CC: | ajambhul, aperotti, bbuckingham, bkearney, fgarciad, gapatil, hprakash, hshukla, inecas, jalviso, ktordeur, logank, lzap, mhulan, mina.asaad, mmccune, rchauhan, vgunasek, vijsingh, vvasilev |
Target Milestone: | Unspecified | Keywords: | Regression, Triaged |
Target Release: | Unused | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-08 13:32:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sandeep MJ
2018-03-13 01:06:11 UTC
I can confirm the analysis is correct, good work. The issue http://projects.theforeman.org/issues/19634 should be root cause of the problem. Except it was not merged in 1.15 but in 1.17 due to long review. There are TWO CHANGES associated with this problem: https://github.com/theforeman/foreman/pull/4555/files https://github.com/theforeman/smart-proxy/pull/532/files We need to backport both, they are reasonable changes for 6.3 z-stream and I consider this an important bug to fix in 6.3. Affected: all customers. WORKAROUND: Delete host, create new one Connecting redmine issue http://projects.theforeman.org/issues/19634 from this bug Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/19634 has been resolved. REL-ENG: Note there is a smart-proxy patch needed for the core part to work: https://github.com/theforeman/smart-proxy/pull/532 Any chance I could get the RPMs with the fix once they are available? I'm testing Satellite for deployment at my company and this is a rather annoying bug to deal with every time I rebuild a system. Support case: 02153260 Hello, we are currently increasing priority of this ticket and we will be able to create hotfix once it's aligned into z-stream release. Thanks after another round of investigation I've found that this particular BZ is fixed for both 6.4.0 (GA) and 6.3.0 (GA) as we rebased on Foreman 1.15.6.2 before the release of 6.3.0. Therefore if you see 409 DHCP conflicts, it is likely for a different reason. Please investigate and provide more details how to reproduce or provide an instance with reproducer. Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/19634 has been resolved. Steps to reproduce: 1. Create Host in Satellite 2. Fill out all the relevant fields 3. Build the host 4. Install it via PXE (probably not required, but this is what i've been doing) 5. Once it's built 6. Click Build again in Satellite on that host. 7. DHCP error 409 pops up every time. 8. To fix it, stop dhcpd, remove the hosts' entry in /var/lib/dhcpd/dhcpd.leases and restart dhcpd and then you can rebuild the host. Error: Failed to enable logank-test1.wolfram.com for installation: ["Create DHCP Settings for logank-test1.wolfram.com task failed with the following error: ERF12-6899 [ProxyAPI::ProxyException]: Unable to set DHCP entry ([RestClient::Conflict]: 409 Conflict) for Capsule https://satellite-tst2.wolfram.com:9090/dhcp", "Failed to perform rollback on Remove DHCP Settings for logank-test1.wolfram.com - ERF12-6899 [ProxyAPI::ProxyException]: Unable to set DHCP entry ([RestClient::Conflict]: 409 Conflict) for Capsule https://satellite-tst2.wolfram.com:9090/dhcp"] Let me know what other information you want me to provide... this is the same information that I provided initially though on my support ticket (02153260). In case it helps, this is the versions we're using: rpm -qa | grep -i satellite satellite-tst2.wolfram.com-foreman-proxy-client-1.0-1.noarch satellite-6.3.5-1.el7sat.noarch tfm-rubygem-foreman_theme_satellite-1.0.4.19-1.el7sat.noarch satellite-installer-6.3.0.12-1.el7sat.noarch satellite-tst2.wolfram.com-qpid-router-client-1.0-1.noarch satellite-tst2.wolfram.com-tomcat-1.0-1.noarch satellite-tst2.wolfram.com-qpid-broker-1.0-2.noarch satellite-tst2.wolfram.com-foreman-proxy-1.0-1.noarch satellite-tst2.wolfram.com-puppet-client-1.0-1.noarch satellite-tst2.wolfram.com-qpid-client-cert-1.0-1.noarch satellite-common-6.3.5-1.el7sat.noarch satellite-tst2.wolfram.com-apache-1.0-1.noarch satellite-cli-6.3.5-1.el7sat.noarch satellite-tst2.wolfram.com-foreman-client-1.0-1.noarch satellite-tst2.wolfram.com-qpid-router-server-1.0-1.noarch Logan, thanks for the detailed repro steps and helpful RPM list. However, I have just performed this and it works just fine. There must be something different on your end with configuration. I tested with: rpm -q satellite satellite-6.3.5-1.el7sat.noarch For the record here is the proxy log from the moment when I hit Build button again: I, [2019-03-08T13:17:19.803213 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:19 +0000] "GET /tftp/serverName HTTP/1.1" 200 30 0.0008 I, [2019-03-08T13:17:19.862013 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:19 +0000] "GET /dhcp/192.168.199.0/mac/52:54:00:60:60:01 HTTP/1.1" 200 247 0.0007 I, [2019-03-08T13:17:19.922321 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:19 +0000] "GET /dhcp/192.168.199.0/ip/192.168.199.154 HTTP/1.1" 200 249 0.0010 I, [2019-03-08T13:17:19.979843 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:19 +0000] "GET /dhcp/192.168.199.0/mac/52:54:00:60:60:01 HTTP/1.1" 200 247 0.0010 I, [2019-03-08T13:17:20.221370 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:20 +0000] "GET /unattended/templateServer HTTP/1.1" 200 46 0.0003 I, [2019-03-08T13:17:20.289092 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:20 +0000] "POST /tftp/PXELinux/52:54:00:60:60:01 HTTP/1.1" 200 - 0.0016 I, [2019-03-08T13:17:20.327690 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:20 +0000] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0018 I, [2019-03-08T13:17:20.349147 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:17:20 +0000] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0023 E, [2019-03-08T13:18:00.993864 ] ERROR -- : Attempt to remove nonexistent client certificate for boyd-boteilho.nat.lan I, [2019-03-08T13:18:00.994337 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:18:00 +0000] "DELETE /puppet/ca/boyd-boteilho.nat.lan HTTP/1.1" 404 74 1.2325 I, [2019-03-08T13:18:01.055048 ] INFO -- : 127.0.0.1 - - [08/Mar/2019:13:18:01 +0000] "POST /puppet/ca/autosign/boyd-boteilho.nat.lan HTTP/1.1" 200 - 0.0009 I, [2019-03-08T13:18:01.777388 ] INFO -- : 192.168.199.154 - - [08/Mar/2019:13:18:01 +0000] "GET /unattended/provision?token=d4142f85-1ff4-408d-b4fd-ae6493cb2fc1 HTTP/1.1" 200 4748 2.0773 Here is the Satellite log: 2019-03-08 13:17:19 19a447a8 [app] [I] Started PUT "/hosts/boyd-boteilho.nat.lan/setBuild?auth_object=boyd-boteilho.nat.lan&permission=build_hosts" for 192.168.199.1 at 2019-03-08 13:17:19 +0000 2019-03-08 13:17:19 19a447a8 [app] [I] Processing by HostsController#setBuild as HTML 2019-03-08 13:17:19 19a447a8 [app] [I] Parameters: {"utf8"=>"✓", "authenticity_token"=>"xxx", "commit"=>"Build", "auth_object"=>"boyd-boteilho.nat.lan", "permission"=>"build_hosts", "id"=>"boyd-boteilho.nat.lan"} 2019-03-08 13:17:19 19a447a8 [app] [I] Current user: admin (administrator) 2019-03-08 13:17:19 19a447a8 [app] [I] Expire fragment views/tabs_and_title_records-3 (0.1ms) 2019-03-08 13:17:19 19a447a8 [app] [I] Fetching DHCP reservation boyd-boteilho.nat.lan for boyd-boteilho.nat.lan-52:54:00:60:60:01/192.168.199.154 2019-03-08 13:17:20 19a447a8 [templates] [I] Rendering template 'Kickstart default PXELinux' 2019-03-08 13:17:20 19a447a8 [app] [I] Redirected to https://sat63.nat.lan/hosts/boyd-boteilho.nat.lan 2019-03-08 13:17:20 19a447a8 [app] [I] Completed 302 Found in 778ms (ActiveRecord: 79.1ms) As you can see, in my case DHCP record is (correctly) not being orchestrated at all. When user hit Build button, there is no need to rebuild DHCP because it's expected that the host can keep it's reserved IP address, therefore it only checks if it's present (Fetching DHCP reservation) and carries on. Now, in my case I have provisioned my host in a subnet managed by Satellite, therefore it was assigned and reserved an IP address: 192.168.199.154. This IP address haven't changed so everything works fine. Note that since the server is using reserved IP, there is no need for a lease, therefore no lease exists and make a conflict. However, there might be issues when IP or MAC address changes over time, either via Satellite or manually on that host. Let me know if you have performed anything like that. Having that said, the scenario you have described is not this bug, this one tracks problems when a host is discovered, a lease is created and then provisioned. This one was confirmed, fixed and verified. Let's work together in this BZ to identify the problem and we can create new BZ if needed. Right, I mean if you have a dhcp reservation set you wouldn't expect there to be a lease to conflict, but if you remove that reservation and it's assigned a lease in the normal fashion of a dhcp request, are you able to replicate the issue? Our setup does not have anything specific configured with DHCP, when it PXE boots, it requests a lease from the server. Once the server finishes installing, if I try to rebuild it, the lease is still there and Satellite can't delete it, producing the 409 error. I understand, I'm 100% fine moving this discussion to a new BZ if that is preferable, I just need this fixed so that i can put this server in production. So before I try, let me sum it up: - install Satellite 6.3 - setup provisioning for PXE - make sure that the default lease time is longer than Anaconda installation time (lease time defaults to 10 minutes so I'd probably need to extend that to 2 hours) - PXE install a host - once it completes, ensure the lease for the MAC address is still valid - click on build button - Satellite throws a DHCP conflict Once you confirm, I will try to reproduce. Please review and provide me any other details, because I am pretty sure this is pretty normal workflow which is being tested by Satellite QA department for each release and it will work, longer lease time might be deviation from our standard configuration tho. I'll be honest, i'm surprised it's a bug that exists, but here we are... I'll be thrilled to find out it's a configuration problem, but there really hasn't been a lot of configuration to the server. Your summary is correct, my dhcp lease is set at, i think the default at "default-lease-time 43200" If need be, I can do a screen recording of what I do and upload that as well. Ok I am reproducing now. Let's take the conversation back to the case, this BZ is not it. |