Bug 1177139

Summary: deleting a host doesn't remove its entry in dhcp.released which might cause problems with provisioning
Product: Red Hat Satellite Reporter: sefi litmanovich <slitmano>
Component: OtherAssignee: Ohad Levy <ohadlevy>
Status: CLOSED WONTFIX QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0.6CC: bkearney, cwelton, katello-qa-list, lzap, mmccune
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 14:17:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proxy.log + production.log + messages log
none
engine log none

Description sefi litmanovich 2014-12-24 12:36:31 UTC
Created attachment 972736 [details]
proxy.log + production.log + messages log

Description of problem:

when a host is discovered by satellite which is working as dhcp an entry is written in /var/lib/dhcpd/dhcpd.leases to reserve ip for the host and so on.
When I deleted the host in order to re discover it and in install on rhevm (at part of rhevm foreman intgeration plugin).
when tried to add the host on rhevm I get an error: 
Failed to add Host <UNKNOWN>.

looking at production log on satellite I can see:

Unprocessable entity Host::Discovered (id: 13):
  Name has already been taken
  IP address has already been taken

  Rendered api/v2/errors/unprocessable_entity.json.rabl within api/v2/layouts/error_layout (1.6ms)
Body: {
  "error": {"id":13,"errors":{"name":["has already been taken"],"ip":["has already been taken"]},"full_messages":["Nam
e has already been taken","IP address has already been taken"]}
}

deleting the host's entry from /var/lib/dhcpd/dhcpd.leases and restarting dhcpd as work around was successful and after that installation proceeded in rhevm.

This seems to be related to a bug opened on RHOS side:

https://bugzilla.redhat.com/show_bug.cgi?id=1107681 

Version-Release number of selected component (if applicable):

satellite:

Satelite 6.0.6
foreman-1.6.0.49-1.el6sat.noarch
foreman-proxy-1.6.0.30-1.el6sat.noarch

rhevm:

rhevm-3.5.0-0.26.el6ev.noarch


How reproducible:

always

Steps to Reproduce:
1. have satellite as dhcp/dns/tftp on a private network with some hosts.
2. install foreman-discovery plugin + Ovirt_provisioning plugin.
3. reboot a host in the network and chose foreman_discovery in pxe boot.
4. verify satellite discovered the host.
5. add satellite as an external provider to rhevm on the same network.
6. add host - chose external providers -> discovered hosts and chose the host and a working host group to configure to it.
7. click ok.

Actual results:

ERROR add host <UNKNOWN> failed.

Expected results:

host is provisioned on satellite (according to ovirt_provisioning_plugin).
upon successful provisioning host is installed in rhevm.

Additional info:

Comment 1 sefi litmanovich 2014-12-24 12:37:19 UTC
Created attachment 972737 [details]
engine log

Comment 2 RHEL Program Management 2014-12-24 12:54:05 UTC
Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 4 sefi litmanovich 2015-01-08 13:37:55 UTC
right now after several times trying to re do this flow with the work around, the wa isn't working when I try to install rhev-h.
after I re discover the host, I make sure it's deleted from satellite's host list, but appears in the discovered hosts list, alsi I delete the entry from dhcpd.leases and restart dhcpd service, then re check no entry was re written into the file. but still after all these steps I get a different error:

Failed to reboot: ERF12-1772 [ProxyAPI::ProxyException]: Unable to perform power BMC operation ([Errno::ECONNREFUSED]: Connection refused - connect(2)) for proxy http://{host_ip}:8443/bmc
Rolling back due to a problem: [Rebooting {host_fqdn}       10000   failed  [#<Host::Managed id: 48, name: "macd4ae52c61a0e.sefi.com", ip: "192.168.200.13", last_compile: nil, last_freshcheck: nil, last_report: "2015-01-08 13:13:17", updated_at: "2015-01-08 13:16:22", source_file_id: nil, created_at: "2015-01-08 13:13:17", mac: "d4:ae:52:c6:1a:0e", root_pass: "$1$W3YjO2Ek$vGJO/FY7.lde0DE5hH.2N1", serial: nil, puppet_status: 0, domain_id: 2, architecture_id: 1, operatingsystem_id: 5, environment_id: 1, subnet_id: 1, ptable_id: 7, medium_id: 10, build: true, comment: nil, disk: nil, installed_at: nil, model_id: 1, hostgroup_id: 4, owner_id: 3, owner_type: "User", enabled: true, puppet_ca_proxy_id: 1, managed: true, use_image: nil, image_file: nil, uuid: nil, compute_resource_id: nil, puppet_proxy_id: 1, certname: nil, image_id: nil, organization_id: 1, location_id: 2, type: "Host::Managed", otp: nil, realm_id: nil, compute_profile_id: nil, provision_method: nil, content_source_id: 1>, :setReboot]]
ActiveRecord::Rollback

I checked it out and this seems similar to this upstream foreman issue:
http://projects.theforeman.org/issues/7539

Is there already a satellite bz for this?

Comment 5 Bryan Kearney 2015-01-09 16:45:20 UTC
There is no downstream bug for 7539. We will investigate this and link them if necessary.

Comment 6 Ohad Levy 2015-02-17 13:48:28 UTC
there is no connection between the initial errors you are getting (duplicate ip/mac) and the dhcp lease configuration.

the dhcp lease configs are flagged as disabled, so that should not make any difference.

regarding the upstream bug - I assume this bug is invalid now that we moved away from ovirt-node?

Comment 7 Lukas Zapletal 2015-02-17 14:17:01 UTC
I am afraid ovirt-provisioning-plugin is not supported. Please reproduce with clean Satellite 6 installation (or better with upstream version) and report.