Bug 1417757

Summary: CF fails to provider discover RHV4.0
Product: Red Hat CloudForms Management Engine Reporter: Satoe Imaishi <simaishi>
Component: ProvidersAssignee: Juan Hernández <juan.hernandez>
Status: CLOSED ERRATA QA Contact: Ilanit Stein <istein>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.6.0CC: cbudzilo, cpelland, istein, jfrey, jhardy, juan.hernandez, masayag, mbetak, mhild, obarenbo, oourfali, simaishi
Target Milestone: GAKeywords: ZStream
Target Release: 5.7.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: rhev:discovery
Fixed In Version: 5.7.2.0 Doc Type: Release Note
Doc Text:
This release corrects an issue with RHV server refusing to authenticate requests that use the IP address instead of the fully qualified host name. The RHV provider has been modified so that when it receives an IP address instead of a fully qualified host name, it will try to find the corresponding fully qualified host name, doing a reverse DNS lookup if required. If a user does not want to use DNS, the RHV server can be explicitly configured to accept IP addresses.
Story Points: ---
Clone Of: 1382732 Environment:
Last Closed: 2017-04-12 14:36:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: RHEVM Target Upstream Version:
Embargoed:
Bug Depends On: 1382732    
Bug Blocks:    

Comment 2 Ilanit Stein 2017-02-02 10:04:09 UTC
Tested on CFME-5.7.1.0 & RHV-4.0.5.

Indeed RHV provider is now discovered,
but after the discovery of the provider, and adding it by CFME, 
the provider refresh is not done.

jhernand:
"This is because access with IP addresses doesn't
work in 4.0, it is a side effect of changes in the SSO service. 
This needs careful investigation, and we may need to do reverse lookups of
the addresses in order to find the host name. "

Oved,
How would you like to handle this please?

Comment 3 Oved Ourfali 2017-02-03 07:25:47 UTC
After discovery the provider is added via ip? 
Seems weird to me to have a provider defined via operating rather than via fqdn.

Comment 4 Ilanit Stein 2017-02-03 12:47:47 UTC
For the discovery, ip addresses range is provided.
After RHV is discovered, it is added with name 
"RHEV-M(<The ip address of the RHV>)", 
and the hostname is <The ip address of the RHV>


Regardless to the discovery, it is possible on CFME to add a RHV provider, by using for hostname the ip address, instead of FQDN, for RHV-3.6, or bellow.

Comment 5 Oved Ourfali 2017-02-06 08:36:36 UTC
Juan - any thoughts on the complexity to get the FQDN?
Any issues there?
I guess there may be different DNS settings that might have issues with it.

Comment 6 Juan Hernández 2017-02-06 11:21:46 UTC
Getting the FQDN isn't complex, if we assume that the user has a working DNS setup. We know that this tends to be false. My suggestion is to try the reverse DNS lookup, but use the IP address anyhow if that fails. That is what the proposed patch does:

  Resolve oVirt IP addresses
  https://github.com/ManageIQ/manageiq/pull/13767

As there may be cases where the user really wants to use the IP address, the proposed patch also adds a configuration parameter to disable this reverse resolving:

  :ems:
    :ems_redhat:
      :resolve_ip_addresses: true

Comment 7 Oved Ourfali 2017-02-06 11:38:02 UTC
Makes sense.

Comment 10 Juan Hernández 2017-02-20 08:43:34 UTC
The pull request is this one:

  Resolve oVirt IP addresses
  https://github.com/ManageIQ/manageiq/pull/13767

It is merged upstream and marked to be backported with the 'euwe/yes' label.

Comment 14 Juan Hernández 2017-03-27 11:13:12 UTC
Ilanit, I believe that the message that you see in the log:

  [ActiveRecord::RecordInvalid]: Validation failed: Host Name has already been taken  Method:[rescue in block in refresh]

Is caused because ManageIQ validates that the 'name' attribute of the 'Host' entity is unique:

  https://github.com/ManageIQ/manageiq/blob/euwe-2/app/models/host.rb#L42

If I understand that correctly then this may happen if you have the same oVirt environment added as provider twice, maybe once with the IP address of the engine and another time with the fully qualified host name of the engine. Do you have that?

It may also happen if you have different oVirt environments added as proiders, and they happen to have different hosts that have the same name. For example, I can have a host named 'myhost' in one oVirt engine, and onother host also named 'myhost' in a different oVirt engine. If you add those two oVirt environments as providers to ManageIQ, then you will see this problem when trying to save the inventory. Do you have such configuration? If this is the rout cause of the problem, then I'd say it is a different bug, one which will happen with or without discovery.

Marcel, can you confirm/reject the above hypothesis?

Comment 15 Ilanit Stein 2017-03-27 12:36:14 UTC
Thanks Juan for the explanation.

Adding CFME, that is connected to a RHV env, again the the same RHV env, using ip address, fail refresh the same,
and thus indeed the problem mentioned in comment #13 is unrelated to the Provider Discovery.

Thus moving bug to Verified.

Opened this bug, for having no error in the case described in comment #13:
bug 1436199

Comment 16 Marcel Hild 2017-03-27 15:27:02 UTC
Juan and Ilanit, that is right. A host.name has to be uniq across the cfme db.

I'm investigating if this is still a valid assesment

Comment 17 Marcel Hild 2017-03-27 15:43:35 UTC
Juan, actually this exception with a hostname already taken should not be raised. 

https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_infra.rb#L179-L184

I think the reason it has to be unique cross ems is that we use it to "steal" archived hosts from old EMSs

https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_infra.rb#L149

and

https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_infra.rb#L376-L384


Could you re-visited the backtrace under that light?
Maybe its still related to connecting to the same env twice

Comment 18 Juan Hernández 2017-03-28 09:01:51 UTC
Marcel, according to the backtrace the exception happens here, when calling ems.save!:

  https://github.com/ManageIQ/manageiq/blob/euwe-2/app/models/ems_refresh/save_inventory_infra.rb#L74

And that happens after calling the 'save_hosts_inventory' method, which is the place where the exception is caught and handled.

So, if I understand correctly, this happens when ActiveRecord automatically persist the EMS to host relationship:

  https://github.com/ManageIQ/manageiq/blob/euwe-2/app/models/ext_management_system.rb#L34

The exception isn't handled in this case.

Comment 19 Marcel Hild 2017-03-28 09:55:37 UTC
Yes, but it should not happen, because https://github.com/ManageIQ/manageiq/blob/master/app/models/ems_refresh/save_inventory_infra.rb#L149 assigns it to the previous ems. All in all the code tries very hard to find the host with the name in question. 
So there might be a hidden bug - but if its not easy to reproduce (e.g. with a db dump that exhibits this) then I would not investigate further, too.

Comment 20 Juan Hernández 2017-03-28 10:18:18 UTC
I just reproduced it with the latest master. Just added the same oVirt system twice, first with a host name and then with an IP address. This is what I see in 'evm.log':

---8<---
[----] E, [2017-03-28T12:00:39.875827 #12552:2acb8866311c] ERROR -- : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresh::Strategies::Api3#refresh) EMS: [192.168.122.18], id: [2] Refresh failed
[----] E, [2017-03-28T12:00:39.875994 #12552:2acb8866311c] ERROR -- : [ActiveRecord::RecordInvalid]: Validation failed: Host Name has to be unique per provider type  Method:[rescue in block in refresh]
[----] E, [2017-03-28T12:00:39.876077 #12552:2acb8866311c] ERROR -- : /files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/validations.rb:78:in `raise_validation_error'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/validations.rb:50:in `save!'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/attribute_methods/dirty.rb:30:in `save!'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/transactions.rb:324:in `block in save!'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/transactions.rb:395:in `block in with_transaction_returning_status'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/connection_adapters/abstract/database_statements.rb:232:in `block in transaction'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/connection_adapters/abstract/transaction.rb:189:in `within_new_transaction'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/connection_adapters/abstract/database_statements.rb:232:in `transaction'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/transactions.rb:211:in `transaction'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/transactions.rb:392:in `with_transaction_returning_status'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/transactions.rb:324:in `save!'
/files/rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/activerecord-5.0.2/lib/active_record/suppressor.rb:45:in `save!'
/files/projects/ManageIQ/manageiq/app/models/ems_refresh/save_inventory_infra.rb:74:in `save_ems_infra_inventory'
/files/projects/ManageIQ/manageiq/app/models/ems_refresh/save_inventory.rb:14:in `save_ems_inventory'
/files/projects/ManageIQ/manageiq/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:156:in `save_inventory'
/files/projects/ManageIQ/manageiq/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:91:in `block in refresh_targets_for_ems'
--->8---

The messages is slightly different, it was changed in this commit:

  https://github.com/ManageIQ/manageiq/pull/12912

From that I understand that the message we saw before was not really about a host name already taken, but about the *endpoint* host name already taken. So this is happening just because there are two providers with the same host name. I guess there is a point where this is validated, before actually adding the provider. What is most likely happening is that the validation is performed *before* we do the reverse lookup to convert the IP address to a name, so it passes, because it compares the host name used by the previously existing provider with the IP addrss of the new provier. Later, when doing the refresh it fails, because we try to update the database with the resolved name.

Marcel, if that is the case, we can either do the name resolving before that initial validation, or else avoid updating the database after resolving. What do you suggest?

Comment 21 Marcel Hild 2017-03-28 12:20:02 UTC
Good find - and luckily we changed the error message.

Where in the refresh are we changing the hostname of an endpoint? I would not expect this...

Comment 23 errata-xmlrpc 2017-04-12 14:36:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:0898