Bug 1372912

Summary: Provide rake task to help identify miss-entitled guests and hypervisors
Product: Red Hat Satellite Reporter: Kathryn Dixon <kdixon>
Component: Subscription ManagementAssignee: Justin Sherrill <jsherril>
Status: CLOSED ERRATA QA Contact: jcallaha
Severity: high Docs Contact:
Priority: urgent    
Version: 6.2.0CC: aperotti, bbuckingham, bcourt, bkearney, byount, cdonnell, egolov, emarquez, hsun, jalviso, jcallaha, jsherril, ktordeur, mjahangi, nitthoma, oshtaier, pwaghmar, rjerrido, scott.higgins, sgao, shihliu, snemeth, tsorense, xdmoon, zhunting
Target Milestone: UnspecifiedKeywords: PrioBumpGSS, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-katello-3.0.0.115-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1426392 (view as bug list) Environment:
Last Closed: 2017-05-01 13:54:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1399395, 1385841, 1426392    
Attachments:
Description Flags
virt_who_report for 6.2
none
virt_who_report for 6.2
none
virt_who_report for 6.1 none

Description Kathryn Dixon 2016-09-03 17:38:41 UTC
Description of problem:

upgrading to 6.2.1 changes all the host names from uuid or hostname to virt-who uuid or virt-who name

This is causing issues with duplication, stuck hypervisor tasks, host/guest mapping.

Also, in odder situations if you renamed your hosts in the webui from uuid to whatever name, then upgrade it seems you get triple the amount of hosts in the webui.

fake name
virt-who uuid
virt-who fake name

The derived skus get very confused. The derived skus are now named "derived sku of uuid" If you try to click on the hyperlink to see what that host actually is you get an error blank screen. Search in the db for that uuid name and it does exist.

Version-Release number of selected component (if applicable):


How reproducible: 100%


Steps to Reproduce:
1. virt-who running on satellite or on vm connected to satellite 6.1.9
2. upgrade to 6.2.1
3. hosts are now named differently

Actual results:
take a look at your content hosts, now all hosts are renamed by force through the upgrade process to virt-who name/uuid

Expected results:
 the upgrade should not force you to rename your hosts, esp for customers who have manually renamed them or used the hypervisor=hostname option in the virt-who.d config

Additional info: This is causing issues with subscriptions

looks like this part of the upgrade step does this 

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.73/db/migrate/20150930183738_migrate_content_hosts.rb

I'm guessing this was to help you be able to search and find your hosts in the webui. But why couldn't this have been done with a "type" search or a separation tab between actual content hosts and hosts.

Comment 7 Justin Sherrill 2016-10-13 19:46:00 UTC
*** Bug 1374788 has been marked as a duplicate of this bug. ***

Comment 16 Justin Sherrill 2016-10-31 19:57:44 UTC
Note that we received a reproducer where it was claimed that guests would not get the proper subscription when auto-attaching.  The issue was that the guests had a service level of 'premium' while the hypervisor had a subscription that only provided standard.  Keep that in mind when looking at auto-attach issues.

Comment 17 Justin Sherrill 2016-11-02 21:34:39 UTC
Related to this bz: https://bugzilla.redhat.com/show_bug.cgi?id=1391200

Comment 18 Justin Sherrill 2016-11-03 16:39:14 UTC
A current theory is that the root cause of the issue is that upon upgrading to Satellite 6.2, customers are also updating virt-who to a newer version which starts reporting the fqdn instead of just the shortname.  This would explain why after upgrade we see this problem. 


In addition one customer has seen duplicates reappearing after deleting them.  So far we have quite a bit of evidence from the customer that the information coming from virt-who will randomly flip between sending the shortname and fqdn of the hypervisor. 

Its unclear why this happening, but the evidence is as such:

/var/log/foreman/production.log shows virt-who checking in with:

"hostname.domain.com" => [{"guestId => "somehash"}]

and then some time later:

"hostname" => [{"guestId => "somehash"}]

Searching through the apache access logs these requests are all coming from the same IP address (hinting it was not coming from different virt-who versions).

Comment 23 Justin Sherrill 2016-12-02 16:11:43 UTC
After some discussion, given that virt-who is now sending fqdns instead of shortnames, the only thing we can do is write tooling to try to correct the problem.  Will start working on that.

Comment 26 Justin Sherrill 2016-12-07 16:19:47 UTC
I did some digging and after talking with csnyder and toledo I think i
can fully explain all the issues:

Issue 1)  After upgrade, there are duplicate hypervisors with names such
as 'virt-who-UUID-X' and 'virt-who-NAME-X'.

The cause of this as far as I can tell is that the users have changed
their virt-who config to include:

hypervisor_id = hostname

This causes virt-who to stop reporting hypervisor names as uuids and
start reporting the "host name" (what this means I will get into next). 
This can easily be hit on 6.1 and after looking at MCOM's db they have
actually already hit this issue and apparently had not noticed it.  I'm
continuing to work with Tom on the appropriate resolution.

Keep in mind that the name reported by virt-who is ALL the information
that candlepin and Satellite 6 have to uniquely identify a hypervisor. 
If what is reported changes, then the user gets an entirely new set of
hypervisors (duplicates).


Issue 2) After upgrade, there are duplicate hypervisors with names such
as 'virt-who-hostname-X' and 'virt-who-hostname.domain.com-X'.

If the user had set 'hypersivor_id = hostname', virt-who would be
reporting the hypervisor 'host name'.  In the 6.1 version of virt-who
(0.14), this 'host name' was the name directly reported by ESX.  From
what I've been told this is simply the name that the user gave the
hypervisor when it was being installed and has nothing whatsoever to do
with a DNS hostname or FQDN.  It seems that many times a user will put
in the shortname or fqdn as this name as convention.

Starting with virt-who 0.16
(https://github.com/virt-who/virt-who/commit/924e7ae1),  virt-who
stopped taking this ESX reported name and started calculating the
reported 'host name' using DNS information that ESX sends.  So instead
of reporting "my-fake-hypervisorname", it starts reporting
"hostname.domain.com".  Again this ends up causing candlepin and
Satellite 6 to create a new set of duplicate entries. 


I'm still working on tooling to help recover from these situations, but
given that we have no way of tying together old and new hypervisors (as
their names are different), I'm leaning towards a simple model of the
following steps:

* deleting all hypervisors
* re-running virt-who
* re-assigning VDCs subs to all hypervisors
* running autoheal on all guests

Some tooling should help this, but i'd be interested in hearing thoughts
around this process.  Most of this can be done manually via the UI in Satellite 6.2 already via bulk content host actions

Comment 28 Justin Sherrill 2016-12-13 18:28:24 UTC
Connecting redmine issue http://projects.theforeman.org/issues/17663 from this bug

Comment 29 Justin Sherrill 2016-12-13 18:59:16 UTC
As part of these scripts to help users reconcile duplicate hypervisors, I've attached a virt-who report which reports two things:

1) Registered Hypervisors without any entitlements
2) Guests that are consuming physical subscriptions

You can download the file 'virt_who_report-62.rake' and place it in /usr/share/foreman/lib/tasks/.  I will upload a version that works with Satellite 6.1 shortly.

Then run:

# foreman-rake katello:virt_who_report

You should see output such as:

76 hypervisor issues found.
406 guest issues found.
0 errors encountered.
Saving results to /tmp/virt_who_report20161213-13449-1j8r8wr/

In the specified directory will be three files with three reports (1 from above, 2 from above, and a list of errors encountered).  

You can get the output in CSV by running:

# foreman-rake katello:virt_who_report CSV=true

An example of a guest issue would looks like this:

hostname.example.com (162): Has physical entitlements: Red Hat Enterprise Linux Server with Smart Management, Standard (Physical or Virtual Nodes). Could not identify hypervisor.

If you want to 'ignore' a physical entitlement such as in the above example if this entitlement is meant to be used for guests, simply run:

# foreman-rake katello:virt_who_report IGNORE="Red Hat Enterprise Linux Server with Smart Management, Standard (Physical or Virtual Nodes)"

making sure to provide the exact name.  Multiple can be specified using a '|' separator:

# foreman-rake katello:virt_who_report IGNORE="Subscription A|Subscription B"

This script is fairly slow, taking about 8 minutes for ~2000 hosts.  The script will indiciate progress as it goes along.  If you want to run it for a subset of hosts, you can run:

# foreman-rake katello:virt_who_report LIMIT=200

which will run it on the first 200 hosts.  This is mostly beneficial for playing around with different ignore lists.

Comment 30 Justin Sherrill 2016-12-13 19:00:20 UTC
Created attachment 1231314 [details]
virt_who_report for 6.2

Comment 31 Bryan Kearney 2016-12-13 19:17:30 UTC
Upstream bug component is Candlepin

Comment 32 Justin Sherrill 2016-12-13 21:22:01 UTC
Created attachment 1231334 [details]
virt_who_report for 6.2

Comment 33 Justin Sherrill 2016-12-13 21:29:52 UTC
Created attachment 1231336 [details]
virt_who_report for 6.1

Adding 6.1 version of virt_who_report.rake

Comment 34 Bryan Kearney 2016-12-16 23:17:25 UTC
Upstream bug assigned to jsherril

Comment 35 Bryan Kearney 2016-12-16 23:17:33 UTC
Upstream bug assigned to jsherril

Comment 41 Satellite Program 2017-01-17 01:17:01 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17663 has been resolved.

Comment 47 Satellite Program 2017-02-23 21:09:20 UTC
Please add verifications steps for this bug to help QE verify

Comment 50 Justin Sherrill 2017-03-30 13:07:50 UTC
To clarify this for all followers of this bug:

* The original cause of this issue was twofold:
  1) Upgrading of virt-who between Satellite 6.1 and 6.2
  2) Switching from hypervisor=uuid (default) to hypervisor=hostname  in your virt-who config

You can read https://bugzilla.redhat.com/show_bug.cgi?id=1372912#c26 for more information.  

There is nothing the Satellite tool can do to remedy these problems.  Trying to fully correct it after it has occurred would be extremely tricky and some manual intervention is needed:


* deleting all hypervisors (achievable via content host bulk actions)
* re-running virt-who 
* re-assigning VDCs subs to all hypervisors (achievable via content host bulk actions)
* running autoheal on all guests (achievable via content host bulk actions)

With this bug we are shipping a script to help identify miss-entitled guests and un-entitled hypervisors.  You can read more about it here: https://bugzilla.redhat.com/show_bug.cgi?id=1372912#c29

Comment 51 jcallaha 2017-03-30 18:09:44 UTC
Verified in Satellite 6.2.9 Snap 2 based on the steps identified in #49.

Running the tool with 45 fully entitle hypervisors returned no issues.

-bash-4.1# foreman-rake katello:virt_who_report
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here
1/45
...
45/45
No issues found.


At this point, I removed the subscriptions from 3 of my hypervisors.


-bash-4.1# foreman-rake katello:virt_who_report
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here
1/45
...
45/45
3 hypervisor issues found.
0 guest issues found.
0 errors encountered.
Saving results to /tmp/virt_who_report20170330-32679-ufyhei/.

-bash-4.1# ll /tmp/virt_who_report20170330-32679-ufyhei/
total 4.0K
-rw-rw-r--. 1 foreman foreman   0 Mar 30 11:07 errors.txt
-rw-rw-r--. 1 foreman foreman   0 Mar 30 11:07 guests.txt
-rw-rw-r--. 1 foreman foreman 181 Mar 30 11:07 hypervisors.txt

-bash-4.1# cat /tmp/virt_who_report20170330-32679-ufyhei/hypervisors.txt 
virt-who-0a95a254.8caa.4b05.8f71.2cd3e3a0200a-1 (3): Has no entitlements
virt-who-0a95a254-1 (31): Has no entitlements
virt-who-0a95a254.domain.test.com-1 (37): Has no entitlements


At this point, I deleted a number of hypervisors, then added a new esx configuration to bring in more hypervisors. I then registered a guest to one of the new un-entitled hypervisors and added a physical subscription to it. This new conflicting guest was identified in the report.


-bash-4.1# foreman-rake katello:virt_who_report
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here
1/21
...
21/21
8 hypervisor issues found.
1 guest issues found.
0 errors encountered.
Saving results to /tmp/virt_who_report20170330-9944-8ep765/.

-bash-4.1# cat /tmp/virt_who_report20170330-9944-8ep765/guests.txt 
dhcp-10-12-211-60.hq.gsslab.rdu.redhat.com (52): Has physical entitlements: Red Hat Enterprise Linux Server, Premium (8 sockets) (Unlimited guests). Hypervisor identified as virt-who-nightwing.hq.gsslab.rdu.redhat.com-1 (48).


Finally, I entitled the guest's hypervisor and added the resulting guest subscription to the guest itself. As a result, both the hypervisor and guest were removed from the issues reported.

-bash-4.1# foreman-rake katello:virt_who_report
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here
1/21
...
21/21
7 hypervisor issues found.
0 guest issues found.
0 errors encountered.
Saving results to /tmp/virt_who_report20170330-11095-x8ywvn/.

Comment 53 errata-xmlrpc 2017-05-01 13:54:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191