Bug 1372912
Summary: | Provide rake task to help identify miss-entitled guests and hypervisors | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Kathryn Dixon <kdixon> | ||||||||
Component: | Subscription Management | Assignee: | Justin Sherrill <jsherril> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | jcallaha | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 6.2.0 | CC: | aperotti, bbuckingham, bcourt, bkearney, byount, cdonnell, egolov, emarquez, hsun, jalviso, jcallaha, jsherril, ktordeur, mjahangi, nitthoma, oshtaier, pwaghmar, rjerrido, scott.higgins, sgao, shihliu, snemeth, tsorense, xdmoon, zhunting | ||||||||
Target Milestone: | Unspecified | Keywords: | PrioBumpGSS, Triaged | ||||||||
Target Release: | Unused | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | tfm-rubygem-katello-3.0.0.115-1 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1426392 (view as bug list) | Environment: | |||||||||
Last Closed: | 2017-05-01 13:54:12 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1399395, 1385841, 1426392 | ||||||||||
Attachments: |
|
Description
Kathryn Dixon
2016-09-03 17:38:41 UTC
*** Bug 1374788 has been marked as a duplicate of this bug. *** Note that we received a reproducer where it was claimed that guests would not get the proper subscription when auto-attaching. The issue was that the guests had a service level of 'premium' while the hypervisor had a subscription that only provided standard. Keep that in mind when looking at auto-attach issues. Related to this bz: https://bugzilla.redhat.com/show_bug.cgi?id=1391200 A current theory is that the root cause of the issue is that upon upgrading to Satellite 6.2, customers are also updating virt-who to a newer version which starts reporting the fqdn instead of just the shortname. This would explain why after upgrade we see this problem. In addition one customer has seen duplicates reappearing after deleting them. So far we have quite a bit of evidence from the customer that the information coming from virt-who will randomly flip between sending the shortname and fqdn of the hypervisor. Its unclear why this happening, but the evidence is as such: /var/log/foreman/production.log shows virt-who checking in with: "hostname.domain.com" => [{"guestId => "somehash"}] and then some time later: "hostname" => [{"guestId => "somehash"}] Searching through the apache access logs these requests are all coming from the same IP address (hinting it was not coming from different virt-who versions). After some discussion, given that virt-who is now sending fqdns instead of shortnames, the only thing we can do is write tooling to try to correct the problem. Will start working on that. I did some digging and after talking with csnyder and toledo I think i can fully explain all the issues: Issue 1) After upgrade, there are duplicate hypervisors with names such as 'virt-who-UUID-X' and 'virt-who-NAME-X'. The cause of this as far as I can tell is that the users have changed their virt-who config to include: hypervisor_id = hostname This causes virt-who to stop reporting hypervisor names as uuids and start reporting the "host name" (what this means I will get into next). This can easily be hit on 6.1 and after looking at MCOM's db they have actually already hit this issue and apparently had not noticed it. I'm continuing to work with Tom on the appropriate resolution. Keep in mind that the name reported by virt-who is ALL the information that candlepin and Satellite 6 have to uniquely identify a hypervisor. If what is reported changes, then the user gets an entirely new set of hypervisors (duplicates). Issue 2) After upgrade, there are duplicate hypervisors with names such as 'virt-who-hostname-X' and 'virt-who-hostname.domain.com-X'. If the user had set 'hypersivor_id = hostname', virt-who would be reporting the hypervisor 'host name'. In the 6.1 version of virt-who (0.14), this 'host name' was the name directly reported by ESX. From what I've been told this is simply the name that the user gave the hypervisor when it was being installed and has nothing whatsoever to do with a DNS hostname or FQDN. It seems that many times a user will put in the shortname or fqdn as this name as convention. Starting with virt-who 0.16 (https://github.com/virt-who/virt-who/commit/924e7ae1), virt-who stopped taking this ESX reported name and started calculating the reported 'host name' using DNS information that ESX sends. So instead of reporting "my-fake-hypervisorname", it starts reporting "hostname.domain.com". Again this ends up causing candlepin and Satellite 6 to create a new set of duplicate entries. I'm still working on tooling to help recover from these situations, but given that we have no way of tying together old and new hypervisors (as their names are different), I'm leaning towards a simple model of the following steps: * deleting all hypervisors * re-running virt-who * re-assigning VDCs subs to all hypervisors * running autoheal on all guests Some tooling should help this, but i'd be interested in hearing thoughts around this process. Most of this can be done manually via the UI in Satellite 6.2 already via bulk content host actions Connecting redmine issue http://projects.theforeman.org/issues/17663 from this bug As part of these scripts to help users reconcile duplicate hypervisors, I've attached a virt-who report which reports two things: 1) Registered Hypervisors without any entitlements 2) Guests that are consuming physical subscriptions You can download the file 'virt_who_report-62.rake' and place it in /usr/share/foreman/lib/tasks/. I will upload a version that works with Satellite 6.1 shortly. Then run: # foreman-rake katello:virt_who_report You should see output such as: 76 hypervisor issues found. 406 guest issues found. 0 errors encountered. Saving results to /tmp/virt_who_report20161213-13449-1j8r8wr/ In the specified directory will be three files with three reports (1 from above, 2 from above, and a list of errors encountered). You can get the output in CSV by running: # foreman-rake katello:virt_who_report CSV=true An example of a guest issue would looks like this: hostname.example.com (162): Has physical entitlements: Red Hat Enterprise Linux Server with Smart Management, Standard (Physical or Virtual Nodes). Could not identify hypervisor. If you want to 'ignore' a physical entitlement such as in the above example if this entitlement is meant to be used for guests, simply run: # foreman-rake katello:virt_who_report IGNORE="Red Hat Enterprise Linux Server with Smart Management, Standard (Physical or Virtual Nodes)" making sure to provide the exact name. Multiple can be specified using a '|' separator: # foreman-rake katello:virt_who_report IGNORE="Subscription A|Subscription B" This script is fairly slow, taking about 8 minutes for ~2000 hosts. The script will indiciate progress as it goes along. If you want to run it for a subset of hosts, you can run: # foreman-rake katello:virt_who_report LIMIT=200 which will run it on the first 200 hosts. This is mostly beneficial for playing around with different ignore lists. Created attachment 1231314 [details]
virt_who_report for 6.2
Upstream bug component is Candlepin Created attachment 1231334 [details]
virt_who_report for 6.2
Created attachment 1231336 [details]
virt_who_report for 6.1
Adding 6.1 version of virt_who_report.rake
Upstream bug assigned to jsherril Upstream bug assigned to jsherril Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/17663 has been resolved. Please add verifications steps for this bug to help QE verify To clarify this for all followers of this bug: * The original cause of this issue was twofold: 1) Upgrading of virt-who between Satellite 6.1 and 6.2 2) Switching from hypervisor=uuid (default) to hypervisor=hostname in your virt-who config You can read https://bugzilla.redhat.com/show_bug.cgi?id=1372912#c26 for more information. There is nothing the Satellite tool can do to remedy these problems. Trying to fully correct it after it has occurred would be extremely tricky and some manual intervention is needed: * deleting all hypervisors (achievable via content host bulk actions) * re-running virt-who * re-assigning VDCs subs to all hypervisors (achievable via content host bulk actions) * running autoheal on all guests (achievable via content host bulk actions) With this bug we are shipping a script to help identify miss-entitled guests and un-entitled hypervisors. You can read more about it here: https://bugzilla.redhat.com/show_bug.cgi?id=1372912#c29 Verified in Satellite 6.2.9 Snap 2 based on the steps identified in #49. Running the tool with 45 fully entitle hypervisors returned no issues. -bash-4.1# foreman-rake katello:virt_who_report /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here 1/45 ... 45/45 No issues found. At this point, I removed the subscriptions from 3 of my hypervisors. -bash-4.1# foreman-rake katello:virt_who_report /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here 1/45 ... 45/45 3 hypervisor issues found. 0 guest issues found. 0 errors encountered. Saving results to /tmp/virt_who_report20170330-32679-ufyhei/. -bash-4.1# ll /tmp/virt_who_report20170330-32679-ufyhei/ total 4.0K -rw-rw-r--. 1 foreman foreman 0 Mar 30 11:07 errors.txt -rw-rw-r--. 1 foreman foreman 0 Mar 30 11:07 guests.txt -rw-rw-r--. 1 foreman foreman 181 Mar 30 11:07 hypervisors.txt -bash-4.1# cat /tmp/virt_who_report20170330-32679-ufyhei/hypervisors.txt virt-who-0a95a254.8caa.4b05.8f71.2cd3e3a0200a-1 (3): Has no entitlements virt-who-0a95a254-1 (31): Has no entitlements virt-who-0a95a254.domain.test.com-1 (37): Has no entitlements At this point, I deleted a number of hypervisors, then added a new esx configuration to bring in more hypervisors. I then registered a guest to one of the new un-entitled hypervisors and added a physical subscription to it. This new conflicting guest was identified in the report. -bash-4.1# foreman-rake katello:virt_who_report /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here 1/21 ... 21/21 8 hypervisor issues found. 1 guest issues found. 0 errors encountered. Saving results to /tmp/virt_who_report20170330-9944-8ep765/. -bash-4.1# cat /tmp/virt_who_report20170330-9944-8ep765/guests.txt dhcp-10-12-211-60.hq.gsslab.rdu.redhat.com (52): Has physical entitlements: Red Hat Enterprise Linux Server, Premium (8 sockets) (Unlimited guests). Hypervisor identified as virt-who-nightwing.hq.gsslab.rdu.redhat.com-1 (48). Finally, I entitled the guest's hypervisor and added the resulting guest subscription to the guest itself. As a result, both the hypervisor and guest were removed from the issues reported. -bash-4.1# foreman-rake katello:virt_who_report /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_theme_satellite-0.1.42/app/models/concerns/satellite_packages.rb:4: warning: already initialized constant Katello::Ping::PACKAGES /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.117/app/models/katello/ping.rb:7: warning: previous definition of PACKAGES was here 1/21 ... 21/21 7 hypervisor issues found. 0 guest issues found. 0 errors encountered. Saving results to /tmp/virt_who_report20170330-11095-x8ywvn/. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1191 |