Bug 1756955 - virt-who hypervisor update may cause rhsm certs check to stuck for several minutes which will lead to 503 or connection timeout
Summary: virt-who hypervisor update may cause rhsm certs check to stuck for several mi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Subscription Management
Version: 6.5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 6.7.0
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-30 09:41 UTC by Hao Chang Yu
Modified: 2023-10-06 18:37 UTC (History)
12 users (show)

Fixed In Version: rubygem-katello-3.14.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1791492 (view as bug list)
Environment:
Last Closed: 2020-04-14 13:26:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hotfix RPM for Satellite 6.6 (15.66 MB, application/x-rpm)
2020-01-14 17:52 UTC, wclark
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 27974 0 High Closed virt-who hypervisor update may cause rhsm certs check to stuck for several minutes which will lead to 503 or connection ... 2020-08-14 18:44:40 UTC
Github Katello katello pull 8370 0 'None' closed Fixes #27974 - RHSM checkin requests stuck for long time 2020-08-14 18:44:39 UTC
Red Hat Product Errata RHSA-2020:1454 0 None None None 2020-04-14 13:26:11 UTC

Description Hao Chang Yu 2019-09-30 09:41:51 UTC
Description of problem:
I think there may be a regression in the following commit.

https://github.com/Katello/katello/commit/81530a06de177a78275b229d0ec491579ce016f4#diff-bf897becee6d218f2e9b589c5f66dcfdR21

The transaction can be huge and takes time to commit(I guess) if there are many hosts with thousands of guests to update. It seems that during the commit, most of the rows in  katello_subscription_facet table are locked due to the following line. If I comment out this line from my reproducer, the "/rhsm/<uuid>/certificates/serials requests didn't get block while the hypervisor update is running.

https://github.com/Katello/katello/blob/master/app/models/katello/host/subscription_facet.rb#L131


To minimize to performance issue, I think we may need to move the transaction to under each host or remove the transaction completely.

For example:
@hosts.each do |uuid, host|
  ActiveRecord::Base.transaction do
    update_subscription_facet(uuid, host)
  end
end


How reproducible:
I use a stupid way to reproduce the issue so it might not be accurate to reflect the real environment.

1. I modified the code to run the update 100 times within the transaction.

  ActiveRecord::Base.transaction do
    100.times do
      @hosts.each do |uuid, host|
        update_subscription_facet(uuid, host)
      end
    end
  end

2. And then trigger the "virt-who -do"

3. On the Satellite, run the following to capture the passenger requests

watch passenger-status --show=requests

4. On a content host run the request many times until it is blocked.

curl -k --cert /etc/pki/consumer/cert.pem --key /etc/pki/consumer/key.pem https://my_satellite_fqdn/rhsm/consumers/<uuid>/certificates/serials


Actual results:
RHSM certs checks request is stuck

passenger-status --show=requests
Version : 4.0.18
Date    : 2019-09-30 14:34:34 +1000
Instance: 30428
1 clients:
  Client 19:
    host                        = my_satellite.com
    uri                         = /rhsm/consumers/a40cc335-8ba9-481c-8d10-59bc5420601a/certificates/serials
    connected at                = 2019-09-30 14:33:50 (43 sec ago)
    state                       = FORWARDING_BODY_TO_APP


Expected results:
RHSM certs checks request should process quicker.

Comment 9 Chris Snyder 2019-10-17 15:15:22 UTC

*** This bug has been marked as a duplicate of bug 1600201 ***

Comment 12 Barnaby Court 2019-10-28 19:12:50 UTC
Moving to subscription management component as per comment 10 this BZ is being used to track a katello side issue.

Comment 21 Bryan Kearney 2019-11-14 17:01:10 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/27974 has been resolved.

Comment 25 wclark 2020-01-14 17:52:21 UTC
Created attachment 1652270 [details]
hotfix RPM for Satellite 6.6

Comment 26 wclark 2020-01-14 17:56:15 UTC
Hotfix RPM is available for Satellite 6.6.1. To install it:

1. Take a snapshot or complete backup of Satellite server

2. Download the attached hotfix RPM and copy it to Satellite server

3. # satellite-maintain packages unlock

4. # yum install tfm-rubygem-katello-3.12.0.30-2.HOTFIXRHBZ1756955.el7sat.noarch.rpm

5. # satellite-maintain packages lock

6. # systemctl restart httpd

Comment 27 jcallaha 2020-02-05 21:49:27 UTC
Verified in Satellite 6.7 Snap 10

Approximately followed the reproducer steps found in the original bug.

After performing the setup modifications, I looped the cert check 1000 times. 

Each completed without any issues in an average runtime of 0.35s

Comment 30 errata-xmlrpc 2020-04-14 13:26:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1454


Note You need to log in before you can comment on or make changes to this bug.