Bug 1633252 - Candlepin throws 500 Internal Server Error for more than 40+ guests
Summary: Candlepin throws 500 Internal Server Error for more than 40+ guests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: Unspecified
Assignee: Barnaby Court
QA Contact: jcallaha
URL:
Whiteboard:
Depends On: 1632764 1635807
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-26 14:05 UTC by Mike McCune
Modified: 2022-03-13 15:37 UTC (History)
14 users (show)

Fixed In Version: tfm-rubygem-katello-3.4.5.85-1,tfm-rubygem-katello-3.4.5.86-1,candlepin-2.1.24-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1631590
Environment:
Last Closed: 2018-10-11 15:18:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 25026 0 Normal Closed Candlepin throws 500 Internal Server Error for more than 40+ guests 2021-01-13 19:17:23 UTC
Foreman Issue Tracker 25060 0 Normal Closed do not send large bulk requests for consumers as part of virt-who check in 2021-01-13 19:16:42 UTC
Github candlepin candlepin pull 2120 0 'None' closed [M] 1631590: Added limits to the size of link headers (ENT-893) 2021-01-13 19:16:42 UTC
Red Hat Product Errata RHBA-2018:2915 0 None None None 2018-10-11 15:18:30 UTC

Description Mike McCune 2018-09-26 14:05:30 UTC
+++ This bug was initially created as a clone of Bug #1631590 +++

Copy the issue from Jitendra Yejare <jyejare>,

Trying to post 40 virt-who guests to /rhsm/hypervisors but Satellite has thrown ISE Error and the guests were not posted.

Please see production.logs [1]  and candlepin.logs [2] or attachment.

[1] : https://pastebin.com/VjeZgjG5
[2] : https://pastebin.com/ceyTb1i7

--- Additional comment from hsun on 20180921T02:05:42

Created attachment 1485370 [details]
virtwho_client_concurrency_erroed

--- Additional comment from hsun on 20180921T02:13:46

We didn't use a large number of guests for virt-who + satellite6.4 testing before, according to the log message, it seems this issue is caused by the buffer size or maxHttpHeaderSize of candlepin or apache settings. 

2018-09-20 11:03:32,875 [thread=http-bio-8443-exec-10] [req=c8d4b309-a10b-41cf-ada1-24c9d489fe66, org=, csid=] ERROR org.candlepin.common.exceptions.mappers.CandlepinExceptionMapper - Runtime Error An attempt was made to write more data to the response headers than there was room available in the buffer. Increase maxHttpHeaderSize on the connector or write less data into the response headers. at org.apache.coyote.http11.AbstractOutputBuffer.checkLengthBeforeWrite:547
org.apache.coyote.http11.HeadersTooLargeException: An attempt was made to write more data to the response headers than there was room available in the buffer. Increase maxHttpHeaderSize on the connector or write less data into the response headers.


@kevin, could you help to check this? or any suggestion? Thanks.

--- Additional comment from hsun on 20180921T02:17:46

Hi, Jitendra Yejare,

Could you help to provide the below detailed info:

1. virt-who package version

2. /var/log/rhsm.log

--- Additional comment from jyejare on 20180921T08:00:39

Hi Eko,


The Package Versions:

virt-what-1.18-4.el7.x86_64
tfm-rubygem-hammer_cli_foreman_virt_who_configure-0.0.3-2.el7sat.noarch
tfm-rubygem-foreman_virt_who_configure-0.2.2-1.el7sat.noarch



Also, I don't see any logs under satellites /var/log/rhsm/rhsm.log during posting.



Note:
------
I can easily post upto 30 virt guests, but the issue comes when it is more than or equal to 40.

--- Additional comment from hsun on 20180921T08:18:08

Hi Jitendra,
it's not virt-what package, it should be virt-who, please check again.

and /var/log/rhsm/rhsm.log should be in the host which virt-who was installed.

--- Additional comment from hsun on 20180921T08:55:28

After discussing with Jitendra, virt-who package never be installed and used for this issue, so I'm afraid it's not a virt-who bug.

According to the error log message, I will move it to the candlepin component to check again.

--- Additional comment from pm-rhel on 20180921T15:15:41

Since this issue was entered in Red Hat Bugzilla, the pm_ack has been
set to + automatically for the next planned release

--- Additional comment from mmccune on 20180921T20:58:34

This does appear to be natively in Candlepin, after posting directly to the virt-who API endpoint the error occurs easily. 

We can tune around this via configs in server.xml, I had to get up into the 100MB range before 100 hosts would work:

               maxHttpHeaderSize="100000"

that is quite a bit more than the default of 4MB which indicates to me that something isn't really working correctly in formulating the response back to the API call.

Customers with 1000 virtual machines in a virt-who transaction would overwhelm the server and require 1G+ ram in the response which .. is a bit much for a HTTP response.

We need to examine how we are formulating the response headers in this API call.

--- Additional comment from mmccune on 20180921T21:04:17

Reproducer info:

I used https://hub.docker.com/r/jacobcallahan/genvirt/ to generate the API call.

--- Additional comment from crog on 20180921T21:57:34

As an aside to this issue, the request which lead to the error in question will not work as expected.

The intent of the request looks to be to fetch a collection of consumers by UUID. However, UUIDs are provided in a doubly-URI-encoded, JSON-serialized array, rather than the expected format of "?uuid=:uuid1&uuid=:uuid2&...&uuid=:uuidN".

I've posted a patch to address the HeadersTooLargeException, though if the request being issued to Candlepin is not fixed, the core of this bug (not getting consumer details back) will persist.

--- Additional comment from bbuckingham on 20180924T13:08:25

It appears that the following commit may have introduced the behavior observed in the URL:

https://github.com/Katello/katello/commit/e0d2c8790718e5aaf366a76b33ee446c06999bde#diff-bf897becee6d218f2e9b589c5f66dcfdR37

--- Additional comment from bbuckingham on 20180924T13:56:18

Based on comment 10 and irc discussion, we are going to re-use this same BZ to include a fix for the katello part.  To support that, I am going to move the bugzilla back to ASSIGNED and over to Justin, as he is investigating on katello.

--- Additional comment from jsherril on 20180924T16:33:29

Created redmine issue https://projects.theforeman.org/issues/25026 from this bug

--- Additional comment from bbuckingham on 20180924T20:33:47

The upstream katello PR has now merged as well.  Moving the BZ to POST.

--- Additional comment from sat6-jenkins on 20180925T18:56:31

build status: succeeded

brew:
 * rubygem-katello: closed - https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=18494435

Comment 1 Mike McCune 2018-09-26 14:06:53 UTC
We found this regression in 6.3.4 builds so pulling this back from 6.4 into 6.3 as well.

Comment 4 Justin Sherrill 2018-10-03 18:26:08 UTC
Moved to post, as additional upstream issue is needed (and now attached)

Comment 5 Patrick Creech 2018-10-03 19:17:38 UTC
Apologies, I should've caught that it was missing an issue.

Comment 7 Mike McCune 2018-10-04 21:29:47 UTC
NOTE, even with the build and fixes in tfm-rubygem-katello-3.4.5.86-1.el7sat we are still seeing the same error.

this will likely move back to ASSIGNED.

Comment 8 Mike McCune 2018-10-07 14:57:59 UTC
ignore above comment #7, was missing the candlepin build in my test. 

after getting all the packages from the latest snap, this worked fine with 5k hosts in the test

Comment 9 jcallaha 2018-10-08 15:45:19 UTC
Verified in Satellite 6.3.4 Snap 4

The endpoint can now accept at least 5000 hypervisors.

#  docker run --rm -e "SATHOST=my.sat.host" -e "COUNT=5000" jacobcallahan/genvirt
Adding satellite certificate http://my.sat.host/pub/katello-ca-consumer-latest.noarch.rpm
Retrieving http://my.sat.host/pub/katello-ca-consumer-latest.noarch.rpm
Preparing...                          ########################################
Updating / installing...
katello-ca-consumer-my.sat########################################
No registration details specified. Registering to Default_Organization and Library...
Registering to: my.sat.host:443/rhsm
The system has been registered with ID: 61d8d268-179c-40a0-8139-e2194514d225 
genvirt.py
ks-script-q6TWGF
startup.sh
yum.log
Generating data with 5000 hosts.
Submitting data to my.sat.host. This may take a while...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  771k    0     4  100  771k      0    285  0:46:11  0:46:06  0:00:05     0
nullUnregistering from Satellite
Unregistering from: my.sat.host:443/rhsm
System has been unregistered.
Done!

Comment 10 Pavel Moravec 2018-10-08 18:44:58 UTC
Reproducing this on a customer data, I see warnings in candlepin's error log:

2018-10-08 20:38:09,840 [thread=http-bio-8443-exec-23] [req=64ba1470-a6f4-4bb9-84e1-c928e9101f45, org=, csid=] WARN  org.candlepin.common.resteasy.filter.LinkHeaderResponseFilter - Link length exceeded maximum length (1024). Link headers will be omitted from this response.
org.candlepin.common.resteasy.filter.LinkTooLongException: https://localhost:8443/candlepin/consumers/?uuid=d3e657c4-df03-430f-97d2-e9dfede9ce22&uuid=.....(long-list-here).....&uuid=908b629e-8c68-44ab-a706-2813519fea23&page=1

While candlepin responds with 200 return code.

Is the response complete / as katello expects?

Comment 11 Pavel Moravec 2018-10-08 20:23:19 UTC
(In reply to Pavel Moravec from comment #10)
> Reproducing this on a customer data, I see warnings in candlepin's error log:
> 
> 2018-10-08 20:38:09,840 [thread=http-bio-8443-exec-23]
> [req=64ba1470-a6f4-4bb9-84e1-c928e9101f45, org=, csid=] WARN 
> org.candlepin.common.resteasy.filter.LinkHeaderResponseFilter - Link length
> exceeded maximum length (1024). Link headers will be omitted from this
> response.
> org.candlepin.common.resteasy.filter.LinkTooLongException:
> https://localhost:8443/candlepin/consumers/?uuid=d3e657c4-df03-430f-97d2-
> e9dfede9ce22&uuid=.....(long-list-here).....&uuid=908b629e-8c68-44ab-a706-
> 2813519fea23&page=1
> 
> While candlepin responds with 200 return code.
> 
> Is the response complete / as katello expects?

Per khowell++ and jsherrill++ these not-responded headers are not interesting for katello.

Comment 13 errata-xmlrpc 2018-10-11 15:18:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2915


Note You need to log in before you can comment on or make changes to this bug.