Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1633252 - Candlepin throws 500 Internal Server Error for more than 40+ guests
Candlepin throws 500 Internal Server Error for more than 40+ guests
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Candlepin (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity urgent (vote)
: 6.3.4
: Unused
Assigned To: Barnaby Court
jcallaha
: PrioBumpGSS, Triaged
Depends On: 1632764 1635807
Blocks:
  Show dependency treegraph
 
Reported: 2018-09-26 10:05 EDT by Mike McCune
Modified: 2018-10-11 11:18 EDT (History)
13 users (show)

See Also:
Fixed In Version: tfm-rubygem-katello-3.4.5.85-1,tfm-rubygem-katello-3.4.5.86-1,candlepin-2.1.24-1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1631590
Environment:
Last Closed: 2018-10-11 11:18:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 25026 None None None 2018-09-26 10:05 EDT
Foreman Issue Tracker 25060 None None None 2018-10-03 14:07 EDT
Github candlepin/candlepin/pull/2120 None None None 2018-09-26 10:05 EDT
Red Hat Product Errata RHBA-2018:2915 None None None 2018-10-11 11:18 EDT

  None (edit)
Description Mike McCune 2018-09-26 10:05:30 EDT
+++ This bug was initially created as a clone of Bug #1631590 +++

Copy the issue from Jitendra Yejare <jyejare@redhat.com>,

Trying to post 40 virt-who guests to /rhsm/hypervisors but Satellite has thrown ISE Error and the guests were not posted.

Please see production.logs [1]  and candlepin.logs [2] or attachment.

[1] : https://pastebin.com/VjeZgjG5
[2] : https://pastebin.com/ceyTb1i7

--- Additional comment from hsun@redhat.com on 20180921T02:05:42

Created attachment 1485370 [details]
virtwho_client_concurrency_erroed

--- Additional comment from hsun@redhat.com on 20180921T02:13:46

We didn't use a large number of guests for virt-who + satellite6.4 testing before, according to the log message, it seems this issue is caused by the buffer size or maxHttpHeaderSize of candlepin or apache settings. 

2018-09-20 11:03:32,875 [thread=http-bio-8443-exec-10] [req=c8d4b309-a10b-41cf-ada1-24c9d489fe66, org=, csid=] ERROR org.candlepin.common.exceptions.mappers.CandlepinExceptionMapper - Runtime Error An attempt was made to write more data to the response headers than there was room available in the buffer. Increase maxHttpHeaderSize on the connector or write less data into the response headers. at org.apache.coyote.http11.AbstractOutputBuffer.checkLengthBeforeWrite:547
org.apache.coyote.http11.HeadersTooLargeException: An attempt was made to write more data to the response headers than there was room available in the buffer. Increase maxHttpHeaderSize on the connector or write less data into the response headers.


@kevin, could you help to check this? or any suggestion? Thanks.

--- Additional comment from hsun@redhat.com on 20180921T02:17:46

Hi, Jitendra Yejare,

Could you help to provide the below detailed info:

1. virt-who package version

2. /var/log/rhsm.log

--- Additional comment from jyejare@redhat.com on 20180921T08:00:39

Hi Eko,


The Package Versions:

virt-what-1.18-4.el7.x86_64
tfm-rubygem-hammer_cli_foreman_virt_who_configure-0.0.3-2.el7sat.noarch
tfm-rubygem-foreman_virt_who_configure-0.2.2-1.el7sat.noarch



Also, I don't see any logs under satellites /var/log/rhsm/rhsm.log during posting.



Note:
------
I can easily post upto 30 virt guests, but the issue comes when it is more than or equal to 40.

--- Additional comment from hsun@redhat.com on 20180921T08:18:08

Hi Jitendra,
it's not virt-what package, it should be virt-who, please check again.

and /var/log/rhsm/rhsm.log should be in the host which virt-who was installed.

--- Additional comment from hsun@redhat.com on 20180921T08:55:28

After discussing with Jitendra, virt-who package never be installed and used for this issue, so I'm afraid it's not a virt-who bug.

According to the error log message, I will move it to the candlepin component to check again.

--- Additional comment from pm-rhel@redhat.com on 20180921T15:15:41

Since this issue was entered in Red Hat Bugzilla, the pm_ack has been
set to + automatically for the next planned release

--- Additional comment from mmccune@redhat.com on 20180921T20:58:34

This does appear to be natively in Candlepin, after posting directly to the virt-who API endpoint the error occurs easily. 

We can tune around this via configs in server.xml, I had to get up into the 100MB range before 100 hosts would work:

               maxHttpHeaderSize="100000"

that is quite a bit more than the default of 4MB which indicates to me that something isn't really working correctly in formulating the response back to the API call.

Customers with 1000 virtual machines in a virt-who transaction would overwhelm the server and require 1G+ ram in the response which .. is a bit much for a HTTP response.

We need to examine how we are formulating the response headers in this API call.

--- Additional comment from mmccune@redhat.com on 20180921T21:04:17

Reproducer info:

I used https://hub.docker.com/r/jacobcallahan/genvirt/ to generate the API call.

--- Additional comment from crog@redhat.com on 20180921T21:57:34

As an aside to this issue, the request which lead to the error in question will not work as expected.

The intent of the request looks to be to fetch a collection of consumers by UUID. However, UUIDs are provided in a doubly-URI-encoded, JSON-serialized array, rather than the expected format of "?uuid=:uuid1&uuid=:uuid2&...&uuid=:uuidN".

I've posted a patch to address the HeadersTooLargeException, though if the request being issued to Candlepin is not fixed, the core of this bug (not getting consumer details back) will persist.

--- Additional comment from bbuckingham@redhat.com on 20180924T13:08:25

It appears that the following commit may have introduced the behavior observed in the URL:

https://github.com/Katello/katello/commit/e0d2c8790718e5aaf366a76b33ee446c06999bde#diff-bf897becee6d218f2e9b589c5f66dcfdR37

--- Additional comment from bbuckingham@redhat.com on 20180924T13:56:18

Based on comment 10 and irc discussion, we are going to re-use this same BZ to include a fix for the katello part.  To support that, I am going to move the bugzilla back to ASSIGNED and over to Justin, as he is investigating on katello.

--- Additional comment from jsherril@redhat.com on 20180924T16:33:29

Created redmine issue https://projects.theforeman.org/issues/25026 from this bug

--- Additional comment from bbuckingham@redhat.com on 20180924T20:33:47

The upstream katello PR has now merged as well.  Moving the BZ to POST.

--- Additional comment from sat6-jenkins@redhat.com on 20180925T18:56:31

build status: succeeded

brew:
 * rubygem-katello: closed - https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=18494435
Comment 1 Mike McCune 2018-09-26 10:06:53 EDT
We found this regression in 6.3.4 builds so pulling this back from 6.4 into 6.3 as well.
Comment 4 Justin Sherrill 2018-10-03 14:26:08 EDT
Moved to post, as additional upstream issue is needed (and now attached)
Comment 5 Patrick Creech 2018-10-03 15:17:38 EDT
Apologies, I should've caught that it was missing an issue.
Comment 7 Mike McCune 2018-10-04 17:29:47 EDT
NOTE, even with the build and fixes in tfm-rubygem-katello-3.4.5.86-1.el7sat we are still seeing the same error.

this will likely move back to ASSIGNED.
Comment 8 Mike McCune 2018-10-07 10:57:59 EDT
ignore above comment #7, was missing the candlepin build in my test. 

after getting all the packages from the latest snap, this worked fine with 5k hosts in the test
Comment 9 jcallaha 2018-10-08 11:45:19 EDT
Verified in Satellite 6.3.4 Snap 4

The endpoint can now accept at least 5000 hypervisors.

#  docker run --rm -e "SATHOST=my.sat.host" -e "COUNT=5000" jacobcallahan/genvirt
Adding satellite certificate http://my.sat.host/pub/katello-ca-consumer-latest.noarch.rpm
Retrieving http://my.sat.host/pub/katello-ca-consumer-latest.noarch.rpm
Preparing...                          ########################################
Updating / installing...
katello-ca-consumer-my.sat########################################
No registration details specified. Registering to Default_Organization and Library...
Registering to: my.sat.host:443/rhsm
The system has been registered with ID: 61d8d268-179c-40a0-8139-e2194514d225 
genvirt.py
ks-script-q6TWGF
startup.sh
yum.log
Generating data with 5000 hosts.
Submitting data to my.sat.host. This may take a while...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  771k    0     4  100  771k      0    285  0:46:11  0:46:06  0:00:05     0
nullUnregistering from Satellite
Unregistering from: my.sat.host:443/rhsm
System has been unregistered.
Done!
Comment 10 Pavel Moravec 2018-10-08 14:44:58 EDT
Reproducing this on a customer data, I see warnings in candlepin's error log:

2018-10-08 20:38:09,840 [thread=http-bio-8443-exec-23] [req=64ba1470-a6f4-4bb9-84e1-c928e9101f45, org=, csid=] WARN  org.candlepin.common.resteasy.filter.LinkHeaderResponseFilter - Link length exceeded maximum length (1024). Link headers will be omitted from this response.
org.candlepin.common.resteasy.filter.LinkTooLongException: https://localhost:8443/candlepin/consumers/?uuid=d3e657c4-df03-430f-97d2-e9dfede9ce22&uuid=.....(long-list-here).....&uuid=908b629e-8c68-44ab-a706-2813519fea23&page=1

While candlepin responds with 200 return code.

Is the response complete / as katello expects?
Comment 11 Pavel Moravec 2018-10-08 16:23:19 EDT
(In reply to Pavel Moravec from comment #10)
> Reproducing this on a customer data, I see warnings in candlepin's error log:
> 
> 2018-10-08 20:38:09,840 [thread=http-bio-8443-exec-23]
> [req=64ba1470-a6f4-4bb9-84e1-c928e9101f45, org=, csid=] WARN 
> org.candlepin.common.resteasy.filter.LinkHeaderResponseFilter - Link length
> exceeded maximum length (1024). Link headers will be omitted from this
> response.
> org.candlepin.common.resteasy.filter.LinkTooLongException:
> https://localhost:8443/candlepin/consumers/?uuid=d3e657c4-df03-430f-97d2-
> e9dfede9ce22&uuid=.....(long-list-here).....&uuid=908b629e-8c68-44ab-a706-
> 2813519fea23&page=1
> 
> While candlepin responds with 200 return code.
> 
> Is the response complete / as katello expects?

Per khowell++ and jsherrill++ these not-responded headers are not interesting for katello.
Comment 13 errata-xmlrpc 2018-10-11 11:18:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2915

Note You need to log in before you can comment on or make changes to this bug.