Bug 1280346
| Summary: | "stack depth limit exceeded" when submitting 600ESXi hypervisors/6500VMs via virt-who | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Christian Horn <chorn> | ||||||||
| Component: | Candlepin | Assignee: | satellite6-bugs <satellite6-bugs> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | jcallaha | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 6.1.1 | CC: | bbuckingham, bkearney, chorn, csnyder, cwelton, dmoessne, egolov, ehelms, hsun, jcallaha, jokot3, mmccune, sauchter, tspeetje, xdmoon | ||||||||
| Target Milestone: | Unspecified | Keywords: | Triaged | ||||||||
| Target Release: | Unused | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | candlepin-0.9.54-5 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 1321630 1327220 1327224 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2016-09-27 09:01:46 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1327224 | ||||||||||
| Bug Blocks: | 1296845, 1321630, 1351644 | ||||||||||
| Attachments: | 
 | ||||||||||
| Any comments? Will this be approached in 6.2 ? Increasing max_stack_depth to 3Mb might be a simple thing to do? Or are there other suggestions for workarounds? Alternatively the query could be structured differently, maybe? max_stack_depth (integer)
    Specifies the maximum safe depth of the server's execution stack. The ideal setting for this parameter is the actual stack size limit enforced by the kernel (as set by ulimit -s or local equivalent), less a safety margin of a megabyte or so. The safety margin is needed because the stack depth is not checked in every routine in the server, but only in key potentially-recursive routines such as expression evaluation. The default setting is two megabytes (2MB), which is conservatively small and unlikely to risk crashes. However, it might be too small to allow execution of complex functions. Only superusers can change this setting.
    Setting max_stack_depth higher than the actual kernel limit will mean that a runaway recursive function can crash an individual backend process. On platforms where PostgreSQL can determine the kernel limit, the server will not allow this variable to be set to an unsafe value. However, not all platforms provide the information, so caution is recommended in selecting a value.
Hi, Can you provide the Candlepin logs from the time of this error? It would be helpful in tracking down the exact code path if we had the stack trace on the Candlepin side. Thanks! Created attachment 1164087 [details]
candlepin_stack
Verified in Satellite 6.1.10 Snap 3. 1. downloaded the file located here: http://file.rdu.redhat.com/csnyder/test_vodaphone.json 2. Registered the satellite to itself 3. Ran this command curl -k -X POST --cert /etc/pki/consumer/cert.pem --key /etc/pki/consumer/key.pem https://rhsm-qe-1.rhq.lab.eng.bos.redhat.com/rhsm/hypervisors -H "Content-Type: application/json" -d @"test_vodaphone.json" The command completed successfully and all content hosts were added correctly (see attached). The output was captured and will be attached as well. Created attachment 1204262 [details]
verification screenshot
Created attachment 1204263 [details]
output.txt
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1938 | 
Description of problem: We are running Satellite 6.1.1+hotfix and configured virt-who (virt-who-0.14-1.el7sat.noarch) now. The initial test was with one clusters of a VCenter which contained 300 ESXi and 3500 VMs. That task went OK in ~1.5min (virt-who logged a Timeout while talking to subscription-manager, but the data was well received by the Satellite). Now we wanted to submit the whole VCenter, which contains ~600 ESXi and 6500 VMs. Doing so, we get the following error in virt-who: ~~~ Error in communication with subscription manager: Runtime Error ERROR: stack depth limit exceeded Hint: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,157 ~~~ This is also seen on the Satellite in foreman/production.log and the PostgreSQL log. The PostgreSQL log also contains the query that was aborted: ~~~ ERROR: stack depth limit exceeded HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate. STATEMENT: select this_.consumer_id as y0_ from cp_consumer_guests this_ inner join cp_consumer gconsumer1_ on this_.consumer_id=gconsumer1_.id inner join cp_guest_ids_checkin checkins2_ on gconsumer1_.id=checkins2_.consumer_id where gconsumer1_.owner_id=$1 and (lower(this_.guest_id)=$2 or ... or lower(this_.guest_id)=$13169) order by checkins2_.updated desc ~~~ These elipsis mean there are 13168 "OR" statements, resulting in a total of about 400KB for the single query which makes the PostgreSQL stack checker unhappy. Version-Release number of selected component (if applicable): satellite 6.1.1 virt-who-0.14-1.el7sat.noarch How reproducible: always Steps to Reproduce: 1. setup 600ESXi hypervisors/6500VMs 2. run virt-who 3. Actual results: Runtime Error ERROR: stack depth limit exceeded Expected results: no error, succeeding operation Additional info: - after increasing postgresqls max_stack_depth to 3Mb, above works