Bug 1364566
| Summary: | After adding generic/orchestration catalog item infinispinner and 502 error(appliance crashed) | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Matouš Mojžíš <mmojzis> | |
| Component: | Automate | Assignee: | Martin Povolny <mpovolny> | |
| Status: | CLOSED NOTABUG | QA Contact: | Matouš Mojžíš <mmojzis> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 5.6.0 | CC: | abellott, cpelland, dajohnso, jhardy, jrafanie, mfalesni, mkanoor, mmojzis, obarenbo, tfitzger | |
| Target Milestone: | GA | Keywords: | ZStream | |
| Target Release: | 5.7.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | service:catalog:appliance:perf | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1371174 (view as bug list) | Environment: | ||
| Last Closed: | 2016-10-12 08:58:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1371174 | |||
|
Description
Matouš Mojžíš
2016-08-05 17:59:28 UTC
This sounds like multiple requests are being processed concurrently in puma threads and are possibly deadlocking. This sounds similar to what happened when automate's mutex and rails activesupport dependency mutex deadlocked each other [1] and was fixed in [2]. Note, I don't know if catalog items uses automate in the same way as the custom dialog form pulldown items. Re-assigning to automate for now... Sorry, I'm not sure what category catalog items are. A quick hack to see if it's puma threads running concurrently is to change your config/puma.rb from 'threads 5, 5' to 'threads 1, 1' and restart your appliance processes. Note, The entire automate mutex goes away on master/5.7 via [3] [1] https://bugzilla.redhat.com/show_bug.cgi?id=1354054 [2] https://github.com/ManageIQ/manageiq/pull/9903 [3] https://github.com/ManageIQ/manageiq/pull/10135 I just figured out that it recovers after some time, so it might be the deadlock thing. Matous - Were you able to test with the config/puma.rb changes suggested in comment #2? Please test with that configuration and provide feedback. With 1, 1 still with infinispinner and in second panel it doesn't let me to connect immediately. After five minutes I can connect and create new catalog items without problems and also the first one is created. Matous, The deadlock in 5.6 was fixed via this PR https://github.com/ManageIQ/manageiq/pull/10038 Can you make sure that this PR is in your environment. You should be seeing this line in lib/miq_automation_engine/engine/miq_ae_engine.rb def self.deliver(*args) # This was added because in RAILS 5 PUMA sends multiple requests # to the same process and our DRb server implementation for Automate # Methods is not thread safe. The permit_concurrent_loads allows the # DRb requests which run in different threads to load constants. ActiveSupport::Dependencies.interlock.permit_concurrent_loads do @deliver_mutex.synchronize { deliver_block(*args) } end end Thanks, Madhu Yes, I can see this in miq_ae_engine.rb. Hi, I see the delay the first time I create a catalog item, after that it seems to create the subsequent items quickly. Is this the same behavior? Or is it slow for you every time? Also if you can provide your appliance IP I would like to test it there. Thanks, Madhu Hi, same for me with threads 1,1. First is created and then it crashes for some time like five minutes and then it's okay and catalog items are created immediately. Hi, So this is happening only when the Appliance uses Catalog Items for the first time, subsequent creations of Catalog Items work properly. So if I were to use your appliance it won't exhibit this issue correct? Thanks, Madhu This was never on 5.7 as the fix was in master before the fork (and then backported to Darga). So closing this. Removing needinfo flag as we discussed everything in BJ session with Madhu. |