Bug 1364566 - After adding generic/orchestration catalog item infinispinner and 502 error(appliance crashed)
Summary: After adding generic/orchestration catalog item infinispinner and 502 error(a...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Automate
Version: 5.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.7.0
Assignee: Martin Povolny
QA Contact: Matouš Mojžíš
URL:
Whiteboard: service:catalog:appliance:perf
Depends On:
Blocks: 1371174
TreeView+ depends on / blocked
 
Reported: 2016-08-05 17:59 UTC by Matouš Mojžíš
Modified: 2016-11-29 16:53 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1371174 (view as bug list)
Environment:
Last Closed: 2016-10-12 08:58:48 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Matouš Mojžíš 2016-08-05 17:59:28 UTC
Description of problem:
I tried to add generic catalog item according to one of my test case.
Then infinispinner appeared, so I tried to open cfme in another window. I was able to brwose through pages until I opened catalog items page again. Then it freezed on this window too. So I tried it on another appliance, where I added generic catalog item and then infinispinner appeared again, so I was able to browse other pages for like three minutes until whole appliance crashed and 502 proxy error appeared.

The worst thing is, that there are no errors in logs. It looks like appliance is still working.

When I added orchestration catalog item the only difference was, that infinispinner didn't appear and it saved that item.

Version-Release number of selected component (if applicable):
5.6.1.0

How reproducible:
Always

Steps to Reproduce:
1. Services -> Catalogs -> Catalog items
2. Add a generic/orchestration item
3.

Actual results:
Appliance crash and 502 proxy error

Expected results:


Additional info:

Comment 2 Joe Rafaniello 2016-08-05 18:30:07 UTC
This sounds like multiple requests are being processed concurrently in puma threads and are possibly deadlocking.  This sounds similar to what happened when automate's mutex and rails activesupport dependency mutex deadlocked each other [1] and was fixed in [2].

Note, I don't know if catalog items uses automate in the same way as the custom dialog form pulldown items.  Re-assigning to automate for now... Sorry, I'm not sure what category catalog items are.

A quick hack to see if it's puma threads running concurrently is to change your config/puma.rb from 'threads 5, 5' to 'threads 1, 1' and restart your appliance processes.

Note, The entire automate mutex goes away on master/5.7 via [3]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1354054
[2] https://github.com/ManageIQ/manageiq/pull/9903
[3] https://github.com/ManageIQ/manageiq/pull/10135

Comment 3 Matouš Mojžíš 2016-08-05 18:33:10 UTC
I just figured out that it recovers after some time, so it might be the deadlock thing.

Comment 4 Greg McCullough 2016-08-08 14:24:05 UTC
Matous - Were you able to test with the config/puma.rb changes suggested in comment #2?

Please test with that configuration and provide feedback.

Comment 5 Matouš Mojžíš 2016-08-08 16:24:00 UTC
With 1, 1 still with infinispinner and in second panel it doesn't let me to connect immediately. After five minutes I can connect and create new catalog items without problems and also the first one is created.

Comment 6 mkanoor 2016-08-08 18:13:06 UTC
Matous,
The deadlock in 5.6 was fixed via this PR https://github.com/ManageIQ/manageiq/pull/10038
Can you make sure that this PR is in your environment.

You should be seeing this line in lib/miq_automation_engine/engine/miq_ae_engine.rb

def self.deliver(*args)
    # This was added because in RAILS 5 PUMA sends multiple requests
    # to the same process and our DRb server implementation for Automate
    # Methods is not thread safe. The permit_concurrent_loads allows the
    # DRb requests which run in different threads to load constants.
    ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
      @deliver_mutex.synchronize { deliver_block(*args) }
    end
end

Thanks,
Madhu

Comment 7 Matouš Mojžíš 2016-08-09 09:20:14 UTC
Yes, I can see this in miq_ae_engine.rb.

Comment 8 mkanoor 2016-08-09 16:01:20 UTC
Hi,
I see the delay the first time I create a catalog item, after that it seems to create the subsequent items quickly. Is this the same behavior? Or is it slow for you every time? Also if you can provide your appliance IP I would like to test it there.
Thanks,
Madhu

Comment 9 Matouš Mojžíš 2016-08-10 10:16:12 UTC
Hi,
same for me with threads 1,1.
First is created and then it crashes for some time like five minutes and then it's okay and catalog items are created immediately.

Comment 10 mkanoor 2016-08-10 13:47:54 UTC
Hi,
So this is happening only when the Appliance uses Catalog Items for the first time, subsequent creations of Catalog Items work properly. So if I were to use your appliance  it won't exhibit this issue correct?

Thanks,
Madhu

Comment 12 Martin Povolny 2016-10-12 08:58:48 UTC
This was never on 5.7 as the fix was in master before the fork (and then backported to Darga). So closing this.

Comment 13 Matouš Mojžíš 2016-10-27 13:10:00 UTC
Removing needinfo flag as we discussed everything in BJ session with Madhu.


Note You need to log in before you can comment on or make changes to this bug.