Description of problem: I tried to add generic catalog item according to one of my test case. Then infinispinner appeared, so I tried to open cfme in another window. I was able to brwose through pages until I opened catalog items page again. Then it freezed on this window too. So I tried it on another appliance, where I added generic catalog item and then infinispinner appeared again, so I was able to browse other pages for like three minutes until whole appliance crashed and 502 proxy error appeared. The worst thing is, that there are no errors in logs. It looks like appliance is still working. When I added orchestration catalog item the only difference was, that infinispinner didn't appear and it saved that item. Version-Release number of selected component (if applicable): 5.6.1.0 How reproducible: Always Steps to Reproduce: 1. Services -> Catalogs -> Catalog items 2. Add a generic/orchestration item 3. Actual results: Appliance crash and 502 proxy error Expected results: Additional info:
This sounds like multiple requests are being processed concurrently in puma threads and are possibly deadlocking. This sounds similar to what happened when automate's mutex and rails activesupport dependency mutex deadlocked each other [1] and was fixed in [2]. Note, I don't know if catalog items uses automate in the same way as the custom dialog form pulldown items. Re-assigning to automate for now... Sorry, I'm not sure what category catalog items are. A quick hack to see if it's puma threads running concurrently is to change your config/puma.rb from 'threads 5, 5' to 'threads 1, 1' and restart your appliance processes. Note, The entire automate mutex goes away on master/5.7 via [3] [1] https://bugzilla.redhat.com/show_bug.cgi?id=1354054 [2] https://github.com/ManageIQ/manageiq/pull/9903 [3] https://github.com/ManageIQ/manageiq/pull/10135
I just figured out that it recovers after some time, so it might be the deadlock thing.
Matous - Were you able to test with the config/puma.rb changes suggested in comment #2? Please test with that configuration and provide feedback.
With 1, 1 still with infinispinner and in second panel it doesn't let me to connect immediately. After five minutes I can connect and create new catalog items without problems and also the first one is created.
Matous, The deadlock in 5.6 was fixed via this PR https://github.com/ManageIQ/manageiq/pull/10038 Can you make sure that this PR is in your environment. You should be seeing this line in lib/miq_automation_engine/engine/miq_ae_engine.rb def self.deliver(*args) # This was added because in RAILS 5 PUMA sends multiple requests # to the same process and our DRb server implementation for Automate # Methods is not thread safe. The permit_concurrent_loads allows the # DRb requests which run in different threads to load constants. ActiveSupport::Dependencies.interlock.permit_concurrent_loads do @deliver_mutex.synchronize { deliver_block(*args) } end end Thanks, Madhu
Yes, I can see this in miq_ae_engine.rb.
Hi, I see the delay the first time I create a catalog item, after that it seems to create the subsequent items quickly. Is this the same behavior? Or is it slow for you every time? Also if you can provide your appliance IP I would like to test it there. Thanks, Madhu
Hi, same for me with threads 1,1. First is created and then it crashes for some time like five minutes and then it's okay and catalog items are created immediately.
Hi, So this is happening only when the Appliance uses Catalog Items for the first time, subsequent creations of Catalog Items work properly. So if I were to use your appliance it won't exhibit this issue correct? Thanks, Madhu
This was never on 5.7 as the fix was in master before the fork (and then backported to Darga). So closing this.
Removing needinfo flag as we discussed everything in BJ session with Madhu.