Description of problem: Concurrent / Parallel CV publishing failed with error - ``` <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Error message: the server returns an error HTTP status code: 502 ``` Version-Release number of selected component (if applicable): Satellite 7.0 snap 3 How reproducible: Steps to Reproduce: 1. Sync 5 big RHEL repos from CDN using subscription. 2. Create 6 CVs (satellite has 6 workers since I had 6 cores in satellite server) with each containing all 5 repos from step 1. 3. Start publishing 6 CVs in parallel. Actual results: All 6 CVs publishing errored with the same error as mentioned in the description of the bug. Expected results: All 6 CVs publishing should be successful using 6 pulp workers without error! Additional info: The stack trace from a CV- ``` Error message: the server returns an error HTTP status code: 502 Response headers: {"Date"=>"Mon, 27 Dec 2021 15:26:35 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Error message: the server returns an error HTTP status code: 502 Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:43 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Error message: the server returns an error HTTP status code: 502 Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:42 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> ```
The bug was raised for the satellite installed on RHEL7!
OOMKiller dropped by for a visit and killed gunicorn: distribution_trees/?repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Frpm%2Frpm%2F80c56992-3363-4f42-9e6e-ef877138754e%2Fversions%2F1%2F HTTP/1.1" 200 52 "-" "OpenAPI-Generator/3.16.1/ruby" Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn cpuset=/ mems_allowed=0 Dec 27 10:36:51 dhcp-3-18 kernel: CPU: 2 PID: 52410 Comm: gunicorn Kdump: loaded Not tainted 3.10.0-1160.49.1.el7.x86_64 #1 Dec 27 10:36:51 dhcp-3-18 kernel: Hardware name: Red Hat RHEL/RHEL-AV, BIOS 1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014 Dec 27 10:36:51 dhcp-3-18 kernel: Call Trace: ... Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB If you're going to do concurrent work, with large repos, on a memory-constrained system, WITH NO SWAP: Dec 27 10:36:51 dhcp-3-18 kernel: Total swap = 0kB you're going to have A Bad Time. (note: "no swap" is NOT a supported Satellite configuration, and should never be used to open BZs...)
Closing as dup of "sync takes more memory than it used to" BZ *** This bug has been marked as a duplicate of bug 1994397 ***
Reopening because there could be a separate issue here from #1994397 The OOM killer targeted gunicorn, and it seems like gunicorn was using ~3.5-4gb of memory, which seems excessive. Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB
@bbuckingham This bug is still in a new state and it's blocking an important ONQA from verification. Can we prioritize this bug please?
Hi @jyejare Have you retested this on a system with a swap file? Thank you
@swadeley , When I raised this bug I used SatLab system and by default satlab systems has swap file! Still I will retake and confirm !
Retested with satellite system with the swapfile. Status: Closed WorksForMe Steps to Reproduce: 1. Synced 5 big RHEL repos from CDN using subscription. 2. Created 6 CVs (satellite has 6 workers since I have 6 cores in satellite server) with each containing all 5 repos from step 1. 3. Published 6 CVs in parallel. Actual results: All 6 CVs were published successfully using 6 pulp workers without error!
The Pulp upstream bug status is at closed. Updating the external tracker on this bug.
The Pulp upstream bug status is at open. Updating the external tracker on this bug.