Hide Forgot
Description of problem: Concurrent / Parallel CV publishing failed with error - ``` <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Error message: the server returns an error HTTP status code: 502 ``` Version-Release number of selected component (if applicable): Satellite 7.0 snap 3 How reproducible: Steps to Reproduce: 1. Sync 5 big RHEL repos from CDN using subscription. 2. Create 6 CVs (satellite has 6 workers since I had 6 cores in satellite server) with each containing all 5 repos from step 1. 3. Start publishing 6 CVs in parallel. Actual results: All 6 CVs publishing errored with the same error as mentioned in the description of the bug. Expected results: All 6 CVs publishing should be successful using 6 pulp workers without error! Additional info: The stack trace from a CV- ``` Error message: the server returns an error HTTP status code: 502 Response headers: {"Date"=>"Mon, 27 Dec 2021 15:26:35 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Error message: the server returns an error HTTP status code: 502 Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:43 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Error message: the server returns an error HTTP status code: 502 Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:42 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET /pulp/api/v3/content/rpm/packages/</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> </body></html> ```
The bug was raised for the satellite installed on RHEL7!
OOMKiller dropped by for a visit and killed gunicorn: distribution_trees/?repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Frpm%2Frpm%2F80c56992-3363-4f42-9e6e-ef877138754e%2Fversions%2F1%2F HTTP/1.1" 200 52 "-" "OpenAPI-Generator/3.16.1/ruby" Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn cpuset=/ mems_allowed=0 Dec 27 10:36:51 dhcp-3-18 kernel: CPU: 2 PID: 52410 Comm: gunicorn Kdump: loaded Not tainted 3.10.0-1160.49.1.el7.x86_64 #1 Dec 27 10:36:51 dhcp-3-18 kernel: Hardware name: Red Hat RHEL/RHEL-AV, BIOS 1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014 Dec 27 10:36:51 dhcp-3-18 kernel: Call Trace: ... Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB If you're going to do concurrent work, with large repos, on a memory-constrained system, WITH NO SWAP: Dec 27 10:36:51 dhcp-3-18 kernel: Total swap = 0kB you're going to have A Bad Time. (note: "no swap" is NOT a supported Satellite configuration, and should never be used to open BZs...)
Closing as dup of "sync takes more memory than it used to" BZ *** This bug has been marked as a duplicate of bug 1994397 ***
Reopening because there could be a separate issue here from #1994397 The OOM killer targeted gunicorn, and it seems like gunicorn was using ~3.5-4gb of memory, which seems excessive. Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB
@bbuckingham@redhat.com This bug is still in a new state and it's blocking an important ONQA from verification. Can we prioritize this bug please?
Hi @jyejare@redhat.com Have you retested this on a system with a swap file? Thank you
@swadeley@redhat.com , When I raised this bug I used SatLab system and by default satlab systems has swap file! Still I will retake and confirm !
Retested with satellite system with the swapfile. Status: Closed WorksForMe Steps to Reproduce: 1. Synced 5 big RHEL repos from CDN using subscription. 2. Created 6 CVs (satellite has 6 workers since I have 6 cores in satellite server) with each containing all 5 repos from step 1. 3. Published 6 CVs in parallel. Actual results: All 6 CVs were published successfully using 6 pulp workers without error!