Bug 2035873 - Concurrent CV Publish fails with 502 error
Summary: Concurrent CV Publish fails with 502 error
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: 6.11.0
Assignee: satellite6-bugs
QA Contact: Lai
URL:
Whiteboard:
Depends On:
Blocks: 2000769
TreeView+ depends on / blocked
 
Reported: 2021-12-28 09:36 UTC by Jitendra Yejare
Modified: 2022-05-13 15:22 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-07 15:39:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github pulp pulpcore issues 2250 0 None open Gunicorn consuming excessive amounts of memory 2022-04-28 14:21:06 UTC

Description Jitendra Yejare 2021-12-28 09:36:13 UTC
Description of problem:
Concurrent / Parallel CV publishing failed with error - 
```
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
Error message: the server returns an error
HTTP status code: 502
```

Version-Release number of selected component (if applicable):
Satellite 7.0 snap 3

How reproducible:


Steps to Reproduce:
1. Sync 5 big RHEL repos from CDN using subscription.
2. Create 6 CVs (satellite has 6 workers since I had 6 cores in satellite server) with each containing all 5 repos from step 1.
3. Start publishing 6 CVs in parallel.

Actual results:
All 6 CVs publishing errored with the same error as mentioned in the description of the bug.

Expected results:
All 6 CVs publishing should be successful using 6 pulp workers without error!

Additional info:

The stack trace from a CV-
```
Error message: the server returns an error
HTTP status code: 502
Response headers: {"Date"=>"Mon, 27 Dec 2021 15:26:35 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"}
Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
Error message: the server returns an error
HTTP status code: 502
Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:43 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"}
Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
Error message: the server returns an error
HTTP status code: 502
Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:42 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"}
Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
```

Comment 2 Jitendra Yejare 2022-01-17 11:42:19 UTC
The bug was raised for the satellite installed on RHEL7!

Comment 3 Grant Gainey 2022-02-21 14:52:04 UTC
OOMKiller dropped by for a visit and killed gunicorn:

distribution_trees/?repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Frpm%2Frpm%2F80c56992-3363-4f42-9e6e-ef877138754e%2Fversions%2F1%2F HTTP/1.1" 200 52 "-" "OpenAPI-Generator/3.16.1/ruby"
Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn cpuset=/ mems_allowed=0
Dec 27 10:36:51 dhcp-3-18 kernel: CPU: 2 PID: 52410 Comm: gunicorn Kdump: loaded Not tainted 3.10.0-1160.49.1.el7.x86_64 #1
Dec 27 10:36:51 dhcp-3-18 kernel: Hardware name: Red Hat RHEL/RHEL-AV, BIOS 1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014
Dec 27 10:36:51 dhcp-3-18 kernel: Call Trace:
...
Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB

If you're going to do concurrent work, with large repos, on a memory-constrained system, WITH NO SWAP:

Dec 27 10:36:51 dhcp-3-18 kernel: Total swap = 0kB

you're going to have A Bad Time.

(note: "no swap" is NOT a supported Satellite configuration, and should never be used to open BZs...)

Comment 4 Grant Gainey 2022-02-22 17:07:21 UTC
Closing as dup of "sync takes more memory than it used to" BZ

*** This bug has been marked as a duplicate of bug 1994397 ***

Comment 5 Daniel Alley 2022-02-22 17:15:26 UTC
Reopening because there could be a separate issue here from #1994397

The OOM killer targeted gunicorn, and it seems like gunicorn was using ~3.5-4gb of memory, which seems excessive.

    Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB

Comment 7 Jitendra Yejare 2022-03-30 07:10:59 UTC
@bbuckingham@redhat.com This bug is still in a new state and it's blocking an important ONQA from verification. Can we prioritize this bug please?

Comment 8 Stephen Wadeley 2022-03-30 08:28:59 UTC
Hi @jyejare@redhat.com 

Have you retested this on a system with a swap file?


Thank you

Comment 9 Jitendra Yejare 2022-04-07 11:26:23 UTC
@swadeley@redhat.com , When I raised this bug I used SatLab system and by default satlab systems has swap file!


Still I will retake and confirm !

Comment 10 Jitendra Yejare 2022-04-07 15:39:02 UTC
Retested with satellite system with the swapfile.

Status: Closed WorksForMe

Steps to Reproduce:
1. Synced 5 big RHEL repos from CDN using subscription.
2. Created 6 CVs (satellite has 6 workers since I have 6 cores in satellite server) with each containing all 5 repos from step 1.
3. Published 6 CVs in parallel.

Actual results:
All 6 CVs were published successfully using 6 pulp workers without error!


Note You need to log in before you can comment on or make changes to this bug.