Bug 2035873

Summary: Concurrent CV Publish fails with 502 error
Product: Red Hat Satellite Reporter: Jitendra Yejare <jyejare>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WORKSFORME QA Contact: Lai <ltran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.11.0CC: bbuckingham, dalley, dkliban, ggainey, rchan, swadeley
Target Milestone: 6.11.0Keywords: Reopened, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-07 15:39:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2000769    

Description Jitendra Yejare 2021-12-28 09:36:13 UTC
Description of problem:
Concurrent / Parallel CV publishing failed with error - 
```
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
Error message: the server returns an error
HTTP status code: 502
```

Version-Release number of selected component (if applicable):
Satellite 7.0 snap 3

How reproducible:


Steps to Reproduce:
1. Sync 5 big RHEL repos from CDN using subscription.
2. Create 6 CVs (satellite has 6 workers since I had 6 cores in satellite server) with each containing all 5 repos from step 1.
3. Start publishing 6 CVs in parallel.

Actual results:
All 6 CVs publishing errored with the same error as mentioned in the description of the bug.

Expected results:
All 6 CVs publishing should be successful using 6 pulp workers without error!

Additional info:

The stack trace from a CV-
```
Error message: the server returns an error
HTTP status code: 502
Response headers: {"Date"=>"Mon, 27 Dec 2021 15:26:35 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"}
Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
Error message: the server returns an error
HTTP status code: 502
Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:43 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"}
Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
Error message: the server returns an error
HTTP status code: 502
Response headers: {"Date"=>"Mon, 27 Dec 2021 15:25:42 GMT", "Server"=>"Apache", "Content-Length"=>"445", "Content-Type"=>"text/html; charset=iso-8859-1"}
Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/pulp/api/v3/content/rpm/packages/">GET&nbsp;/pulp/api/v3/content/rpm/packages/</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
```

Comment 2 Jitendra Yejare 2022-01-17 11:42:19 UTC
The bug was raised for the satellite installed on RHEL7!

Comment 3 Grant Gainey 2022-02-21 14:52:04 UTC
OOMKiller dropped by for a visit and killed gunicorn:

distribution_trees/?repository_version=%2Fpulp%2Fapi%2Fv3%2Frepositories%2Frpm%2Frpm%2F80c56992-3363-4f42-9e6e-ef877138754e%2Fversions%2F1%2F HTTP/1.1" 200 52 "-" "OpenAPI-Generator/3.16.1/ruby"
Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Dec 27 10:36:51 dhcp-3-18 kernel: gunicorn cpuset=/ mems_allowed=0
Dec 27 10:36:51 dhcp-3-18 kernel: CPU: 2 PID: 52410 Comm: gunicorn Kdump: loaded Not tainted 3.10.0-1160.49.1.el7.x86_64 #1
Dec 27 10:36:51 dhcp-3-18 kernel: Hardware name: Red Hat RHEL/RHEL-AV, BIOS 1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014
Dec 27 10:36:51 dhcp-3-18 kernel: Call Trace:
...
Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB

If you're going to do concurrent work, with large repos, on a memory-constrained system, WITH NO SWAP:

Dec 27 10:36:51 dhcp-3-18 kernel: Total swap = 0kB

you're going to have A Bad Time.

(note: "no swap" is NOT a supported Satellite configuration, and should never be used to open BZs...)

Comment 4 Grant Gainey 2022-02-22 17:07:21 UTC
Closing as dup of "sync takes more memory than it used to" BZ

*** This bug has been marked as a duplicate of bug 1994397 ***

Comment 5 Daniel Alley 2022-02-22 17:15:26 UTC
Reopening because there could be a separate issue here from #1994397

The OOM killer targeted gunicorn, and it seems like gunicorn was using ~3.5-4gb of memory, which seems excessive.

    Dec 27 10:36:51 dhcp-3-18 kernel: Killed process 52410 (gunicorn), UID 993, total-vm:3940968kB, anon-rss:3555200kB, file-rss:0kB, shmem-rss:0kB

Comment 7 Jitendra Yejare 2022-03-30 07:10:59 UTC
@bbuckingham This bug is still in a new state and it's blocking an important ONQA from verification. Can we prioritize this bug please?

Comment 8 Stephen Wadeley 2022-03-30 08:28:59 UTC
Hi @jyejare 

Have you retested this on a system with a swap file?


Thank you

Comment 9 Jitendra Yejare 2022-04-07 11:26:23 UTC
@swadeley , When I raised this bug I used SatLab system and by default satlab systems has swap file!


Still I will retake and confirm !

Comment 10 Jitendra Yejare 2022-04-07 15:39:02 UTC
Retested with satellite system with the swapfile.

Status: Closed WorksForMe

Steps to Reproduce:
1. Synced 5 big RHEL repos from CDN using subscription.
2. Created 6 CVs (satellite has 6 workers since I have 6 cores in satellite server) with each containing all 5 repos from step 1.
3. Published 6 CVs in parallel.

Actual results:
All 6 CVs were published successfully using 6 pulp workers without error!

Comment 11 pulp-infra@redhat.com 2022-11-16 14:08:15 UTC
The Pulp upstream bug status is at closed. Updating the external tracker on this bug.

Comment 12 Robin Chan 2023-03-02 15:06:45 UTC
The Pulp upstream bug status is at open. Updating the external tracker on this bug.

Comment 13 Robin Chan 2023-07-20 13:08:40 UTC
The Pulp upstream bug status is at closed. Updating the external tracker on this bug.