Bug 1548940

Summary: pulp docker repo sync doesn't fix disk issues
Product: Red Hat Satellite Reporter: Ahmed Nazmy <anazmy>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WONTFIX QA Contact: Kersom <koliveir>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2.13CC: broose, dkliban, ipanova, patalber, rbertolj, ttereshc, vdeshpan
Target Milestone: UnspecifiedKeywords: Reopened, Triaged
Target Release: Unused   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-10 13:03:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ahmed Nazmy 2018-02-26 04:49:55 UTC
Description of problem:

If satellite faces a full disk or similar issue preventing it from writing docker blobs on disk, subsequent docker repo delete/create/sync will not fix the issue and the incomplete/missing docker_blob will still be referenced in database and never re-downloaded again.

Version-Release number of selected component (if applicable):

pulp-client-1.0-1
satellite-6.2.13-4.0.el7sat
pulp-server-2.8.7.17-1.el7sat
pulp-docker-plugins-2.0.3.1-1.el7sat
python-crane-2.0.2.1-1.el7sat

How reproducible:
Always

Steps to Reproduce:
1. Setup a small /var/cache/pulp & /var/lib/pulp to hit limits soon
2. Sync some docker repos to fill-up space
3. Error similar to below should be noticed on disk fill-up (Notice its cache that got hit first) :
~~~
Jan 18 10:54:22 satellite.example.com pulp[53455]: nectar.downloaders.threaded:ERROR: (53455-15136) [Errno 28] No space left on device: u'/var/cache/pulp/reserved_resource_worker-6.com/d11a02d8-0a35-40ae-b8f1-301aea436200/sha256:2d36a0f107a4e56b82b2052f02896e58ccabca0d539bfef3f059ebe776c43443'
~~~

4. Verify docker repo sync is marked as complete from WebUI.

5. mentioned docker_blob is has a broken symlink to a missing file on disk:
~~~
# ls -al /var/lib/pulp/published/docker/v2/master/*openshift3_metrics-hawkular*/*/blobs/sha256:2d36a0f107a4e56b82b2052f02896e58ccabca0d539bfef3f059ebe776c43443

lrwxrwxrwx. 1 apache apache   177 Feb 21 11:42 sha256:2d36a0f107a4e56b82b2052f02896e58ccabca0d539bfef3f059ebe776c43443 -> /var/lib/pulp/content/units/docker_blob/06/2fa53cb04a019956f3d7e475a25bd0e8a0b86d21bb1582e50faa2c8f3676dc/sha256:2d36a0f107a4e56b82b2052f02896e58ccabca0d539bfef3f059ebe776c43443
~~~
5. Increase  /var/cache/pulp & /var/lib/pulp disk space to accommodate more docker_blobs

6. Trying to delete/recreate/re-sync this repo doesn't fix the problem.

 
Actual results:

Satellite/pulp wrongfully report docker repo sync is complete, while actually its not.

Causing 403 errors on running docker pull:

~~~
[root@client ~]# docker pull satellite.example.com:5000/acmecorp-ocp37-openshift3_metrics-hawkular-metrics:v3.7
Trying to pull repository satellite.example.com:5000/acmecorp-ocp37-openshift3_metrics-hawkular-metrics ...
v3.7: Pulling from satellite.example.com:5000/acmecorp-ocp37-openshift3_metrics-hawkular-metrics

381f45621c04: Downloading [==>                                                ] 3.785 MB/73.91 MB
f81b872e5378: Download complete
89bbbd535dc0: Downloading
3ece8908f8ed: Downloading [=>                                                 ] 1.622 MB/69.06 MB
2d36a0f107a4: Downloading
73827b6ae407: Waiting
8c2e7f2ae557: Waiting
error parsing HTTP 403 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don't have permission to access /pulp/docker/v2/acmecorp-ocp37-openshift3_metrics-hawkular-metrics/blobs/sha256:2d36a0f107a4e56b82b2052f02896e58ccabca0d539bfef3f059ebe776c43443\non this server.</p>\n</body></html>\n"
~~~

Expected results:

Satellite/pulp can detect such issues and report a failed update.

Additional info:

deleting docker_blob entry from mongodb seems like a viable workaround to force redownload

# mongo pulp_database --eval "db.units_docker_blob.remove({_storage_path: \"/var/lib/pulp/content/units/docker_blob/06/2fa53cb04a019956f3d7e475a25bd0e8a0b86d21bb1582e50faa2c8f3676dc/sha256:2d36a0f107a4e56b82b2052f02896e58ccabca0d539bfef3f059ebe776c43443\"})"

Then resync this docker repo

Comment 6 Tanya Tereshchenko 2018-05-02 18:31:04 UTC

*** This bug has been marked as a duplicate of bug 1548966 ***

Comment 8 Tanya Tereshchenko 2020-01-03 12:56:48 UTC
Ina, Dennis,

Any recommendations for this bug resolution?
A workaround is available in the provided KBS article.

The question is whether the problem still persists, whether we can/should do anything in pulp2 and what's the state for the similar situation in pulp3.

Comment 9 Ina Panova 2020-01-10 13:03:20 UTC
I suggest closing this as Won't Fix, with the closed resolution of https://bugzilla.redhat.com/show_bug.cgi?id=1548966 it would be hard to provide auto-hilling or recovery state.

Given that Pulp2 is in maintenance mode and the complexity of the fix is high, I suggest using the workaround provided in the KS.

This issue won't be a problem in Pulp3.