Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2172182

Summary: [Pulp-3] Orphan cleanup does not remove the artifact_id association from individual content units
Product: Red Hat Satellite Reporter: Sayan Das <saydas>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED MIGRATED QA Contact: Satellite QE Team <sat-qe-bz-list>
Severity: high Docs Contact:
Priority: medium    
Version: 6.11.0CC: dalley
Target Milestone: UnspecifiedKeywords: MigratedToJIRA, Triaged
Target Release: Unused   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-06 16:08:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sayan Das 2023-02-21 15:55:40 UTC
Description of problem:

While it's possible to get rid of all the repository information from a capsule 6.13 server by removing all lifecycles from the capsule and running an orphan cleanup against the same, The artifact_id remains associated with individual content units at the database level. 

Not only that but, at this point, even it is possible to run repair api against pulpcore of the capsule server, which shows successfully completed but actually does not performs anything at all. 


Version-Release number of selected component (if applicable):

Satellite Capsule 6.13 [ or any version of Satellite and Satellite capsule running with Pulp 3 ]


How reproducible:

Always

Steps to Reproduce:
1. Install a Satellite 6.13 
2. Enable RHEL 8 BaseOS and Appstream and Sync them 
3. Configure required repos for Capsule installation and associate a Capsule 6.13 system with satellite.
4. Create PROD lifecycle on satellite and associate the same lifecycle with capsule for content syncing.
5. Use repos from Step 2 to create a CV called "RHEL8-CV", Publish the same and promote the new version to PROD lifecycle
6. Wait for the automated capsule sync task to finish.
7. On Satellite, Configure pulp-cli with a profile named proxy to check pulp data from the external capsule server and use the following command to see the list of synced repos in capsule.
   # pulp --profile proxy repository list  --limit 9999 | jq .[].name
   
8. Register a RHEL 8.6 client system to the capsule server , with "RHEL8-CV" and "PROD" lifecycle.
9. Execute "dnf clean all && dnf update --downloadonly -y" on the client system and it should download about 270+ packages. 
10. On capsule, Remove all the RPM type artifacts from the filesystem i.e. 

    # cd /var/lib/pulp/media/artifact/
    # file */* | grep RPM -c            ## NOTE down the count 
    # file */* | grep RPM | awk -F':' '{print $1}' | xargs rm -f
    # cd ~

11. Repeat step 9 and notice the errors with RPM downloads. Also check the pulp logs from capsule server reflecting that expected artifacts are missing from filesystem.
12. Collect the output of the followng output from capsule :

    # echo "select ca.pulp_id,cca.artifact_id,ca.file,cca.relative_path from core_artifact ca LEFT JOIN core_contentartifact cca on cca.artifact_id = ca.pulp_id where ca.file = 'artifact/fe/3c5fe47fcde23b567759bc05dd0e8f294d6cb8997cd7c7c18072bc30fc1896';"   | su - postgres -c "psql -x pulpcore"

13. Use the hammer command from https://access.redhat.com/solutions/6685201 on satellite, to reduce the orphan protection timeout to 3 minutes.
14. Disassociate the PROD lifecycle from Capsule and ensure that no LCEs are selected to sync with capsule.
15. Invoke a sync for the capsule server ( which should finish in seconds ).
16. Wait for 5 minutes ( > the orphan protection timeout ) and then execute this command on satellite to initiate orphan cleanup on capsule 

    # SMART_PROXY_ID=2 foreman-rake katello:delete_orphaned_content RAILS_ENV=production --trace
    
17. Wait for the task to complete
18. Repeat the command from "Step 7" and the list of repos should be zero i.e. no repos are listed
19. Repeat Step 12 on Capsule and check the output.
20. Execute the following on satellite to set back the Orphan protection timeout to default value i.e.

    # hammer settings set --name orphan_protection_time --value 1440
    
21. Try running the repair API on the capsule from satellite while monitoring the pulp logs on capsule 

    # curl -s --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem  -H "Content-Type: application/json" -X POST https://capsule613.example.com/pulp/api/v3/repair/ | jq .
    # curl -s --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem  -H "Content-Type: application/json" -X GET https://capsule613.example.com/<task href> | jq .

22. Add back the PROD lifecycle to the Capsule server for syncing and Initiate a "Complete Sync" and wait for the sync to be successfully completed. 
23. Repeat Step 9 on client. 
24. Repeat Step 21 on the capsule server.
25. Now for the final time, repeat Step 9 on the client host
 
Actual results:

Step 7: We will see > 2 repo name\IDs 

Step 11: 

On client:
[MIRROR] kernel-modules-4.18.0-425.10.1.el8_7.x86_64.rpm: Status code: 500 for https://capsule613.example.com/pulp/content/RedHat/PROD/RHEL8/content/dist/rhel8/8/x86_64/baseos/os/Packages/k/kernel-modules-4.18.0-425.10.1.el8_7.x86_64.rpm (IP: 192.168.124.3)

In pulp logs of capsule:

Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]:     return await self._match_and_stream(path, request)
Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 542, in _match_and_stream
Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]:     return await self._serve_content_artifact(ca, headers, request)
Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 815, in _serve_content_artifact
Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]:     raise Exception(_("Expected path '{}' is not found").format(path))
Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]: Exception: Expected path '/var/lib/pulp/media/artifact/fe/3c5fe47fcde23b567759bc05dd0e8f294d6cb8997cd7c7c18072bc30fc1896' is not found
Feb 21 19:38:19 capsule613.example.com pulpcore-content[22961]:  [21/Feb/2023:14:08:19 +0000] "GET /pulp/content/RedHat/PROD/RHEL8/content/dist/rhel8/8/x86_64/baseos/os/Packages/k/kernel-modules-4.18.0-425.10.1.el8_7.x86_64.rpm HTTP/1.1" 500 244 "-" "libdnf (Red Hat Enterprise Linux 8.6; generic; Linux.x86_64)"


Step 12 and 19

# echo "select ca.pulp_id,cca.artifact_id,ca.file,cca.relative_path from core_artifact ca LEFT JOIN core_contentartifact cca on cca.artifact_id = ca.pulp_id where ca.file = 'artifact/fe/3c5fe47fcde23b567759bc05dd0e8f294d6cb8997cd7c7c18072bc30fc1896';"   | su - postgres -c "psql -x pulpcore"
-[ RECORD 1 ]-+---------------------------------------------------------------------------
pulp_id       | 4d9ffd03-e0a1-4a1b-b019-84239a295e7f
artifact_id   | 4d9ffd03-e0a1-4a1b-b019-84239a295e7f
file          | artifact/fe/3c5fe47fcde23b567759bc05dd0e8f294d6cb8997cd7c7c18072bc30fc1896
relative_path | kernel-modules-4.18.0-425.10.1.el8_7.x86_64.rpm

Step 21:

# curl -s --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem  -H "Content-Type: application/json" -X GET https://capsule613.example.com/pulp/api/v3/tasks/a0a2dec4-1da1-4955-828c-8539d4977dd6/ | jq .progress_reports
[
    {
      "message": "Identify missing units",
      "code": "repair.missing",
      "state": "completed",
      "total": null,
      "done": 278,
      "suffix": null
    },
    {
      "message": "Identify corrupted units",
      "code": "repair.corrupted",
      "state": "completed",
      "total": null,
      "done": 0,
      "suffix": null
    },
    {
      "message": "Repair corrupted units",
      "code": "repair.repaired",
      "state": "completed",
      "total": null,
      "done": 0,  --------> nothing was done but the task still shows completed
      "suffix": null
    }
]

# curl -s --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem  -H "Content-Type: application/json" -X GET https://capsule613.example.com/pulp/api/v3/tasks/a0a2dec4-1da1-4955-828c-8539d4977dd6/ | jq .state
"completed"


Step 24:

# curl -s --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem  -H "Content-Type: application/json" -X GET https://capsule613.example.com/pulp/api/v3/tasks/771c701d-6839-49d3-9ac4-56ee53ad76f4/ | jq .state
"completed"

# curl -s --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem  -H "Content-Type: application/json" -X GET https://capsule613.example.com/pulp/api/v3/tasks/771c701d-6839-49d3-9ac4-56ee53ad76f4/ | jq .progress_reports
[
  {
    "message": "Identify missing units",
    "code": "repair.missing",
    "state": "completed",
    "total": null,
    "done": 278,
    "suffix": null
  },
  {
    "message": "Identify corrupted units",
    "code": "repair.corrupted",
    "state": "completed",
    "total": null,
    "done": 0,
    "suffix": null
  },
  {
    "message": "Repair corrupted units",
    "code": "repair.repaired",
    "state": "completed",
    "total": null,
    "done": 278,
    "suffix": null
  }
]


Step 25: Successful yum\dnf transaction on client host


Expected results:

* If no Lifecycles are associated with a capsule server and all orphan contents have been deleted after the orphan protection timeout, Then all content unit information should be deleted from the database or disassociated from their artifact_id's as well. 

* Repair API should not work at all when no remotes ( i.e. no syncable repos ) are present on the said capsule 


Additional info:

On 6.10 or 6.11, This scenario would be completely unrecoverable if someone does exactly the same as I did or simply removes the entire artifacts directory from capsule. Reason -> Missing Modulemd metadata cannot be repaired by repair API.

On 6.12+ it's still possible to get back the artifacts as expected but Users will additionally need to run the repair API on capsule which I would very much love to skip and so as the end-users.

Comment 2 Daniel Alley 2023-02-28 20:22:00 UTC
>>>>>
On 6.10 or 6.11, This scenario would be completely unrecoverable if someone does exactly the same as I did or simply removes the entire artifacts directory from capsule. Reason -> Missing Modulemd metadata cannot be repaired by repair API.

On 6.12+ it's still possible to get back the artifacts as expected but Users will additionally need to run the repair API on capsule which I would very much love to skip and so as the end-users.
>>>>>

Two sidenotes:

1) Keep in mind that published metadata is also stored in /var/lib/pulp/artifacts/ and can't be regenerated on an individual basis in an exact way - it can't be "repaired".

2) Possibly "orphan cleanup protection time" is no longer an ideal solution any more given that the direction we would like to move for other reasons (RBAC, etc.) is that uploaded content must be immediately added to a repo in order to set permissions on it appropriately. Uploading content independently of adding it to a repo conflicts with several features which we want to adopt.

Comment 3 Sayan Das 2023-03-01 08:58:40 UTC
1) Keep in mind that published metadata is also stored in /var/lib/pulp/artifacts/ and can't be regenerated on an individual basis in an exact way - it can't be "repaired".

Correct and for that, The very first action we take is to Force Full Sync the repos + Republish the content-view version metadata as well. Once that is done, we go for "Validate Sync" or the "Repair" API to get back the other content.

That is why i mentioned on 6.10 or 6.11 it is impossible to fix the issue as modulemd metadata used to be stored on the filesystem as well that repair API cannot recover. That is no longer a blocker for 6.12+

Comment 4 Sayan Das 2023-09-12 08:04:57 UTC
The only way I found I can get rid of all the data is by doing the following on the capsule i.e. 

# satellite-installer --reset-data 

This drops the entire pulpcore database on the capsule and reinitializes it back with default content\settings\tables. 

Then we can again add the necessary lifecycles with the capsule and "Complete Sync" the capsule. 

But this approach is only possible when the end-user is ready to get rid of all contents of capsule and resync them. If that is not a possibility, Then the Orphan Cleanup needs to be able to deal with those stale contents properly.

Comment 5 Eric Helms 2024-06-06 16:08:28 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "SAT-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.