Bug 2041508 - Publication creation (during migration to pulp3 as well) can fail if /var/lib/pulp is NFS share
Summary: Publication creation (during migration to pulp3 as well) can fail if /var/lib...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.9.8
Hardware: All
OS: All
unspecified
high with 3 votes vote
Target Milestone: 6.11.0
Assignee: satellite6-bugs
QA Contact: Lai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-17 14:37 UTC by Jan Jansky
Modified: 2022-07-05 14:32 UTC (History)
12 users (show)

Fixed In Version: tfm-pulpcore-python-pulp-rpm-3.17.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2059373 2061715 (view as bug list)
Environment:
Last Closed: 2022-07-05 14:32:03 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github pulp pulp_rpm issues 2379 0 None closed NFS issue during publish due to not all file descriptors being closed 2022-02-24 19:31:51 UTC
Red Hat Product Errata RHSA-2022:5498 0 None None None 2022-07-05 14:32:15 UTC

Description Jan Jansky 2022-01-17 14:37:58 UTC
Description of problem: 
If Satellite have /var/lib/pulp shared as NFS share migration have high chance to fail.


Version-Release number of selected component (if applicable):


How reproducible:
Did not reproduced yet, asked for backup.


Steps to Reproduce:
1. Have Satellite 6.9.z with /var/lib/pulp on NFS share
2. satellite-maintain content prepare

Actual results:

Jan 14 03:11:55 satellite pulpcore-worker-2: pulp: rq.worker:ERROR: Traceback (most recent call last):
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Jan 14 03:11:55 satellite pulpcore-worker-2: rv = job.perform()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Jan 14 03:11:55 satellite pulpcore-worker-2: self._result = self._execute()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Jan 14 03:11:55 satellite pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 231, in complex_repo_migration
Jan 14 03:11:55 satellite pulpcore-worker-2: migrated_repo.pulp3_repository_version
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 478, in migrate_repo_distributor
Jan 14 03:11:55 satellite pulpcore-worker-2: pulp2dist, repo_version)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/rpm/repository.py", line 91, in migrate_to_pulp3
Jan 14 03:11:55 satellite pulpcore-worker-2: publish(repo_version.pk, checksum_types=checksum_types, sqlite_metadata=sqlite)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/publishing.py", line 368, in publish
Jan 14 03:11:55 satellite pulpcore-worker-2: metadata_signing_service=metadata_signing_service,
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 178, in __exit__
Jan 14 03:11:55 satellite pulpcore-worker-2: self.delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 80, in delete
Jan 14 03:11:55 satellite pulpcore-worker-2: self._delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 90, in _delete
Jan 14 03:11:55 satellite pulpcore-worker-2: shutil.rmtree(self.path)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 490, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: onerror(os.rmdir, path, sys.exc_info())
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 488, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: os.rmdir(path)
Jan 14 03:11:55 satellite pulpcore-worker-2: OSError: [Errno 39] Directory not empty: '/var/lib/pulp/tmp/83062@satellite.example.com/02c3949f-a46f-4644-9c08-e04211f3a340'
Jan 14 03:11:55 satellite pulpcore-worker-2: Traceback (most recent call last):
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Jan 14 03:11:55 satellite pulpcore-worker-2: rv = job.perform()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Jan 14 03:11:55 satellite pulpcore-worker-2: self._result = self._execute()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Jan 14 03:11:55 satellite pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 231, in complex_repo_migration
Jan 14 03:11:55 satellite pulpcore-worker-2: migrated_repo.pulp3_repository_version
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 478, in migrate_repo_distributor
Jan 14 03:11:55 satellite pulpcore-worker-2: pulp2dist, repo_version)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/rpm/repository.py", line 91, in migrate_to_pulp3
Jan 14 03:11:55 satellite pulpcore-worker-2: publish(repo_version.pk, checksum_types=checksum_types, sqlite_metadata=sqlite)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/publishing.py", line 368, in publish
Jan 14 03:11:55 satellite pulpcore-worker-2: metadata_signing_service=metadata_signing_service,
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 178, in __exit__
Jan 14 03:11:55 satellite pulpcore-worker-2: self.delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 80, in delete
Jan 14 03:11:55 satellite pulpcore-worker-2: self._delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 90, in _delete
Jan 14 03:11:55 satellite pulpcore-worker-2: shutil.rmtree(self.path)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 490, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: onerror(os.rmdir, path, sys.exc_info())
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 488, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: os.rmdir(path)
Jan 14 03:11:55 satellite pulpcore-worker-2: OSError: [Errno 39] Directory not empty: '/var/lib/pulp/tmp/83062@satellite.example.com/02c3949f-a46f-4644-9c08-e04211f3a340'


Expected results:
Migration complete.

Additional info:
Most likely connected with https://pulp.plan.io/issues/7719
Different customer used workaround noted in https://pulp.plan.io/issues/7719#note-7, but was not tested.

Comment 11 Lai 2022-04-04 17:13:17 UTC
Steps to retest:

1. Have a satellite spin up to work on
2. Spin up a rhel box for the NFS share
3. On both systems, run: "rpm -q nfs-utils"
	a. This checks if the system has nfs already installed (it should)
4. On NFS share, run "systemctl status nfs-server" to check if NFS is running or enabled
5. If not, then enabled the it via "systemctl enable nfs-server --now" and recheck status.  It should be active.
6. run: “systemctl start nfs-server”, then “systemctl status nfs-server” to ensure that running NFS-server is running.
7. Ensure that the dir /mnt is blank because we're using this location for the export but you can put it anywhere or create a separate dir for this
8. Create an export point by adding "/mnt <sat-server>(rw)" to /etc/exports using vim
9. Stop firewall just in case: systemctl stop firewalld
10. Run: exportfs -r
11. set permission to write into /mnt with chmod 777 /mnt
12. On the satellite, create a separate folder (pulp1)
13. Grant all permission to pulp1 and /var/lib/pulp/ by running: chown -R pulp:pulp /var/lib/pulp/ and chown -R pulp:pulp pulp1
14. run: mount -t nfs <nfs-share server name>:/mnt pulp1 -v
	a. It should output something like this for status:
		mount.nfs: timeout set for Fri Mar 18 14:25:52 2022
		mount.nfs: trying text-based options
15. copy everything in /var/lib/pulp to the share: cp -r /var/lib/pulp/* pulp1
16. On NFS, check /mnt to ensure that the folder structure of /var/lib/pulp is in there (it should have assets, import, exports, tmp, etc)
17. Check the uid an gid of sat: grep pulp /etc/passwd
18. On NFS, create group, pulp user, and grant necessary permissions: groupadd -g <gid> pulp, adduser -g pulp -u <uid> pulp, chown -R pulp:pulp /mnt
19. Ensure that the uid and gid matches between NFS share and sat: grep pulp /etc/passwd
20. run: umount <NFS share hostname>:/mnt
21. Remount the NFS share to /var/lib/pulp/ with: mount -t nfs <nfs-share server name>:/mnt /var/lib/pulp/ -v
        a. Steps 14-18 is necessary so the NFS share has the same file structure as /var/lib/pulp in the export in /mnt.  Then we can mount to /var/lib/pulp/ so that any writing into that dir in satellite can 
                be directed to the NFS share one.  If we don't do this, and we mount directly to /var/lib/pulp/, then pulp cannot work if all its contents was removed by mounting the empty filesystem
22. run: restorecon -Rv /var/lib/pulp
23. Restart pulp services or all services on satellite: foreman-maintain service restart
24. run: showmount -e <NSF share hostname> to ensure mount is still active
25. Create a custom repo and sync
26. Create a cv, add the repos, and publish

Expected result:
step 25 and 26 should complete successfully

Actual:
Step 25 and 26 does complete successfully

To ensure that NFS share does populate correctly, I checked /mnt/media/artifact and the folder was populated.  Prior, the artifact folder wasn't created because I haven't synced any repos yet.  After syncing, content should appear in "artifact", which it did in my case.

Verified on 6.11 snap 14 with tfm-pulpcore-python3-pulp-rpm-3.17.3-2.el7pc.noarch on rhel7.9 and rhel8.5

Comment 14 errata-xmlrpc 2022-07-05 14:32:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498


Note You need to log in before you can comment on or make changes to this bug.