Bug 2041508

Summary: Publication creation (during migration to pulp3 as well) can fail if /var/lib/pulp is NFS share
Product: Red Hat Satellite Reporter: Jan Jansky <jjansky>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lai <ltran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.9.8CC: alsouza, dalley, dkliban, ggainey, jpasqual, keith.hammel, pcreech, pmendezh, pwaghmar, rchan, sadas, ttereshc
Target Milestone: 6.11.0Keywords: Triaged, Upgrades
Target Release: Unused   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: tfm-pulpcore-python-pulp-rpm-3.17.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2059373 2061715 (view as bug list) Environment:
Last Closed: 2022-07-05 14:32:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Jansky 2022-01-17 14:37:58 UTC
Description of problem: 
If Satellite have /var/lib/pulp shared as NFS share migration have high chance to fail.


Version-Release number of selected component (if applicable):


How reproducible:
Did not reproduced yet, asked for backup.


Steps to Reproduce:
1. Have Satellite 6.9.z with /var/lib/pulp on NFS share
2. satellite-maintain content prepare

Actual results:

Jan 14 03:11:55 satellite pulpcore-worker-2: pulp: rq.worker:ERROR: Traceback (most recent call last):
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Jan 14 03:11:55 satellite pulpcore-worker-2: rv = job.perform()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Jan 14 03:11:55 satellite pulpcore-worker-2: self._result = self._execute()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Jan 14 03:11:55 satellite pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 231, in complex_repo_migration
Jan 14 03:11:55 satellite pulpcore-worker-2: migrated_repo.pulp3_repository_version
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 478, in migrate_repo_distributor
Jan 14 03:11:55 satellite pulpcore-worker-2: pulp2dist, repo_version)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/rpm/repository.py", line 91, in migrate_to_pulp3
Jan 14 03:11:55 satellite pulpcore-worker-2: publish(repo_version.pk, checksum_types=checksum_types, sqlite_metadata=sqlite)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/publishing.py", line 368, in publish
Jan 14 03:11:55 satellite pulpcore-worker-2: metadata_signing_service=metadata_signing_service,
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 178, in __exit__
Jan 14 03:11:55 satellite pulpcore-worker-2: self.delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 80, in delete
Jan 14 03:11:55 satellite pulpcore-worker-2: self._delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 90, in _delete
Jan 14 03:11:55 satellite pulpcore-worker-2: shutil.rmtree(self.path)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 490, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: onerror(os.rmdir, path, sys.exc_info())
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 488, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: os.rmdir(path)
Jan 14 03:11:55 satellite pulpcore-worker-2: OSError: [Errno 39] Directory not empty: '/var/lib/pulp/tmp/83062.com/02c3949f-a46f-4644-9c08-e04211f3a340'
Jan 14 03:11:55 satellite pulpcore-worker-2: Traceback (most recent call last):
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Jan 14 03:11:55 satellite pulpcore-worker-2: rv = job.perform()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Jan 14 03:11:55 satellite pulpcore-worker-2: self._result = self._execute()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Jan 14 03:11:55 satellite pulpcore-worker-2: return self.func(*self.args, **self.kwargs)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 231, in complex_repo_migration
Jan 14 03:11:55 satellite pulpcore-worker-2: migrated_repo.pulp3_repository_version
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 478, in migrate_repo_distributor
Jan 14 03:11:55 satellite pulpcore-worker-2: pulp2dist, repo_version)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/rpm/repository.py", line 91, in migrate_to_pulp3
Jan 14 03:11:55 satellite pulpcore-worker-2: publish(repo_version.pk, checksum_types=checksum_types, sqlite_metadata=sqlite)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulp_rpm/app/tasks/publishing.py", line 368, in publish
Jan 14 03:11:55 satellite pulpcore-worker-2: metadata_signing_service=metadata_signing_service,
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 178, in __exit__
Jan 14 03:11:55 satellite pulpcore-worker-2: self.delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 80, in delete
Jan 14 03:11:55 satellite pulpcore-worker-2: self._delete()
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/services/storage.py", line 90, in _delete
Jan 14 03:11:55 satellite pulpcore-worker-2: shutil.rmtree(self.path)
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 490, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: onerror(os.rmdir, path, sys.exc_info())
Jan 14 03:11:55 satellite pulpcore-worker-2: File "/usr/lib64/python3.6/shutil.py", line 488, in rmtree
Jan 14 03:11:55 satellite pulpcore-worker-2: os.rmdir(path)
Jan 14 03:11:55 satellite pulpcore-worker-2: OSError: [Errno 39] Directory not empty: '/var/lib/pulp/tmp/83062.com/02c3949f-a46f-4644-9c08-e04211f3a340'


Expected results:
Migration complete.

Additional info:
Most likely connected with https://pulp.plan.io/issues/7719
Different customer used workaround noted in https://pulp.plan.io/issues/7719#note-7, but was not tested.

Comment 11 Lai 2022-04-04 17:13:17 UTC
Steps to retest:

1. Have a satellite spin up to work on
2. Spin up a rhel box for the NFS share
3. On both systems, run: "rpm -q nfs-utils"
	a. This checks if the system has nfs already installed (it should)
4. On NFS share, run "systemctl status nfs-server" to check if NFS is running or enabled
5. If not, then enabled the it via "systemctl enable nfs-server --now" and recheck status.  It should be active.
6. run: “systemctl start nfs-server”, then “systemctl status nfs-server” to ensure that running NFS-server is running.
7. Ensure that the dir /mnt is blank because we're using this location for the export but you can put it anywhere or create a separate dir for this
8. Create an export point by adding "/mnt <sat-server>(rw)" to /etc/exports using vim
9. Stop firewall just in case: systemctl stop firewalld
10. Run: exportfs -r
11. set permission to write into /mnt with chmod 777 /mnt
12. On the satellite, create a separate folder (pulp1)
13. Grant all permission to pulp1 and /var/lib/pulp/ by running: chown -R pulp:pulp /var/lib/pulp/ and chown -R pulp:pulp pulp1
14. run: mount -t nfs <nfs-share server name>:/mnt pulp1 -v
	a. It should output something like this for status:
		mount.nfs: timeout set for Fri Mar 18 14:25:52 2022
		mount.nfs: trying text-based options
15. copy everything in /var/lib/pulp to the share: cp -r /var/lib/pulp/* pulp1
16. On NFS, check /mnt to ensure that the folder structure of /var/lib/pulp is in there (it should have assets, import, exports, tmp, etc)
17. Check the uid an gid of sat: grep pulp /etc/passwd
18. On NFS, create group, pulp user, and grant necessary permissions: groupadd -g <gid> pulp, adduser -g pulp -u <uid> pulp, chown -R pulp:pulp /mnt
19. Ensure that the uid and gid matches between NFS share and sat: grep pulp /etc/passwd
20. run: umount <NFS share hostname>:/mnt
21. Remount the NFS share to /var/lib/pulp/ with: mount -t nfs <nfs-share server name>:/mnt /var/lib/pulp/ -v
        a. Steps 14-18 is necessary so the NFS share has the same file structure as /var/lib/pulp in the export in /mnt.  Then we can mount to /var/lib/pulp/ so that any writing into that dir in satellite can 
                be directed to the NFS share one.  If we don't do this, and we mount directly to /var/lib/pulp/, then pulp cannot work if all its contents was removed by mounting the empty filesystem
22. run: restorecon -Rv /var/lib/pulp
23. Restart pulp services or all services on satellite: foreman-maintain service restart
24. run: showmount -e <NSF share hostname> to ensure mount is still active
25. Create a custom repo and sync
26. Create a cv, add the repos, and publish

Expected result:
step 25 and 26 should complete successfully

Actual:
Step 25 and 26 does complete successfully

To ensure that NFS share does populate correctly, I checked /mnt/media/artifact and the folder was populated.  Prior, the artifact folder wasn't created because I haven't synced any repos yet.  After syncing, content should appear in "artifact", which it did in my case.

Verified on 6.11 snap 14 with tfm-pulpcore-python3-pulp-rpm-3.17.3-2.el7pc.noarch on rhel7.9 and rhel8.5

Comment 14 errata-xmlrpc 2022-07-05 14:32:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498