Bug 2026277
Summary: | null value in column "manifest_id" violates not-null constraint error while syncing RHOSP container images | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Rafal Szmigiel <rszmigie> | ||||
Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> | ||||
Status: | CLOSED ERRATA | QA Contact: | Lai <ltran> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.10.1 | CC: | akapse, alex.bron, dkliban, ggainey, hakon.gislason, iballou, ipanova, jpasqual, ltran, mkalyat, mmccune, nube, pcreech, rchan, sadas, saydas, shtiwari, wclark | ||||
Target Milestone: | 6.11.0 | Keywords: | Triaged | ||||
Target Release: | Unused | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | tfm-pulpcore-python-pulp-container-2.9.2 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 2043697 (view as bug list) | Environment: | |||||
Last Closed: | 2022-07-05 14:30:29 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Rafal Szmigiel
2021-11-24 08:58:00 UTC
I've configured Sync Policy to re-run sync every hour. After about 12hrs it finally synchronised all images. But still this is pretty disappointing and I'd like to know where the problem comes from. @Rafal can you share some logs with tracebacks? thanks. Hey, Attaching tracebacks as requested: Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: pulp [a2b82b5f-0427-4420-919e-b149d00c759e]: pulpcore.tasking.pulpcore_worker:INFO: Task fb0f3c7c-e537-4d15-9363-ea357f73d910 failed (null value in column "manifest_id" violates not-null constraint Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: DETAIL: Failing row contains (34796245, null, 30fd903d-0869-4787-9743-67aa57e068ca). Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: ) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: pulp [a2b82b5f-0427-4420-919e-b149d00c759e]: pulpcore.tasking.pulpcore_worker:INFO: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/pulpcore_worker.py", line 317, in _perform_task Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: result = func(*args, **kwargs) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/pulp_container/app/tasks/synchronize.py", line 44, in synchronize Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return dv.create() Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/declarative_version.py", line 151, in create Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: loop.run_until_complete(pipeline) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return future.result() Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: await asyncio.gather(*futures) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__ Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: await self.run() Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/pulp_container/app/tasks/sync_stages.py", line 461, in run Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: BlobManifest.objects.bulk_create(objs=blob_list, ignore_conflicts=True, batch_size=1000) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return getattr(self.get_queryset(), name)(*args, **kwargs) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 474, in bulk_create Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: ids = self._batched_insert(objs_without_pk, fields, batch_size, ignore_conflicts=ignore_conflicts) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 1211, in _batched_insert Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: self._insert(item, fields=fields, using=self.db, ignore_conflicts=ignore_conflicts) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 1186, in _insert Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return query.get_compiler(using=using).execute_sql(return_id) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1377, in execute_sql Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: cursor.execute(sql, params) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return executor(sql, params, many, context) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return self.cursor.execute(sql, params) Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__ Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: raise dj_exc_value.with_traceback(traceback) from exc_value Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute Nov 30 03:44:59 satellite.local pulpcore-worker-3[1231]: return self.cursor.execute(sql, params) Nov 23 09:42:23 satellite.local pulpcore-worker-5: pulp [65351aa8-eb7b-4167-ab35-57e104b5a339]: pulpcore.tasking.pulpcore_worker:INFO: Task 7d38dc27-fb2e-489a-8895-bc250e59fe4b failed (null value in column "manifest_id" violates not-null constraint Nov 23 09:42:23 satellite.local pulpcore-worker-5: DETAIL: Failing row contains (41938, null, 3cdba870-bc13-4637-9e30-ae4a8d93e135). Nov 23 09:42:23 satellite.local pulpcore-worker-5: ) Nov 23 09:42:23 satellite.local pulpcore-worker-5: pulp [65351aa8-eb7b-4167-ab35-57e104b5a339]: pulpcore.tasking.pulpcore_worker:INFO: File "/usr/lib/python3.6/site-packages/pulpcore/tasking/pulpcore_worker.py", line 317, in _perform_task Nov 23 09:42:23 satellite.local pulpcore-worker-5: result = func(*args, **kwargs) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/pulp_container/app/tasks/synchronize.py", line 44, in synchronize Nov 23 09:42:23 satellite.local pulpcore-worker-5: return dv.create() Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/declarative_version.py", line 151, in create Nov 23 09:42:23 satellite.local pulpcore-worker-5: loop.run_until_complete(pipeline) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete Nov 23 09:42:23 satellite.local pulpcore-worker-5: return future.result() Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline Nov 23 09:42:23 satellite.local pulpcore-worker-5: await asyncio.gather(*futures) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__ Nov 23 09:42:23 satellite.local pulpcore-worker-5: await self.run() Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/pulp_container/app/tasks/sync_stages.py", line 461, in run Nov 23 09:42:23 satellite.local pulpcore-worker-5: BlobManifest.objects.bulk_create(objs=blob_list, ignore_conflicts=True, batch_size=1000) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method Nov 23 09:42:23 satellite.local pulpcore-worker-5: return getattr(self.get_queryset(), name)(*args, **kwargs) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 474, in bulk_create Nov 23 09:42:23 satellite.local pulpcore-worker-5: ids = self._batched_insert(objs_without_pk, fields, batch_size, ignore_conflicts=ignore_conflicts) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 1211, in _batched_insert Nov 23 09:42:23 satellite.local pulpcore-worker-5: self._insert(item, fields=fields, using=self.db, ignore_conflicts=ignore_conflicts) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 1186, in _insert Nov 23 09:42:23 satellite.local pulpcore-worker-5: return query.get_compiler(using=using).execute_sql(return_id) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1377, in execute_sql Nov 23 09:42:23 satellite.local pulpcore-worker-5: cursor.execute(sql, params) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute Nov 23 09:42:23 satellite.local pulpcore-worker-5: return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers Nov 23 09:42:23 satellite.local pulpcore-worker-5: return executor(sql, params, many, context) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute Nov 23 09:42:23 satellite.local pulpcore-worker-5: return self.cursor.execute(sql, params) Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__ Nov 23 09:42:23 satellite.local pulpcore-worker-5: raise dj_exc_value.with_traceback(traceback) from exc_value Nov 23 09:42:23 satellite.local pulpcore-worker-5: File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute Nov 23 09:42:23 satellite.local pulpcore-worker-5: return self.cursor.execute(sql, params) Thank you for your help! Best Regards, Rafal @Rafal, if you inspect the repo, for example nova-api through the Registry API, since not all the information is displayed via RH registry UI, you will see that it has 66 tags, compared to 33 tags https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-nova-api/5de6bdfe5a13461646f8fe2a?tag=all&architecture=amd64 ``` $ less | curl -H "Authorization: $ACCESS_TOKEN" 'https://registry.redhat.io/v2/rhosp-rhel8/openstack-nova-api/tags/list' -L| python -m json.tool | grep 16| wc -l % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 665 100 665 0 0 998 0 --:--:-- --:--:-- --:--:-- 997 100 1136 100 1136 0 0 1629 0 --:--:-- --:--:-- --:--:-- 1629 66 ``` The rest of the tags you see are tags of the source containers, which are fairly big in size. For what I know, OSP workflows do not care about these source files. I have taken one tag that references the image that contains source files and inspected it. It has 506 layers, when I summed up size of each layer it was 1.2GB and this is just for one image, there are in total for this repo 24 images that contain source files. ``` $ curl -H "Accept:application/vnd.docker.distribution.manifest.v2+json" -H "Authorization: $ACCESS_TOKEN" 'https://registry.redhat.io/v2/rhosp-rhel8/openstack-nova-api/manifests/16.1.5-2-source' -L| python -m json.tool | grep size| wc -l % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 701 100 701 0 0 1611 0 --:--:-- --:--:-- --:--:-- 1611 100 82531 100 82531 0 0 149k 0 --:--:-- --:--:-- --:--:-- 1611k 506 $ less | curl -H "Authorization: $ACCESS_TOKEN" 'https://registry.redhat.io/v2/rhosp-rhel8/openstack-nova-api/tags/list' -L| python -m json.tool | grep source| wc -l % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 665 100 665 0 0 2166 0 --:--:-- --:--:-- --:--:-- 2159 100 1136 100 1136 0 0 964 0 0:00:01 0:00:01 --:--:-- 0 24 ``` If you exclude all images that are tagged as source images and sync only regular images the repo size will be 18GB. It took me 16 mins to sync these tags. ``` 18G /var/lib/pulp/media/artifact 18G /var/lib/pulp/media/ ``` Suggestions: 1. when creating repo tell it to exclude tags that have 'source' in their name 2. do you need tags of all the repo if not then when creating repo tell it to sync only specific list of tags If I am not mistaken this option is exposed in Satellite as Limit Sync Tags https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html/content_management_guide/managing_container_images#Importing_Container_Images 3. In Satellite 6.10 there is pulp3 which offers the option to sync container images with the on_demand policy if you don't need all the bits right away, they will be downloaded only when requested by the client, i.e during podman pull, this is also configurable on the repo. It takes less then several minutes to complete the sync since only manifests will be downloaded. I am still working on reproducing this BZ to confirm couple of my theories where we can improve codewise regardless of the suggestions I've listed above, but so far I ran out disk space plus I am being throttled by the registry server. A couple notes on what Ina said: - Limit Sync Tags can indeed be used to sync only tags with specific names. - While the installed version of Pulp 3 supports it, Satellite doesn't yet allow for on_demand syncing of container content in 6.10. I seem to not being able to reproduce this bug. I tried to sync nova-api. But based on the traceback provided,this upstream issue is very plausible https://pulp.plan.io/issues/9424 Since we're getting more customer cases,can I ask for a reproducer? I am having troubles to get this reproduced in my dev environment. Maybe @rafal or @lai would be able to help with that? Hey, Just a small note: If I specify tag limits as suggested above, for an instance: while read IMAGE; do \ IMAGE_NAME=$(echo $IMAGE | cut -d"/" -f3 | sed "s/openstack-//g") ; \ IMAGE_NOURL=$(echo $IMAGE | sed "s/registry.redhat.io\///g") ; \ hammer repository create \ --organization "Default Organization" \ --product "OSP Containers New" \ --content-type docker \ --url https://registry.redhat.io \ --docker-upstream-name $IMAGE_NOURL \ --upstream-username ‘blebleble’ \ --upstream-password ‘blablabla' \ --docker-tags-whitelist 16.2,16.2.0 \ --name $IMAGE_NAME ; done < satellite_images the problem does not occur. I believe we should suggest in the official doc the use of tag limits not only because of this issue, but also for the sake of disk space usage. There is no need to mirror every single image, including source images. Best Regards, Rafal here is pulp upstream patch that should fix the original manifest_id null database error https://github.com/pulp/pulp_container/pull/535 Moving to POST as upstream patch is merged Due to an issue with build tooling, the previously supplied RPM did not contain the fix. A new hotfix RPM will be attached shortly. Created attachment 1851425 [details]
python3-pulp-container-2.8.1-0.3.HOTFIXRHBZ2026277.el7pc.noarch.rpm
HOTFIX RPM is available for Satellite 6.10.1
To install the hotfix:
1. Take a complete backup or snapshot before installing the hotfix
2. Download the attached RPM and copy it to the affected Satellite/Capsule servers
3. # yum install ./python3-pulp-container-2.8.1-0.3.HOTFIXRHBZ2026277.el7pc.noarch.rpm --disableplugin=foreman-protector
4. # systemctl restart pulpcore-worker@*.service pulpcore-content.service
Steps to reproduce 1. Create a custom docker repo with openstack nova-api, cinder-api, and nova-compute (ensure that the system has adequate space) (I got mine from quay.io) 2. Sync the repos Expected: Repos should all sync successfully Actual: Repos does sync successfully. Please note that the image attacked doesn't nova-compute and that's because of the limitation of my vm which ran out of space after nova-api and cinder-api synced successfully. This limitation expands to a fresh vm with just syncing nova-compute itself. Verified in 7.0 snap 9 with python38-pulp-container-2.9.2-1.el8pc.noarch on both rhel8.5 and rhel7.9 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5498 |