Bug 1276911
Summary: | Capsule fails to publish kickstart tree due to missing target of symlink | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Alexander Braverman <abraverm> | ||||||
Component: | Pulp | Assignee: | satellite6-bugs <satellite6-bugs> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Jitendra Yejare <jyejare> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.1.1 | CC: | abraverm, arahaman, avaddara, bbuckingham, bkearney, bkorren, cwelton, ddevra, dzhukous, jnikolak, jortel, jyejare, mhrivnak, mmccune, ohadlevy, oramraz, pmoravec, sauchter, yjog, ykaul | ||||||
Target Milestone: | Unspecified | Keywords: | TestBlocker, Triaged | ||||||
Target Release: | Unused | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-02-15 15:51:53 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1338516 | ||||||||
Attachments: |
|
in the logs i see: Symbolic link not allowed or link target not accessible: /var/www/pub/yum/http/repos/RHEVM-DEV-SLA/Library/VDSM_host/custom/EPEL_Extra_Packages_for_Enterprise_Linux/EPEL_7_x64/qpid-proton-c-0.9-3.el7.x86_64.rpm what is your disk layout? (consider using satellite/foreman-debug for full logs.....) is there any reason why that operation will fail? thanks All storage is managed by Pulp. Pulp content is stored on an NFS mount: 10.35.160.108:/RHEV/capsule6-tlv /var/lib/pulp/content nfs defaults 1 2 Regarding logs: https://access.redhat.com/solutions/1177833 Right now the only clue I have for the failing operation is the fact that Satellite and Capsule were upgraded from 6.0 and the user created lifecycle environment 'production'. The new lifycycle environment is not an environment path. Other actions, which also required sync, were done. Such as creation of CV, new\update repos and product sync. SELinux is disabled. note: nfs is currently down and we can't reproduce the problem until it get back online. Thanks The above error might indicate a permissions issue, when NFS is up again, please double check that the pulp user can actually write to /var/www/pub/yum/http/repos/RHEVM-DEV-SLA/Library/VDSM_host/custom/EPEL_Extra_Packages_for_Enterprise_Linux/EPEL_7_x64/ We replaced NFS storage with ISCSI. The permissions are set for apache:apache. Then manually triggered sync using hammer. The error raised again with same results and also there are multiple errors of another sort[1]: "RuntimeError: Will not create a symlink to a non-existent source" There are reported bugs[2][3][4][5] but they seem to be related to pulp-server version 2.4. I didn't find any support case related. Details about the environment: - Satellite 6.1.1 , RHEL 6.7 - Capsule katello-service-2.2, pulp-server 2.6, RHEL 7.1 - Both Capsulet and Satellite were upgraded and updated from 6.0 [1] http://pastebin.test.redhat.com/324618 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1098340 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1125388 [4] https://bugzilla.redhat.com/show_bug.cgi?id=1093745 [5] https://bugzilla.redhat.com/show_bug.cgi?id=1102745 Created attachment 1090029 [details]
foreman-debug
I seems the pulp was not able to handle the case, where the content disappeared form the /var/lib/pulp/content directory. As a workaround, I did the following: ``` # before the procuder unassociate all environments from the capsule for s in {pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s stop; done service mongod stop mv /var/lib/mongodb{,.bak} mkdir /var/lib/mongodb restorecon -RvvF /var/lib/mongodb service mongod start su - apache -s /bin/bash -c /usr/bin/pulp-manage-db for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done # now attach the environments to the capsule and synchronize ``` This effectively cleaned the capsule pulp database and resynchronized the content as a fresh installation. The previously empty /var/lib/pulp/content started to get filled with the data again after that. The sync finished successfully and looks like the kickstart was published. However, the error appeared again during the sync in /var/log/messages: Nov 9 12:03:46 capsule-ops pulp: pulp.server.managers.repo.publish:INFO: publish failed for repo [RHEVM-Production-Baremetal_Slave] with distributor ID [RHEVM-Production-Baremetal_Slave] How this error affects the sync process? This is the first I've seen this bug report, so I don't have an update. But looking at it now, based on the pastebin in comment #5, it appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1285830 *** Bug 1285830 has been marked as a duplicate of this bug. *** Lots of info in https://bugzilla.redhat.com/show_bug.cgi?id=1285830, please go check it out. I don't have access to BZ 1285830 Customer able to resolve this issue on another case using the workaround. ######### Work Around 1 ################# # before the procedure unassociate all environments from the capsule for s in {pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s stop; done service mongod stop mv /var/lib/mongodb{,.bak} mkdir /var/lib/mongodb restorecon -RvvF /var/lib/mongodb service mongod start su - apache -s /bin/bash -c /usr/bin/pulp-manage-db for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done # now attach the environments to the capsule and synchronize ``` This effectively cleaned the capsule pulp database and resynchronized the content as a fresh installation. The previously empty /var/lib/pulp/content started to get filled with the data again after that. ######### Finish Work Around 1 ################ Do I still need to track this bug? I think we should still track this bug, to find out the cause of this issue, perhaps we need better error detection, that can stop this issue from occurring. I have reproduced this and determined the root cause. The node sync logic needs to be updated to properly handle changed content units with associated files (such as productid and prestodata). Jeff... can you please let me know what the upstream pulp bug is, or link it to this ticket? This is the same fix as https://bugzilla.redhat.com/show_bug.cgi?id=1288855. I am not sure if we should treat them as dupes or not, so I am going to keep this open. However, I am moving this to POST since https://bugzilla.redhat.com/show_bug.cgi?id=1288855 is already in POST. The upstream Pulp bug 1463 is at MODIFIED. The upstream/downstream issue association automation is failing because another bugzilla (#1288855) is already associated with issue 1463. Because of this, I need to remove 1463 from this BZ to get the upstream/downstream automation to start passing. Currently trying to verify this bug with following steps: ___________________________________________________________ 1. Setup : Sat 6.0.8 and Capsule 6.0.8 on RHEL 67. 2. Capsule is associated with CV rhel67_cv and has the required repos to install the capsule are Capsule 60, rhel base_os 67 and RH Common server6. 3. Before Upgrade: Capsule never synced with satellite. 4. Before Upgrade: I sync 'Red Hat Enterprise Linux 7 Server Kickstart x86_64 7Server' in satellite and added this repo in newly created CV 'rhel67_test'. Then I publish and promote this CV. 5. The I upgraded satellite and capsule to latest 6.1.7 #c1. 6. And I started capsule sync using hammer. What I observe: _______________________ I observe that kickstart repo is synced without any Runtime Error in /var/log/messages. But while performing this capsule sync, an issue of multiple HTTP connection from capsule to satellite is observed and I see that sync is incomplete or not fully done. So I am not sure how does it impacted the sync of kickstart repo or the Runtime error which i was looking for. So please confirm the steps I perform to repro/verify this bug are correct ? And weather to change the bug state to 'verified'. The original problem was that the repo wasn't available for usage and it fail on the publish step. regarding the steps: 3. capsule was synced before the upgrade. Alex, I fetched steps from your comment#3: 'Right now the only clue I have for the failing operation is the fact that Satellite and Capsule were upgraded from 6.0 and the user created lifecycle environment 'production'. The new lifycycle environment is not an environment path. Other actions, which also required sync, were done. Such as creation of CV, new\update repos and product sync.' In my case, I have published and promoted the kickstart repo before upgrade only and it published and promoted successfully. I dont see any failure in that. But the only thing is that I synced capsule after upgrade. If that is ok, I can make it verified else, please let me the know the correct steps to repro it and the expected behavior as well. We can't reproduce the bug as the Capsule was removed. But the steps look correct. As per verification steps and behavior in comment#29 and the confirmation from reporter in comment#32, Changing the state to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0174 |
Created attachment 1088319 [details] capsule logs Description of problem: Capsule fails to publish kickstart tree (/var/www/pub/yum/https/repos/RHEVM/Production/Baremetal_Slave/content/dist/rhel/server/7/7Server/x86_64/) with error: pulp.server.managers.repo.publish:INFO: publish failed for repo [RHEVM-Production-Baremetal_Slave] with distributor ID [RHEVM-Production-Baremetal_Slave] Version-Release number of selected component (if applicable): How reproducible: Sync capsule from cli: hammer capsule content synchronize --id=9 Actual results: error in capsule /var/logs/messages Expected results: published kickstart tree Additional info: