Bug 1276911 - Capsule fails to publish kickstart tree due to missing target of symlink
Summary: Capsule fails to publish kickstart tree due to missing target of symlink
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.1.1
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Jitendra Yejare
URL:
Whiteboard:
: 1285830 (view as bug list)
Depends On:
Blocks: 1338516
TreeView+ depends on / blocked
 
Reported: 2015-11-01 08:39 UTC by Alexander Braverman
Modified: 2022-07-09 07:40 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-15 15:51:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
capsule logs (14.27 MB, application/x-tar)
2015-11-01 08:39 UTC, Alexander Braverman
no flags Details
foreman-debug (11.34 MB, application/x-xz)
2015-11-05 11:31 UTC, Alexander Braverman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2069243 0 None None None 2017-07-26 19:45:58 UTC
Red Hat Product Errata RHSA-2016:0174 0 normal SHIPPED_LIVE Moderate: Satellite 6.1.7 security, bug and enhancement fix update 2016-02-15 20:50:32 UTC

Description Alexander Braverman 2015-11-01 08:39:11 UTC
Created attachment 1088319 [details]
capsule logs

Description of problem:
Capsule fails to publish kickstart tree (/var/www/pub/yum/https/repos/RHEVM/Production/Baremetal_Slave/content/dist/rhel/server/7/7Server/x86_64/) with error:
pulp.server.managers.repo.publish:INFO: publish failed for repo [RHEVM-Production-Baremetal_Slave] with distributor ID [RHEVM-Production-Baremetal_Slave]

Version-Release number of selected component (if applicable):


How reproducible:
Sync capsule from cli: hammer capsule content synchronize --id=9

Actual results:
error in capsule /var/logs/messages

Expected results:
published kickstart tree

Additional info:

Comment 2 Ohad Levy 2015-11-01 09:11:29 UTC
in the logs i see:
 Symbolic link not allowed or link target not accessible: /var/www/pub/yum/http/repos/RHEVM-DEV-SLA/Library/VDSM_host/custom/EPEL_Extra_Packages_for_Enterprise_Linux/EPEL_7_x64/qpid-proton-c-0.9-3.el7.x86_64.rpm

what is your disk layout? (consider using satellite/foreman-debug for full logs.....)

is there any reason why that operation will fail?

thanks

Comment 3 Alexander Braverman 2015-11-01 09:44:59 UTC
All storage is managed by Pulp. Pulp content is stored on an NFS mount:
10.35.160.108:/RHEV/capsule6-tlv /var/lib/pulp/content nfs defaults 1 2

Regarding logs: https://access.redhat.com/solutions/1177833

Right now the only clue I have for the failing operation is the fact that Satellite and Capsule were upgraded from 6.0 and the user created lifecycle environment 'production'. The new lifycycle environment is not an environment path. 

Other actions, which also required sync, were done. Such as creation of CV, new\update repos and product sync.

SELinux is disabled.

note: nfs is currently down and we can't reproduce the problem until it get back online.

Thanks

Comment 4 Ohad Levy 2015-11-01 15:10:17 UTC
The above error might indicate a permissions issue, when NFS is up again, please double check that the pulp user can actually write to /var/www/pub/yum/http/repos/RHEVM-DEV-SLA/Library/VDSM_host/custom/EPEL_Extra_Packages_for_Enterprise_Linux/EPEL_7_x64/

Comment 5 Alexander Braverman 2015-11-03 16:18:52 UTC
We replaced NFS storage with ISCSI. The permissions are set for apache:apache.
Then manually triggered sync using hammer.

The error raised again with same results and also there are multiple errors of another sort[1]:
"RuntimeError: Will not create a symlink to a non-existent source"
There are reported bugs[2][3][4][5] but they seem to be related to pulp-server version 2.4. I didn't find any support case related.

Details about the environment:
 - Satellite 6.1.1 , RHEL 6.7
 - Capsule katello-service-2.2, pulp-server 2.6, RHEL 7.1
 - Both Capsulet and Satellite were upgraded and updated from 6.0

[1] http://pastebin.test.redhat.com/324618
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1098340
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1125388
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1093745
[5] https://bugzilla.redhat.com/show_bug.cgi?id=1102745

Comment 6 Alexander Braverman 2015-11-05 11:31:54 UTC
Created attachment 1090029 [details]
foreman-debug

Comment 7 Ivan Necas 2015-11-09 10:21:38 UTC
I seems the pulp was not able to handle the case, where the content disappeared form the /var/lib/pulp/content directory.

As a workaround, I did the following:

```
# before the procuder unassociate all environments from the capsule

for s in {pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s stop; done

service mongod stop

mv /var/lib/mongodb{,.bak}
mkdir /var/lib/mongodb
restorecon -RvvF /var/lib/mongodb

service mongod start
su - apache -s /bin/bash -c /usr/bin/pulp-manage-db 

for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done

# now attach the environments to the capsule and synchronize
```

This effectively cleaned the capsule pulp database and resynchronized the content as a fresh installation. The previously empty /var/lib/pulp/content started to get filled with the data again after that.

Comment 8 Alexander Braverman 2015-11-09 13:22:01 UTC
The sync finished successfully and looks like the kickstart was published.
However, the error appeared again during the sync in /var/log/messages:

Nov  9 12:03:46 capsule-ops pulp: pulp.server.managers.repo.publish:INFO: publish failed for repo [RHEVM-Production-Baremetal_Slave] with distributor ID [RHEVM-Production-Baremetal_Slave]

How this error affects the sync process?

Comment 12 Michael Hrivnak 2015-12-11 16:49:05 UTC
This is the first I've seen this bug report, so I don't have an update.

But looking at it now, based on the pastebin in comment #5, it appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1285830

Comment 13 Bryan Kearney 2015-12-11 17:28:34 UTC
*** Bug 1285830 has been marked as a duplicate of this bug. ***

Comment 14 Bryan Kearney 2015-12-11 17:29:26 UTC
Lots of info in https://bugzilla.redhat.com/show_bug.cgi?id=1285830, please go check it out.

Comment 15 Alexander Braverman 2015-12-13 06:40:12 UTC
I don't have access to BZ 1285830

Comment 17 jnikolak 2015-12-29 23:10:44 UTC
Customer able to resolve this issue on another case using the workaround.

######### Work Around 1 #################
# before the procedure unassociate all environments from the capsule

for s in {pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s stop; done

service mongod stop

mv /var/lib/mongodb{,.bak}
mkdir /var/lib/mongodb
restorecon -RvvF /var/lib/mongodb

service mongod start
su - apache -s /bin/bash -c /usr/bin/pulp-manage-db 

for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done

# now attach the environments to the capsule and synchronize
```

This effectively cleaned the capsule pulp database and resynchronized the content as a fresh installation. The previously empty /var/lib/pulp/content started to get filled with the data again after that.
######### Finish Work Around 1 ################

Comment 18 Bryan Kearney 2016-01-04 18:25:09 UTC
Do I still need to track this bug?

Comment 20 jnikolak 2016-01-13 10:18:03 UTC
I think we should still track this bug, to find out the cause of this issue, perhaps we need better error detection, that can stop this issue from occurring.

Comment 24 Jeff Ortel 2016-01-18 23:16:12 UTC
I have reproduced this and determined the root cause.  The node sync logic needs to be updated to properly handle changed content units with associated files (such as productid and prestodata).

Comment 25 Bryan Kearney 2016-01-21 13:44:51 UTC
Jeff... can you please let me know what the upstream pulp bug is, or link it to this ticket?

Comment 26 Bryan Kearney 2016-01-21 15:02:00 UTC
This is the same fix as https://bugzilla.redhat.com/show_bug.cgi?id=1288855. I am not sure if we should treat them as dupes or not, so I am going to keep this open. However, I am moving this to POST since https://bugzilla.redhat.com/show_bug.cgi?id=1288855 is already in POST.

Comment 28 Brian Bouterse 2016-02-01 16:31:04 UTC
The upstream Pulp bug 1463 is at MODIFIED. The upstream/downstream issue association automation is failing because another bugzilla (#1288855) is already associated with issue 1463. Because of this, I need to remove 1463 from this BZ to get the upstream/downstream automation to start passing.

Comment 29 Jitendra Yejare 2016-02-09 07:15:02 UTC
Currently trying to verify this bug with following steps:
___________________________________________________________

1. Setup : Sat 6.0.8 and Capsule 6.0.8 on RHEL 67.
2. Capsule is associated with CV rhel67_cv and has the required repos to install the capsule are Capsule 60, rhel base_os 67 and RH Common server6.
3. Before Upgrade: Capsule never synced with satellite.
4. Before Upgrade: I sync 'Red Hat Enterprise Linux 7 Server Kickstart x86_64 7Server' in satellite and added this repo in newly created CV 'rhel67_test'. Then I publish and promote this CV.
5. The I upgraded satellite and capsule to latest 6.1.7 #c1.
6. And I started capsule sync using hammer.


What I observe:
_______________________

I observe that kickstart repo is synced without any Runtime Error in /var/log/messages. But while performing this capsule sync, an issue of multiple HTTP connection from capsule to satellite is observed and I see that sync is incomplete or not fully done. So I am not sure how does it impacted the sync of kickstart repo or the Runtime error which i was looking for.


So please confirm the steps I perform to repro/verify this bug are correct ?
And weather to change the bug state to 'verified'.

Comment 30 Alexander Braverman 2016-02-09 13:33:45 UTC
The original problem was that the repo wasn't available for usage and it fail on the publish step.

regarding the steps: 
3. capsule was synced before the upgrade.

Comment 31 Jitendra Yejare 2016-02-09 14:45:16 UTC
Alex,

I fetched steps from your comment#3: 
'Right now the only clue I have for the failing operation is the fact that Satellite and Capsule were upgraded from 6.0 and the user created lifecycle environment 'production'. The new lifycycle environment is not an environment path. 

Other actions, which also required sync, were done. Such as creation of CV, new\update repos and product sync.'
 
In my case, I have published and promoted the kickstart repo before upgrade only and it published and promoted successfully. I dont see any failure in that.

But the only thing is that I synced capsule after upgrade.

If that is ok, I can make it verified else, please let me the know the correct steps to repro it and the expected behavior as well.

Comment 32 Alexander Braverman 2016-02-09 16:09:07 UTC
We can't reproduce the bug as the Capsule was removed. But the steps look correct.

Comment 33 Jitendra Yejare 2016-02-10 06:05:37 UTC
As per verification steps and behavior in comment#29 and the confirmation from reporter in comment#32, Changing the state to Verified.

Comment 35 errata-xmlrpc 2016-02-15 15:51:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0174


Note You need to log in before you can comment on or make changes to this bug.