Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1276911

Summary:

Capsule fails to publish kickstart tree due to missing target of symlink

Product:

Red Hat Satellite

Reporter:

Alexander Braverman <abraverm>

Component:

Pulp

Assignee:

satellite6-bugs <satellite6-bugs>

Status:

CLOSED ERRATA

QA Contact:

Jitendra Yejare <jyejare>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.1.1

CC:

abraverm, arahaman, avaddara, bbuckingham, bkearney, bkorren, cwelton, ddevra, dzhukous, jnikolak, jortel, jyejare, mhrivnak, mmccune, ohadlevy, oramraz, pmoravec, sauchter, yjog, ykaul

Target Milestone:

Unspecified

Keywords:

TestBlocker, Triaged

Target Release:

Unused

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-02-15 15:51:53 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1338516

Attachments:

Description	Flags
capsule logs	none
foreman-debug	none

Description Alexander Braverman 2015-11-01 08:39:11 UTC

Created attachment 1088319 [details]
capsule logs

Description of problem:
Capsule fails to publish kickstart tree (/var/www/pub/yum/https/repos/RHEVM/Production/Baremetal_Slave/content/dist/rhel/server/7/7Server/x86_64/) with error:
pulp.server.managers.repo.publish:INFO: publish failed for repo [RHEVM-Production-Baremetal_Slave] with distributor ID [RHEVM-Production-Baremetal_Slave]

Version-Release number of selected component (if applicable):


How reproducible:
Sync capsule from cli: hammer capsule content synchronize --id=9

Actual results:
error in capsule /var/logs/messages

Expected results:
published kickstart tree

Additional info:

Comment 2 Ohad Levy 2015-11-01 09:11:29 UTC

in the logs i see:
 Symbolic link not allowed or link target not accessible: /var/www/pub/yum/http/repos/RHEVM-DEV-SLA/Library/VDSM_host/custom/EPEL_Extra_Packages_for_Enterprise_Linux/EPEL_7_x64/qpid-proton-c-0.9-3.el7.x86_64.rpm

what is your disk layout? (consider using satellite/foreman-debug for full logs.....)

is there any reason why that operation will fail?

thanks

Comment 3 Alexander Braverman 2015-11-01 09:44:59 UTC

All storage is managed by Pulp. Pulp content is stored on an NFS mount:
10.35.160.108:/RHEV/capsule6-tlv /var/lib/pulp/content nfs defaults 1 2

Regarding logs: https://access.redhat.com/solutions/1177833

Right now the only clue I have for the failing operation is the fact that Satellite and Capsule were upgraded from 6.0 and the user created lifecycle environment 'production'. The new lifycycle environment is not an environment path. 

Other actions, which also required sync, were done. Such as creation of CV, new\update repos and product sync.

SELinux is disabled.

note: nfs is currently down and we can't reproduce the problem until it get back online.

Thanks

Comment 4 Ohad Levy 2015-11-01 15:10:17 UTC

The above error might indicate a permissions issue, when NFS is up again, please double check that the pulp user can actually write to /var/www/pub/yum/http/repos/RHEVM-DEV-SLA/Library/VDSM_host/custom/EPEL_Extra_Packages_for_Enterprise_Linux/EPEL_7_x64/

Comment 5 Alexander Braverman 2015-11-03 16:18:52 UTC

We replaced NFS storage with ISCSI. The permissions are set for apache:apache.
Then manually triggered sync using hammer.

The error raised again with same results and also there are multiple errors of another sort[1]:
"RuntimeError: Will not create a symlink to a non-existent source"
There are reported bugs[2][3][4][5] but they seem to be related to pulp-server version 2.4. I didn't find any support case related.

Details about the environment:
 - Satellite 6.1.1 , RHEL 6.7
 - Capsule katello-service-2.2, pulp-server 2.6, RHEL 7.1
 - Both Capsulet and Satellite were upgraded and updated from 6.0

[1] http://pastebin.test.redhat.com/324618
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1098340
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1125388
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1093745
[5] https://bugzilla.redhat.com/show_bug.cgi?id=1102745

Comment 6 Alexander Braverman 2015-11-05 11:31:54 UTC

Created attachment 1090029 [details]
foreman-debug

Comment 7 Ivan Necas 2015-11-09 10:21:38 UTC

I seems the pulp was not able to handle the case, where the content disappeared form the /var/lib/pulp/content directory.

As a workaround, I did the following:

```
# before the procuder unassociate all environments from the capsule

for s in {pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s stop; done

service mongod stop

mv /var/lib/mongodb{,.bak}
mkdir /var/lib/mongodb
restorecon -RvvF /var/lib/mongodb

service mongod start
su - apache -s /bin/bash -c /usr/bin/pulp-manage-db 

for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done

# now attach the environments to the capsule and synchronize
```

This effectively cleaned the capsule pulp database and resynchronized the content as a fresh installation. The previously empty /var/lib/pulp/content started to get filled with the data again after that.

Comment 8 Alexander Braverman 2015-11-09 13:22:01 UTC

The sync finished successfully and looks like the kickstart was published.
However, the error appeared again during the sync in /var/log/messages:

Nov  9 12:03:46 capsule-ops pulp: pulp.server.managers.repo.publish:INFO: publish failed for repo [RHEVM-Production-Baremetal_Slave] with distributor ID [RHEVM-Production-Baremetal_Slave]

How this error affects the sync process?

Comment 12 Michael Hrivnak 2015-12-11 16:49:05 UTC

This is the first I've seen this bug report, so I don't have an update.

But looking at it now, based on the pastebin in comment #5, it appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1285830

Comment 13 Bryan Kearney 2015-12-11 17:28:34 UTC

*** Bug 1285830 has been marked as a duplicate of this bug. ***

Comment 14 Bryan Kearney 2015-12-11 17:29:26 UTC

Lots of info in https://bugzilla.redhat.com/show_bug.cgi?id=1285830, please go check it out.

Comment 15 Alexander Braverman 2015-12-13 06:40:12 UTC

I don't have access to BZ 1285830

Comment 17 jnikolak 2015-12-29 23:10:44 UTC

Customer able to resolve this issue on another case using the workaround.

######### Work Around 1 #################
# before the procedure unassociate all environments from the capsule

for s in {pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s stop; done

service mongod stop

mv /var/lib/mongodb{,.bak}
mkdir /var/lib/mongodb
restorecon -RvvF /var/lib/mongodb

service mongod start
su - apache -s /bin/bash -c /usr/bin/pulp-manage-db 

for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done

# now attach the environments to the capsule and synchronize
```

This effectively cleaned the capsule pulp database and resynchronized the content as a fresh installation. The previously empty /var/lib/pulp/content started to get filled with the data again after that.
######### Finish Work Around 1 ################

Comment 18 Bryan Kearney 2016-01-04 18:25:09 UTC

Do I still need to track this bug?

Comment 20 jnikolak 2016-01-13 10:18:03 UTC

I think we should still track this bug, to find out the cause of this issue, perhaps we need better error detection, that can stop this issue from occurring.

Comment 24 Jeff Ortel 2016-01-18 23:16:12 UTC

I have reproduced this and determined the root cause.  The node sync logic needs to be updated to properly handle changed content units with associated files (such as productid and prestodata).

Comment 25 Bryan Kearney 2016-01-21 13:44:51 UTC

Jeff... can you please let me know what the upstream pulp bug is, or link it to this ticket?

Comment 26 Bryan Kearney 2016-01-21 15:02:00 UTC

This is the same fix as https://bugzilla.redhat.com/show_bug.cgi?id=1288855. I am not sure if we should treat them as dupes or not, so I am going to keep this open. However, I am moving this to POST since https://bugzilla.redhat.com/show_bug.cgi?id=1288855 is already in POST.

Comment 28 Brian Bouterse 2016-02-01 16:31:04 UTC

The upstream Pulp bug 1463 is at MODIFIED. The upstream/downstream issue association automation is failing because another bugzilla (#1288855) is already associated with issue 1463. Because of this, I need to remove 1463 from this BZ to get the upstream/downstream automation to start passing.

Comment 29 Jitendra Yejare 2016-02-09 07:15:02 UTC

Currently trying to verify this bug with following steps:
___________________________________________________________

1. Setup : Sat 6.0.8 and Capsule 6.0.8 on RHEL 67.
2. Capsule is associated with CV rhel67_cv and has the required repos to install the capsule are Capsule 60, rhel base_os 67 and RH Common server6.
3. Before Upgrade: Capsule never synced with satellite.
4. Before Upgrade: I sync 'Red Hat Enterprise Linux 7 Server Kickstart x86_64 7Server' in satellite and added this repo in newly created CV 'rhel67_test'. Then I publish and promote this CV.
5. The I upgraded satellite and capsule to latest 6.1.7 #c1.
6. And I started capsule sync using hammer.


What I observe:
_______________________

I observe that kickstart repo is synced without any Runtime Error in /var/log/messages. But while performing this capsule sync, an issue of multiple HTTP connection from capsule to satellite is observed and I see that sync is incomplete or not fully done. So I am not sure how does it impacted the sync of kickstart repo or the Runtime error which i was looking for.


So please confirm the steps I perform to repro/verify this bug are correct ?
And weather to change the bug state to 'verified'.

Comment 30 Alexander Braverman 2016-02-09 13:33:45 UTC

The original problem was that the repo wasn't available for usage and it fail on the publish step.

regarding the steps: 
3. capsule was synced before the upgrade.

Comment 31 Jitendra Yejare 2016-02-09 14:45:16 UTC

Alex,

I fetched steps from your comment#3: 
'Right now the only clue I have for the failing operation is the fact that Satellite and Capsule were upgraded from 6.0 and the user created lifecycle environment 'production'. The new lifycycle environment is not an environment path. 

Other actions, which also required sync, were done. Such as creation of CV, new\update repos and product sync.'
 
In my case, I have published and promoted the kickstart repo before upgrade only and it published and promoted successfully. I dont see any failure in that.

But the only thing is that I synced capsule after upgrade.

If that is ok, I can make it verified else, please let me the know the correct steps to repro it and the expected behavior as well.

Comment 32 Alexander Braverman 2016-02-09 16:09:07 UTC

We can't reproduce the bug as the Capsule was removed. But the steps look correct.

Comment 33 Jitendra Yejare 2016-02-10 06:05:37 UTC

As per verification steps and behavior in comment#29 and the confirmation from reporter in comment#32, Changing the state to Verified.

Comment 35 errata-xmlrpc 2016-02-15 15:51:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0174