Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1482539 - Upgrade of Satellite to 6.2.11 error on removal of qpid dat2 directory
Summary: Upgrade of Satellite to 6.2.11 error on removal of qpid dat2 directory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Installation
Version: 6.2.11
Hardware: x86_64
OS: Linux
high
high
Target Milestone: Unspecified
Assignee: Chris Roberts
QA Contact: Sanket Jagtap
URL:
Whiteboard:
: 1494798 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-17 13:45 UTC by Chris Roberts
Modified: 2022-07-09 09:22 UTC (History)
30 users (show)

Fixed In Version: katello-installer-base-3.0.0.100-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1530694 (view as bug list)
Environment:
Last Closed: 2018-02-05 13:54:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch (894 bytes, patch)
2017-11-01 19:41 UTC, Chris Roberts
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3157651 0 None None None 2017-08-21 08:00:35 UTC
Red Hat Product Errata RHSA-2018:0273 0 normal SHIPPED_LIVE Important: Red Hat Satellite 6 security, bug fix, and enhancement update 2018-02-08 00:35:29 UTC

Description Chris Roberts 2017-08-17 13:45:40 UTC
Description of problem:

Upgrading to 6.2.11 I get a:

[ERROR 2017-08-11 20:40:22 main] rm: cannot remove ‘/var/lib/qpidd/.qpidd/qls/dat2’: Is a directory

Version-Release number of selected component (if applicable): 6.2.11


How reproducible:


Steps to Reproduce:
1. upgrade to 6.2.11
2. watch the installer log with -v

Actual results:
[ERROR 2017-08-11 20:40:22 main] rm: cannot remove ‘/var/lib/qpidd/.qpidd/qls/dat2’: Is a directory

Expected results:

Remove the directory

Comment 5 Pavel Moravec 2017-08-19 16:14:45 UTC
No dupe (IMHO), but the error is just a symptom of a bigger problem.

What Satellite users usually do and how it goes wrong:

    1) Sat6 including qpidd is fully running

    2) "yum update" upgrades also qpid-cpp-server package - which has a tricky post-install script that restarts qpidd service. Since now, qpidd is already using the new directory structure. That is empty at this moment.

    3) satellite-installer --upgrade migrates data to the new structure (when qpidd is down, but too late). Here the "rm: cannot remove .." error comes from - it is just a side-effect symptom of this bigger problem.

    4) Depending on timing, we:
       - usually end up with 2 journal files instead of 1 for every durable queue (usually this is no problem but I saw a customer unable to start qpidd due to that since both queues had some unique journal sequence ID), and
       - sometimes many queues missing (like 2/3 of queues missing at some customer)


Note, that we can NOT resolve this in a Satellite upgrade step (like in hooks/pre/30-upgrade.rb). Since this requires "yum update" has been already run, so qpidd is already running on the new directory structure.

I see three possible ways how to resolve it:

A) somehow allow (yum) updating only with Satellite services down (or at least qpidd down), and running satellite-installer --upgrade just after it / without starting qpidd between. Could we somehow enforce this?

B) have qpid-cpp-server package without the post-install script / qpidd restart. Elegant but 
- wont resolve use case "yum update; reboot (or katello restart); satellite-installer --upgrade".
- will require the changed postinstall script in qpid-cpp-server practically forever (to allow upgrades 6.2.10 -> any release)

C) Forget on any data migration and re-build the queues and bindings from scratch (possible steps are at the bottom of KCS 3148641). Rationale:
- no upgrade shall be done during (pulp) task running, so all pulp queues (for workers, res.manager, celery and katello-agent) should have been empty; so almost everytime *all* queues will be empty; so no messages in queues lost; we can warn about this at the beginning of the upgrade script
- bypassing any script issue
- this assumes "yum update" is followed by "satellite-installer --upgrade" with ideally as few operations on Satellite as possible - since until the upgrade step is run, queues can be missing with its consequences (tasks can fail).

D) Some other solution?


I personally vote for C) where the particular procedure would be:

(*) running services (at least) qpidd, postgres, foreman-tasks (and httpd?)
(*) stop services tomcat (to let katello_event queue made empty), qdrouterd (dont (re)try pulp.agent.* queues to create) and pulp (dont create pulp queues)
(*) after a while (few seconds should be enough), stop foreman-tasks (just for sure)
(*) stop qpidd
(*) rm -rf /var/lib/qpidd/.qpidd (or /var/lib/qpidd/* for RHEL6)
  - optionally take backup prior this step?
(*) start qpidd (no clients shall connect now, so the next step shall create complete and coherent queues/exchanges/bindings)
(*) follow bottom of KCS 3148641 to rebuild queues/exchanges/bindings
(*) "step migrate qpid directory is complete" \o/

(I owe a beer to anybody who finds a gotcha in above procedure)

Comment 6 Ivan Necas 2017-08-21 06:54:56 UTC
Would it make sense to update the re-create queue procedure to restore all the queues, so that the installer would not need to be run?

Comment 7 Pavel Moravec 2017-08-21 08:07:29 UTC
(In reply to Ivan Necas from comment #6)
> Would it make sense to update the re-create queue procedure to restore all
> the queues, so that the installer would not need to be run?

It makes sense (specifically for the upgrade path from 6.2.10 or older).

The KCS 3148641 might create more queus than required, esp. queues for goferd-less clients. So there are 2 options how to create just required queues:

1) chose just systems with katello-agent package installed

2) identify what queues were there before (and after) the upgrade):

ls /var/lib/qpidd/.qpidd/qls/jrnl /var/lib/qpidd/.qpidd/qls/jrnl2 /var/lib/qpidd/qls/jrnl /var/lib/qpidd/qls/jrnl2 2> /dev/null | sort -u | grep pulp.agent

- jrnl for old dir before upgrade, jrnl2 after upgrade (if e.g. a system registered after yum update but before satellite-installer)
- the other pair of dirs is due to RHEL6/RHEL7 different paths


Moving the BZ to NEW since this required work (for the bigger problem) hasnt been implemented.

Comment 9 michiel.smit 2017-08-22 19:55:36 UTC
- I ran into this problem too. My sequence of events:
# yum update
(this upgraded the OS from 7.3 to 7.4, and updated Satellite packages from 6.2.10 to 6.2.11)
# satellite-installer --scenario satellite --upgrade
# systemctl reboot

- lots of errors; resolved by restarting goferd and puppet on all Content Hosts
- a few days later I decided to use the UI and upgrade a few Content Hosts: UI -> Content -> Erratta -> apply RHBA-2017:2467 (Satellite Tools 6.2.11 Async Release)
this resulted in it being applied successfully but now the UI -> Hosts -> Content Hosts -> host -> Errata - no Errata, host is uptodate, meanwhile it is at RHEL 6.8 and has a few hundred errata to be applied
- so I decided to rerun the "satellite-installer --scenario satellite --upgrade" to see if this would fix the discrepany between the UI and "yum check-update" on the Content Host, but hit this problem
- resolved by following https://access.redhat.com/solutions/3157651

Comment 14 Peter Vreman 2017-09-13 17:40:09 UTC
I have the same issue, but on Capsule, so it is not limited to Satellite Server.
Contrast my the Satellite Server upgrade when well.
- Satellite Server has only itself connected as client
- Satellite Capsule has 3 hosts connected (1 itself and 2 others) as client

Comment 15 Peter Vreman 2017-09-13 17:45:35 UTC
The KB https://access.redhat.com/solutions/3157651 is therefor incomplete, because some of the proposed fixes do not work on Capsules because there is no postgres

Comment 16 Pavel Moravec 2017-09-13 17:53:55 UTC
(In reply to Peter Vreman from comment #15)
> The KB https://access.redhat.com/solutions/3157651 is therefor incomplete,
> because some of the proposed fixes do not work on Capsules because there is
> no postgres

As far as I am aware, qpid-cpp packages of version 0.34 can / should be updated on Satellite only, since only there the memory leak they fix can appear. So there shouldnt be a need to update Capsule to 0.34 now (there will be in 6.3, I think).

qpidd on Capsule has far far less queues, basically only those that pulp requires. And pulp re-creates them after relevant services restart. So even if you update qpid-cpp packages on Capsule to 0.34 and hit some error message, it can be ignored (until it stops installer) since any queue will be recreated automatically.

Technically, to clear some trash, it makes sense to "rm -rf /var/lib/qpidd/* /var/lib/qpidd/.*" before the upgrade (assuming no pulp pending tasks).

Comment 17 Peter Vreman 2017-09-13 18:45:44 UTC
Correct the qpid-cpp are not installed on the Capsule

[crash/LI] root@li-lc-1589:~# rpm -q qpid-cpp
package qpid-cpp is not installed

In the end i ended up with executing, based on https://access.redhat.com/solutions/3157651
=========
katello-service stop
rm -rf /var/lib/qpidd/.qpidd /var/lib/qpidd/*
service qpidd start

qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 add exchange topic event --durable
qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 add queue katello_event_queue --durable
for key in compliance.created entitlement.created entitlement.deleted pool.created pool.deleted; do
    qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 bind event katello_event_queue $key
done

for i in pulp_resource_manager pulp_workers pulp_celerybeat; do service $i restart; done

katello-service restart
=========

After this the Capsule was working again.
Before the Capsule was not syncing because pulp-manage-db was not run during the upgrade. This resulted for me in broken repos that then also made all yum commands on the Capsule failing.

So the issue is really tricky in repairing once you have self-registered Satellite or Capsules.

Comment 18 Satellite Program 2017-10-14 06:07:26 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/20594 has been resolved.

Comment 21 Chris Roberts 2017-11-01 19:41:34 UTC
Created attachment 1346710 [details]
patch

To apply the patch do the following:

Download the patch and move it to the /usr/share/katello-installer-base directory

~~~
# mv 121.patch /usr/share/katello-installer-base
# patch -p1 < 121.patch
~~~

Now complete the upgrade to 6.2.12

~~~
# satellite-installer --scenario satellite --upgrade
~~~

Upgrade Step: upgrade_qpid_paths...
[ INFO 2017-11-01 15:40:34 verbose] Upgrade Step: upgrade_qpid_paths...
[ INFO 2017-11-01 15:40:34 verbose] Qpid directory upgrade is already complete, skipping
Upgrade Step: migrate_pulp...

Comment 27 Chris Roberts 2017-11-17 15:33:15 UTC
*** Bug 1494798 has been marked as a duplicate of this bug. ***

Comment 30 Ivan Necas 2018-01-11 11:25:42 UTC
Ideally both cases should be checked. As far as I remember, this was reproduced when actually stopping the services prior upgrade

Comment 31 Sanket Jagtap 2018-01-15 17:06:49 UTC
Build : Satellite 6.2.14 snap 1


Upgraded 6.1.z to 6.2.14 
katello-service stop
yum update -y
satellite-installer --scenario satellite --upgrade

Upgraded 6.2.z to 6.2.14 
yum update -y 
satellite-installer --scenario satellite --upgrade

No issues were discovered on both upgrade path.

After upgrade also checked the queues 

[root@hp-dl380pgen8-01 ~]# qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b "amqps://localhost:5671" -q
Queues
  queue                                                                                 dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  =====================================================================================================================================================================
  00012012-7a98-4d7c-a222-48fe51b7703b:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  0515af28-08e9-4b9c-a9b8-1f35529d3d43:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  0e0304d4-1c72-495b-a68c-0a720f008425:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  14cbd114-af14-4b4e-b726-75cfd41f9140:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  14cbd114-af14-4b4e-b726-75cfd41f9140:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  18200480-6416-4ccc-bbcd-e66762e2e425:1.0                                                   Y        Y        0     8      8       0   4.93k    4.93k        1     2
  18200480-6416-4ccc-bbcd-e66762e2e425:2.0                                                   Y        Y        0     4      4       0   2.53k    2.53k        1     2
  18ec18c2-aca2-407c-8ca0-264c054fb558:0.0                                                   Y        Y        0     0      0       0      0        0         1     2
  303bd0e6-3419-4d79-927e-d82a2278ed19:1.0                                                   Y        Y        0     4      4       0   2.42k    2.42k        1     2
  3535f09a-d57e-43dc-b2e3-9fc97d776254:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  3865a63a-936c-4ce6-b02c-4647e71f29bc:1.0                                                   Y        Y        0     4      4       0   2.46k    2.46k        1     2
  3b8d1175-5fdb-4a5d-a2ca-93684c7179cc:1.0                                                   Y        Y        0     4      4       0   2.46k    2.46k        1     2
  3ec15536-255f-40c1-8aef-53d5b1ac1ea1:1.0                                                   Y        Y        0     4      4       0   2.46k    2.46k        1     2
  47c96a84-498d-45af-bb15-88be107a8d15:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  47c96a84-498d-45af-bb15-88be107a8d15:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  4fc14b36-e467-4d8d-b649-10dfc9fb705d:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  5b25d318-7130-4c53-beef-ed5d65324370:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  5b25d318-7130-4c53-beef-ed5d65324370:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  5dbd2a51-e600-41ec-9525-529e6f8002dd:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  621a5d09-8dff-42b5-8a4c-28c4079fb91f:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  621a5d09-8dff-42b5-8a4c-28c4079fb91f:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  633fe998-2a78-49e0-8851-425f44aa8ff8:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  633fe998-2a78-49e0-8851-425f44aa8ff8:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  7386dc0a-4a6e-4aa0-9451-4ac33e224fad:1.0                                                   Y        Y        0     4      4       0   2.42k    2.42k        1     2
  7e18abab-3cbb-43c8-a2bd-bc1afcd1e659:1.0                                                   Y        Y        0     0      0       0      0        0         1     2
  808c6a43-e747-43f7-b5a3-12c6ceb2117c:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  808c6a43-e747-43f7-b5a3-12c6ceb2117c:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  961ce4f5-925a-40b7-8ab8-499acbbc7c89:1.0                                                   Y        Y        0     4      4       0   2.42k    2.42k        1     2
  b09a58e5-2e3c-4880-a92d-b7864ec94e94:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  b09a58e5-2e3c-4880-a92d-b7864ec94e94:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  b2463185-1a5c-49cc-b751-471a60e41c98:1.0                                                   Y        Y        0     8      8       0   4.91k    4.91k        1     2
  b2463185-1a5c-49cc-b751-471a60e41c98:2.0                                                   Y        Y        0     4      4       0   2.55k    2.55k        1     2
  celery                                                                                Y                      0    20     20       0   16.6k    16.6k        8     2
  celeryev.69726f5d-9fe1-452d-9397-a8ce0556b8de                                              Y                 0  2.06k  2.06k      0   1.82m    1.82m        1     2
  d0588b3e-a0c5-4ad8-9975-e0ad96c0daff:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  db3265db-db03-43ce-bee8-6e4f745511bc:1.0                                                   Y        Y        0     5      5       0   2.67k    2.67k        1     2
  dd397b1b-08ca-458b-bc87-ab913b76ac8a:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  de2ed33e-8be3-4545-a732-d499d156b9eb:1.0                                                   Y        Y        0     4      4       0   2.42k    2.42k        1     2
  fce33adc-7ce3-45f9-8ddf-6af84b109d8b:1.0                                                   Y        Y        0     2      2       0    486      486         1     2
  katello_event_queue                                                                   Y                      0     0      0       0      0        0         1     6
  pulp.agent.78fc319c-8993-4c75-965e-e1d151b59287                                       Y                      0     1      1       0    661      661         1     1
  pulp.task                                                                             Y                      0     3      3       0   1.36k    1.36k        3     1
  reserved_resource_worker-0.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-0             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-1.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-1             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-2.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-2             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-3.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-3             Y    Y                 0     6      6       0   6.79k    6.79k        1     2
  reserved_resource_worker-4.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-4             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-5.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-5             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-6.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-6             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-7.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-7             Y    Y                 0     0      0       0      0        0         1     2
  resource_manager                                                                      Y                      0     3      3       0   4.07k    4.07k        1     2
  resource_manager.pidbox                 Y                 0     0      0       0      0        0         1     2
  resource_manager                       Y    Y                 0     0      0       0      0        0         1     2
[root@hp-dl380pgen8-01 ~]# qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b "amqps://localhost:5671" exchanges
Type      Exchange Name       Attributes
==================================================
direct                        --replicate=none
direct    C.dq                --durable
direct    amq.direct          --durable --replicate=none
fanout    amq.fanout          --durable --replicate=none
headers   amq.match           --durable --replicate=none
topic     amq.topic           --durable --replicate=none
direct    celery              --durable
fanout    celery.pidbox      
topic     celeryev            --durable
topic     event               --durable
direct    qmf.default.direct  --replicate=none
topic     qmf.default.topic   --replicate=none
topic     qpid.management     --replicate=none
direct    resource_manager    --durable

Is there anything else that needs to verified on the box?

Comment 32 Chris Roberts 2018-01-15 17:13:11 UTC
Looks good to me, if you didnt see a message regarding cant remove directory on upgrade then this bug is fixed.

Comment 33 Sanket Jagtap 2018-01-15 17:14:23 UTC
Marking as verified

Comment 36 errata-xmlrpc 2018-02-05 13:54:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0273


Note You need to log in before you can comment on or make changes to this bug.