Bug 1786975
| Summary: | pcs booth ticket remove doesn't clean up ticket in the CIB [RHEL 8] | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> | |
| Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | |
| Status: | NEW --- | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.1 | CC: | cluster-maint, idevat, jobaker, jss, mlisik, mpospisi, nhostako, omular, sbradley, tojeline | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Enhancement | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1786976 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1786976 | |||
The customer who reported this issue basically stated that it caused two problems:
1) Resources that are constrained to run where the ticket is granted are still able to run after `pcs booth ticket remove` and restarting the booth daemons. The CIB still shows the ticket in granted state after it was removed via booth, so the ticket constraint is still satisfied for those constrained resources. In their words:
~~~
If i have a service with a constraint that depends on a ticket, and I destroy that ticket, I'd expect the service to stop.
And if I destroy a ticket, I'd expect it to be removed from CIB too.
But, nope. Service continues to run, ticket remains in CIB. It is merely removed from the booth.conf. That's all.
~~~
2) Customer is trying to use a "geoattribute" called ACTIVATE. When they create the ticket with ACTIVATE="1", then remove the ticket via booth and recreate it WITHOUT setting ACTIVATE="1", the ticket in the CIB still has ACTIVATE="1" because it was never removed. In their words:
~~~
I am toying with the idea of using a geoattribute to also control whether the tickets are sufficient to activate a service. I have a before-acquire-handler that checks for the ACTIVATE attribute, and refuses to grant the ticket, if the attribute is unset.
So if I destroy a ticket that has ACTIVATE set, and then re-create it, WITHOUT setting ACTIVATE, then, ACTIVATE should not be set.
But, in fact, it is set, because it has been inherited from the old ticket I destroyed/removed, which of course wasn't properly removed from the CIB.
~~~`
|
Description of problem: Running `pcs booth ticket remove <ticket>`, with or without revoking the ticket beforehand, does not trigger removal of the ticket from the CIB even *after* restarting the booth daemon on all sites and on the arbitrator. Booth environment: - Cluster 1: fastvm-rhel-8-0-2{3,4} - Cluster 2: fastvm-rhel-8-0-3{3,4} - Arbitrator: fastvm-rhel-8-0-52 Using defined function booth_sync(): booth_sync() { SYNC="pcs booth sync" PULL="pcs booth pull" LHOST=fastvm-rhel-8-0-23 $SYNC ssh fastvm-rhel-8-0-52 "$PULL $LHOST" ssh fastvm-rhel-8-0-33 "$PULL $LHOST && $SYNC" } [root@fastvm-rhel-8-0-23 ~]# pcs booth ticket add apacheticket [root@fastvm-rhel-8-0-23 ~]# booth_sync ... Omitting output ... [root@fastvm-rhel-8-0-23 ~]# pcs resource restart booth-booth-service booth-booth-service successfully restarted [root@fastvm-rhel-8-0-33 ~]# pcs resource restart booth-booth-service booth-booth-service successfully restarted [root@fastvm-rhel-8-0-52 ~]# systemctl restart booth [root@fastvm-rhel-8-0-23 ~]# pcs cluster cib | grep ticket <tickets> <ticket_state id="apacheticket2" owner="0" expires="1577666394" term="0" granted="false"/> </tickets> [root@fastvm-rhel-8-0-23 ~]# pcs booth ticket grant apacheticket [root@fastvm-rhel-8-0-23 ~]# pcs cluster cib | grep ticket <tickets> <ticket_state id="apacheticket2" owner="0" expires="1577666394" term="0" granted="false"/> <ticket_state id="apacheticket" owner="1950506022" expires="1577677282" term="0" granted="true" last-granted="1577676682"/> </tickets> [root@fastvm-rhel-8-0-23 ~]# pcs booth ticket revoke apacheticket [root@fastvm-rhel-8-0-23 ~]# pcs cluster cib | grep ticket <tickets> <ticket_state id="apacheticket2" owner="0" expires="1577666394" term="0" granted="false"/> <ticket_state id="apacheticket" owner="-1" expires="1577676713" term="0" granted="false" last-granted="1577676682"/> </tickets> [root@fastvm-rhel-8-0-23 ~]# pcs booth ticket remove apacheticket [root@fastvm-rhel-8-0-23 ~]# booth_sync ... Omitting output ... [root@fastvm-rhel-8-0-23 ~]# pcs resource restart booth-booth-service booth-booth-service successfully restarted [root@fastvm-rhel-8-0-33 ~]# pcs resource restart booth-booth-service booth-booth-service successfully restarted [root@fastvm-rhel-8-0-52 ~]# systemctl restart booth [root@fastvm-rhel-8-0-23 ~]# booth list ticket: apacheticket2, leader: NONE ticket: apacheticket3, leader: NONE # # Ticket is still in CIB after removal and restart of booth daemons [root@fastvm-rhel-8-0-23 ~]# pcs cluster cib | grep ticket <tickets> <ticket_state id="apacheticket2" owner="0" expires="1577666395" term="0" granted="false"/> <ticket_state id="apacheticket" owner="-1" expires="1577676713" term="0" granted="false" last-granted="1577676682"/> </tickets> # # crm_ticket --cleanup command removes it from the CIB [root@fastvm-rhel-8-0-23 ~]# crm_ticket --ticket apacheticket --cleanup Cleaned up apacheticket [root@fastvm-rhel-8-0-23 ~]# pcs cluster cib | grep ticket <tickets> <ticket_state id="apacheticket2" owner="0" expires="1577666395" term="0" granted="false"/> </tickets> ----- Version-Release number of selected component (if applicable): booth-core-1.0-5.f2d38ce.git.el8.x86_64 booth-site-1.0-5.f2d38ce.git.el8.noarch pcs-0.10.1-4.el8_0.4.x86_64 ----- How reproducible: Always ----- Steps to Reproduce: See description. Basically: 1. Optionally, run `pcs booth ticket revoke <ticket>`. (Seems the ticket has to be revoked before a crm_ticket --cleanup would work later.) 2. Run `pcs booth ticket remove <ticket>`. 3. Restart the booth daemon on sites and arbitrator. 4. Run `pcs cluster cib | grep ticket`. ----- Actual results: Removed ticket is still in CIB. ----- Expected results: Removed ticket is no longer in CIB. ----- Additional info: CC'd poki for input on the booth side of things. I filed the BZ against pcs because we may be able to fix this (if it's considered a bug) without having to make changes within booth. Adding two pieces to the `pcs booth ticket remove` subcommand -- (1) revoking the ticket and (2) running a `crm_ticket --cleanup` -- might take care of it, if this does not have unintended consequences. I'm not sure exactly what the ideal behavior for `pcs booth ticket remove` would be. Should it leave the ticket in the CIB? Should it revoke the ticket and remove the ticket state, while leaving any applicable constraints? Should it revoke the ticket and remove the ticket state after automatically removing any applicable constraints? Or something else? How proactive or sweeping `pcs booth ticket remove` should be is up for discussion -- I think all of the desired behavior can be obtained by manually removing constraints and revoking the ticket beforehand, then removing the ticket and running a crm_ticket --cleanup. However, it seems to me (and to our customer who reported this) that the most intuitive behavior is for `pcs booth ticket remove` to completely remove the ticket from the cluster.