1665191 – Stale cinder-backup entries left behind when service moves to new controller

Bug 1665191 - Stale cinder-backup entries left behind when service moves to new controller

Summary: Stale cinder-backup entries left behind when service moves to new controller

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	13.0 (Queens)
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Cinder Bugs List
QA Contact:	Tzach Shefi
Docs Contact:	RHOS Documentation Team
URL:
Whiteboard:
Duplicates (1):	1899160 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-10 16:30 UTC by Kellen Gattis
Modified:	2024-03-25 15:11 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1666804 (view as bug list)
Environment:
Last Closed:	2019-01-16 15:36:50 UTC
Target Upstream Version:
Embargoed:
Flags:	tshefi: automate_bug-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-13880	0	None	None	None	2022-03-13 17:15:23 UTC

Description Kellen Gattis 2019-01-10 16:30:12 UTC

Description of problem:

In a three controller environment, when the cinder-backup service moves from one controller to another, a cinder-backup service entry is left behind for the old controller host and is in a DOWN state.  This stale entry can be misleading and can be flagged up by service monitoring software.

The stale entry can be seen by running:
openstack volume service list --service cinder-backup

Version-Release number of selected component (if applicable):


How reproducible:

Easy to reproduce

Steps to Reproduce:
1. Move the cinder-backup service by rebooting the node hosting it (or by using pcs to move it)
2. The cinder-backup service will start running on another controller in the cluster
3. Viewing the cinder-backup service list will show a stale entry for the previous controller

Actual results:

When listing the volume services, an entry for cinder-backup will be listed for the active controller and the previously active controller (will show as down)

Expected results:

Viewing the volume services should show only the active cinder-backup service

Comment 1 Jon Bernard 2019-01-10 19:51:24 UTC

This has to be manually removed from the database, cinder does not do this automatically. This I believe is by design so that DB entries remain consistent.

Comment 2 Jon Bernard 2019-01-10 23:47:55 UTC

Closing for now, there’s no engineering solution to resolve this in the short term. Scripts could be provided but nothing within cinder proper. Please reopen if you strongly disagree.

Comment 3 Kellen Gattis 2019-01-11 15:21:11 UTC

Apologies, I should have put this against rhosp-director as I agree that it's not a cinder-specific issue, but rather a consequence of rhosp director's HA implementation of cinder-backup.


Thanks.

Comment 4 Alan Bishop 2019-01-11 16:16:27 UTC

Well, I don't think this will yield what you want. The director takes care of the deployment, but isn't involved in what goes on when pacemaker starts cinder-backup on another node. In fact, while I understand it feels wrong to see the cinder-backup service "down" on the prior node, neither cinder nor pacemaker have any basis for treating this as a stale service entry. If pacemaker were to restart the service on the original node, the service will report itself "up" again (although the other one will now report "down").

cinder-backup under pacemaker has always behaved this way, and its more of a side effect of that model than a bug. But again, I'm not trying to diminish the fact that the behaviour is less than ideal.

I think what we really need is an RFE that improves the cinder-backup service's deployment model. One approach would have the service use a common identifier (like the cinder-volume service's use of "hostgroup") instead of each node's hostname. Another would be to run cinder-backup active/active (i.e. NOT under pacemaker). Note: cinder-backup supports a/a, whereas the cinder-volume service will only have limited a/a support in OSP-15.

To summarize, I think this BZ should be recast as an RFE.

Comment 5 Alan Bishop 2019-01-16 15:36:50 UTC

I'm closing this BZ out in favor of an RFE I created (bug #1666804). The titles (and topics) are different enough to warrant tracking them in separate BZs.

Comment 6 Tzach Shefi 2019-01-29 07:28:22 UTC

Close loop wise, nothing to test/automate as WONTFIX per this BZ. 

However Alan's RFE will include QE coverage/testing once it lands.
Eventually I'll use RFE to track close loop process per this customer request.

Comment 7 Alan Bishop 2020-11-18 17:14:07 UTC

*** Bug 1899160 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.