Bug 535776 (RHQ-2437) - RFE: switchover thread should gravitate to a server higher in the failover list, not necessarily limited to just primary
Summary: RFE: switchover thread should gravitate to a server higher in the failover li...
Keywords:
Status: NEW
Alias: RHQ-2437
Product: RHQ Project
Classification: Other
Component: Agent, High Availability
Version: unspecified
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: John Mazzitelli
QA Contact:
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-19 04:33 UTC by John Mazzitelli
Modified: 2024-03-04 13:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description John Mazzitelli 2009-09-19 04:33:00 UTC
Consider this scenario:

There are 2 data centers - one in California, one in New York.
There are 4 Servers total - 2 in each data center (thus, 2 servers in CA, 2 servers in NY)
There are 2 Affinity Groups, one for each data center (2 servers per affinity group, AG-CA, AG-NY))

Servers in CA datacenter are called CA1 and CA2; servers in NY are called NY1 and NY2.

I have an agent running in CA data center and is assigned the AG-CA affinity group. This agent is assigned a  primary server of CA1 (the full failover list is CA1, CA2, NY1, NY2).

Now suppose both CA servers go down. Agent switches over to NY1. Now, CA2 is brought on-line, but CA1 is still down.

Affinity tells us that we'd prefer that the agent talk to a server in the CA data center. We have a CA server up (CA2). Our agent therefore should switch over and talk to CA2. However, that will not occur. Our agent will only ever switch back to the CA datacenter when the primary CA1 server comes up.

Our primary switchover thread will only switch over to the PRIMARY server. If the primary is still down, the agent continues talking to its current server, EVEN IF there is another server that is up and in our affinity group.

Would be better if the agent could try to slowly gravitate to its primary server, even if that means going to a non-primary server first. The agent needs some way to know about which servers it has affinity with and if any of them come up, switch to them (if the primary is still down).

Can an agent belong to more than one affinity group? If so, that might complicate matters. It may simply be that the rule to follow is - if a server that is higher up in the failover list is up, switch to that server... always move to a server that is higher up in the failover list.


Comment 1 Red Hat Bugzilla 2009-11-10 21:04:26 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2437


Comment 2 wes hayutin 2010-02-16 17:08:54 UTC
mass add of key word FutureFeature to help track

Comment 3 Joseph Marques 2010-10-05 16:51:14 UTC
"Our primary switchover thread will only switch over to the PRIMARY server. If
the primary is still down, the agent continues talking to its current server,
EVEN IF there is another server that is up and in our affinity group."

agreed, why not just go down the failover list and see if you can connect to ANY of the ones that are higher than where you currently are?

Comment 4 Jay Shaughnessy 2014-05-09 15:55:29 UTC
Mazz, is this still the case?

Comment 5 John Mazzitelli 2014-05-09 18:26:23 UTC
(In reply to Jay Shaughnessy from comment #4)
> Mazz, is this still the case?

this RFE is still valid. today we only attempt to switch to primary, we don't walk up the list. If primary is down, it stays on the current server.

Comment 6 Jay Shaughnessy 2014-05-09 19:44:18 UTC
Any work here should also consider Bug 1033790


Note You need to log in before you can comment on or make changes to this bug.