Red Hat Bugzilla – Bug 535776
RFE: switchover thread should gravitate to a server higher in the failover list, not necessarily limited to just primary
Last modified: 2015-02-01 18:29:12 EST
Consider this scenario:
There are 2 data centers - one in California, one in New York.
There are 4 Servers total - 2 in each data center (thus, 2 servers in CA, 2 servers in NY)
There are 2 Affinity Groups, one for each data center (2 servers per affinity group, AG-CA, AG-NY))
Servers in CA datacenter are called CA1 and CA2; servers in NY are called NY1 and NY2.
I have an agent running in CA data center and is assigned the AG-CA affinity group. This agent is assigned a primary server of CA1 (the full failover list is CA1, CA2, NY1, NY2).
Now suppose both CA servers go down. Agent switches over to NY1. Now, CA2 is brought on-line, but CA1 is still down.
Affinity tells us that we'd prefer that the agent talk to a server in the CA data center. We have a CA server up (CA2). Our agent therefore should switch over and talk to CA2. However, that will not occur. Our agent will only ever switch back to the CA datacenter when the primary CA1 server comes up.
Our primary switchover thread will only switch over to the PRIMARY server. If the primary is still down, the agent continues talking to its current server, EVEN IF there is another server that is up and in our affinity group.
Would be better if the agent could try to slowly gravitate to its primary server, even if that means going to a non-primary server first. The agent needs some way to know about which servers it has affinity with and if any of them come up, switch to them (if the primary is still down).
Can an agent belong to more than one affinity group? If so, that might complicate matters. It may simply be that the rule to follow is - if a server that is higher up in the failover list is up, switch to that server... always move to a server that is higher up in the failover list.
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2437
mass add of key word FutureFeature to help track
"Our primary switchover thread will only switch over to the PRIMARY server. If
the primary is still down, the agent continues talking to its current server,
EVEN IF there is another server that is up and in our affinity group."
agreed, why not just go down the failover list and see if you can connect to ANY of the ones that are higher than where you currently are?
Mazz, is this still the case?
(In reply to Jay Shaughnessy from comment #4)
> Mazz, is this still the case?
this RFE is still valid. today we only attempt to switch to primary, we don't walk up the list. If primary is down, it stays on the current server.
Any work here should also consider Bug 1033790