Red Hat Bugzilla – Bug 1479435
[RFE] KUBE_PING does not separate clusters during Rolling Upgrade
Last modified: 2017-08-23 02:23:07 EDT
> 3. What is the nature and description of the request?
We have discovered the following bug / misbehavior in the KUBE_PING protocol of the JBoss EAP docker image.
We are running the following docker image in production : https://access.redhat.com/containers/?tab=overview#/registry.access.redhat.com/jboss-eap-6/eap64-openshift
We recently tried to upgrade the deploymentconfig from version
We saw many errors like :
12474271 --> 2017/04/26 21:18:32.000686 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 48) JOIN(ahp-adminui-10-ddbe8/web) sent to ahp-adminui-9-fl7lm/web timed out (after 3000 ms), on try 202
During the rolling upgrade phase, the pod with prefix ahp-adminui-10 tried to join 2 pods with prefix ahp-adminui-9 from another cluster.
This RFE is to avoid this misbehavior of KUBE_PING during a Rolling upgrade
> 4. Why does the customer need this? (List the business requirements here)
Currently the customer recreates the deployment config or uses undeploy/deploy option to tackle this issue but both result in downtime (which is not acceptable for our SLA commitment in the long term).
Also another option, which is using a different template for each cluster, could be implemented but it has some efforts as the customer deploys preconstructed "json/yaml" objects (routes, svc, dc,...) in a static way
> 5. How would the customer like to achieve this? (List the functional requirements here)
We request, as solution to this issue, that the KUBE_PING protocol contains a variable like CLUSTER_CREATION_ONLY_FOR_POD_SIBLINGS=true, avoiding the above behavior (as during a rolling upgrade of Openshift we can not assure that no serialized objects in the cache have been changed in the newer version)
During a rolling upgrade we have - for the time of upgrade - a pod <dc-name>-1-XXXXX and a new pod <dc-name>-2-YYYYY that is started.
Both names will be in the list retrieved by the KUBE_PING implementation.
But when OPENSHIFT_KUBE_PING_ONLY_POD_SIBLINGS=true would only allow a new pod <dc-name>-2-ZZZZZ to retrieve all pod names with prefix <dc-name>-2-*
In other words, we could use the incrementing deployment config number as discriminant for joining the cluster.
> 6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
I think it will be easy to check that, during the rolling upgrade phase, a pod with prefix say ahp-adminui-10 will not join other pods with prefix ahp-adminui-9 from another cluster.
> 10. List any affected packages or components.
The KUBE_PING JGroups protocol