Description of problem: We are still experiencing the problem described in issue #01666500 (Openshift Origin issue #7983). Whenever a node comes down, we cannot mount Ceph PV images on other nodes until we remove the rbd locks manually. To work around this, we are wondering if it would be possible to use the 'exclusive-lock' feature of Ceph. That is, to let Ceph handle the locks and to disable Kubernetes/Openshift fencing. Version-Release number of selected component (if applicable): We are using Ceph 2.0 (Jewell) and Openshift v3.2.1 How reproducible: Whenever a node unexpectedly shuts down. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
There is actually an upstream kubernetes ticket [1] re: this subject. The answer is, yes, you can use the RBD exclusive lock feature assuming kubernetes failover is guaranteed to STONITH the other node before attempting to migrate the services. The rationale is that currently, the exclusive lock feature is cooperative -- which means the lock can transparently transition back and forth between RBD clients. If the original lock owner is dead (via STONITH or other proceedure), there is no worry about this lock ping-pong. [1] https://github.com/kubernetes/kubernetes/issues/33013 We actually have an upstream PR to fix this issue, it missed kubernetes 1.5. https://github.com/kubernetes/kubernetes/pull/33660
*** This bug has been marked as a duplicate of bug 1365867 ***