| Summary: | RH HA + HA Scheduler - 2 schedulers in pool after node with scheduler died | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> | ||||
| Component: | condor | Assignee: | grid-maint-list <grid-maint-list> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 2.1 | CC: | matt, trusnak, tstclair | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-05-26 20:23:33 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs. |
Created attachment 545715 [details] configuration of cluster nodes Description of problem: There are 2(one live and one dead) scheduler in the pool after killing of node with running scheduler. Version-Release number of selected component (if applicable): condor-7.6.5-0.8.el6 How reproducible: 100% Steps to Reproduce: 1. setup HA scheduler with RH HA so scheduler runs on one node of cluster 2. do this for node with scheduler: virsh destroy(for cluster with nodes in KVM virtualization) or iptables --flush iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP for cluster with physical machines In this case is this run for _202_ node. 3. cluster moves service to another node 4. check how many schedulers are in pool by $ condor_status -schedd -l | grep -i machine Machine = "_202_" <- this machine has benn destroyed and should not been there Machine = "_205_" <- this is current running scheduler $ condor_q -name ha_schedd@ -- Schedd: ha_schedd@ : <_205_:52784> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held <waiting for timeout of query on dead _202_ node> ... -- Failed to fetch ads from: <_202_:54520> : _202_ CEDAR:6001:Failed to connect to <_202_:54520> Actual results: There are dead scheduler in pool in condor_status outputs and condor_q also work with it. Expected results: There is just one live scheduler in pool. Additional info: This testcase works for configuration of HA scheduler without usage of RH HA.