Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 144025 Details for
Bug 220211
multiple qdisk master after network outage
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
Email describing the problem in more detail
email.txt (text/plain), 4.67 KB, created by
Frederik Ferner
on 2006-12-19 18:28:44 UTC
(
hide
)
Description:
Email describing the problem in more detail
Filename:
MIME Type:
Creator:
Frederik Ferner
Created:
2006-12-19 18:28:44 UTC
Size:
4.67 KB
patch
obsolete
>Hi List, > >I am currently testing Redhat Cluster Suite for a number of two node >clusters accessing EMC storage systems. Everything seems to be running >fine expect for qdisk. > >On Friday we had a network problem during which the nodes were still >able to see each other but none of the addresses used in my heuristics >for qdisk. The result was not what I expected, when the network came >back, both nodes claimed to be master. > >See below the quorumd part of my cluster.conf ><snip> > <quorumd interval="1" tko="10" votes="3" log_level="9" log_facility="local4" status_file="/qdisk_status" min_score="3" device="/dev/emcpowerk1"> > <heuristic program="ping 172.23.4.254 -c1 -t1" score="2" interval="2"/> > <heuristic program="ping 130.246.8.13 -c1 -t3" score="1" interval="2"/> > <heuristic program="ping 130.246.72.21 -c1 -t3" score="1" interval="2"/> > <heuristic program="ping 172.23.5.120 -c1 -t1" score="2" interval="2"/> > </quorumd> ></snip> > >/qdisk_status on one node while everything seems to be running fine: ><snip> >Node ID: 2 >Score (current / min req. / max allowed): 6 / 3 / 6 >Current state: Running >Current disk state: None >Visible Set: { 1 2 } >Master Node ID: 1 >Quorate Set: { 1 2 } ></snip> > >After a "/etc/init.d/qdiskd restart" I find the following in the log >files: (logs fine to me...) > >Dec 18 10:50:40 duoserv2 qdiskd[24304]: <info> Quorum Daemon Initializing >Dec 18 10:50:40 duoserv2 qdiskd: Starting the Quorum Disk Daemon: succeeded >Dec 18 10:50:47 duoserv2 qdiskd[24304]: <info> Node 1 is the master >Dec 18 10:50:50 duoserv2 qdiskd[24304]: <info> Initial score 6/6 >Dec 18 10:50:50 duoserv2 qdiskd[24304]: <info> Initialization complete > >And finally during the network issue last week I found the following log >entries: > >Dec 15 09:53:48 duoserv2 qdiskd[31393]: <info> Node 1 shutdown >Dec 15 09:53:48 duoserv2 qdiskd[31393]: <notice> Score insufficient for master operation (0/3; max=6); downgrading >Dec 15 09:53:48 duoserv2 clurgmgrd[7950]: <emerg> #1: Quorum Dissolved >Dec 15 09:53:48 duoserv2 kernel: CMAN: quorum lost, blocking activity >Dec 15 09:53:48 duoserv2 ccsd[5595]: Cluster is not quorate. Refusing connection. >Dec 15 09:53:48 duoserv2 ccsd[5595]: Error while processing connect: Connection refused >Dec 15 09:53:48 duoserv2 ccsd[5595]: Invalid descriptor specified (-111). >Dec 15 09:53:48 duoserv2 ccsd[5595]: Someone may be attempting something evil. >Dec 15 09:53:48 duoserv2 ccsd[5595]: Error while processing get: Invalid request descriptor > >And later when the network came back: >Dec 15 10:31:45 duoserv2 qdiskd[31393]: <notice> Score sufficient for master operation (6/3; max=6); upgrading >Dec 15 10:31:46 duoserv2 qdiskd[31393]: <info> Assuming master role >Dec 15 10:31:47 duoserv2 kernel: CMAN: quorum regained, resuming activity >Dec 15 10:31:47 duoserv2 clurgmgrd[7950]: <notice> Quorum Achieved >Dec 15 10:31:47 duoserv2 clurgmgrd[7950]: <info> Magma Event: Membership Change >Dec 15 10:31:47 duoserv2 clurgmgrd[7950]: <info> State change: Local UP >Dec 15 10:31:47 duoserv2 clurgmgrd[7950]: <info> State change: duoserv1 UP >Dec 15 10:31:47 duoserv2 clurgmgrd[7950]: <info> Loading Service Data >Dec 15 10:31:47 duoserv2 ccsd[5595]: Cluster is quorate. Allowing connections. >Dec 15 10:31:50 duoserv2 clurgmgrd: [7950]: <info> /dev/mapper/logs1-logs1 is not mounted >Dec 15 10:31:51 duoserv2 qdiskd[31393]: <crit> Critical Error: More than one master found! >Dec 15 10:31:51 duoserv2 qdiskd[31393]: <crit> A master exists, but it's not me?! >Dec 15 10:31:52 duoserv2 qdiskd[31393]: <info> Node 1 is the master >... > >At the same time on the second node: >Dec 15 10:31:45 duoserv1 qdiskd[316]: <notice> Score sufficient for master operation (5/3; max=6); upgrading >Dec 15 10:31:46 duoserv1 qdiskd[316]: <info> Assuming master role >Dec 15 10:31:47 duoserv1 kernel: CMAN: quorum regained, resuming activity >Dec 15 10:31:47 duoserv1 ccsd[5624]: Cluster is quorate. Allowing connections. >Dec 15 10:31:47 duoserv1 clurgmgrd[3631]: <notice> Quorum Achieved >Dec 15 10:31:51 duoserv1 qdiskd[316]: <crit> Critical Error: More than one master found! >Dec 15 10:31:52 duoserv1 qdiskd[316]: <info> Node 2 is the master >Dec 15 10:31:52 duoserv1 qdiskd[316]: <crit> Critical Error: More than one master found! >... > >This continues until I finally notice and restart qdiskd on both nodes, >when they agree on one master again. > >I have the following packages installed on both nodes >ccs-1.0.7-0 >rgmanager-1.9.54-1 >lvm2-cluster-2.02.01-1.2.RHEL4 >cman-1.0.11-0 >cman-kernel-smp-2.6.9-43.8.5 >fence-1.32.25-1 >cman-kernel-smp-2.6.9-45.8 > >The running kernel is: 2.6.9-42.0.3.ELsmp > >Does anyone have any idea what I could do to avoid this situation in the >future? > >If I can provide any more information, please ask.
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 220211
: 144025 |
150525