| Summary: | cman fails to start (fine after system is up) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Shad L. Lords <slords> | ||||
| Component: | openais | Assignee: | Jan Friesse <jfriesse> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 5.7 | CC: | aravind.parchuri, cluster-maint, edamato, sdake | ||||
| Target Milestone: | rc | Keywords: | Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-05-30 06:23:33 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 807971 | ||||||
| Attachments: |
|
||||||
|
Description
Shad L. Lords
2011-11-15 20:01:13 UTC
Few more notes on this. I first noticed an issue which might be the same thing when I added the first IMS blade to an existing 5 node cluster. When the IMS would start (automatically) the cluster would exhibit the same behavior where all but one node would spout off the "[TOTEM] Retransmit List: xxxx". If I did a manual fence_node of the node that wasn't doing the Retransmit List then as soon as the node was successfully fenced the cluster would recover and start working fine. I was fine with this method as long as I had a non IMS part of the cluster. (The slowest box always seemed to be the one that wasn't throwing the retransmit message). I would just make it so no resources were on this node and I was able to fence it easily. However once I had the cluster just on the IMS this doesn't work. When one of the IMS nodes reboots and automatically starts the cman services then it throws things into a tizzy. The only way I've been able to get things working is to manually start them after the box is fully up. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. This was addressed in a fix to openais: http://rhn.redhat.com/errata/RHBA-2012-0180.html *** This bug has been marked as a duplicate of bug 781773 *** Does it matter that this is a RHEL5 bug and was duped to Fedora 16? This has not been fixed by RHBA-2012-0180. I've installed all updates as of today and restarted the cluster with services enabled. 2 of the 3 nodes start spitting out "[TOTEM] Retransmit List: xxxx" with the same number over and over and the cluster hangs. Please attach sosreports or cluster.conf and logs from the nodes. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Together with sosreports, please attach iptables configuration (I believe it's in sosreport, but ...). Because I believe this really looks like problem with network configuration. Created attachment 576508 [details]
Requested files
I've attached a file that includes the messages log for each member of the cluster during the attempts to bring the system up yesterday. the first boot is with all the services enabled. The second one is with the the services disabled and after all 3 were up running a script that contains the following: [root@xen2 ~]# cat ~/bin/start_cluster #!/bin/bash service cman start service cmirror start sleep 1 service clvmd start sleep 1 service gfs start sleep 5 service rgmanager start I've run this exact same configuration on a different set of hardware (slower machines) and everything comes up as it should (see first few comments of this bug). It was only after adding the IMS nodes to the cluster that I started seeing this issue. Shad, Please submit your issue through your support representative. They can surprisingly troubleshoot these switch configuration issues better then engineering since they see all customer issues. Regards -steve Shad, please let me know if gss was able to solve your issue so I can close bug eventually. I ended up fixing this by setting NETWORK_BRIDGE_SCRIPT in /etc/sysconfig/cman to the network script I was using. My custom script was setting up 4 bridges. Once I set the NETWORK_BRIDGE_SCRIPT to match my custom script everything works as expected. This can be closed. Shad, thanks for good news and hope everything will work as expected. Closing as NOTABUG. |