Bug 159783
| Summary: | dlm module not getting loaded, causing clvmd to not start | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
| Component: | lvm2-cluster | Assignee: | Abhijith Das <adas> |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 4 | CC: | agk, ccaulfie, jbrassow, mbroz |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHBA-2006-0556 | Doc Type: | Enhancement |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2006-08-10 21:32:11 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 157094 | ||
| Bug Blocks: | |||
|
Description
Corey Marthaler
2005-06-07 22:12:58 UTC
what's in /proc/misc & /dev/misc ? Either the dlm.ko module isn't loaded or the /dev/misc/dlm-control device hasn't been created. libdlm is supposed to create this on-demand but it may not have been able to for some reason I can't see at the moment. I saw this again after hitting 159877. The dlm mod did not get loaded like it should have. I loaded it by hand and then is worked just fine. There was nothing in /dev/misc and /proc/misc contained: [root@link-01 ~]# cat /proc/misc 183 hw_random 63 device-mapper 135 rtc 227 mcelog So this needs reassigning to whoever is responsible for the init scripts or the GUI (not sure which) ? clvmd will not load the dlm module, that is cman's job. Looking at the output you've provided in bug #159877, cman failed to start. Doing so will prevent dlm.ko from getting loaded. Without cman properlly working, how do you expect clvmd to work, even with dlm.ko loaded? Sounds like it's NOTABUG to me. I'm not sure where in 159877 it mentions cman not starting. Cman was indeed started, I (by hand or with a script) would never try to start clvmd with out first starting a lock manager as it's required. In the original report, I posted the status of /proc/cluster/nodes, status, and services which shows that cman was running and everyone was in the cluster. I didn't post statuses for the second time I saw this since they where the same as the first. Again, after modprobing dlm by hand on the node where I saw this issue, clvmd started just fine, since the cluster was already up. (In reply to comment #5) > I'm not sure where in 159877 it mentions cman not starting. oops. you're right. I made a typo entering the bug number, bug #157094 demonstrates that the cman script failed to start. > Cman was indeed started But according to bug #157094, it failed > I (by hand or with a script) would never try to > start clvmd with out first starting a lock manager as it's required. Did you verify that it was running properlly? (In reply to comment #6) > But according to bug #157094, it failed correct, it did fail, but at the end of that comment in that bug, I mention that I then tried it by hand and got it working. :) > Did you verify that it was running properlly? indeed. I guess I just don't understand what your test case is then. Here is what I've been able to discern. 0. You have some nodes with some initial configuration 1. You killed a node 2. You changed something in cluster.conf 3. You propagated changes, but the changes didn't make it to the failed node (bug #157094) 4. Your killed node came back online 5. The killed node was unable to join the cman cluster because the cluster.conf was not at the same revision level. This caused the cman script to fail (which will prevent DLM from loading). 6. Because cman failed to successfully join, the dlm module was not loaded and clvmd failed to start, hence the reason for this bug report. 7. After the system came up "everything" was done by hand and things worked. Based on the above description, this would appear to depend on the REOPENED bug #157094. Once that is fixed, this should go away since the cman init script will load the dlm module and cause this problem to go away. As such, I'm marking this as DEPENDS ON bug #157094, rather than marking it as NOTABUG. In the meantime, I'm placing this in the NEEDINFO status until a reproducable test is written that demonstrate this bug ("I then tried everything on morph-01 by hand" just doesn't help me) I'll try to gather more info for you. The simplest case that I've seen this is in comment #2. Here no config changes were made at all. I merely had a cluster up and running, all the nodes paniced at the same time (due to 159877), I power cycled them, when they were back up, I started ccsd, cman, fenced, and clvmd. On one node clvmd failed to start, I then modprobed dlm and then it did. For some odd reason, I thought that one of the cluster start up commands also loaded the dlm module, that is not true. Only when using the init script does it do a dlm module load (but only if cman joins the cluster first). So as the init script currently stands, Adam is right, without cman starting through the init script, dlm will not get loaded and will have to be done by hand. Jon and I discussed moving the dlm module up some lines next to the cman module load and then it wouldn't be dependant on cman actually join the cluster and would be less confusing for users when clvmd fails. Reassign to Adam as he seems to know what's going on, and it doesn't look to be my problem ;-) fix verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0556.html |