From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Description of problem: On a cluster with and Oracle instance the following operations leave the cluster inoperable: cluadmin .... service disable oracle (name of the service) Clumanager performs the umount of file systems and shuts down the oracle instance as expected. The other node does not pick the service as expected. Performing "service enable oracle" presents a menu with a choice of 0,1,c. No matter what choice is selected cluadmin always presents the same menu. Recovery of this situation occurs only by going through cluconfig and accepting all default values. The service relocate works OK, so does the relocation of the service if the active node is shutdown. Based on the manual a fsck of the file system that are mounted by the Cluster Manager was done with no change in behavior Version-Release number of selected component (if applicable): clumanager-1.0.19-2 How reproducible: Always Steps to Reproduce: 1. service disable name_of_the_service 2. service enable name_of_the_service 3. Actual Results: The cluster manager comes up and down but the service is always disabled Expected Results: We expected to be able to enable the service back operational Additional info:
I don't understand what you mean... cluadmin> service disable test Are you sure? (yes/no/?) yes Disabling test. Service test disabled. cluadmin> service enable test 0) blue 1) cyan c) cancel Choose member: 0 Are you sure? (yes/no/?) yes Enabling test on member blue. Service test enabled. cluadmin> When you try to enable a service which is already running, it is a no-op. Thus, enabling the service a second time on cyan will succeed, but the service remains on blue. While this may be convoluted, it is expected behavior. For instance: Tue Dec 9 17:48:18 EST 2003 You can obtain help by entering help and one of the following commands: cluster service clear help apropos exit version quit cluadmin> service disable test Are you sure? (yes/no/?) yes Disabling test. Service test disabled. cluadmin> service enable test 0) blue 1) cyan c) cancel Choose member: 0 Are you sure? (yes/no/?) yes Enabling test on member blue. Service test enabled. cluadmin> service show state test ========================= S e r v i c e S t a t u s ======================== Last Monitor Restart Service Status Owner Transition Interval Count test started blue 17:48:36 Dec 09 0 0 cluadmin> service enable test 0) blue 1) cyan c) cancel Choose member: 1 Are you sure? (yes/no/?) yes Enabling test on member cyan. Service test enabled. cluadmin> service show state test ========================= S e r v i c e S t a t u s ======================== Last Monitor Restart Service Status Owner Transition Interval Count test started blue 17:48:36 Dec 09 0 0 cluadmin> service disable test Are you sure? (yes/no/?) yes Disabling test. Service test disabled. cluadmin> service show state test ========================= S e r v i c e S t a t u s ======================== Last Monitor Restart Service Status Owner Transition Interval Count test disabled None 17:49:03 Dec 09 0 0 cluadmin> Is this what you meant, or did I miss something?
At the point where the menu below appears: 0) blue 1) cyan c) cancel If you'd choose 0 or 1 it would simply propose the same menu and the only acceptable value to exit from this situation was "c".
That seems odd, and I can't reproduce it. Hmm... This may be a long-shot, but what are your locale & LANG settings?
To your last question here are the settings about locale and LANG on both nodes: [root@1885-db1 root]# locale LANG=en_US LC_CTYPE="en_US" LC_NUMERIC="en_US" LC_TIME="en_US" LC_COLLATE="en_US" LC_MONETARY="en_US" LC_MESSAGES="en_US" LC_PAPER="en_US" LC_NAME="en_US" LC_ADDRESS="en_US" LC_TELEPHONE="en_US" LC_MEASUREMENT="en_US" LC_IDENTIFICATION="en_US" LC_ALL= [root@1885-db2 root]# locale LANG=en_US LC_CTYPE="en_US" LC_NUMERIC="en_US" LC_TIME="en_US" LC_COLLATE="en_US" LC_MONETARY="en_US" LC_MESSAGES="en_US" LC_PAPER="en_US" LC_NAME="en_US" LC_ADDRESS="en_US" LC_TELEPHONE="en_US" LC_MEASUREMENT="en_US" LC_IDENTIFICATION="en_US" LC_ALL=
Any more suggestions? What's the next step? I have to provide the customer some answers on what we are planning to do to solve this problem. Thank you for your assistance.
Try the RHEL 2.1 Update 3 beta package (1.0.24-1).
This cluster manager installation is by IT Telecom and is a production system, therefore I cannot test the 1.0.24-1 package on that machine. I coul propose the customer to wai until Update 3 of AS 2.1 is officially released to install this version if the 1.0.24-1 is backward compatible with e27 or if this update will include the latest Open SW driver for Emulex HBA. Do we know the official release date for this Update 3?
Probably a good idea - contact PM for the official release date of U3.
After verifying with the customer I realized I made a mistake in the description of the problem. At the point where we try to enable the service previously disable the system gives the following: cluadmin> service enable test 0) blue DOWN 1) cyan DOWN c) cancel This is the reason why the only possible choice can only be "c". The odd thing is that by doing a "clustat" it says that both nodes are "GOOD" and the heartbeat is OK. Sorry for any inconvenience.
I haven't received any comments or suggestions on my last clarification.
The only thing I can think of is that somewhere below the cluster, the shared raw partitions are getting manipulated or data is being written to the wrong address for some reason. Can you think of any strange things about this particular configuration? Disabling a service doesn't touch the member's state on disk; but cluadmin thinks this is exactly what's happening. Quick question... if you exit cluadmin, can you run "cluadmin -- service enable oracle" ?
Changed to reflect actual problem: Cluadmin was checking for integers at the beginning of hostnames; they confused it. This problem was not visible to clustat because clustat did not call the problematic function.
Created attachment 100271 [details] Fixes cluadmin problem Untested, but one-line and it compiles properly.
Derek Anderson pointed out this from hosts(5): Host names may contain only alphanumericcharacters, minus signs ("-"), and periods ("."). They must begin with an alphabetic character and end with an alphanumeric character. ... The format of the host table is described in RFC 952. [Thus, it's not improper for cluadmin to become confused by hostnames starting with non-alphabetic characters.]
Note that the hosts(5) entry on RHEL 2.1 does not contain this tidbit of information.
Closing - hostnames need to start with alphabetic characters.