Bug 111743

Summary: Hostnames starting with numbers break cluadmin
Product: Red Hat Enterprise Linux 2.1 Reporter: Giovanni Capone <gcapone>
Component: clumanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 2.1CC: tao
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-06-23 14:58:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fixes cluadmin problem none

Description Giovanni Capone 2003-12-09 17:00:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
On a cluster with and Oracle instance the following operations leave
the cluster inoperable:
cluadmin ....
service disable oracle (name of the service)

Clumanager performs the umount of file systems and shuts down the
oracle instance as expected. The other node does not pick the service
as expected.

Performing "service enable oracle" presents a menu with a choice of
0,1,c. No matter what choice is selected cluadmin always presents the
same menu.

Recovery of this situation occurs only by going through cluconfig and
accepting all default values.

The service relocate works OK, so does the relocation of the service
if the active node is shutdown. 

Based on the manual a fsck of the file system that are mounted by the
Cluster Manager was done with no change in behavior

Version-Release number of selected component (if applicable):
clumanager-1.0.19-2

How reproducible:
Always

Steps to Reproduce:
1. service disable name_of_the_service
2. service enable name_of_the_service
3.
    

Actual Results:  The cluster manager comes up and down but the service
is always disabled

Expected Results:  We expected to be able to enable the service back
operational

Additional info:

Comment 1 Lon Hohberger 2003-12-09 17:51:49 UTC
I don't understand what you mean...

cluadmin> service disable test
Are you sure? (yes/no/?) yes
Disabling test. Service test disabled.
cluadmin> service enable test

  0) blue  
  1) cyan  
  c) cancel

Choose member: 0
Are you sure? (yes/no/?) yes
Enabling test on member blue. Service test enabled.
cluadmin>

When you try to enable a service which is already running, it is a
no-op.  Thus, enabling the service a second time on cyan will succeed,
but the service remains on blue.  While this may be convoluted, it is
expected behavior.

For instance:

Tue Dec  9 17:48:18 EST 2003

You can obtain help by entering help and one of the following commands:

cluster     service        clear
help        apropos        exit 
version     quit
cluadmin> service disable test
Are you sure? (yes/no/?) yes
Disabling test. Service test disabled.
cluadmin> service enable test

  0) blue  
  1) cyan  
  c) cancel

Choose member: 0
Are you sure? (yes/no/?) yes
Enabling test on member blue. Service test enabled.
cluadmin> service show state test
=========================  S e r v i c e   S t a t u s 
========================

                                         Last             Monitor  Restart
  Service        Status   Owner          Transition       Interval
Count   
  test           started  blue           17:48:36 Dec 09  0        0      
cluadmin> service enable test

  0) blue  
  1) cyan  
  c) cancel

Choose member: 1
Are you sure? (yes/no/?) yes
Enabling test on member cyan. Service test enabled.
cluadmin> service show state test
=========================  S e r v i c e   S t a t u s 
========================

                                         Last             Monitor  Restart
  Service        Status   Owner          Transition       Interval
Count   
  test           started  blue           17:48:36 Dec 09  0        0      
cluadmin> service disable test
Are you sure? (yes/no/?) yes
Disabling test. Service test disabled.
cluadmin> service show state test
=========================  S e r v i c e   S t a t u s 
========================

                                         Last             Monitor  Restart
  Service        Status   Owner          Transition       Interval
Count   
  test           disabled None           17:49:03 Dec 09  0        0      
cluadmin> 


Is this what you meant, or did I miss something?

Comment 2 Giovanni Capone 2003-12-09 18:25:56 UTC
At the point where the menu below appears:
  0) blue  
  1) cyan  
  c) cancel
If you'd choose 0 or 1 it would simply propose the same menu and the
only acceptable value to exit from this situation was "c".

Comment 3 Lon Hohberger 2003-12-09 18:29:12 UTC
That seems odd, and I can't reproduce it.  Hmm...  This may be a
long-shot, but what are your locale & LANG settings?



Comment 4 Giovanni Capone 2003-12-10 10:01:05 UTC
To your last question here are the settings about locale and LANG on
both nodes:
[root@1885-db1 root]# locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=

 
 
[root@1885-db2 root]# locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=

Comment 5 Giovanni Capone 2003-12-16 21:45:14 UTC
Any more suggestions? What's the next step?

I have to provide the customer some answers on what we are planning to
do to solve this problem.

Thank you for your assistance.

Comment 7 Lon Hohberger 2003-12-16 22:00:26 UTC
Try the RHEL 2.1 Update 3 beta package (1.0.24-1).

Comment 8 Giovanni Capone 2003-12-17 15:21:43 UTC
This cluster manager installation is by IT Telecom and is a production
system, therefore I cannot test the 1.0.24-1 package on that machine.

I coul propose the customer to wai until Update 3 of AS 2.1 is
officially released to install this version if the 1.0.24-1 is
backward compatible with e27 or if this update will include the latest
Open SW driver for Emulex HBA. Do we know the official release date
for this Update 3?

Comment 9 Lon Hohberger 2003-12-18 17:05:26 UTC
Probably a good idea - contact PM for the official release date of U3.

Comment 10 Giovanni Capone 2003-12-22 10:06:05 UTC
After verifying with the customer I realized I made a mistake in the
description of the problem. 
At the point where we try to enable the service previously disable the
system gives the following:
cluadmin> service enable test

  0) blue     DOWN
  1) cyan     DOWN
  c) cancel

This is the reason why the only possible choice can only be "c". The
odd thing is that by doing a "clustat" it says that both nodes are
"GOOD" and the heartbeat is OK. Sorry for any inconvenience. 


Comment 11 Giovanni Capone 2004-01-15 08:48:04 UTC
I haven't received any comments or suggestions on my last clarification.

Comment 12 Lon Hohberger 2004-01-15 14:35:27 UTC
The only thing I can think of is that somewhere below the cluster, the
shared raw partitions are getting manipulated or data is being written
to the wrong address for some reason.

Can you think of any strange things about this particular configuration?

Disabling a service doesn't touch the member's state on disk; but
cluadmin thinks this is exactly what's happening.  Quick question...
if you exit cluadmin, can you run "cluadmin -- service enable oracle" ?





Comment 14 Lon Hohberger 2004-05-17 14:42:32 UTC
Changed to reflect actual problem:

Cluadmin was checking for integers at the beginning of hostnames; they
confused it.  This problem was not visible to clustat because clustat
did not call the problematic function.

Comment 15 Lon Hohberger 2004-05-17 14:44:25 UTC
Created attachment 100271 [details]
Fixes cluadmin problem

Untested, but one-line and it compiles properly.

Comment 17 Lon Hohberger 2004-05-17 14:51:15 UTC
Derek Anderson pointed out this from hosts(5):

Host names may contain only alphanumericcharacters, minus signs ("-"),
and periods ("."). They must begin with an alphabetic  character  and
 end with an alphanumeric character.  ...  The format of the host
table is described in RFC 952.

[Thus, it's not improper for cluadmin to become confused by hostnames
starting with non-alphabetic characters.]


Comment 19 Lon Hohberger 2004-05-17 15:03:36 UTC
Note that the hosts(5) entry on RHEL 2.1 does not contain this tidbit
of information.

Comment 20 Lon Hohberger 2004-06-23 14:58:46 UTC
Closing - hostnames need to start with alphabetic characters.