From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901 Description of problem: I created a new samba service, running cluadmin on clug. When it asked which cluster member to start the service on, I referred to the other cluster member cluh. The service start failed. But subsequent starts succeeded. Version-Release number of selected component (if applicable): How reproducible: Didn't try Steps to Reproduce: Added a new samba service: Service name: smb1 Preferred member [None]: ? Choose the preferred member for the service. Enter clug (or 0), cluh (or 1), or None. Preferred member [None]: 1 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes User script (e.g., /usr/foo/script or None) [None]: Status check interval [0]: 5 Do you want to add an IP address to the service (yes/no/?) [no]: yes IP Address Information IP address: 172.16.65.161 Netmask (e.g. 255.255.255.0 or None) [None]: Broadcast (e.g. X.Y.Z.255 or None) [None]: Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]: Do you want to add a disk device to the service (yes/no/?) [no]: yes Disk Device Information Device special file (e.g., /dev/sdb4): /dev/sdb6 Filesystem type (e.g., ext2, or ext3): ext3 Mount point (e.g., /usr/mnt/service1) [None]: /mnt/smb1 Mount options (e.g., rw,nosuid,sync): rw,sync Forced unmount support (yes/no/?) [yes]: Would you like to allow NFS access to this filesystem (yes/no/?) [no]: Would you like to share to Windows clients (yes/no/?) [no]: yes You will now be prompted for the Samba configuration: Samba share name: smb1 The samba config file /etc/samba/smb.conf.smb1 does not exist. Would you like a default config file created (yes/no/?) [no]: yes Successfully created daemon lock directory /var/cache/samba/smb1. Please run `mkdir /var/cache/samba/smb1` on the other cluster member. Successfully created /etc/samba/smb.conf.smb1. Please remember to make necessary customizations and then copy the file over to the other cluster member. ######################## At this point, on another window on this same system: vi /etc/samba/smb.conf.smb1 - just to set the share writable scp !$ cluh:/etc/samba/ ######################### Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]: name: smb1 preferred node: cluh relocate: yes user script: None monitor interval: 5 IP address 0: 172.16.65.161 netmask 0: None broadcast 0: None device 0: /dev/sdb6 mount point, device 0: /mnt/smb1 mount fstype, device 0: ext3 mount options, device 0: rw,sync force unmount, device 0: yes samba share, device 0: smb1 Add smb1 service as shown? (yes/no/?) yes 0) clug 1) cluh c) cancel Choose member to start service on: 1 Error: Failed to start service smb1 cluadmin> Looking at cluh's /var/log/messages: Apr 25 17:13:16 cluh syslogd 1.4.1: restart. Apr 25 17:14:06 cluh kernel: kjournald starting. Commit interval 5 seconds Apr 25 17:14:06 cluh kernel: EXT3 FS 2.4-0.9.11, 3 Oct 2001 on sd(8,21), internal journal Apr 25 17:14:06 cluh kernel: EXT3-fs: mounted filesystem with ordered data mode.Apr 25 17:15:29 cluh rpc.mountd: export request from 172.16.65.159 Apr 25 17:16:55 cluh sshd(pam_unix)[3333]: session opened for user root by (uid=0) Apr 25 17:17:20 cluh sshd(pam_unix)[3333]: session closed for user root Apr 25 17:18:14 cluh rpc.mountd: authenticated unmount request from tim.boston.redhat.com:1005 for /mnt/nfs1 (/mnt/nfs1) Apr 25 17:33:45 cluh sshd(pam_unix)[18261]: session opened for user root by (uid=0) Apr 25 17:33:45 cluh sshd(pam_unix)[18261]: session closed for user root Apr 25 17:33:54 cluh clusvcmgrd[1071]: <warning> Cannot get service name for service #1 Apr 25 17:33:54 cluh clusvcmgrd[18342]: <warning> Cannot get service name for service #1 Apr 25 17:33:54 cluh clusvcmgrd[18342]: <warning> Cannot get service name for service #1 Apr 25 17:33:54 cluh clusvcmgrd: [18343]: <err> service error: Cannot get service name for service entry 1, err=2 Apr 25 17:33:54 cluh clusvcmgrd[18342]: <warning> Cannot get service name for service #1 Apr 25 17:33:54 cluh clusvcmgrd[18342]: <warning> Cannot get service name for service #1 Apr 25 17:33:54 cluh clusvcmgrd: [18361]: <err> service error: Cannot get service name for service entry 1, err=2 Apr 25 17:33:54 cluh clusvcmgrd[18342]: <warning> Cannot get service name for service #1 ################################## These systems are running the new stuff: [root@cluh nfs]# rpm -qa | grep clumanager clumanager-1.0.11-1 I thought those can't get service name things were cleaned up? Or is it the case that they got cleaned up and submitted after version 11? ################################### So now I try to start the service on clug instead (this is the node on which I am running cluadmin), this succeeds cluadmin> service enable 0) smb1 c) cancel Choose service to enable: 0 Are you sure? (yes/no/?) yes 0) clug 1) cluh c) cancel Choose member: 0 Enabling smb1 on member clug. Service enabled. cluadmin> OK, that worked, now try disabling the service and then starting it over on the other member which previously failed: cluadmin> service disable smb1 Are you sure? (yes/no/?) yes Disabling smb1. Service smb1 disabled. cluadmin> service enable smb1 0) clug 1) cluh c) cancel Choose member: 1 Are you sure? (yes/no/?) yes Enabling smb1 on member cluh. Service smb1 enabled. cluadmin> Now this one worked. I didn't do anything over on cluh. So why did it fail the first time? (resarray krap?) Looking at /var/log/messages over on cluh, there are no new entries complaining that it can't find the service name. Actual Results: Initally the service failed to start. (Although, my prior service creation of an nfs service to start on the other cluster member worked fine.) Expected Results: Service should have started fine initially. Additional info:
This has nothing to do with resarray. Based on the log file, the service mananger tried to start the service but errored out b/c the service script could not get the service name. Subsequently, the service manager tried to stop and and disable the service. These subsequent operations failed as well b/c the service script could not get the service name once again... This is likely due to the fact that the getconfig program as used by the service scripts uses the /etc/cluster.conf file to get service config info, which likely has not been written yet to be consistent with the shared database.
Patch in pool.