Bug 50949

Summary: LAM fails to boot on localhost
Product: [Retired] Red Hat Public Beta Reporter: Joachim Frieben <jfrieben>
Component: lamAssignee: Trond Eivind Glomsrxd <teg>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: roswell   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-08-05 13:35:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joachim Frieben 2001-08-05 13:35:16 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)

Description of problem:
After upgrading from Red Hat Linux 7.1 to the current 'Roswell' public 
beta release, LAM fails to boot on 'localhost'. However, this used to work 
out the box with the stock 'lam-bhost.def' file (for 1 CPU using the 
original file and for 2 CPUs after appending 'cpu=2' to the 'localhost' 
entry in 'lam-bhost.def'). It is not a problem with the LAM update, for I 
had been using a self-updated lam-6.5.3-1 package without any problem 
before. It doesn't work anymore, but after installing the original lam-
6.5.3-1 RPM by Red Hat, this figure doesn't change either. So, this is 
probably not a problem inherent to the LAM package, I guess, but it's 
unclear which other component to hold liable for this behaviour!

How reproducible:
Always

Steps to Reproduce:
1. lamboot -d

Actual Results:  LAM reports failure of the boot process because it 
claims 'localhost' to be missing in lam-bhost.def

Expected Results:  LAM reports successful boot procedure.

Additional info:

<33 localhost-cactus /home/cactus> lamboot -d

LAM 6.5.3/MPI 2 C++/ROMIO - University of Notre Dame

lamboot: boot schema file: /etc/lam/lam-bhost.def
lamboot: opening hostfile /etc/lam/lam-bhost.def
lamboot: found the following hosts:
lamboot:   n0 localhost
---------------------------------------------------------------------------
--
lamboot found that your local host is not in the hostfile "/etc/lam/lam-
bhost.def".

The local host name *must* be in the list of hosts in the hostfile.
In other words, you must boot LAM from a node that will be part of the
multicomputer.  

	- If you simply forgot to put the localhost in the boot
	  schema file, add it and re-run lamboot
	- If you are trying to boot LAM from a node that will not be
	  part of the multicomputer, you must login to on of the nodes
	  that will be part of the multicomputer (i.e., one of the
	  nodes in the hostfiles), and re-run lamboot

Although the local host name is usually the first in the list to avoid
I/O ambiguities, it can actually appear anywhere in the list.
---------------------------------------------------------------------------
--

Comment 1 Trond Eivind Glomsrxd 2001-08-06 20:41:13 UTC
It works just fine for me...


halden% tail -n 3 /etc/lam/lam-bhost.def
#

localhost cpu=2
halden% lamboot -d                      

LAM 6.5.3/MPI 2 C++/ROMIO - University of Notre Dame

lamboot: boot schema file: /etc/lam/lam-bhost.def
lamboot: opening hostfile /etc/lam/lam-bhost.def
lamboot: found the following hosts:
lamboot:   n0 localhost
lamboot: found 1 host node(s)
lamboot: origin node is 0 (halden.devel.redhat.com)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H 172.16.46.54
-P 39502 -n 0 -o 0     ""
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill 
hboot: booting...
hboot: fork /usr/bin/lamd
[1]    353 lamd -H 172.16.46.54 -P 39502 -n 0 -o 0 -d
hboot: attempting to execute 
lamboot completed successfully
halden% lamnodes 
n0      halden.devel.redhat.com:2
halden% rpm -q lam
lam-6.5.3-1
halden%


Comment 2 Joachim Frieben 2001-08-17 20:02:25 UTC
Described behaviour was observed after -upgrading- from a working Red Hat Linux 
7.1 installation. After doing a clean install of 'Roswell', lam worked as 
expected.