From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0) Description of problem: After upgrading from Red Hat Linux 7.1 to the current 'Roswell' public beta release, LAM fails to boot on 'localhost'. However, this used to work out the box with the stock 'lam-bhost.def' file (for 1 CPU using the original file and for 2 CPUs after appending 'cpu=2' to the 'localhost' entry in 'lam-bhost.def'). It is not a problem with the LAM update, for I had been using a self-updated lam-6.5.3-1 package without any problem before. It doesn't work anymore, but after installing the original lam- 6.5.3-1 RPM by Red Hat, this figure doesn't change either. So, this is probably not a problem inherent to the LAM package, I guess, but it's unclear which other component to hold liable for this behaviour! How reproducible: Always Steps to Reproduce: 1. lamboot -d Actual Results: LAM reports failure of the boot process because it claims 'localhost' to be missing in lam-bhost.def Expected Results: LAM reports successful boot procedure. Additional info: <33 localhost-cactus /home/cactus> lamboot -d LAM 6.5.3/MPI 2 C++/ROMIO - University of Notre Dame lamboot: boot schema file: /etc/lam/lam-bhost.def lamboot: opening hostfile /etc/lam/lam-bhost.def lamboot: found the following hosts: lamboot: n0 localhost --------------------------------------------------------------------------- -- lamboot found that your local host is not in the hostfile "/etc/lam/lam- bhost.def". The local host name *must* be in the list of hosts in the hostfile. In other words, you must boot LAM from a node that will be part of the multicomputer. - If you simply forgot to put the localhost in the boot schema file, add it and re-run lamboot - If you are trying to boot LAM from a node that will not be part of the multicomputer, you must login to on of the nodes that will be part of the multicomputer (i.e., one of the nodes in the hostfiles), and re-run lamboot Although the local host name is usually the first in the list to avoid I/O ambiguities, it can actually appear anywhere in the list. --------------------------------------------------------------------------- --
It works just fine for me... halden% tail -n 3 /etc/lam/lam-bhost.def # localhost cpu=2 halden% lamboot -d LAM 6.5.3/MPI 2 C++/ROMIO - University of Notre Dame lamboot: boot schema file: /etc/lam/lam-bhost.def lamboot: opening hostfile /etc/lam/lam-bhost.def lamboot: found the following hosts: lamboot: n0 localhost lamboot: found 1 host node(s) lamboot: origin node is 0 (halden.devel.redhat.com) lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H 172.16.46.54 -P 39502 -n 0 -o 0 "" hboot: process schema = "/etc/lam/lam-conf.lam" hboot: found /usr/bin/lamd hboot: performing tkill hboot: tkill hboot: booting... hboot: fork /usr/bin/lamd [1] 353 lamd -H 172.16.46.54 -P 39502 -n 0 -o 0 -d hboot: attempting to execute lamboot completed successfully halden% lamnodes n0 halden.devel.redhat.com:2 halden% rpm -q lam lam-6.5.3-1 halden%
Described behaviour was observed after -upgrading- from a working Red Hat Linux 7.1 installation. After doing a clean install of 'Roswell', lam worked as expected.