Bug 654943

Summary: Grid Engine configuration script inst_sge fails in Fedora 13, 14, and RHEL6
Product: [Fedora] Fedora Reporter: Sidney Markowitz <sidney>
Component: gridengineAssignee: Orion Poplawski <orion>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: brendan.jones.it, maurizio.antillon, orion
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gridengine-6.2u5-6.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-11 20:52:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed patch that only rejects when the only host name is localhost* and the only ip address is 127.0* none

Description Sidney Markowitz 2010-11-19 05:35:54 UTC
Description of problem:
Grid Engine configuration script inst_sge fails when run under a commonly found configuration in Fedora 13, 14 and RHEL6 because of new NetworkManager behavior.
I only tested this on 64-bit versions, though I don't think that matters.

Version-Release number of selected component (if applicable): 6.2u5


How reproducible: always


Steps to Reproduce:
1. Install Fedora 13, or 14, or RHEL6 on a machine that gets its host name and ip address from DHCP. In the NetworkManager applet ipv4 should be configured to automatically get settings from DHCP, ipv6 should be set to the default "ignore".
2. Verify that /etc/hosts contains a line with the external ip address and the host name with a comment that it was added by NetworkManager, and that it also has a line beginning with ::1 that has the host name in addition to the ipv6 localhost names. The host name in the ::1 line was added by NetworkManager even though there is no comment to that effect. Verify that hostname -i output two ip addresses, the external ip address and either ::1 (in Fedora 13/14) or 127.0.0.1 (in RHEL6).
3. yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon
You probably don't need to install all of these, but that's what I did when I tested
4. run the command from the correct directory for your architecture
 /usr/share/gridengine/utilbin/lx26-amd64/gethostbyname
Notice that it displays two ip addresses like hostname -i does, the second one always being 127.0.0.1. This is what will break the install script.
5. cd to /usr/share/gridengine, make a copy of the file my_configuration.conf and edit it to set ADMIN_USER=sgeadmin, HOST_LIST=the short host name of the computer, i.e., the output of hostname -s, ADD_TO_RC=true, and at the end of the file add the lines

  SGE_CLUSTER_NAME="none"
  CLUSTER_NAME="none"
  SGE_ENABLE_SMF="false"

Again, not all of this may be necessary to reproduce the bug but that is the configuration file that I used to make it happen.
6. Assuming you named the edited configuration file foo.conf, run the command

 ./inst_sge -m -x -auto ./foo.conf

Actual results:
The configuration script inst_sge completes very quickly after it reaches the point of clearing the screen. When it is done there is no /usr/share/gridengine/default directory as there should be. You can see an error log file named /tmp/install.nnnn where nnnn is some number. It shows the output you saw in step 4 and says "It is not supported for a Grid Engine installation that the local hostname contains the hostname "localhost" and/or the IP address "127.0.x.x" of the loopback interface. [...] Installation failed"

Expected results:
The configuration takes longer to complete after the step of clearing the screen and indicates successful completion when it is done, creating a proper /usr/share/gridengine/default directory and its contents.

Additional info:
This is caused by an upstream bug in NetworkManager as I commented in https://bugzilla.gnome.org/show_bug.cgi?id=629021#c6 which causes a number of problems in Fedora 13 and 14, and in RHEL6 as reported in https://bugzilla.redhat.com/show_bug.cgi?id=643443#c4

However I think there is a simple workaround that can allow gridengine to install ok before this is fixed upstream and which will actually be more correct behavior. The error is generated in the script /usr/share/gridengine/util/install_common.sh around line 1584 in the sh function CheckForLocalHostResolving() where there is a test

   for cmp in $output; do
      case "$cmp" in
      localhost*|127.0*)
         notok=true

That should be changed to look for localhost* in all the tokens like it is now, but only match 127.0* in a line that contains no other ip addresses. In other words, if multiple ip addresses are in the output then notok should not be set to true even if one of the ip addresses is 127.0* I think that is more correct anyway, because this test is supposed to protect against the situation in which a server does not have an external ip address. As long as there is an ip address that is not 127.0* then the condition is satisfied.

Comment 1 Brendan Jones 2010-11-19 09:22:21 UTC
Thanks for the report. Assigning



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 2 Sidney Markowitz 2010-11-25 18:38:56 UTC
Created attachment 462953 [details]
Proposed patch that only rejects when the only host name is localhost* and the only ip address is 127.0*

Here is a patch to the configuration script that is less picky about what it rejects. Instead of looking for any host name or alias named localhost* and any ip address that is 127.0* like the current script does, it only rejects the case in which there are only localhost names and ip addresses. I tried this on various configurations in RHEL6, Fedora 13 and Fedora 14 and could not come up with one that broke it.

I think this is simple enough not to require a CLA from me, and I state here that anyone is free to use this any way they want, but I am willing to submit a CLA if you want one.

Comment 3 Orion Poplawski 2010-12-03 17:48:44 UTC
Could you try:

gethostname | awk -F: '/Host/ { split($2, items," "); print items[1]; } '

as the filter to gethostname with the rest of inst_common.sh intact?  I think that should be sufficient.

Comment 4 Sidney Markowitz 2010-12-03 21:40:38 UTC
That doesn't work with one Fedora 14 system I encountered that somehow ended up with the 127.0.0.1 line in /etc/hosts containing the host name of the machine, i.e., the equivalent of

 192.168.3.4 foo foo.example.com
 127.0.0.1 foo localhost.localdomain localhost localhost4
 ::1 foo localhost6.localdomain6 localhost6

I don't know how it ended up configured like that, but on that machine the command

 hostname -i

begins with "::1" and the gridengine gethostname program has 127.0.0.1 coming before the nonlocal ip address. It was because of that example that I made the more complex patch that I submitted instead of just looking at the first name and ip address on each line as you suggested and was the first fix that I tried.

By the way, this bug is more widely applicable than just with NetworkManager. I realized that when looking at a system that had NetworkManager disabled but still has the host name in the ::1 line of /etc/hosts, that there could be any number of ways that /etc/hosts ends up looking like that and that is what breaks the install script in gridengine.

Comment 5 Fedora Update System 2010-12-06 21:24:51 UTC
gridengine-6.2u5-6.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/gridengine-6.2u5-6.fc14

Comment 6 Fedora Update System 2010-12-07 20:09:29 UTC
gridengine-6.2u5-6.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update gridengine'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/gridengine-6.2u5-6.fc14

Comment 7 Fedora Update System 2011-03-11 20:52:50 UTC
gridengine-6.2u5-6.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.