Bug 654943 - Grid Engine configuration script inst_sge fails in Fedora 13, 14, and RHEL6
Grid Engine configuration script inst_sge fails in Fedora 13, 14, and RHEL6
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: gridengine (Show other bugs)
14
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Orion Poplawski
Fedora Extras Quality Assurance
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-19 00:35 EST by Sidney Markowitz
Modified: 2013-01-22 15:40 EST (History)
3 users (show)

See Also:
Fixed In Version: gridengine-6.2u5-6.fc14
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-03-11 15:52:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Proposed patch that only rejects when the only host name is localhost* and the only ip address is 127.0* (1.06 KB, patch)
2010-11-25 13:38 EST, Sidney Markowitz
no flags Details | Diff

  None (edit)
Description Sidney Markowitz 2010-11-19 00:35:54 EST
Description of problem:
Grid Engine configuration script inst_sge fails when run under a commonly found configuration in Fedora 13, 14 and RHEL6 because of new NetworkManager behavior.
I only tested this on 64-bit versions, though I don't think that matters.

Version-Release number of selected component (if applicable): 6.2u5


How reproducible: always


Steps to Reproduce:
1. Install Fedora 13, or 14, or RHEL6 on a machine that gets its host name and ip address from DHCP. In the NetworkManager applet ipv4 should be configured to automatically get settings from DHCP, ipv6 should be set to the default "ignore".
2. Verify that /etc/hosts contains a line with the external ip address and the host name with a comment that it was added by NetworkManager, and that it also has a line beginning with ::1 that has the host name in addition to the ipv6 localhost names. The host name in the ::1 line was added by NetworkManager even though there is no comment to that effect. Verify that hostname -i output two ip addresses, the external ip address and either ::1 (in Fedora 13/14) or 127.0.0.1 (in RHEL6).
3. yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon
You probably don't need to install all of these, but that's what I did when I tested
4. run the command from the correct directory for your architecture
 /usr/share/gridengine/utilbin/lx26-amd64/gethostbyname
Notice that it displays two ip addresses like hostname -i does, the second one always being 127.0.0.1. This is what will break the install script.
5. cd to /usr/share/gridengine, make a copy of the file my_configuration.conf and edit it to set ADMIN_USER=sgeadmin, HOST_LIST=the short host name of the computer, i.e., the output of hostname -s, ADD_TO_RC=true, and at the end of the file add the lines

  SGE_CLUSTER_NAME="none"
  CLUSTER_NAME="none"
  SGE_ENABLE_SMF="false"

Again, not all of this may be necessary to reproduce the bug but that is the configuration file that I used to make it happen.
6. Assuming you named the edited configuration file foo.conf, run the command

 ./inst_sge -m -x -auto ./foo.conf

Actual results:
The configuration script inst_sge completes very quickly after it reaches the point of clearing the screen. When it is done there is no /usr/share/gridengine/default directory as there should be. You can see an error log file named /tmp/install.nnnn where nnnn is some number. It shows the output you saw in step 4 and says "It is not supported for a Grid Engine installation that the local hostname contains the hostname "localhost" and/or the IP address "127.0.x.x" of the loopback interface. [...] Installation failed"

Expected results:
The configuration takes longer to complete after the step of clearing the screen and indicates successful completion when it is done, creating a proper /usr/share/gridengine/default directory and its contents.

Additional info:
This is caused by an upstream bug in NetworkManager as I commented in https://bugzilla.gnome.org/show_bug.cgi?id=629021#c6 which causes a number of problems in Fedora 13 and 14, and in RHEL6 as reported in https://bugzilla.redhat.com/show_bug.cgi?id=643443#c4

However I think there is a simple workaround that can allow gridengine to install ok before this is fixed upstream and which will actually be more correct behavior. The error is generated in the script /usr/share/gridengine/util/install_common.sh around line 1584 in the sh function CheckForLocalHostResolving() where there is a test

   for cmp in $output; do
      case "$cmp" in
      localhost*|127.0*)
         notok=true

That should be changed to look for localhost* in all the tokens like it is now, but only match 127.0* in a line that contains no other ip addresses. In other words, if multiple ip addresses are in the output then notok should not be set to true even if one of the ip addresses is 127.0* I think that is more correct anyway, because this test is supposed to protect against the situation in which a server does not have an external ip address. As long as there is an ip address that is not 127.0* then the condition is satisfied.
Comment 1 Brendan Jones 2010-11-19 04:22:21 EST
Thanks for the report. Assigning



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 2 Sidney Markowitz 2010-11-25 13:38:56 EST
Created attachment 462953 [details]
Proposed patch that only rejects when the only host name is localhost* and the only ip address is 127.0*

Here is a patch to the configuration script that is less picky about what it rejects. Instead of looking for any host name or alias named localhost* and any ip address that is 127.0* like the current script does, it only rejects the case in which there are only localhost names and ip addresses. I tried this on various configurations in RHEL6, Fedora 13 and Fedora 14 and could not come up with one that broke it.

I think this is simple enough not to require a CLA from me, and I state here that anyone is free to use this any way they want, but I am willing to submit a CLA if you want one.
Comment 3 Orion Poplawski 2010-12-03 12:48:44 EST
Could you try:

gethostname | awk -F: '/Host/ { split($2, items," "); print items[1]; } '

as the filter to gethostname with the rest of inst_common.sh intact?  I think that should be sufficient.
Comment 4 Sidney Markowitz 2010-12-03 16:40:38 EST
That doesn't work with one Fedora 14 system I encountered that somehow ended up with the 127.0.0.1 line in /etc/hosts containing the host name of the machine, i.e., the equivalent of

 192.168.3.4 foo foo.example.com
 127.0.0.1 foo localhost.localdomain localhost localhost4
 ::1 foo localhost6.localdomain6 localhost6

I don't know how it ended up configured like that, but on that machine the command

 hostname -i

begins with "::1" and the gridengine gethostname program has 127.0.0.1 coming before the nonlocal ip address. It was because of that example that I made the more complex patch that I submitted instead of just looking at the first name and ip address on each line as you suggested and was the first fix that I tried.

By the way, this bug is more widely applicable than just with NetworkManager. I realized that when looking at a system that had NetworkManager disabled but still has the host name in the ::1 line of /etc/hosts, that there could be any number of ways that /etc/hosts ends up looking like that and that is what breaks the install script in gridengine.
Comment 5 Fedora Update System 2010-12-06 16:24:51 EST
gridengine-6.2u5-6.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/gridengine-6.2u5-6.fc14
Comment 6 Fedora Update System 2010-12-07 15:09:29 EST
gridengine-6.2u5-6.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update gridengine'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/gridengine-6.2u5-6.fc14
Comment 7 Fedora Update System 2011-03-11 15:52:50 EST
gridengine-6.2u5-6.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.