Bug 654943 - Grid Engine configuration script inst_sge fails in Fedora 13, 14, and RHEL6
Summary: Grid Engine configuration script inst_sge fails in Fedora 13, 14, and RHEL6
Alias: None
Product: Fedora
Classification: Fedora
Component: gridengine
Version: 14
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Orion Poplawski
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2010-11-19 05:35 UTC by Sidney Markowitz
Modified: 2013-01-22 20:40 UTC (History)
3 users (show)

Fixed In Version: gridengine-6.2u5-6.fc14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-03-11 20:52:59 UTC
Type: ---

Attachments (Terms of Use)
Proposed patch that only rejects when the only host name is localhost* and the only ip address is 127.0* (1.06 KB, patch)
2010-11-25 18:38 UTC, Sidney Markowitz
no flags Details | Diff

System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 629021 0 None None None Never

Description Sidney Markowitz 2010-11-19 05:35:54 UTC
Description of problem:
Grid Engine configuration script inst_sge fails when run under a commonly found configuration in Fedora 13, 14 and RHEL6 because of new NetworkManager behavior.
I only tested this on 64-bit versions, though I don't think that matters.

Version-Release number of selected component (if applicable): 6.2u5

How reproducible: always

Steps to Reproduce:
1. Install Fedora 13, or 14, or RHEL6 on a machine that gets its host name and ip address from DHCP. In the NetworkManager applet ipv4 should be configured to automatically get settings from DHCP, ipv6 should be set to the default "ignore".
2. Verify that /etc/hosts contains a line with the external ip address and the host name with a comment that it was added by NetworkManager, and that it also has a line beginning with ::1 that has the host name in addition to the ipv6 localhost names. The host name in the ::1 line was added by NetworkManager even though there is no comment to that effect. Verify that hostname -i output two ip addresses, the external ip address and either ::1 (in Fedora 13/14) or (in RHEL6).
3. yum install gridengine gridengine-qmaster gridengine-execd gridengine-qmon
You probably don't need to install all of these, but that's what I did when I tested
4. run the command from the correct directory for your architecture
Notice that it displays two ip addresses like hostname -i does, the second one always being This is what will break the install script.
5. cd to /usr/share/gridengine, make a copy of the file my_configuration.conf and edit it to set ADMIN_USER=sgeadmin, HOST_LIST=the short host name of the computer, i.e., the output of hostname -s, ADD_TO_RC=true, and at the end of the file add the lines


Again, not all of this may be necessary to reproduce the bug but that is the configuration file that I used to make it happen.
6. Assuming you named the edited configuration file foo.conf, run the command

 ./inst_sge -m -x -auto ./foo.conf

Actual results:
The configuration script inst_sge completes very quickly after it reaches the point of clearing the screen. When it is done there is no /usr/share/gridengine/default directory as there should be. You can see an error log file named /tmp/install.nnnn where nnnn is some number. It shows the output you saw in step 4 and says "It is not supported for a Grid Engine installation that the local hostname contains the hostname "localhost" and/or the IP address "127.0.x.x" of the loopback interface. [...] Installation failed"

Expected results:
The configuration takes longer to complete after the step of clearing the screen and indicates successful completion when it is done, creating a proper /usr/share/gridengine/default directory and its contents.

Additional info:
This is caused by an upstream bug in NetworkManager as I commented in https://bugzilla.gnome.org/show_bug.cgi?id=629021#c6 which causes a number of problems in Fedora 13 and 14, and in RHEL6 as reported in https://bugzilla.redhat.com/show_bug.cgi?id=643443#c4

However I think there is a simple workaround that can allow gridengine to install ok before this is fixed upstream and which will actually be more correct behavior. The error is generated in the script /usr/share/gridengine/util/install_common.sh around line 1584 in the sh function CheckForLocalHostResolving() where there is a test

   for cmp in $output; do
      case "$cmp" in

That should be changed to look for localhost* in all the tokens like it is now, but only match 127.0* in a line that contains no other ip addresses. In other words, if multiple ip addresses are in the output then notok should not be set to true even if one of the ip addresses is 127.0* I think that is more correct anyway, because this test is supposed to protect against the situation in which a server does not have an external ip address. As long as there is an ip address that is not 127.0* then the condition is satisfied.

Comment 1 Brendan Jones 2010-11-19 09:22:21 UTC
Thanks for the report. Assigning

Fedora Bugzappers volunteer triage team

Comment 2 Sidney Markowitz 2010-11-25 18:38:56 UTC
Created attachment 462953 [details]
Proposed patch that only rejects when the only host name is localhost* and the only ip address is 127.0*

Here is a patch to the configuration script that is less picky about what it rejects. Instead of looking for any host name or alias named localhost* and any ip address that is 127.0* like the current script does, it only rejects the case in which there are only localhost names and ip addresses. I tried this on various configurations in RHEL6, Fedora 13 and Fedora 14 and could not come up with one that broke it.

I think this is simple enough not to require a CLA from me, and I state here that anyone is free to use this any way they want, but I am willing to submit a CLA if you want one.

Comment 3 Orion Poplawski 2010-12-03 17:48:44 UTC
Could you try:

gethostname | awk -F: '/Host/ { split($2, items," "); print items[1]; } '

as the filter to gethostname with the rest of inst_common.sh intact?  I think that should be sufficient.

Comment 4 Sidney Markowitz 2010-12-03 21:40:38 UTC
That doesn't work with one Fedora 14 system I encountered that somehow ended up with the line in /etc/hosts containing the host name of the machine, i.e., the equivalent of foo foo.example.com foo localhost.localdomain localhost localhost4
 ::1 foo localhost6.localdomain6 localhost6

I don't know how it ended up configured like that, but on that machine the command

 hostname -i

begins with "::1" and the gridengine gethostname program has coming before the nonlocal ip address. It was because of that example that I made the more complex patch that I submitted instead of just looking at the first name and ip address on each line as you suggested and was the first fix that I tried.

By the way, this bug is more widely applicable than just with NetworkManager. I realized that when looking at a system that had NetworkManager disabled but still has the host name in the ::1 line of /etc/hosts, that there could be any number of ways that /etc/hosts ends up looking like that and that is what breaks the install script in gridengine.

Comment 5 Fedora Update System 2010-12-06 21:24:51 UTC
gridengine-6.2u5-6.fc14 has been submitted as an update for Fedora 14.

Comment 6 Fedora Update System 2010-12-07 20:09:29 UTC
gridengine-6.2u5-6.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update gridengine'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/gridengine-6.2u5-6.fc14

Comment 7 Fedora Update System 2011-03-11 20:52:50 UTC
gridengine-6.2u5-6.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.