Bug 1072967

Summary:

server hangup on shutdown when using NIS and oracle

Product:

Red Hat Enterprise Linux 6

Reporter:

Eduard Barrera <ebarrera>

Component:

initscripts

Assignee:

Lukáš Nykrýn <lnykryn>

Status:

CLOSED ERRATA

QA Contact:

Jan Ščotka <jscotka>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.5

CC:

ebarrera, jeremy.martin, jscotka, leon.kos, lnykryn, mkolbas, psklenar, rmainz

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

initscripts-9.03.47-1.el6

Doc Type:

Bug Fix

Doc Text:

Cause: ip addr flush was always called with scope global which is wrong for looback Consequence: Occasional hangs. Fix: use scope host for lo Result: It works!

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-07-22 07:18:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1075802, 1159926

Attachments:

Description	Flags
ifdown-eth.redhat.out	none
ifdown.redhat.out	none
network.redhat.out	none
Proposed patch	none
var log messages*	none

Description Eduard Barrera 2014-03-05 14:02:13 UTC

Description of problem:

When using NIS for username resolution server hangup after shutting down the interface.



Version-Release number of selected component (if applicable):


How reproducible:

Using NIS to resolv oracle uid

Steps to Reproduce:
1.
2.
3.

Actual results:

server hang (shutdown take 30 mins / 1 hour ) to stop

Expected results:

Server shuttdown normally

Additional info:

- The workarround is not stopping  the network when shutting down the server.

mv K99network k99network

- Some scripts are trying to solve usernames to uids,

The following error appears:

"su: user 238 does not exist"


238 is oracle uid on NIS

Because the network is stopped before than this process, su tries to lookup the uid of oracle but it cant get any response

Comment 3 Honza Horak 2014-03-06 15:58:11 UTC

That is weird, because ypbind should be stopped before network and after ypbind is stopped, glibc shouldn't ask for user information any more. So, either I didn't understand all the consequences or there is something wrong with the ypserv shutting down. I'd definitely need some more information.

Does the link /etc/rc6.d/K73ypbind exist?
Is there any error in the syslog related the ypbind?

Comment 4 Eduard Barrera 2014-03-07 09:33:09 UTC

Yes, it is!

K60nfs
K69rpcsvcgssd
K72autofs
K73ypbind  <=====
K74acpid
K74haldaemon
K74ntpd
K75blk-availability
K75netfs
K75ntpdate
K75quota_nld
K75udev-post
K80kdump
K84bgpd
K84ospf6d
K84ospfd
K84ripd
K84ripngd
K85ebtables
K85mdmonitor
K85messagebus
K85rpcgssd
K85zebra
K86cgred
K86nfslock
K87irqbalance
K87multipathd
K87restorecond
K87rpcbind
K88auditd
K88iscsi
K88rsyslog
K89iscsid
K89portreserve
K89rdisc
K90network
K92ip6tables
K92iptables

Comment 10 Eduard Barrera 2014-04-07 10:10:40 UTC

Attaching files with the output of each command:

ifdown-eth.redhat.out
ifdown.redhat.out
network.redhat.out

Comment 11 Eduard Barrera 2014-04-07 10:11:51 UTC

Created attachment 883527 [details]
ifdown-eth.redhat.out

Comment 12 Eduard Barrera 2014-04-07 10:12:21 UTC

Created attachment 883528 [details]
ifdown.redhat.out

Comment 13 Eduard Barrera 2014-04-07 10:12:48 UTC

Created attachment 883529 [details]
network.redhat.out

Comment 18 Lukáš Nykrýn 2014-04-28 08:03:53 UTC

Unfortunately this did not help

Comment 22 Lukáš Nykrýn 2014-04-28 13:01:20 UTC

Also can you post here you /var/log/messages?

Comment 26 Eduard Barrera 2014-04-30 13:23:03 UTC

customer reports downgrading the initscripts fix the issue

""""

Eduard,

I've downgraded to initscripts-9.03.38-1.el6.x86_64 and it has fixed the
problem.
The server now reboots without hanging so it looks like the problem is
somewhere in initscripts.

""""

What else should I ask to the customer ?

Comment 28 Leon Kos 2014-05-14 16:08:54 UTC

I think that the problem described is related to " loopback scope global" problem as described in https://www.centos.org/forums/viewtopic.php?f=16&t=45370

Can you check if the attached patch helps?

Comment 29 Leon Kos 2014-05-14 16:16:50 UTC

Created attachment 895547 [details]
Proposed patch

Comment 30 Leon Kos 2014-05-14 17:33:16 UTC

Comment on attachment 895547 [details]
Proposed patch

Typo on line
 id
should be
 if

Comment 31 Leon Kos 2014-05-14 18:33:57 UTC

Additionally I may comment that this flush problem probably is due to the fact that not all connections are being closed at the time of network shutdown. I have observed portmapper and rpc.statd connections before entering K90network. Even if nfslock services were not started. Anyway, I think that global scope on localhost flush is a bug that needs to be handled as proposed.

Comment 32 Eduard Barrera 2014-05-15 13:01:59 UTC

It will be enough with the /var/log/messages in the customer portal case ?
Uploading it!


Now the customer has the previous version of the initscripts...

Comment 33 Eduard Barrera 2014-05-15 13:05:38 UTC

Created attachment 895916 [details]
var log messages*

Comment 34 Jeremy Martin 2014-05-29 02:48:28 UTC

What is the status of this bug?  I too have this issue on 6.5 and found that downgrading to initscripts-9.03.38-1.el6.x86_64 resolved the issue.

Comment 35 Jeremy Martin 2014-05-29 03:08:30 UTC

FYI proposed patch in comment 29 resolved the problem with initscripts-9.03.40-2.el6.centos.x86_64 package installed.

With your proposed patch i am able to shutdown my network without it hanging and bring it back up.

Comment 36 Jeremy Martin 2014-05-29 03:13:24 UTC

sorry the proper installed package is initscripts-9.03.40-2.el6.x86_64.  I ws unable to copy/paste from the server to my laptop so I googled the package name and it brought up the centos version of the package.  

Using your proposed patch in comment 29 I was able to get the network to shutdown properly without it hanging.  I do still get a lot of:
do_ypcall: clnt_call: RPC: unable to send; errno = Network is unreachable 

errors until the network service starts up but at least its not hanging.

Comment 37 Jeremy Martin 2014-05-29 03:17:54 UTC

do_ypcall errors are expected of course since ypbind is still running. 

Just adding with your proposed fix on boot up i do not get the do_ypcall errors that i displayed above anymore.  Prior to the fix I did on boot up and while it was hanging on shutdown.

I think you resolved the issue with that few lines above in comment 29

Comment 38 Leon Kos 2014-05-29 05:31:13 UTC

Now that the patch has been verified, hopefully this will be accepted for initscripts update. As mentioned, global scope on loopback is essentially wrong any NIS just happened to be unclean at shutdown and got caught up in flushing. It didn't helped to me even if I killed the remaining processes before entering interface stopping.

Comment 43 Lukáš Nykrýn 2015-03-09 09:21:53 UTC

Sorry wrong commit

https://git.fedorahosted.org/cgit/initscripts.git/commit/?h=rhel6-branch&id=ebf9c167ecffc67617d6c89564e8394ccf985fbc

Comment 45 errata-xmlrpc 2015-07-22 07:18:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1380.html