477826 – Network Goes Down After 1 week continuous operation during night (NetworkManager)

Bug 477826 - Network Goes Down After 1 week continuous operation during night (NetworkManager)

Summary: Network Goes Down After 1 week continuous operation during night (NetworkMana...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	10
Hardware:	All
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	---
Assignee:	Dan Williams
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-12-24 00:32 UTC by Bevis King
Modified:	2009-11-11 16:30 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-11-11 16:30:53 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bevis King 2008-12-24 00:32:13 UTC

Description of problem:
The permanently connected statically assigned address ethernet interface (eth0) looses outbound internet connection after just over one week of continuous operation under NetworkManager control. It operates fine for one week, then fails.

Usually shows itself as a loss of internet connectivity while system remains up - usual first error recorded is due to DNS lookup failures.

Possibly related messages are:
Dec 22 20:12:13 willow named[2528]: network unreachable resolving 'ns2.mailbox.c
o.uk/AAAA/IN': 2001:7fd::1#53
Dec 22 20:12:13 willow named[2528]: network unreachable resolving 'ns1.mailbox.c
o.uk/AAAA/IN': 2001:7fd::1#53
Dec 22 20:12:13 willow named[2528]: network unreachable resolving 'ns0.mailbox.c
o.uk/AAAA/IN': 2001:7fd::1#53
Dec 22 20:12:14 willow named[2528]: network unreachable resolving 'ns2.mailbox.c
o.uk/AAAA/IN': 2001:500:3::42#53
Dec 22 20:12:14 willow named[2528]: network unreachable resolving 'ns1.mailbox.c
o.uk/AAAA/IN': 2001:500:3::42#53
Dec 22 20:12:14 willow named[2528]: network unreachable resolving 'ns0.mailbox.c
o.uk/AAAA/IN': 2001:500:3::42#53
Dec 22 20:12:16 willow named[2528]: too many timeouts resolving 'smtp.mailbox.co
.uk/A' (in 'mailbox.co.uk'?): disabling EDNS

Dec 22 20:12:27 willow named[2528]: too many timeouts resolving 'B.ROOT-SERVERS.
NET/AAAA' (in '.'?): reducing the advertised EDNS UDP packet size to 512 octets
Dec 22 20:12:27 willow named[2528]: too many timeouts resolving 'C.ROOT-SERVERS.
NET/AAAA' (in '.'?): reducing the advertised EDNS UDP packet size to 512 octets
Dec 22 20:12:27 willow named[2528]: too many timeouts resolving 'D.ROOT-SERVERS.
NET/AAAA' (in '.'?): reducing the advertised EDNS UDP packet size to 512 octets
Dec 22 20:12:27 willow named[2528]: too many timeouts resolving 'E.ROOT-SERVERS.

Dhcpd continues to record devices on the local network requesting addresses despite loss of off-network connectivity. External ADSL router remains operational.

Both incoming (http, sshd) requests port forwarded by the ADSL router and outgoing (locally initiated automated email dispatch, etc) fail.

Unfortunately since I'm not on site, I can't examine the state of the system after failure only the logs after I've asked someone to reset the server - I'm wondering if a routing table entry for the default route to the internet or some controlling named pipe/FIFO is being expired or deleted after one week.

Version-Release number of selected component (if applicable):
NetworkManager-0.7.0-0.12.svn4326.fc10.x86_64

How reproducible:
Three times in a row now - system goes down shortly after one week of operation since last reboot.

Steps to Reproduce:
1. Leave system up for over a week with a static address eth0 port controlled by NetworkManager.

Actual results:
System fails after just over one week.

Expected results:
System stays accessible from the internet and continues to access the internet to dispatch cron emails, etc as required.

Additional info:

Comment 1 Bevis King 2009-01-14 21:55:39 UTC

This problem has continued despite the git20090102 updated version of NetworkManager etc issued as updates.

Comment 2 Dan Williams 2009-02-14 19:51:41 UTC

Is there a chance that power is lost around these times, or that the switch which eth0 is connected to somehow looses power?  Can you paste in the output of /var/log/messages from the time that the connection is lost?  NM may log some interesting information about it.  Thanks!

Comment 3 Jessica Sterling 2009-03-06 21:54:46 UTC

This bug has been triaged

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 4 Dan Williams 2009-11-11 00:50:23 UTC

Can you also give me a bit of info about the network setup?  Is the computer hooked up via ethernet to the DSL router?  What kernel version and ethernet hardware is this?

Comment 5 Bevis King 2009-11-11 10:26:31 UTC

Dan

Thanks for your interest in this bug.  It was reported just after NetworkManager became the default for managing static interfaces and was proving unreliable.

Additionally, as it later turned out, it was also during a contract dispute between my ISP and their infrastructure provider and at least some of the outages were due to escalating tensions between the two companies.

As a result the DSL line's access to some upstream services was being suspended and re-directed without ever a full loss of service.  The combination of this and Network Manager's novelty was really causing things to fall over badly, particularly in the DNS area.

Since then NetworkManager has definitely gained in stability and my ISP has moved infrastructure suppliers and I have not seen any problems of this type in at least six months.

I'd suggest closing the call at this point.  It's not an issue with the current installed version of NetworkManager.

Thanks!

Regards, Bevis.

Comment 6 Dan Williams 2009-11-11 16:30:53 UTC

Ok, thanks for the report.

Note You need to log in before you can comment on or make changes to this bug.