Bug 142745 - Amanda hangs on backup (FC3)
Amanda hangs on backup (FC3)
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
3
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-12-13 15:04 EST by Matthew Saltzman
Modified: 2015-01-04 17:13 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-07-15 18:42:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Saltzman 2004-12-13 15:04:07 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
amcheck works fine, but amdump hangs forever.  Disk archive files are
zero-length, and amanda-related processes have consumed zero time
after 15 hours.

I believe the problem started with a recent kernel upgrade in FC2, but
I have upgraded the server to FC3 and the problem persists.  The
amanda configuration has not changed.  There is plenty of space on my
holding disk filesystem.


Version-Release number of selected component (if applicable):
amanda-2.4.4p3-1

How reproducible:
Always

Steps to Reproduce:
1. Set up remote backup with amanda.  chkconfig amanada on.  Open
firewall ports for all amanda processes.  Include
"ip_conntrack_amanda" iptables module.
2. Run amcheck.
3. Run amdump.
    

Actual Results:  amcheck works perfectly.  amdump starts but hangs
forever.

Expected Results:  amcheck works perfectly.  amdump copies files
across the network to disk holding area (and eventually dumps to tape).

Additional info:

This message at the Amanda FAQ-O-Matic appears to address this issue:

  http://amanda.sourceforge.net/fom-serve/cache/426.html

It's possible that this is a kernel issue, but I thought I'd start
here, as amanda is the affected package.

I'm going to set the severity high, as I have no functioning backup
systems so loss of data is a serious risk.
Comment 1 Peter Bieringer 2004-12-16 05:08:30 EST
I ran into similar problem since updating the kernel on a box to
kernel-2.6.9-1.6_FC2. "sendbackup" is running, sending UDP packages to
the amanda-server, but no progress.

Client log tells:

amandad: time 10.374: dgram_recv: timeout after 10 seconds
amandad: time 10.375: waiting for ack: timeout, retrying
amandad: time 20.374: dgram_recv: timeout after 10 seconds
amandad: time 20.374: waiting for ack: timeout, retrying
amandad: time 30.373: dgram_recv: timeout after 10 seconds
amandad: time 30.373: waiting for ack: timeout, retrying
amandad: time 40.372: dgram_recv: timeout after 10 seconds
amandad: time 40.373: waiting for ack: timeout, retrying
amandad: time 50.371: dgram_recv: timeout after 10 seconds
amandad: time 50.372: waiting for ack: timeout, giving up!
amandad: time 50.372: pid 11240 finish time Sun Dec 12 17:25:23 2004


sendbackup: debug 1 pid 11241 ruid 33 euid 33: start at Sun Dec 12
17:24:32 2004
/usr/lib/amanda/sendbackup: version 2.4.4p2
  parsed request as: program `GNUTAR'
                     disk `/var/log/ipacc'
                     device `/var/log/ipacc'
                     level 1
                     since 2004:12:2:23:45:44
                     options
`|;auth=bsd;compress-fast;index;exclude-list=/etc/amanda-exclude.gtar;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.003: stream_server: waiting for connection:
0.0.0.0.4472
sendbackup: time 0.004: stream_server: waiting for connection:
0.0.0.0.4473
sendbackup: time 0.004: stream_server: waiting for connection:
0.0.0.0.4474
sendbackup: time 0.007: waiting for connect on 4472, then 4473, then 4474
sendbackup: time 30.001: stream_accept: timeout after 30 seconds
sendbackup: time 30.002: timeout on data port 4472
sendbackup: time 59.996: stream_accept: timeout after 30 seconds
sendbackup: time 59.996: timeout on mesg port 4473
sendbackup: time 89.991: stream_accept: timeout after 30 seconds
sendbackup: time 89.991: timeout on index port 4474
sendbackup: time 89.992: pid 11241 finish time Sun Dec 12 17:26:03 2004


Switching back to kernel-2.6.8-1.521 all works fine again.

I use "ip_conntrack_amanda" on client and server sides.

Server runs kernel-2.4.20-37.9.legacy and amanda-2.4.3-4

Comment 2 Peter Bieringer 2005-01-02 04:36:58 EST
I run further tests with firewalling ACCEPT INPUT and OUTPUT and found following result:

at least the existance of "ip_conntrack_amanda" (regardless of use in chains) will cause 
the problem. So it's really a kernel issue and should be solved quickly.
Comment 3 Jay Fenlason 2005-01-04 16:32:48 EST
If the problem is the ip_conntrack_amanda netfilter module, this bug 
needs to be filed against the kernel package.  Reassigning. 
Comment 4 Matthew Saltzman 2005-01-04 18:45:11 EST
That makes it a dup of #142745 (which I filed when I saw that
ip_conntrack_amanda appeared to be the culprit), or vice versa.
Comment 5 Matthew Saltzman 2005-01-04 18:46:42 EST
Sorry, I meant #143803...
Comment 6 Peter Bieringer 2005-01-12 15:03:34 EST
I may report that using kernel 2.6.10-1.737_FC3 on FC3 and 
kernel-2.6.10-1.8_FC2 on FC2 this issue is fixed.
Comment 7 Matthew Saltzman 2005-02-02 08:10:22 EST
I have to report that the problem persists in the devel
kernel-2.6.10-1.1115_FC4.  The server does not hang now, but reports

clientmachine  /home lev 0 FAILED [Estimate timeout from clientmacine]
clientmachine  /etc lev 0 FAILED [Estimate timeout from clientmachine]

and no backup is done.
Comment 8 Dave Jones 2005-07-15 14:11:45 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 9 Matthew Saltzman 2005-07-15 15:51:57 EDT
Sorry, I should have followed up before.  I believe this has been working in FC3
at least since kernels went to 2.6.11.

I now have two FC4 machines in the mix and they have been working fine.

Thanks.

Note You need to log in before you can comment on or make changes to this bug.