Description of problem: Intermittently our daily backups fail on the server side. The amdump log states: find diskspace: not enough diskspace. Left with 201536 K find diskspace: not enough diskspace. Left with 201536 K driver: Don't know how to send ABORT command to chunker taper: DONE [idle wait: 2730.174 secs] chunker: error [bad command after RQ-MORE-DISK: "QUIT"] chunker: time 1375.165: error [bad command after RQ-MORE-DISK: "QUIT"] chunker: time 1375.165: pid 9916 finish time Wed Apr 30 00:46:07 2008 taper: writing end marker. [tape13 OK kb 3445216 fm 6] dumper: kill index command amdump: end at Wed Apr 30 00:46:07 EDT 2008 It is my understanding that the find diskspace error should not actually be a fatal error, but it appears that sometimes it is treated as fatal and the server quits the backup. The client's sendbackup log reflects this: sendbackup: time 1230.584: 52: normal(|): gtar: ./var/spool/postfix/public/flush: socket ignored sendbackup: time 1230.584: 52: normal(|): gtar: ./var/spool/postfix/public/showq: socket ignored sendbackup: time 1375.125: index tee cannot write [Broken pipe] sendbackup: time 1375.139: pid 11355 finish time Wed Apr 30 00:46:07 2008 sendbackup: time 1375.125: 119: strange(?): sendbackup: time 1375.139: 119: strange(?): gzip: stdout: Broken pipe sendbackup: time 1375.139: 119: strange(?): sendbackup: index tee cannot write [Broken pipe] sendbackup: time 1375.155: 46: size(|): Total bytes written: 14438103040 (14GiB, ?/s) sendbackup: time 1375.155: 119: strange(?): gtar: -: Wrote only 8192 of 10240 bytes sendbackup: time 1375.181: 119: strange(?): gtar: Error is not recoverable: exiting now sendbackup: time 1375.181: 119: strange(?): sed: couldn't flush stdout: Broken pipe sendbackup: time 1375.181: error [compress returned 1, /bin/tar returned 2] sendbackup: time 1375.181: pid 11351 finish time Wed Apr 30 00:46:07 2008 as does the client's amandad log: amandad: time 0.151: stream_accept: connection from 10.120.1.15.34223 amandad: time 0.151: stream_accept: connection from 10.120.1.15.42220 amandad: time 0.151: stream_accept: connection from 10.120.1.15.58129 amandad: time 1375.179: sending NAK pkt: <<<<< ERROR write error on stream 52037: write error on stream 52037: Connection reset by peer >>>>> Version-Release number of selected component (if applicable): 2.5.0p4 Looking around in the amanda-users list indicates this may be a bug in the version of amanda that RHEL5 is using (see http://readlist.com/lists/amanda.org/amanda-users/2/10137.html). How reproducible: Set up a backup job over the network, then wait about a month for it to start sporadically failing for a couple weeks, then proceed to work great for a few months before it starts sporadically failing again. Actual results: Every few months, for a week or two, the dumps fail for one or two servers with the above errors. Then everything starts working again for a few more months, then the errors again. Expected results: Backup should succeed without a hitch.
hello Noah, did you contact http://www.redhat.com/support with this issue? it is unlikely we will rebase amanda in rhel5 and maybe they can help you work around the problem
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
I am sorry, but it is too late in the RHEL-5 release cycle [1]. At the moment we are addressing only critical and security related issues in RHEL-5. This one is fixed in RHEL-6. I am closing the bug as WONTFIX. [1] https://access.redhat.com/support/policy/updates/errata/