Description of Problem: When running dump repeatedly, the system hangs. Version-Release number of selected component (if applicable): dump-0.4b27-3 How Reproducible: Very. I have not been able to dump all of my partions without hanging the system, and I have tried at least 5 times today. Steps to Reproduce: 1. e.g. /sbin/dump -4 -f /dev/null /dev/hdd1 2. may need to repeat step 1 a few times Actual Results: System hangs: console is dead; ssh/telnet does not work; ping does, however. Additional Information: From /var/log/messages at the same time: Mar 28 18:06:56 oscar kernel: invalidate: busy buffer Mar 28 18:06:56 oscar last message repeated 53 times Mar 28 18:06:56 oscar kernel: invalidate: dirty buffer Mar 28 18:06:56 oscar kernel: invalidate: busy buffer Mar 28 18:06:56 oscar kernel: invalidate: dirty buffer Mar 28 18:06:56 oscar kernel: invalidate: busy buffer Mar 28 18:06:56 oscar kernel: invalidate: dirty buffer Mar 28 18:06:56 oscar kernel: invalidate: busy buffer Mar 28 18:06:56 oscar kernel: invalidate: dirty buffer Mar 28 18:06:56 oscar kernel: invalidate: busy buffer Mar 28 18:08:03 oscar last message repeated 988 times Mar 28 18:10:05 oscar last message repeated 1688 times Mar 28 18:10:32 oscar last message repeated 2522 times Mar 28 18:10:32 oscar kernel: invalidate: dirty buffer Mar 28 18:10:32 oscar kernel: invalidate: busy buffer Mar 28 18:10:32 oscar kernel: invalidate: dirty buffer Mar 28 18:10:32 oscar kernel: invalidate: busy buffer Mar 28 18:10:32 oscar kernel: invalidate: dirty buffer Mar 28 18:10:32 oscar kernel: invalidate: busy buffer Mar 28 18:10:32 oscar kernel: invalidate: dirty buffer Mar 28 18:10:32 oscar kernel: invalidate: busy buffer Mar 28 18:11:25 oscar last message repeated 1915 times
This is a kernel issue, not a dump issue....
I hope those partitions aren't actually mounted and in use ?
My understanding is that dump can handle mounted filesystems. Quoting from the man page dump(8): "files-to-dump is either a mountpoint of a filesystem or a list of files and directories to be backed up as a subset of a filesystem. In the for- mer case, either the path to a mounted filesystem or the device of an unmounted filesystem can be used." In my "Steps to Reproduce" example, I was using the device of a mounted filesystem, which the man page does not claim to support. So I changed "/dev/hdd1" to "/usr2", but I was still able to hang my system on the second try. As a reference point, this does work fine under RH7.2 (kernel 2.4.9-31) and dump 0.4b25.
This should be fixed by the patch that went into the tree to fix the invalidate only on last unuse rule
Can you try the 2.4.18-0.12 or later kernels in rawhide to see if those fix this issue ?
I upgraded to kernel-smp-2.4.18-0.12, but no change.
On a whim, I tried the non-SMP 2.4.18-0.12, but it hung as well.
Do you still get the "invalidate: buffer busy" messages in your kernel logs with the updated kernel?
Unfortunately, yes. I am now on kernel-smp-2.4.18-0.13. Maybe a clue: I made a typo and tried to dump a nonexistant mount point (/usr1), and I still got one "invalidate: buffer busy" message. But I guess this could be from reading /.
note that amanda depends on dump working.
Umpty-thousand amanda installs all over the world require dump to work on mounted filesystems. Ugly as that is, breaking this behaviour would be Bad.
Hmm... I started getting this `buffer busy' message on my laptop too, with kernel 2.4.18-0.13, after I switched to a RAID1 filesystem for root, with the caveat that the RAID1 filesystem is in degraded mode, with a single replica at the moment (my removable disk died the other day :-(
Failed to mention: I get these messages on shutdown, right before it marks the md devices clean.
Yeah! I just ran up2date and got kernel kernel-smp-2.4.18-0.20. When I run dump, /var/log/messages is still heavily polluted with thousands of busy and dirty buffer messages, but my system no longer hangs ;) I ran my "Steps to Reproduce" a dozen times, and completed a zero level dump of all 9 partitions (18Gb used), and yet my uptime keeps increasing! Progress is always a good thing, wouldn't you say?
This should be fixed in our kernels now. Look for it in rawhide in a subsequent kernel build.