Red Hat Bugzilla – Bug 431997
dd reads random sizes of data
Last modified: 2011-01-09 07:21:00 EST
Description of problem:
When dd reads from a pipe it produces random results
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Find a large file, say 10MB
2. cat largefile | dd ibs=100000 count=2 >output1
3. repeat 2) several times recording size of output1
4. dd if=largefile | dd ibs=100000 count=2 >output2
5. repeat 4) several times recording size of output2
The size of output will vary.
2. Various sizes such as 65536,139264,143360.
5. Various sizes such as 6144,7168,6656
size should always be 200000, and never anything else
The apparent block size of a pipe "device" should be transparent.
But in any case, when using ibs we are instructing dd what the input block size
is! So dd should obey our instructions, and not use other block size.
Interesting to note that a conceptually simple program like dd has been so
hacked around since v5.2 that it is a wonder it still works at all.
I think this is similar report to the :
Jim Meyering(upstream) wrote there:
"Thanks for the report, but that behavior is required by POSIX.
dd must handle SIGINT the way you want, but dd may not handle
SIGPIPE that way:
For SIGINT, the dd utility shall interrupt its current processing,
write status information to standard error, and exit as though
terminated by SIGINT. It shall take the standard action for all
other signals; see the ASYNCHRONOUS EVENTS section in Section 1.4
(on page 280)."
Recommendation was to not use unnecessary pipes with dd command (both 2 and 4
could be written in one command without pipe), closing NOTABUG (see this mailing
list thread for better explanation).
A bit more explanation is also there:
Feel free to add some comments if you are not satisfied with that explanation.
You could also discuss about that behaviour on upstream mailing list
It should be handled properly because any SIGPIPE is synchronous with actual
read/write and is totally predictable. I have been patching dd for the last 5 or
more years and using stat/S_ISFIFO() to modify dd's behavior.
But I am getting sick of reworking the patch every time the hackers totally
It is still a bug because dd used to run properly.
Single User Specification v3 says at
The processing order shall be as follows:
1. An input block is read.
2. If the input block is shorter than the specified input block size and the
sync conversion is specified, null bytes shall be appended to the input data up
to the specified size. (If either block or unblock is also specified, <space>s
shall be appended instead of null bytes.) The remaining conversions and output
shall include the pad characters as if they had been read from the input.
paragraph (2) clearly acknowledges the fact that read can be a short one, and it
says nowhere that additional reads should accumulate the full input block. The
fact that such additional reads might block for indeterminate amount of time may
break scripts which do not expect this.
Your assumption stated here: "But in any case, when using ibs we are instructing
dd what the input block size is! So dd should obey our instructions" does not
seem to match what standard says. Standard only says that reads shall be
_attempted_ with that size.
As I see it, changing dd behavior might make you happy, but at the same time
_other_ users will experience breakage ("hung" dd waiting for additional input
which will never come).
> It is still a bug because dd used to run properly.
Which version of dd worked for you?
Sorry, wrong link. The correct one is:
If dd is waiting for "additional input that will never come" then it is either
at the end of file (which would be treated normally and cause termination) or
there is more data to come (eventually). There is no other possibility.
In any case I am not concerned with what sync behavior does (pad), I am
concerned when sync is NOT specified. What does your standard say about that?
paragraph (2) is clearly talking about a condition where sync is specified -
The version that used to work for me was the version that existed 5 years ago.
As solution to this problem (and similar #449263) I proposed a patch to
upstream developers. You can watch discussion in the thread http://
I thing this is not a bug. Option count= specifies sum of full and partial
records (full = 100000B; partial = less than 100000B in this case).
This is documented dd behavior (standardized by POSIX) and can not be changed.
But there is already solution in Fedora 9 and rawhide - new dd option
iflag=fullblock, which turns on reading full blocks where possible.
The option iflag=fullblock is available since coreutils-6.10-28.fc9 and
coreutils-6.9-18.fc8 has been submitted as an update for Fedora 8
coreutils-6.9-18.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report.
Closing CURRENTRELEASE ... it looks like automatic closing bot is too lazy to do that. New option iflag=fullblock added to address the issue in coreutils-6.9-18.fc8.