431997 – dd reads random sizes of data

Bug 431997 - dd reads random sizes of data

Summary: dd reads random sizes of data

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	coreutils
Sub Component:
Version:	8
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Ondrej Vasik
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	668247
TreeView+	depends on / blocked

Reported:	2008-02-08 10:29 UTC by JW
Modified:	2011-01-09 12:21 UTC (History)
CC List:	2 users (show)
Fixed In Version:	coreutils-6.9-18.fc8
Clone Of:
Clones:	668247 (view as bug list)
Environment:
Last Closed:	2008-08-19 11:54:02 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description JW 2008-02-08 10:29:20 UTC

Description of problem:
When dd reads from a pipe it produces random results

Version-Release number of selected component (if applicable):
coreutils-6.9-9

How reproducible:
Always

Steps to Reproduce:
1. Find a large file, say 10MB
2. cat largefile | dd ibs=100000 count=2 >output1
3. repeat 2) several times recording size of output1
4. dd if=largefile | dd ibs=100000 count=2 >output2
5. repeat 4) several times recording size of output2
  
Actual results:
The size of output will vary.
2. Various sizes such as 65536,139264,143360.
5. Various sizes such as 6144,7168,6656

Expected results:
size should always be 200000, and never anything else

Additional info:
The apparent block size of a pipe "device" should be transparent.
But in any case, when using ibs we are instructing dd what the input block size
is!  So dd should obey our instructions, and not use other block size.
Interesting to note that a conceptually simple program like dd has been so
hacked around since v5.2 that it is a wonder it still works at all.

Comment 1 Ondrej Vasik 2008-02-08 23:31:28 UTC

I think this is similar report to the : 
http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg447197.html

Jim Meyering(upstream) wrote there:
"Thanks for the report, but that behavior is required by POSIX.
dd must handle SIGINT the way you want, but dd may not handle
SIGPIPE that way:

    ASYNCHRONOUS EVENTS

        For SIGINT, the dd utility shall interrupt its current processing,
        write status information to standard error, and exit as though
        terminated by SIGINT. It shall take the standard action for all
        other signals; see the ASYNCHRONOUS EVENTS section in Section 1.4
        (on page 280)."

Recommendation was to not use unnecessary pipes with dd command (both 2 and 4
could be written in one command without pipe), closing NOTABUG (see this mailing
list thread for better explanation).

A bit more explanation is also there:
http://lists.gnu.org/archive/html/bug-coreutils/2008-02/msg00000.html

Feel free to add some comments if you are not satisfied with that explanation.
You could also discuss about that behaviour on upstream mailing list
bug-coreutils .

Comment 2 JW 2008-02-08 23:42:55 UTC

It should be handled properly because any SIGPIPE is synchronous with actual
read/write and is totally predictable. I have been patching dd for the last 5 or
more years and using stat/S_ISFIFO() to modify dd's behavior.

But I am getting sick of reworking the patch every time the hackers totally
reinvent dd.

It is still a bug because dd used to run properly.

Comment 3 Denys Vlasenko 2008-07-14 12:32:06 UTC

Single User Specification v3 says at
http://www.opengroup.org/onlinepubs/009695399/ :

The processing order shall be as follows:

1. An input block is read.
2. If the input block is shorter than the specified input block size and the
sync conversion is specified, null bytes shall be appended to the input data up
to the specified size. (If either block or unblock is also specified, <space>s
shall be appended instead of null bytes.) The remaining conversions and output
shall include the pad characters as if they had been read from the input.
...

paragraph (2) clearly acknowledges the fact that read can be a short one, and it
says nowhere that additional reads should accumulate the full input block. The
fact that such additional reads might block for indeterminate amount of time may
break scripts which do not expect this.

Your assumption stated here: "But in any case, when using ibs we are instructing
dd what the input block size is! So dd should obey our instructions" does not
seem to match what standard says. Standard only says that reads shall be
_attempted_ with that size.

As I see it, changing dd behavior might make you happy, but at the same time
_other_ users will experience breakage ("hung" dd waiting for additional input
which will never come).

> It is still a bug because dd used to run properly.

Which version of dd worked for you?

Comment 4 Denys Vlasenko 2008-07-14 12:35:13 UTC

Sorry, wrong link. The correct one is:

http://www.opengroup.org/onlinepubs/009695399/utilities/dd.html

Comment 5 JW 2008-07-14 13:37:31 UTC

If dd is waiting for "additional input that will never come" then it is either
at the end of file (which would be treated normally and cause termination) or
there is more data to come (eventually).  There is no other possibility.

In any case I am not concerned with what sync behavior does (pad), I am
concerned when sync is NOT specified.  What does your standard say about that? 

paragraph (2) is clearly talking about a condition where sync is specified -
nothing else.

The version that used to work for me was the version that existed 5 years ago.

Comment 6 Kamil Dudka 2008-07-18 07:10:55 UTC

As solution to this problem (and similar #449263) I proposed a patch to 
upstream developers. You can watch discussion in the thread http://
lists.gnu.org/archive/html/bug-coreutils/2008-07/msg00118.html

Comment 7 Kamil Dudka 2008-07-28 10:52:03 UTC

I thing this is not a bug. Option count= specifies sum of full and partial 
records (full = 100000B; partial = less than 100000B in this case).

This is documented dd behavior (standardized by POSIX) and can not be changed.
But there is already solution in Fedora 9 and rawhide - new dd option 
iflag=fullblock, which turns on reading full blocks where possible.

The option iflag=fullblock is available since coreutils-6.10-28.fc9 and 
coreutils-6.12-7.fc10.

Comment 8 Fedora Update System 2008-08-07 14:09:32 UTC

coreutils-6.9-18.fc8 has been submitted as an update for Fedora 8

Comment 9 Fedora Update System 2008-08-12 18:23:27 UTC

coreutils-6.9-18.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Ondrej Vasik 2008-08-19 11:54:02 UTC

Closing CURRENTRELEASE ... it looks like automatic closing bot is too lazy to do that. New option iflag=fullblock added to address the issue in coreutils-6.9-18.fc8.

Note You need to log in before you can comment on or make changes to this bug.