Bug 556338 - Allow hex input
Allow hex input
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: coreutils (Show other bugs)
12
All Linux
low Severity medium
: ---
: ---
Assigned To: Ondrej Vasik
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-17 19:30 EST by JW
Modified: 2010-01-20 06:12 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-01-20 06:12:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description JW 2010-01-17 19:30:03 EST
Description of problem:
By default all integer input to dd is restricted to decimal values.  By allowing hex, yet still defaulting to decimal, a vast improvement in the ease of use of dd in many circumstances is implemented.

Version-Release number of selected component (if applicable):
coreutils-7.6-8

How reproducible:
Always

Steps to Reproduce:
1. dd ibs=0x100

Actual results:
1. dd: invalid number `0x100'

Expected results:
1. no error

Additional info:
There is a single line in funtion parse_integer() that needs to change:
-  enum strtol_error e = xstrtoumax (str, &suffix, 10, &n, "bcEGkKMPTwYZ0");
+  enum strtol_error e = xstrtoumax (str, &suffix, 0, &n, "bcEGkKMPTwYZ0");

It is really that simple.

This change is backward-compatible because entry of a hex number currently causes a syntax error and therefore no legacy scripts will break.
Comment 1 Pádraig Brady 2010-01-18 04:47:48 EST
Well it's a simple code change, but the implications aren't so benign.
If scripts become dependent on this, then they'll not be portable to older and/or different systems. Probably the correct procedure for this would be to get POSIX to specify it after which then we could change. Anyway it's trivial to get this behaviour in a POSIX compliant manner with slightly more syntax:

dd ibs=$((0x100))
Comment 2 JW 2010-01-18 05:28:36 EST
(In reply to comment #1)

> dd ibs=$((0x100))    

I get syntax errors with that.

There obviously isn't a quick and simple way to achieve hex input because it depends on your shell.

Perhaps, since you mention the issue of being "portable to older systems", you should have suggested:

    dd ibs=`(echo ibase=16; echo 100) | bc`

However, not all bc's use ibase, some use ib. So I guess backwards compatability never stopped bc from changing its syntax.

Syntax is even worse using dc.

Anyhow I always patch my version of bc and I have been doing for about 7 years now.  If you want progress you have to change.  If you want total backward compatibility then we should all have stayed living in caves.
Comment 3 JW 2010-01-18 05:29:57 EST
(In reply to comment #2)

> Anyhow I always patch my version of bc ...

Anyhow I always patch my version of 'dd' ...
Comment 4 Pádraig Brady 2010-01-18 05:42:45 EST
I would move 0x internal to dd rather than require bc.
But what /bin/sh variant are you using that doesn't support $((0x100)) ?
/bin/sh on solaris does not support it I know, but that's not saying much.
Comment 5 JW 2010-01-18 05:52:13 EST
(In reply to comment #4)

> But what /bin/sh variant are you using that doesn't support $((0x100)) ?

Doesn't matter what I use ... you have to think what everyone else might be using.
Comment 6 JW 2010-01-19 21:18:53 EST
(In reply to comment #1)
> Well it's a simple code change, but the implications aren't so benign.
> If scripts become dependent on this, then they'll not be portable to older
> and/or different systems. Probably the correct procedure for this would be to
> get POSIX to specify it after which then we could change.

Your argument is a furphy.

Didn't somebody once extend the input syntax for dd so that multiplier suffixes were permitted (eg. K, M, G)?

So wouldn't using 'dd bs=1M ...' break portability of scripts to older systems too?

Perhaps it should really be left entirely to users of a utility to decide whether their use of a new feature might break portability of their script to an older system.  But in any case if they did encounter such a situation they are probably more likely to install a more up-to-date version of 'dd' than adjust their script.

Let's move forward and make things better, rather than doggedly try to keep utilities in the dark ages for some fictitious reverse 'compatibility' argument.

dd will still be backward compatible, but the new feature wont be.  But that happens all of the time.  This patch would not break backward compatibility.  The only break would occur if a new script uses the new feature and the author of the new script still wanted backward compatibility.  In which case he could simply avoid using the new feature.
Comment 7 Ondrej Vasik 2010-01-20 01:42:13 EST
I'll add Jim (other upstream maintainer of coreutils) to CC as well. Let's see his opinion.

JW: It would be better to discuss such feature requests on the upstream list - bug-coreutils@gnu.org - it has bigger audience. Of course - only if the feature is not related to Red Hat / Fedora patches.
Comment 8 Jim Meyering 2010-01-20 02:27:41 EST
Thanks for the suggestion, but that would be an incompatible change,
since "x" already has a documented meaning in some of those contexts.
Note the "xM" example in the documentation:

       The numeric-valued strings above (BYTES and BLOCKS) can be followed
    by a multiplier: `b'=512, `c'=1, `w'=2, `xM'=M, or any of the standard
    block size suffixes like `k'=1024 (*note Block size::).

Here's an example to illustrate that you can use "MxN" for any
numbers M and N:

    $ printf 123456789 | dd count=2x2 bs=1 2>/dev/null; echo
    1234

Here's one of the previous threads where this was proposed:

    http://lists.gnu.org/archive/html/bug-coreutils/2006-03/msg00107.html
Comment 9 JW 2010-01-20 02:42:42 EST
(In reply to comment #8)
> Thanks for the suggestion, but that would be an incompatible change,
> since "x" already has a documented meaning in some of those contexts.

If you know anything about context and parsing you would understand that:
1) The proposal is for a "0x" prefix to a number, and not "x".
2) That nobody will ever use 0xN meaning to multiple N by 0.
3) That 1000xN is unambiguous and does not imply any hex

> Note the "xM" example in the documentation:
> 
>        The numeric-valued strings above (BYTES and BLOCKS) can be followed
>     by a multiplier: `b'=512, `c'=1, `w'=2, `xM'=M, or any of the standard
>     block size suffixes like `k'=1024 (*note Block size::).
> 
> Here's an example to illustrate that you can use "MxN" for any
> numbers M and N:
> 
>     $ printf 123456789 | dd count=2x2 bs=1 2>/dev/null; echo
>     1234
> 

There is no compatibility problem!

Show me an example with ambiguous interpretation, where the use of "x" as a multiplier and "0x" as a hexadecimal prefix cannot be discerned.

I know that there is no problem because I have been using the patch for 7 years now and it simply does not conflict with anything.

Try it yourself!

And let me know when you find some real (and not imagined) problem.
Comment 10 Jim Meyering 2010-01-20 03:54:13 EST
(In reply to comment #9)
...
> If you know anything about context and parsing you would understand that:
> 1) The proposal is for a "0x" prefix to a number, and not "x".
> 2) That nobody will ever use 0xN meaning to multiple N by 0.

I suspect that 0xN would be very unusual with literal command-line use,
but consider a script with code like this:

    n=$(some_function)
    dd count=${n}x$some_factor ...

Currently, the script handles the n=0 case as documented.
With the proposed incompatible change, it could destroy someone's data.

That is why I will not make such a change.
Comment 11 JW 2010-01-20 04:20:44 EST
(In reply to comment #10)

> I suspect that 0xN would be very unusual with literal command-line use,
> but consider a script with code like this:
> 
>     n=$(some_function)
>     dd count=${n}x$some_factor ...
> 

That is very perverse.  It really is scraping the bottom of the probability barrel!

Anyone who:

a) cannot use $(($(some_function) * $some_factor))
b) cannot use [ "$n" -gt 0 ]
c) uses the output of some arbitrary function in a calculation to dd without any constraints or checks
d) cannot use dd count=0${n}x$some_factor instead
e) cannot use dd count='$n x $some_factor' instead

shouldn't be allowed to write scripts.

Of course any problem largely originate with whoever decided that 'x' should be a multiplier symbol rather than the very universal '*'.  Every calculator I can think of uses '*', and none of them uses 'x'.  In fact the use of 'x' should be considered a very serious bug.

But more importantly who on earth thought that it was necessary to add arithmetic expressions to the internal processing of args when clearly most shells have better ways to doing it - and for any arbitrary program too.

What next?  Will we see the addition of internally evaluated arithemtic expresssions to cut, date, gzip, nice, sleep, etc?

It is one thing to allow a range of constant bases but quite another to go against the *nix concept of simple and specific programs and start turning every program into a glorified calculator.

I seriously doubt anyhow that many people would be so stupid as to use dd's xM feature to any extent.  The shell does it better.  Nobody is really going to suddenly break years of habit in using a shell's expression syntax and decide that just for this program (dd) they will use dd's inbuilt (and very very lame) expression evaluation.

I look forward to your decision to break from your extremely defensive attitude and seriously consider what really is the right answer in this situation.  Maintain a stupid multiplier syntax which nobody would really use, or move forward with a simple adjustment to the constant parsing.
Comment 12 Jim Meyering 2010-01-20 04:44:58 EST
> That is very perverse.  It really is scraping the bottom of the probability
> barrel!

Anyone whose data is destroyed as a result of this low-probability failure would not be consoled.

> Anyone who:
>
> a) cannot use $(($(some_function) * $some_factor))

You said that $((...)) evokes a syntax error from your shell.

> b) cannot use [ "$n" -gt 0 ]

No need.  count=0x$anything is currently well defined: count=0
I have scripts that use dd with "count=0".

> c) uses the output of some arbitrary function in a calculation to dd without
> any constraints or checks

No need to assume it's arbitrary.
Let's assume the function returns 0 or 1.

> d) cannot use dd count=0${n}x$some_factor instead

Adding the leading zero changes nothing.

> e) cannot use dd count='$n x $some_factor' instead

That is invalid syntax.  dd would reject it as an invalid number.

> shouldn't be allowed to write scripts.

The real risk is more with legacy scripts than with new ones,
but considering it is a documented feature, it seems unfair
to deride a developer for using it -- assuming they are assured
to be using GNU dd.

GNU dd was written about 20 years ago, and has had support for that MxN
syntax from the beginning.  I suspect it was considered useful because
shells with $((...)) support had not yet been invented.  Thus, any script
written back then, or, more recently, to be portable to older shells
could have used that documented syntax without appearing to be clueless.
Comment 13 Ondrej Vasik 2010-01-20 04:52:43 EST
Based on the upstream reaction closing WONTFIX.
Comment 14 JW 2010-01-20 04:58:29 EST
(In reply to comment #12)
> 
> No need.  count=0x$anything is currently well defined: count=0
> I have scripts that use dd with "count=0".

Why?

> > d) cannot use dd count=0${n}x$some_factor instead
> 
> Adding the leading zero changes nothing.

Well, that just goes to show that you are really just guessing.

The parsing of '00x$some_factor' reduces to '00' followed by 'x$some_factor'
whereas a single '0' as in '0x$some_factor' could be interpreted as a single
hex constant token.

You really are taking an overly defensive attitude when you allow it
to cloud your thinking to such a degree!

> 
> > e) cannot use dd count='$n x $some_factor' instead
> 
> That is invalid syntax.  dd would reject it as an invalid number.
>

That just goes to show how stupid the 'xM' addition was. It doesn't
even handle white space properly in its expression evaluation!
 
> GNU dd was written about 20 years ago, and has had support for that MxN
> syntax from the beginning.  I suspect it was considered useful because
> shells with $((...)) support had not yet been invented.  Thus, any script
> written back then, or, more recently, to be portable to older shells
> could have used that documented syntax without appearing to be clueless.    

What a load of bollocks.  You obviously have heard of 'expr'.  The authors wouldn't have add 'xM' because there was no $((...)) support because 'expr' has been around for an eternity.  Any older scripts would most likely have use 'expr' and not that silly 'xM' feature.
Comment 15 Jim Meyering 2010-01-20 05:45:15 EST
(In reply to comment #14)
> (In reply to comment #12)
> > 
> > No need.  count=0x$anything is currently well defined: count=0
> > I have scripts that use dd with "count=0".
> 
> Why?

Take a look at the 20+ uses in coreutils' test suite.

You're welcome to re-open this bug if you find a way to ensure that the proposed change can result in no ill effect for existing scripts.
Comment 16 JW 2010-01-20 06:09:51 EST
(In reply to comment #15)
>
> Take a look at the 20+ uses in coreutils' test suite.
>

There is not one instance of 'xM'.

So this marvellous 'xM' features is not even tested by the test suite.  Which means that it could be omitted and nobody would care. If it was important then surely the test scripts would test that it still works.

> You're welcome to re-open this bug if you find a way to ensure that the
> proposed change can result in no ill effect for existing scripts.    

Yes, drop the xM (or change the 'x' to '*').  That will have no ill effect on existing scripts.

Alternative insist upon '0X' prefix.  But, yes, that would involve a tedious shift key press, and would require remembering that unlike everywhere else an uppercase 'X' is required, but it would work.  But you would have to rewrite the strtoul() too or do a late conversion of 'X' to 'x'.

Note You need to log in before you can comment on or make changes to this bug.