Bug 1048324 - Reference to perl (5.18.1-288.fc20.x86_64) utf8 string is Invalid Argument to open
Summary: Reference to perl (5.18.1-288.fc20.x86_64) utf8 string is Invalid Argument to...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: perl
Version: 20
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jitka Plesnikova
QA Contact: Fedora Extras Quality Assurance
URL: https://rt.perl.org//Public/Bug/Displ...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-03 17:58 UTC by Ross Tyler
Modified: 2014-01-09 05:09 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-01-07 14:33:51 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Script that demonstrates the bug. (93 bytes, application/x-perl)
2014-01-03 17:58 UTC, Ross Tyler
no flags Details

Description Ross Tyler 2014-01-03 17:58:22 UTC
Created attachment 845034 [details]
Script that demonstrates the bug.

Description of problem:
#!/usr/bin/perl
my $string = qq{\x{2019}};
open(STRING , '<', \$string) or die "$!: string";

Version-Release number of selected component (if applicable):
perl-5.18.1-288.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run the above perl script (say, bug.pl)

Actual results:
Invalid argument: string at ./bug.pl line 3.

Expected results:
<null>

Additional info:
http://perldoc.perl.org/perlfaq5.html#How-can-I-open-a-filehandle-to-a-string?
Used to work!
Native 8 bit encodings continue to work.
Use old IO::Scalar method as a workaround.

Comment 1 Petr Pisar 2014-01-07 14:33:51 UTC
The truth comes when running the code with enabled warnings:

# perl -we 'my $s = qq{\x{2019}}; open(my $f, q{<}, \$s) or die $!' 2>&1 | splain
Strings with code points over 0xFF may not be mapped into in-memory file
        handles (#1)
    (W utf8) You tried to open a reference to a scalar for read or append
    where the scalar contained code points over 0xFF.  In-memory files
    model on-disk files and can only contain bytes.

This is result of

commit b38d579d7e4fdb6e4abade72630ea777d8c509d9
Author: Tony Cook <tony>
Date:   Fri Jan 25 09:56:01 2013 +1100

    handle reading from a SVf_UTF8 scalar
    
    if the scalar can be downgradable, it is downgraded and the read succeeds.
    
    Otherwise the read fails, producing a warning if enabled and setting
    errno/$! to EINVAL.

which comes from perl bug report <https://rt.perl.org//Public/Bug/Display.html?id=109828>.

The overall conclusion is that file consists always of bytes.

If you don't agree, please open a request at upstream <https://rt.perl.org/Public/>.

Comment 2 Ross Tyler 2014-01-09 05:09:34 UTC
Thanks for the explanation and link.

For the benefit of others,
What is wrong with:

perl -we 'my $s = qq{\x{2019}}; open(my $f, q{<}, \$s) or die $!'

is that it would make assumptions about the perl internal string representation that the language does not guarantee.

Instead an explicit encoding/decoding must be done:

perl -we 'use Encode; my $s = encode(q{utf8}, qq{\x{2019}}); open(my $f, q{<:utf8}, \$s) or die $!'


Note You need to log in before you can comment on or make changes to this bug.