Bug 1048324

Summary: Reference to perl (5.18.1-288.fc20.x86_64) utf8 string is Invalid Argument to open
Product: [Fedora] Fedora Reporter: Ross Tyler <rossetyler>
Component: perlAssignee: Jitka Plesnikova <jplesnik>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 20CC: cweyl, iarnell, jplesnik, kasal, perl-devel, ppisar, psabata, rc040203, tcallawa
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: https://rt.perl.org//Public/Bug/Display.html?id=109828
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-07 14:33:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Script that demonstrates the bug. none

Description Ross Tyler 2014-01-03 17:58:22 UTC
Created attachment 845034 [details]
Script that demonstrates the bug.

Description of problem:
#!/usr/bin/perl
my $string = qq{\x{2019}};
open(STRING , '<', \$string) or die "$!: string";

Version-Release number of selected component (if applicable):
perl-5.18.1-288.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run the above perl script (say, bug.pl)

Actual results:
Invalid argument: string at ./bug.pl line 3.

Expected results:
<null>

Additional info:
http://perldoc.perl.org/perlfaq5.html#How-can-I-open-a-filehandle-to-a-string?
Used to work!
Native 8 bit encodings continue to work.
Use old IO::Scalar method as a workaround.

Comment 1 Petr Pisar 2014-01-07 14:33:51 UTC
The truth comes when running the code with enabled warnings:

# perl -we 'my $s = qq{\x{2019}}; open(my $f, q{<}, \$s) or die $!' 2>&1 | splain
Strings with code points over 0xFF may not be mapped into in-memory file
        handles (#1)
    (W utf8) You tried to open a reference to a scalar for read or append
    where the scalar contained code points over 0xFF.  In-memory files
    model on-disk files and can only contain bytes.

This is result of

commit b38d579d7e4fdb6e4abade72630ea777d8c509d9
Author: Tony Cook <tony>
Date:   Fri Jan 25 09:56:01 2013 +1100

    handle reading from a SVf_UTF8 scalar
    
    if the scalar can be downgradable, it is downgraded and the read succeeds.
    
    Otherwise the read fails, producing a warning if enabled and setting
    errno/$! to EINVAL.

which comes from perl bug report <https://rt.perl.org//Public/Bug/Display.html?id=109828>.

The overall conclusion is that file consists always of bytes.

If you don't agree, please open a request at upstream <https://rt.perl.org/Public/>.

Comment 2 Ross Tyler 2014-01-09 05:09:34 UTC
Thanks for the explanation and link.

For the benefit of others,
What is wrong with:

perl -we 'my $s = qq{\x{2019}}; open(my $f, q{<}, \$s) or die $!'

is that it would make assumptions about the perl internal string representation that the language does not guarantee.

Instead an explicit encoding/decoding must be done:

perl -we 'use Encode; my $s = encode(q{utf8}, qq{\x{2019}}); open(my $f, q{<:utf8}, \$s) or die $!'