Bug 1180035

Summary: regex: sub_match does not form half open interval
Product: Red Hat Enterprise Linux 7 Reporter: Miroslav Franc <mfranc>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.1CC: fche, fweimer, jwakely, mfranc, mpolacek, ohudlick
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-08 08:07:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miroslav Franc 2015-01-08 07:21:59 UTC
Description of problem:

When extracting regex submatches with regex_match function I would expect the first iterator of every sub_match to point to the start of the matched sequence and the second one to point one element past the end.  This does not seem to be the case.  The iterators form open instead of half open interval.


--- regex.cc ---
#include <cstddef>
#include <regex>
#include <string>
#include <iostream>

using namespace std;

int main ()
{ 
  const regex r ("xxx(y*)xxx(y*)xxx(y*)xxx");
  string s;
  smatch m;

  getline (cin, s);

  if (regex_match (s, m, r))
    { 
      for (unsigned int i = 1; i < m.size (); ++i)
        { 
          const size_t b = m[i].first - s.begin ();
          const size_t e = m[i].second - s.begin ();
          string markers (s.size (), ' ');
          markers[b] = '^'; markers[e] = '^';

          cout << "submatch: " << m[i] << '\n';
          cout << "length: " << m[i].length () << '\n';
          cout << "interval: [ " << b << ", " << e << " )\n";
          cout << "first iterator points to: " << *m[i].first << '\n';
          cout << s << '\n';
          cout << markers << endl;
        }
    }
}
--- ---  --- ---


Version-Release number of selected component (if applicable):
libstdc++-4.8.3-9.el7


Steps to Reproduce:
1. g++ -O2 -std=c++11 -g    regex.cc   -o regex
2. ./regex <<<xxxyyyxxxyxxxxxx


Actual results:
submatch: xyyy
length: 4
interval: [ 2, 6 )
first iterator points to: x
xxxyyyxxxyxxxxxx
  ^   ^
submatch: xy
length: 2
interval: [ 8, 10 )
first iterator points to: x
xxxyyyxxxyxxxxxx
        ^ ^
submatch: x
length: 1
interval: [ 12, 13 )
first iterator points to: x
xxxyyyxxxyxxxxxx
            ^^


Expected results:
submatch: yyy
length: 3
interval: [ 3, 6 )
first iterator points to: y
xxxyyyxxxyxxxxxx
   ^  ^         
submatch: y
length: 1
interval: [ 9, 10 )
first iterator points to: y
xxxyyyxxxyxxxxxx
         ^^     
submatch: 
length: 0
interval: [ 13, 13 )
first iterator points to: x
xxxyyyxxxyxxxxxx
             ^  


Additional info:

Comment 1 Jakub Jelinek 2015-01-08 07:50:32 UTC
Are you talking here really about the system libstdc++?  Because in GCC 4.8-RH <regex> is not implemented, there are some headers and perhaps some things compile, but that is about it (at least that is my understanding):
https://gcc.gnu.org/onlinedocs/gcc-4.8.4/libstdc++/manual/manual/status.html
says:
28 	Regular expressions
28.1	General	N	 
28.2	Definitions	N	 
28.3	Requirements	N	 
28.4	Header <regex> synopsis	N	 
28.5	Namespace std::regex_constants	Y	 
28.6	Class regex_error	Y	 
28.7	Class template regex_traits	Partial	 
28.8	Class template basic_regex	Partial	 
28.9	Class template sub_match	Partial	 
28.10	Class template match_results	Partial	 
28.11	Regular expression algorithms	N	 
28.12	Regular expression Iterators	N	 
28.13	Modified ECMAScript regular expression grammar	N	 

For usable <regex> you need GCC 4.9 or later, which has:
28 	Regular expressions
28.1	General	Y	 
28.2	Definitions	Y	 
28.3	Requirements	Y	 
28.4	Header <regex> synopsis	Y	 
28.5	Namespace std::regex_constants	Y	 
28.6	Class regex_error	Y	 
28.7	Class template regex_traits	Partial	transform_primary is not correctly implemented	 
28.8	Class template basic_regex	Y	 
28.9	Class template sub_match	Y	 
28.10	Class template match_results	Y	 
28.11	Regular expression algorithms	Y	 
28.12	Regular expression Iterators	Y	 
28.13	Modified ECMAScript regular expression grammar	Y

Comment 3 Florian Weimer 2019-10-23 13:31:20 UTC
*** Bug 1764617 has been marked as a duplicate of this bug. ***