Bug 573060

Summary: DN normalizer: ESC HEX HEX is not normalized
Product: [Retired] 389 Reporter: Noriko Hosoi <nhosoi>
Component: Directory ServerAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED CURRENTRELEASE QA Contact: Viktor Ashirov <vashirov>
Severity: medium Docs Contact:
Priority: low    
Version: 1.2.6CC: andrey.ivanov, nkinder, rmeggins
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-07 16:49:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 434914    
Attachments:
Description Flags
git diff
none
git patch file
nhosoi: review?, rmeggins: review+
MODRDN ldif to pure ASCII RDN
none
MODRDN ldif to non-ASCII RDN none

Description Noriko Hosoi 2010-03-12 18:30:08 UTC
Description of problem:
Reported by Andrey Ivanov following the bug 570107:
I've compiled the sources from git checkout that i've just made.
Up to the test 4 everything is just like you describe(the import in base64 is
corrected). Starting from 5 you use ldapmodify of mozldap. I use ldapmodify of
OpenLDAP. And the representation of the dn is not the same. One can escape
UTF-8 symbols (i think it's part of rfc4514), so for example ldapvi(based on
ldapsearch) generates the following for the DN:

newrdn: cn=ATELIER M\C3\89CANIQUE

In other words, if i use ldapmodify of openldap starting from 5) 
[root@ldap-model Admin]# /usr/bin/ldapmodify -x -D 'cn=Directory Manager' -w
'<mdp>'
dn: cn=ATELIER MECANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu
changetype: modrdn
newrdn: cn=ATELIER M\C3\89CANIQUE
deleteoldrdn: 1
newsuperior: ou=Objets,dc=id,dc=polytechnique,dc=edu

modifying rdn of entry "cn=ATELIER
MECANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu"
rename completed

The resulting entry is exactly as i stated above. Actually there are two
problems :
* incorrect resulting entry :
/usr/bin/ldapsearch -x -D 'cn=Directory manager' -w '<mdp>' -b
"dc=id,dc=polytechnique,dc=edu" cn=* 

# ATELIER M\C3\89CANIQUE, Objets, id.polytechnique.edu
dn: cn=ATELIER M\C3\89CANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
uid: toto
sn: toto
cn: ATELIER MC389CANIQUE

* absence of the MODRDN line in accesslog for this modification (and the
presence of RESULT line for the operation, cf op=1, tag-109 is the result of
moddn operation):
==> /Logs/Ldap/access <==
[12/Mar/2010:16:55:29 +0100] conn=4 fd=128 slot=128 connection from 127.0.0.1
to 127.0.0.1
[12/Mar/2010:16:55:29 +0100] conn=4 op=0 BIND dn="cn=Directory Manager"
method=128 version=3
[12/Mar/2010:16:55:29 +0100] conn=4 op=0 RESULT err=0 tag=97 nentries=0
etime=0.010000 dn="cn=Directory Manager"
[12/Mar/2010:16:55:34 +0100] conn=4 op=1 RESULT err=0 tag=109 nentries=0
etime=0.001000
[12/Mar/2010:16:55:36 +0100] conn=4 op=2 UNBIND
[12/Mar/2010:16:55:36 +0100] conn=4 op=2 fd=128 closed - U1
==> /Logs/Ldap/audit <==
time: 20100312165534
dn: cn=atelier de mecanique,ou=objets,dc=id,dc=polytechnique,dc=edu
changetype: modrdn
newrdn: cn=ATELIER DE M\C3\89CANIQUE
deleteoldrdn: 1

Comment 1 Noriko Hosoi 2010-03-13 00:36:51 UTC
Created attachment 399793 [details]
git diff

This fix is still a preliminary one.

I've found one bug in substr_dn_normalize which was failing to convert ESC HEX HEX format into a UTF-8 character.  In addition, if we set the normalized DN to the DN in the entry, this problem is solved.  Is it acceptable?

The normalization removes unescaped spaces around the separators, removes unnecessary escapes, and convert the ESC HEX HEX format into a corresponding character (or a part of a character).

Comment 2 Noriko Hosoi 2010-03-13 01:33:39 UTC
Created attachment 399795 [details]
git patch file

Description: there were 2 bugs handling ESC HEX HEXT format.
It was ignoring non-ASCII characters.  Now, they are covered.

Comment 3 Andrey Ivanov 2010-03-14 12:55:24 UTC
Hi Noriko,
I am testing your patch. I don't know whether it's a good idea to change the RDN format in entryrdn.dn4. We have to make sure that all the operations on the database read/write the same DN format. The simplest way to do it will be to stick to the initial format. The thing is that if i import the ldif from the bug 570107 and then make two renames (forth and back)  i do not return to the initial database state:

Just after importing .ldif (results of mozilla/openldap ldapsearch and the dump of entryrdn.db4) :

[root@ldap-model ~]# /usr/bin/ldapsearch -x -D 'cn=Directory Manager' -w '<mdp>' -b "dc=id,dc=polytechnique,dc=edu" cn=*             
# extended LDIF
#
# LDAPv3
# base <dc=id,dc=polytechnique,dc=edu> with scope subtree
# filter: cn=*
# requesting: ALL
#

# ATELIER DE M\C3\89CANIQUE, Objets, id.polytechnique.edu
dn:: Y249QVRFTElFUiBERSBNw4lDQU5JUVVFLG91PU9iamV0cyxkYz1pZCxkYz1wb2x5dGVjaG5pc
 XVlLGRjPWVkdQ==
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
cn:: QVRFTElFUiBERSBNw4lDQU5JUVVF
uid: toto
sn: toto

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

[root@ldap-model ~]# /usr/lib64/mozldap/ldapsearch -e  -D 'cn=Directory Manager' -w '<mdp>' -b "dc=id,dc=polytechnique,dc=edu" cn=*
version: 1
dn: cn=ATELIER DE MÉCANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
cn: ATELIER DE MÉCANIQUE
uid: toto
sn: toto


[root@ldap-model ~]# dbscan -f entryrdn.db4
2:ou=objets
  ID: 2; RDN: "ou=Objets"; NRDN: "ou=objets"
3:cn=atelier de mécanique
  ID: 3; RDN: "cn=ATELIER DE MÉCANIQUE"; NRDN: "cn=atelier de mécanique"
C1:dc=id,dc=polytechnique,dc=edu
  ID: 2; RDN: "ou=Objets"; NRDN: "ou=objets"
C2:ou=objets
  ID: 3; RDN: "cn=ATELIER DE MÉCANIQUE"; NRDN: "cn=atelier de mécanique"
P2:ou=objets
  ID: 1; RDN: "dc=id,dc=polytechnique,dc=edu"; NRDN: "dc=id,dc=polytechnique,dc=edu"
P3:cn=atelier de mécanique
  ID: 2; RDN: "ou=Objets"; NRDN: "ou=objets"
dc=id,dc=polytechnique,dc=edu
  ID: 1; RDN: "dc=id,dc=polytechnique,dc=edu"; NRDN: "dc=id,dc=polytechnique,dc=edu"

We see the UTF-8 symbols "normally". both ldapsearch utilities show the correct DNs.

Rename twice the entry (first with forth.ldif, then with back.ldif attached to this bug) :

[root@ldap-model ~]# /usr/bin/ldapmodify -x -D 'cn=Directory Manager' -w '<mdp>' -f /tmp/forth.ldif                                    
modifying rdn of entry "cn=ATELIER DE MÉCANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu"
rename completed

[root@ldap-model ~]# /usr/bin/ldapmodify -x -D 'cn=Directory Manager' -w '<mdp>' -f /tmp/back.ldif 
modifying rdn of entry "cn=ATELIER DE MECANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu"
rename completed


After these renames :
[root@ldap-model ~]# /usr/bin/ldapsearch -x -D 'cn=Directory Manager' -w '<mdp>' -b "dc=id,dc=polytechnique,dc=edu" cn=*
# extended LDIF
#
# LDAPv3
# base <dc=id,dc=polytechnique,dc=edu> with scope subtree
# filter: cn=*
# requesting: ALL
#

# ATELIER DE M\C3\89CANIQUE, Objets, id.polytechnique.edu
dn: cn=ATELIER DE M\C3\89CANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
uid: toto
sn: toto
cn:: QVRFTElFUiBERSBNw4lDQU5JUVVF

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

[root@ldap-model ~]# /usr/lib64/mozldap/ldapsearch -e  -D 'cn=Directory Manager' -w '<mdp>' -b "dc=id,dc=polytechnique,dc=edu" cn=*
version: 1
dn: cn=ATELIER DE M\C3\89CANIQUE,ou=Objets,dc=id,dc=polytechnique,dc=edu
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
uid: toto
sn: toto
cn: ATELIER DE MÉCANIQUE



[root@ldap-model ~]# dbscan -f entryrdn.db4
2:ou=objets
  ID: 2; RDN: "ou=Objets"; NRDN: "ou=objets"
3:cn=atelier de mécanique
  ID: 3; RDN: "cn=ATELIER DE M\C3\89CANIQUE"; NRDN: "cn=atelier de mécanique"
C1:dc=id,dc=polytechnique,dc=edu
  ID: 2; RDN: "ou=Objets"; NRDN: "ou=objets"
C2:ou=objets
  ID: 3; RDN: "cn=ATELIER DE M\C3\89CANIQUE"; NRDN: "cn=atelier de mécanique"
P2:ou=objets
  ID: 1; RDN: "dc=id,dc=polytechnique,dc=edu"; NRDN: "dc=id,dc=polytechnique,dc=edu"
P3:cn=atelier de mécanique
  ID: 2; RDN: "ou=Objets"; NRDN: "ou=objets"
dc=id,dc=polytechnique,dc=edu
  ID: 1; RDN: "dc=id,dc=polytechnique,dc=edu"; NRDN: "dc=id,dc=polytechnique,dc=edu"

We see that the same RDN is represented in entryrdn.db4 (and seen by both ldapsearch utils) differently from the initial database state. So i think we should stick to the initial entryrdn format in order to be consistent with the rest of 389 engine and ldap clients.

As for the second problem (absence of logs) i will open a new bug.

Comment 4 Andrey Ivanov 2010-03-14 12:59:19 UTC
Created attachment 399976 [details]
MODRDN ldif to pure ASCII RDN

Comment 5 Andrey Ivanov 2010-03-14 13:00:11 UTC
Created attachment 399977 [details]
MODRDN ldif to non-ASCII RDN

Comment 6 Noriko Hosoi 2010-03-15 19:28:50 UTC
Hi Andrey.  Thank you for your testing and comments (always)!!

> We see that the same RDN is represented in entryrdn.db4 (and seen by both
> ldapsearch utils) differently from the initial database state. So i think we
> should stick to the initial entryrdn format in order to be consistent with the
> rest of 389 engine and ldap clients.

I wonder your concern is the contents difference of entryrdn before and after the modrdn like this?

[before]
3:cn=atelier de mécanique
  ID: 3; RDN: "cn=ATELIER DE MÉCANIQUE"; NRDN: "cn=atelier de mécanique"

[after]
3:cn=atelier de mécanique
  ID: 3; RDN: "cn=ATELIER DE M\C3\89CANIQUE"; NRDN: "cn=atelier de mécanique"

We are basically respecting the user input for presenting the entry, while the real operation is done against the normalized string (NRDN, in this case, normalized + case lowered).

So, even if the current entry looks like this:
dn: cn=ATELIER DE M\C3\89CANIQUE,dc=example,dc=com
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
uid: toto
sn: toto
cn: ATELIER DE MÉCANIQUE

This operation works fine against cn=ATELIER DE M\C3\89CANIQUE,dc=example,dc=com.
$ ldapmodify -D 'cn=directory manager' -w <pw>
dn: cn=ATELIER DE MÉCANIQUE,dc=example,dc=com
changetype: modrdn
newrdn: cn=ATELIER DE MECANIQUE
deleteoldrdn: 1

Then, if you give cn=ATELIER DE MÉCANIQUE to newrdn, search returns this dn: cn=ATELIER DE MÉCANIQUE,dc=example,dc=com

and entryrdn stores the given string:
#:cn=atelier de mécanique
    ID: #; RDN: "cn=ATELIER DE MÉCANIQUE"; NRDN: "cn=atelier de mécanique"

Please let us know if you see any problems with this usage.  We respect and rely on your depthful insight.

Comment 7 Andrey Ivanov 2010-03-15 20:16:09 UTC
Hi Noriko,

Thank you for your explications! I see your point better now (respecting the preferred user representation and operating with normalised DNs).

The thing that bothers me is that for exactly the same CN ("ATELIER DE MÉCANIQUE") the server code can return two different DN representations even though the entries are exactly the same.

In other words, the essential question is - should the DN returned by the server search operations be normalised or not? Should it be returned in some "canonical" way or should it have a "user-preferred" look. I have not read all the LDAP RFCs to have an "obvious" answer. The common sense (the mine!:)) ) tells me that the returned DNs should be normalized (as NRDN). But if RFCs do not have any clearly stated policy on this, both points of view are acceptable and reasonable.

And i do not have any more "depthful insight" than the others :) The problem is that i use 389 in french-language environment, so the accents management is a real day-to-day pain :))

Comment 8 Andrey Ivanov 2010-03-15 20:23:41 UTC
It seems that there are only some recommendations in section 2.4 of RFC4514... Both UTF-8 and escaped representations seem to be acceptable in DN:

2.4.  Converting an AttributeValue from ASN.1 to a String
...
   Otherwise, if the AttributeValue is of a syntax that has a LDAP-
   specific string encoding, the value is converted first to a UTF-8-
   encoded Unicode string according to its syntax specification (see
   [RFC4517], Section 3.3, for examples).  If that UTF-8-encoded Unicode
   string does not have any of the following characters that need
   escaping, then that string can be used as the string representation
   of the value.

      - a space (' ' U+0020) or number sign ('#' U+0023) occurring at
        the beginning of the string;

      - a space (' ' U+0020) character occurring at the end of the
        string;

      - one of the characters '"', '+', ',', ';', '<', '>',  or '\'
        (U+0022, U+002B, U+002C, U+003B, U+003C, U+003E, or U+005C,
        respectively);

      - the null (U+0000) character.

   Other characters may be escaped.

   Each octet of the character to be escaped is replaced by a backslash
   and two hex digits, which form a single octet in the code of the
   character.  Alternatively, if and only if the character to be escaped
   is one of

      ' ', '"', '#', '+', ',', ';', '<', '=', '>', or '\'
      (U+0020, U+0022, U+0023, U+002B, U+002C, U+003B,
       U+003C, U+003D, U+003E, U+005C, respectively)

   it can be prefixed by a backslash ('\' U+005C).

   Examples of the escaping mechanism are shown in Section 4.

Comment 9 Noriko Hosoi 2010-03-15 22:01:32 UTC
I ran the same test against the openldap and found even if we give RDN including "ESC HEX HEX" characters, converted/normalized DN is returned.  That is, the search result is consistent.  Considering the consistency vs. keeping the raw input from user, we could conclude that it's more important to maintain the consistency of the entry.

$ ldapmodify -D <mgr> -w <pw>
dn: cn=ATELIER DE MÉCANIQUE,dc=example,dc=com
changetype: modrdn
newrdn: cn=ATELIER DE M\C3\89CANIQUE
deleteoldrdn: 1

$ ldapsearch -D <mgr> -w <pw> -b "dc=example,dc=com" "(cn=*)" -e
dn: cn=ATELIER DE MÉCANIQUE,dc=example,dc=com
objectClass: top
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
uid: toto
sn: toto
cn: ATELIER DE MÉCANIQUE

Comment 10 Noriko Hosoi 2010-03-16 20:34:17 UTC
Hi Andrey,

I've summarized the tasks to support the DN consistency here:

http://directory.fedoraproject.org/wiki/Upgrade_to_New_DN_Format

If you could give me your comments, I'd greatly appreciate it.

Since the task has become larger than this bug, let me commit this attachment and mark "MODIFIED".
Created an attachment (id=399795) [details]
git patch file

Thanks!
--noriko

Comment 11 Noriko Hosoi 2010-03-16 20:39:04 UTC
Reviewed by Rich (Thanks!!!)

Pushed to master.

$ git merge escape
Updating 1ef0ec9..23bf606
Fast forward
 ldap/servers/slapd/dn.c   |    4 +---
 ldap/servers/slapd/util.c |    4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

$ git push
Counting objects: 13, done.
Delta compression using 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 775 bytes, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   1ef0ec9..23bf606  master -> master

Pushed to Directory_Server_8_2_Branch.

$ git push origin ds82-local:Directory_Server_8_2_Branch
Counting objects: 13, done.
Delta compression using 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 772 bytes, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   87d2477..81de991  ds82-local -> Directory_Server_8_2_Branch

Comment 12 Noriko Hosoi 2010-06-25 23:37:48 UTC
This bug is covered by the acceptance test Bob -> add -> bug199923 - 10 (rfc4514-10.ldif).

Test Name 	PASS 	FAIL
Bob run 	100% (138/138)