Bug 191295 - Unison fails to syncronise certain directories between Fedora and OS X
Summary: Unison fails to syncronise certain directories between Fedora and OS X
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: unison
Version: 5
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Gérard Milmeister
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-05-10 16:23 UTC by Edward Grace
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-16 20:05:27 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Log and configuration files (7.23 KB, application/octet-stream)
2006-05-10 16:23 UTC, Edward Grace
no flags Details
Perl script to generated Unicode file names o with umlaut and equivalent. (604 bytes, text/plain)
2006-06-01 13:30 UTC, Edward Grace
no flags Details

Description Edward Grace 2006-05-10 16:23:57 UTC
I am encountering a frustrating failure to syncronise between OS X  
and Linux,

Version 1.13.16

The end result is always that certain directories simply fail.  It is  
not obvious to me why they do so.  The permissions and ownerships are  
all the same as for paths that do work, the structure of the file  
names is also fine.

Interestingly I can sync the problem subdirectories fine, providing I  
syncronise them on their own.  When I attempt to syncronise the  
parent directory it fails.

I have tried:

* Only a single thread for copying
* Playing with the HFS resource fork option
* Command line invocation
* Invocation with a preference file
* Making the other side initiate the transfer

Attached are logs of the transfer process, the head and the tail of  
the transfer, and the prf files.

I'm stumped,

Help would be appreciated!

Comment 1 Edward Grace 2006-05-10 16:23:58 UTC
Created attachment 128855 [details]
Log and configuration files

Comment 2 Gérard Milmeister 2006-05-10 17:32:42 UTC
Can you try to create a test situation with a mininum of files
and directories?

Comment 3 Edward Grace 2006-05-16 08:24:57 UTC
I have been battling with this for some time.  Getting something to fail
consistently in a straightforward manner has been difficult.  I am beginning to
think its something specific to Darwin (OS X).

There is a new version of unison, which is available for OS X.  I am curious to
see if that fixes the problem.  I cannot however build the rpm on Fedora, due to
problems in Ocaml.  See bug 191296.

Comment 4 Edward Grace 2006-05-25 14:42:11 UTC
I'm stumped.  It seems to be a bug with Mac OS X -> FC.  Syncronising between
two Fedora Core boxes is fine.  Unfortunately I don't have another Mac to test
alternatives.

I will see if this can be resolved via the unison people.  In the mean time
could someone try doing a unison sync of their /Users/<blah>/Documents directory
on a Mac with $HOME/Documents under Linux?

Comment 5 Gérard Milmeister 2006-05-26 17:01:07 UTC
(In reply to comment #4)
> I'm stumped.  It seems to be a bug with Mac OS X -> FC.  Syncronising between
> two Fedora Core boxes is fine.
> I will see if this can be resolved via the unison people.  In the mean time
> could someone try doing a unison sync of their /Users/<blah>/Documents directory
> on a Mac with $HOME/Documents under Linux?
Maybe you should ask this on the fedora or fedora-extras list.

Comment 6 Edward Grace 2006-05-31 14:33:20 UTC
Solved!

It appears that the root cause of the problem were two (hardlinked) files which
resolved to the same canonical name:

One file had the UTF-8 encoded name 'Török - Ray Traced System.png'
The other 'ToÌroÌk - Ray Traced System.png'.

Note: The Second name was not To"r"ok (with quote marks), it appears that their
is a *single* character oÌ (looks like o followed by a small speech mark) which
is distinct from ö (o with an umlaut) except during the canonical name
formulation, hence the collision.


This collision then seems to cause an error which cascades to all files that
have not been considered:


I'm guessing that on Mac OS X the names are identical but under Linux they are
distinct.  This would explain why the problem does not occur under Linux.

Solution -> Assuming the above assertion is correct, the "ignorecase" option
should be appended with an option to ignore such encoding collisions.

Work around -> Make sure these names don't occur!

(I appreciate this is easier said than done)

-ed

[Peter - If you end up reading this I'm sure you will find it amusing that your
name can cause so much trouble!]

Comment 7 Gérard Milmeister 2006-05-31 15:44:58 UTC
But this is almost as bad as on Windows!
I would say that this is a bug in MacOSX...

Comment 8 Edward Grace 2006-06-01 13:30:09 UTC
Created attachment 130341 [details]
Perl script to generated Unicode file names o with umlaut and equivalent.

Perl script to generate two files with equivalent unicode names.  On Mac OS X
only one file is created as the two encodings are the same.  On Linux two files
are created.

Run with perl <filename>

Comment 9 Edward Grace 2006-06-01 13:37:44 UTC
I think that the correct behaviour is genuinely uncertain.  When one looks at
Gnome Character Map U+00F6 - SMALL LETTER O WITH DIAERESIS is marked as
equivalent to U+006F LATIN SMALL LETTER O U+0308 COMBINING DIAERESIS.  On Mac OS
X this equivalence is honoured.  On Linux it is not, they are treated as seperate.

I suggest that the best behaviour for Unison is to default to what happens with
files that differ only by case.  If a file is attempted to be copied that maps
to an existing file name, this colission could be caused by Unicode wierdness or
case insensitivity.  They should be dealt with in the same way, by issuing a
warning.

I don't understand any OCAML, so cannot offer a solution.  What do you think?

Comment 10 Gérard Milmeister 2006-06-01 17:02:41 UTC
So, maybe it is a bug in Linux after all :-) The question is, what function
is used to compare UTF-8 strings and should this function honour the
equivalence, or is this left undefined. If strncmp is used, then the comparison
is probably char based. There is a wcscmp that compares wide strings but UTF-8
is not really wide string... I don't really know about the internal handling
of UTF-8 and locales etc...
On the otherhand, I find it strange and not very useful, that ö is created using
a combining character on MacOSX.
In any case it would be best to contact upstream for a solution to this problem.

Comment 11 Edward Grace 2006-06-01 17:23:54 UTC
I can see good arguments for the way OS X treats this situation.  After all, to
a human we could say: ö is simply "The letter o with an umlaut" or an individual
entity.  It probably depends on your language as to which is the "natural" or
"correct" choice.  This is probably the reason for the ambiguity!

The fact that they are equivalent are far as we are concerned visually, implies
that the representation -> U+006F LATIN SMALL LETTER O U+0308 COMBINING
DIAERESIS should fold into the single entity.  After all we never see o" written
down.  It's meaningless..

> On the otherhand, I find it strange and not very useful, that ö is created using
> a combining character on MacOSX.

It's not.  I think the problem is on Linux.  I don't know how I ended up with
these files, I can input the character a number of ways, the most obvious being
a "compose character":


<Compose> o "

Where my "compose" key is the right Windows key, assigned by Xmodmap below

keycode 117 = Multi_key



> I don't really know about the internal handling
> of UTF-8 and locales etc...

I suspect that it's a nightmare of devilish proportions! ;-)



Comment 12 Gérard Milmeister 2006-06-01 17:58:59 UTC
(In reply to comment #11)
> I can see good arguments for the way OS X treats this situation.  After all, to
> a human we could say: ö is simply "The letter o with an umlaut" or an individual
> entity.  It probably depends on your language as to which is the "natural" or
> "correct" choice.  This is probably the reason for the ambiguity!
Well, I think the ambiguity in this case is really unnecessary, but it's
too late to change it now :-)
Of course there are cases where combining characters are useful,
for example Sanskrit, but then there should not be an alternative entry.

BTW I have found the following:
http://mail.gnome.org/archives/gtk-i18n-list/2001-June/msg00050.html
which deals exactly with this so-called "canonical decomposition".

Comment 13 Gérard Milmeister 2006-10-16 20:05:27 UTC
I close this bug now as WONTFIX, since I don't think this can be fixed in unison
at all.


Note You need to log in before you can comment on or make changes to this bug.