Red Hat Bugzilla – Bug 191295
Unison fails to syncronise certain directories between Fedora and OS X
Last modified: 2007-11-30 17:11:32 EST
I am encountering a frustrating failure to syncronise between OS X
The end result is always that certain directories simply fail. It is
not obvious to me why they do so. The permissions and ownerships are
all the same as for paths that do work, the structure of the file
names is also fine.
Interestingly I can sync the problem subdirectories fine, providing I
syncronise them on their own. When I attempt to syncronise the
parent directory it fails.
I have tried:
* Only a single thread for copying
* Playing with the HFS resource fork option
* Command line invocation
* Invocation with a preference file
* Making the other side initiate the transfer
Attached are logs of the transfer process, the head and the tail of
the transfer, and the prf files.
Help would be appreciated!
Created attachment 128855 [details]
Log and configuration files
Can you try to create a test situation with a mininum of files
I have been battling with this for some time. Getting something to fail
consistently in a straightforward manner has been difficult. I am beginning to
think its something specific to Darwin (OS X).
There is a new version of unison, which is available for OS X. I am curious to
see if that fixes the problem. I cannot however build the rpm on Fedora, due to
problems in Ocaml. See bug 191296.
I'm stumped. It seems to be a bug with Mac OS X -> FC. Syncronising between
two Fedora Core boxes is fine. Unfortunately I don't have another Mac to test
I will see if this can be resolved via the unison people. In the mean time
could someone try doing a unison sync of their /Users/<blah>/Documents directory
on a Mac with $HOME/Documents under Linux?
(In reply to comment #4)
> I'm stumped. It seems to be a bug with Mac OS X -> FC. Syncronising between
> two Fedora Core boxes is fine.
> I will see if this can be resolved via the unison people. In the mean time
> could someone try doing a unison sync of their /Users/<blah>/Documents directory
> on a Mac with $HOME/Documents under Linux?
Maybe you should ask this on the fedora or fedora-extras list.
It appears that the root cause of the problem were two (hardlinked) files which
resolved to the same canonical name:
One file had the UTF-8 encoded name 'TÃ¶rÃ¶k - Ray Traced System.png'
The other 'ToÌroÌk - Ray Traced System.png'.
Note: The Second name was not To"r"ok (with quote marks), it appears that their
is a *single* character oÌ (looks like o followed by a small speech mark) which
is distinct from Ã¶ (o with an umlaut) except during the canonical name
formulation, hence the collision.
This collision then seems to cause an error which cascades to all files that
have not been considered:
I'm guessing that on Mac OS X the names are identical but under Linux they are
distinct. This would explain why the problem does not occur under Linux.
Solution -> Assuming the above assertion is correct, the "ignorecase" option
should be appended with an option to ignore such encoding collisions.
Work around -> Make sure these names don't occur!
(I appreciate this is easier said than done)
[Peter - If you end up reading this I'm sure you will find it amusing that your
name can cause so much trouble!]
But this is almost as bad as on Windows!
I would say that this is a bug in MacOSX...
Created attachment 130341 [details]
Perl script to generated Unicode file names o with umlaut and equivalent.
Perl script to generate two files with equivalent unicode names. On Mac OS X
only one file is created as the two encodings are the same. On Linux two files
Run with perl <filename>
I think that the correct behaviour is genuinely uncertain. When one looks at
Gnome Character Map U+00F6 - SMALL LETTER O WITH DIAERESIS is marked as
equivalent to U+006F LATIN SMALL LETTER O U+0308 COMBINING DIAERESIS. On Mac OS
X this equivalence is honoured. On Linux it is not, they are treated as seperate.
I suggest that the best behaviour for Unison is to default to what happens with
files that differ only by case. If a file is attempted to be copied that maps
to an existing file name, this colission could be caused by Unicode wierdness or
case insensitivity. They should be dealt with in the same way, by issuing a
I don't understand any OCAML, so cannot offer a solution. What do you think?
So, maybe it is a bug in Linux after all :-) The question is, what function
is used to compare UTF-8 strings and should this function honour the
equivalence, or is this left undefined. If strncmp is used, then the comparison
is probably char based. There is a wcscmp that compares wide strings but UTF-8
is not really wide string... I don't really know about the internal handling
of UTF-8 and locales etc...
On the otherhand, I find it strange and not very useful, that Ã¶ is created using
a combining character on MacOSX.
In any case it would be best to contact upstream for a solution to this problem.
I can see good arguments for the way OS X treats this situation. After all, to
a human we could say: Ã¶ is simply "The letter o with an umlaut" or an individual
entity. It probably depends on your language as to which is the "natural" or
"correct" choice. This is probably the reason for the ambiguity!
The fact that they are equivalent are far as we are concerned visually, implies
that the representation -> U+006F LATIN SMALL LETTER O U+0308 COMBINING
DIAERESIS should fold into the single entity. After all we never see o" written
down. It's meaningless..
> On the otherhand, I find it strange and not very useful, that Ã¶ is created using
> a combining character on MacOSX.
It's not. I think the problem is on Linux. I don't know how I ended up with
these files, I can input the character a number of ways, the most obvious being
a "compose character":
<Compose> o "
Where my "compose" key is the right Windows key, assigned by Xmodmap below
keycode 117 = Multi_key
> I don't really know about the internal handling
> of UTF-8 and locales etc...
I suspect that it's a nightmare of devilish proportions! ;-)
(In reply to comment #11)
> I can see good arguments for the way OS X treats this situation. After all, to
> a human we could say: Ã¶ is simply "The letter o with an umlaut" or an individual
> entity. It probably depends on your language as to which is the "natural" or
> "correct" choice. This is probably the reason for the ambiguity!
Well, I think the ambiguity in this case is really unnecessary, but it's
too late to change it now :-)
Of course there are cases where combining characters are useful,
for example Sanskrit, but then there should not be an alternative entry.
BTW I have found the following:
which deals exactly with this so-called "canonical decomposition".
I close this bug now as WONTFIX, since I don't think this can be fixed in unison