I am encountering a frustrating failure to syncronise between OS X and Linux, Version 1.13.16 The end result is always that certain directories simply fail. It is not obvious to me why they do so. The permissions and ownerships are all the same as for paths that do work, the structure of the file names is also fine. Interestingly I can sync the problem subdirectories fine, providing I syncronise them on their own. When I attempt to syncronise the parent directory it fails. I have tried: * Only a single thread for copying * Playing with the HFS resource fork option * Command line invocation * Invocation with a preference file * Making the other side initiate the transfer Attached are logs of the transfer process, the head and the tail of the transfer, and the prf files. I'm stumped, Help would be appreciated!
Created attachment 128855 [details] Log and configuration files
Can you try to create a test situation with a mininum of files and directories?
I have been battling with this for some time. Getting something to fail consistently in a straightforward manner has been difficult. I am beginning to think its something specific to Darwin (OS X). There is a new version of unison, which is available for OS X. I am curious to see if that fixes the problem. I cannot however build the rpm on Fedora, due to problems in Ocaml. See bug 191296.
I'm stumped. It seems to be a bug with Mac OS X -> FC. Syncronising between two Fedora Core boxes is fine. Unfortunately I don't have another Mac to test alternatives. I will see if this can be resolved via the unison people. In the mean time could someone try doing a unison sync of their /Users/<blah>/Documents directory on a Mac with $HOME/Documents under Linux?
(In reply to comment #4) > I'm stumped. It seems to be a bug with Mac OS X -> FC. Syncronising between > two Fedora Core boxes is fine. > I will see if this can be resolved via the unison people. In the mean time > could someone try doing a unison sync of their /Users/<blah>/Documents directory > on a Mac with $HOME/Documents under Linux? Maybe you should ask this on the fedora or fedora-extras list.
Solved! It appears that the root cause of the problem were two (hardlinked) files which resolved to the same canonical name: One file had the UTF-8 encoded name 'Török - Ray Traced System.png' The other 'ToÌroÌk - Ray Traced System.png'. Note: The Second name was not To"r"ok (with quote marks), it appears that their is a *single* character oÌ (looks like o followed by a small speech mark) which is distinct from ö (o with an umlaut) except during the canonical name formulation, hence the collision. This collision then seems to cause an error which cascades to all files that have not been considered: I'm guessing that on Mac OS X the names are identical but under Linux they are distinct. This would explain why the problem does not occur under Linux. Solution -> Assuming the above assertion is correct, the "ignorecase" option should be appended with an option to ignore such encoding collisions. Work around -> Make sure these names don't occur! (I appreciate this is easier said than done) -ed [Peter - If you end up reading this I'm sure you will find it amusing that your name can cause so much trouble!]
But this is almost as bad as on Windows! I would say that this is a bug in MacOSX...
Created attachment 130341 [details] Perl script to generated Unicode file names o with umlaut and equivalent. Perl script to generate two files with equivalent unicode names. On Mac OS X only one file is created as the two encodings are the same. On Linux two files are created. Run with perl <filename>
I think that the correct behaviour is genuinely uncertain. When one looks at Gnome Character Map U+00F6 - SMALL LETTER O WITH DIAERESIS is marked as equivalent to U+006F LATIN SMALL LETTER O U+0308 COMBINING DIAERESIS. On Mac OS X this equivalence is honoured. On Linux it is not, they are treated as seperate. I suggest that the best behaviour for Unison is to default to what happens with files that differ only by case. If a file is attempted to be copied that maps to an existing file name, this colission could be caused by Unicode wierdness or case insensitivity. They should be dealt with in the same way, by issuing a warning. I don't understand any OCAML, so cannot offer a solution. What do you think?
So, maybe it is a bug in Linux after all :-) The question is, what function is used to compare UTF-8 strings and should this function honour the equivalence, or is this left undefined. If strncmp is used, then the comparison is probably char based. There is a wcscmp that compares wide strings but UTF-8 is not really wide string... I don't really know about the internal handling of UTF-8 and locales etc... On the otherhand, I find it strange and not very useful, that ö is created using a combining character on MacOSX. In any case it would be best to contact upstream for a solution to this problem.
I can see good arguments for the way OS X treats this situation. After all, to a human we could say: ö is simply "The letter o with an umlaut" or an individual entity. It probably depends on your language as to which is the "natural" or "correct" choice. This is probably the reason for the ambiguity! The fact that they are equivalent are far as we are concerned visually, implies that the representation -> U+006F LATIN SMALL LETTER O U+0308 COMBINING DIAERESIS should fold into the single entity. After all we never see o" written down. It's meaningless.. > On the otherhand, I find it strange and not very useful, that ö is created using > a combining character on MacOSX. It's not. I think the problem is on Linux. I don't know how I ended up with these files, I can input the character a number of ways, the most obvious being a "compose character": <Compose> o " Where my "compose" key is the right Windows key, assigned by Xmodmap below keycode 117 = Multi_key > I don't really know about the internal handling > of UTF-8 and locales etc... I suspect that it's a nightmare of devilish proportions! ;-)
(In reply to comment #11) > I can see good arguments for the way OS X treats this situation. After all, to > a human we could say: ö is simply "The letter o with an umlaut" or an individual > entity. It probably depends on your language as to which is the "natural" or > "correct" choice. This is probably the reason for the ambiguity! Well, I think the ambiguity in this case is really unnecessary, but it's too late to change it now :-) Of course there are cases where combining characters are useful, for example Sanskrit, but then there should not be an alternative entry. BTW I have found the following: http://mail.gnome.org/archives/gtk-i18n-list/2001-June/msg00050.html which deals exactly with this so-called "canonical decomposition".
I close this bug now as WONTFIX, since I don't think this can be fixed in unison at all.