Description of problem: According to Unicode 4.0 Standard, UCS-4 is just an alias to UTF-32, and UCS-2 is just an alias of UTF-16. Therefore, endian-ness of them should be equal. Really, UCS-2 and UTF-16 are both big-endian. However, iconv() conversion into UCS-4 works differently from conversion into UTF-32. Result of conversion into UTF-32 is big-endian, and result of conversion into UCS-4 is little-endian. I believe this is wrong. Especially, there is no any justification for inconsistency between UTF-16 and UTF-32. Please notice, I mean default names like "UCS-4", without suffixes like "BE". Version-Release number of selected component (if applicable): Found on RHEL WS v4, gcc v3.4.3. How reproducible: I attached a program to reproduce the problem. Not very commercial product though :). Steps to Reproduce: 1. The source gets result for UTF-32 2. To see result for UCS-4, replace the name and recompile. 3. g++ iconvtest.C; ./a.out Actual results: Result of conversion into UTF-32 is big-endian, and result of conversion into UCS-4 is little-endian. Expected results: They both should be big-endian, like UTF-16 & UCS-2 are. Additional info: I was not sure to which component it should be assigned, sorry for that. If you know a more correct person please forward the defect to him/her.
Created attachment 129908 [details] C++ source to reproduce the problem
Can you cite why you think say UCS-2 is an alias for UTF-16? Certainly http://www.unicode.org/reports/tr17/index.html doesn't suggest anything like that, it has always been a different encoding.
That's nonsense. UCS-2 and UCS-4 are standalone encodings.
(In reply to comment #2) > Can you cite why you think say UCS-2 is an alias for UTF-16? > Certainly http://www.unicode.org/reports/tr17/index.html > doesn't suggest anything like that, it has always been a different encoding. > You are correct, my opinion is based on different source. Unicode v4.0 Standard book, page 1350: "As a conseguence, UCS-4 can now be taken effectively as an alias for the Unicode encoding form UTF-32...". In page 1352, list of encodings: "UTF-8, UTF-16 or UCS-4 (=UTF-32)" There is a similar statement about UTF-16 vs UCS-2, but I did not find an exact citate.
(In reply to comment #3) > That's nonsense. UCS-2 and UCS-4 are standalone encodings. Well, but according to Unicode v4.0 Standard book, page 32, an endian order for both of them is platform dependent. Please notice, since (unlike UTF-16 and UTF-32) UCS-2 and UCS-4 converted data is generated witout BOM, customer does not have another way to expect their endian order but by platform. That's why I don't understand why endian order of UCS-2 and UCS-4 might be different in the same system.