Bug 186382
Summary: | system-config-users fails to start when usernames contain iso-8859-1 characters | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Håkon Løvdal <hlovdal> | ||||||||
Component: | system-config-users | Assignee: | Nils Philippsen <nphilipp> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4 | CC: | mitr, nphilipp | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libuser-0.53.7-1.fc4.1 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-03-10 04:20:57 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Håkon Løvdal
2006-03-23 09:30:40 UTC
Mirek, can you look at this one? It stems from the following piece of code: # try to get a name to associate with the user's primary gid, # and attempt to minimize lookups by caching answers try: gidNumber = user.get(libuser.GIDNUMBER)[0] except: messageDialog.show_message_dialog(_("The user database cannot be read. This problem is most likely caused by a mismatch in /etc/passwd and /etc/shadow. The program will exit now.")) os._exit(0) Putting a "raise" in between shows that it raises an IndexError. Is there a way for s-c-user to detect via libuser whether there have been errors while reading the user data, where they are and what the problem is? I can't reproduce this: with /etc/passwd containing this single line, gidNumber is extracted correctly and s-c-users blows up a bit later with: File "/usr/share/system-config-users/mainWindow.py", line 485, in populate_user_list gecos = unicode (gecos, 'utf-8') UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data HÃ¥kon, can you please provide an example set of {passwd,shadow,group,gshadow} demonstrating the problem? Please edit out any passwords or other confidential information. You can copy the files from /etc to a different directory and use the "directory = " variables in the [files] and [shadow] sections of /etc/libuser.conf to point to the directory with modified files if you want to edit them without the danger of damaging your main configuration. I'm sorry, I see it now - I was not using the fc4 version of libuser. Created attachment 128709 [details]
Fix double free
Created attachment 128710 [details]
Fix user name matching in users_enumerate_full
These two backported patches fix the exception above, but s-c-u then crashes with Traceback (most recent call last): File "scu/system-config-users.py", line 45, in ? mainWindow.mainWindow() File "/home/mitr/t/scu/mainWindow.py", line 280, in __init__ self.refresh() File "/home/mitr/t/scu/mainWindow.py", line 459, in refresh self.populate_lists() File "/home/mitr/t/scu/mainWindow.py", line 550, in populate_lists self.populate_user_list() File "/home/mitr/t/scu/mainWindow.py", line 490, in populate_user_list gecos = unicode (gecos, 'utf-8') UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data In #74058c7 I argued for interpreting pw_gecos using the LC_CTYPE of the current locale. I still think it is still the right thing to do, OTOH I wouldn't mind much if s-c-u just required UTF-8. Nils, what do you think? Hmm. I think it would be correct to have system-data (e.g. what's in /etc/passwd) in UTF-8 and tools should convert that to other character sets (e.g. ISO-8859-X) if needed. If s-c-users encounters non-UTF-8 data there, it could either: a) silently assume that the data is stored in the character set as in LC_CTYPE b) notify the user about the problem, perhaps offering to convert the entry to the new character set (as it will be stored in UTF-8 anyway upon changes). What do you think? That would be a reasonable new design, but AFAIK we are constrained by keeping get{pw,gr}{ent,by*} compatible: These functions do no charset conversion now, and from the API point of view it "makes sense" that their output should be in the charset specified by LC_CTYPE, so /etc/* should be really stored in the "system LC_CTYPE" charset. I agree that get{pw,gr}{ent,by}* don't do charset conversion, but no documentation says anything about the charset used therein. For all I know you can only expect a stream of data coming out there, because how is "system LC_CTYPE" defined? AFAIK, /etc/sysconfig/i18n isn't standard anyway (and parsing it whenever trying to read such a file is gross IMHO). Storing stuff in UTF-8 (the agreed upon superset of the charsets in question) seems the most sensible thing to do, because apps or tools can easily convert that over to other charsets if needed -- they have all the required information. "system LC_CTYPE" is not explicit, it is just "what the default configuration is". In practice all programs running on a system have to use the same LC_CTYPE value anyway, because LC_CTYPE also specifies the encoding of file names. While apps or tools can convert pw_gecos as needed, they don't currently do so and IMHO they aren't likely to start; { printf ("%s", pw->pw_gecos); } should Just Work. Created attachment 129356 [details]
Use the LC_CTYPE-specified encoding for pw_gecos
Nils, is this OK with you?
Mirek, thanks for the patch. But I'm still not quite sure that it's the correct thing to do for the future as that way we might end up with different encodings for different entries of /etc/passwd. Consider this scenario: To make it complicated enough, let's assume e.g. a student-managed server for an international college with for example a French, a Russian and a Japanese speaking administrator. The French uses an ISO8859 locale, the Russian UTF-8 and the Japanese an SJIS locale. Students approach them with the usual requests for new and changed accounts, passwords to be reset and so on. If passwd e.a. contain different encodings for different entries this is just begging for disaster IMO. My approach would be to use an encoding that is able to represent all alternatives that pretty much leaves us with Unicode charsets and UTF-8 is a good candidate because it is semi-compatible to many other encodings) and let apps convert at runtime if needed. I think that applications depending on data shared between apps to be in a certain encoding (current LC_CTYPE in our case) can be considered flawed these days. Perhaps we should bring up this discussion with a larger audience, say fedora-devel-list? In that case I'd write a summary. Let me know what you think. Let me rephrase that last sentence of the second to last paragraph: I think that applications depending on data shared between apps to be in a certain encoding (current LC_CTYPE in our case) that can't represent the other encodings as well can be considered flawed these days. DB lost this: ------- Additional Comments From Miroslav Trmac 2006-06-10 16:28 EST ------- Comment #13 has not really clarified the above - it really seems to advocate a restriction to ASCII. The complicated scenario above really indicates that the "system locale" should be UTF-8 in such multilingual environments; it doesn't provide evidence for the necessity to always use UTF-8 in /etc/passwd. In a multilingual environment (global multi-user systems, internet mail, ...) UTF-8 is a huge win; nevertheless there are a lot of deployments standardized on ISO-8859-x which are restricted to one country, or even a single user; such people may be perfectly happy with ISO-8859-x and I can see no reason to break their environment. Compare with http://blog.gmane.org/gmane.comp.internationalization.linux/month=20011101 suggesting to use MIME; using UTF-8 in non-UTF-8 locales is comparably inpractical. Feel free to bring this up to fedora-devel if you like. libuser-0.53.7-1.fc4.1 has been pushed for fc4, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report. libuser-0.53.7-1.fc4.1 has been pushed for fc4, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report. libuser-0.53.7-1.fc4.1 has been pushed for fc4, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report. The libuser patches from comments #3 and #4 are released as an erratum, leaving s-c-u behavior in non-UTF8 locales. This report targets the FC3 or FC4 products, which have now been EOL'd. Could you please check that it still applies to a current Fedora release, and either update the target product or close it ? Thanks. |