Bug 186382

Summary: system-config-users fails to start when usernames contain iso-8859-1 characters
Product: [Fedora] Fedora Reporter: Håkon Løvdal <hlovdal>
Component: system-config-usersAssignee: Nils Philippsen <nphilipp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: mitr, nphilipp
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libuser-0.53.7-1.fc4.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-10 04:20:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix double free
none
Fix user name matching in users_enumerate_full
none
Use the LC_CTYPE-specified encoding for pw_gecos none

Description Håkon Løvdal 2006-03-23 09:30:40 UTC
Description of problem:

system-config-users fails to start when usernames contain iso-8859-1 characters

Version-Release number of selected component (if applicable):

system-config-users-1.2.41-0.fc4.1

How reproducible:

Always

Steps to Reproduce:

If /etc/passwd contains the following entry, using 'a' and 'o',
system-config-users starts and everything works ok.

        hlovdal:x:500:500:Hakon Lovdal:/home/hlovdal:/bin/bash

However, if the name is written properly (with the html characters "&aring;" and
"&oslash;") encoded as iso-8859-1,

        hlovdal:x:500:500:H\xe5kon L\xf8vdal:/home/hlovdal:/bin/bash

        or output from "od -t c"
        0000000   h   l   o   v   d   a   l   :   x   :   5   0   0   :   5   0
        0000020   0   :   H 345   k   o   n       L 370   v   d   a   l   :   /
        0000040   h   o   m   e   /   h   l   o   v   d   a   l   :   /   b   i
        0000060   n   /   b   a   s   h  \n
        0000067

system-config-users refuses to start up and gives the following error message:

"The user database cannot be read.  This problem is most likely caused by a
mismatch in /etc/passwd and /etc/shadow.  The program will exit now."

When recoding the name from iso-8859-1 to utf8 system-config-users starts up and
works OK.

My locale settings is the following:

LANG=en_US.UTF-8
LC_CTYPE=no_NO.utf8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY=no_NO.utf8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=no_NO.utf8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE=no_NO.utf8
LC_MEASUREMENT=no_NO.utf8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

  
Actual results:


Expected results:

This problem is maybe somewhat related to bug 177025, although in this case the
error message is in best case massively misleading. Please AT LEAST fix with a
better error message, although I see no reasons for not handling the characters
properly. Properly handling might for instance be the following:

"Hi Mr Root. Your locale is currently set up to use UTF8 however /etc/passwd
contains <N> users whose name is not encoded according to UTF8. Would you like to:
        1) Continue, leaving the name(s) unchanged
        2) Select name(s) to convert to UTF8
        3) Abort"

Comment 1 Nils Philippsen 2006-04-24 08:05:37 UTC
Mirek, can you look at this one? It stems from the following piece of code:

            # try to get a name to associate with the user's primary gid,
            # and attempt to minimize lookups by caching answers
            try:
                gidNumber = user.get(libuser.GIDNUMBER)[0]
            except:
                messageDialog.show_message_dialog(_("The user database cannot be
read.  This problem is most likely caused by a mismatch in /etc/passwd and
/etc/shadow.  The program will exit now."))
                os._exit(0)

Putting a "raise" in between shows that it raises an IndexError.

Is there a way for s-c-user to detect via libuser whether there have been errors
while reading the user data, where they are and what the problem is?


Comment 2 Miloslav Trmač 2006-04-25 15:52:07 UTC
I can't reproduce this: with /etc/passwd containing this single line,
gidNumber is extracted correctly and s-c-users blows up a bit later with:

  File "/usr/share/system-config-users/mainWindow.py", line 485, in
populate_user_list
    gecos = unicode (gecos, 'utf-8')
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data

HÃ¥kon, can you please provide an example set of {passwd,shadow,group,gshadow}
demonstrating the problem?  Please edit out any passwords or other confidential
information.

You can copy the files from /etc to a different directory and use the
"directory = " variables in the [files] and [shadow] sections of
/etc/libuser.conf to point to the directory with modified files if you want
to edit them without the danger of damaging your main configuration.

Comment 3 Miloslav Trmač 2006-04-25 16:08:35 UTC
I'm sorry, I see it now - I was not using the fc4 version of libuser.

Comment 4 Miloslav Trmač 2006-05-07 14:28:23 UTC
Created attachment 128709 [details]
Fix double free

Comment 5 Miloslav Trmač 2006-05-07 14:29:34 UTC
Created attachment 128710 [details]
Fix user name matching in users_enumerate_full

Comment 6 Miloslav Trmač 2006-05-07 14:41:32 UTC
These two backported patches fix the exception above, but s-c-u then crashes
with

Traceback (most recent call last):
  File "scu/system-config-users.py", line 45, in ?
    mainWindow.mainWindow()
  File "/home/mitr/t/scu/mainWindow.py", line 280, in __init__
    self.refresh()
  File "/home/mitr/t/scu/mainWindow.py", line 459, in refresh
    self.populate_lists()
  File "/home/mitr/t/scu/mainWindow.py", line 550, in populate_lists
    self.populate_user_list()
  File "/home/mitr/t/scu/mainWindow.py", line 490, in populate_user_list
    gecos = unicode (gecos, 'utf-8')
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data


In #74058c7 I argued for interpreting pw_gecos using the LC_CTYPE of the
current locale.
I still think it is still the right thing to do, OTOH I wouldn't mind much
if s-c-u just required UTF-8.

Nils, what do you think?

Comment 7 Nils Philippsen 2006-05-12 10:50:59 UTC
Hmm. I think it would be correct to have system-data (e.g. what's in
/etc/passwd) in UTF-8 and tools should convert that to other character sets
(e.g. ISO-8859-X) if needed. If s-c-users encounters non-UTF-8 data there, it
could either:

a) silently assume that the data is stored in the character set as in LC_CTYPE
b) notify the user about the problem, perhaps offering to convert the entry to
the new character set (as it will be stored in UTF-8 anyway upon changes).

What do you think?

Comment 8 Miloslav Trmač 2006-05-12 11:39:52 UTC
That would be a reasonable new design, but AFAIK we are constrained by keeping
get{pw,gr}{ent,by*} compatible:

These functions do no charset conversion now, and from the API point of view it
"makes sense" that their output should be in the charset specified by LC_CTYPE,
so /etc/* should be really stored in the "system LC_CTYPE" charset.



Comment 9 Nils Philippsen 2006-05-12 13:03:56 UTC
I agree that get{pw,gr}{ent,by}* don't do charset conversion, but no
documentation says anything about the charset used therein. For all I know you
can only expect a stream of data coming out there, because how is "system
LC_CTYPE" defined? AFAIK, /etc/sysconfig/i18n isn't standard anyway (and parsing
it whenever trying to read such a file is gross IMHO). Storing stuff in UTF-8
(the agreed upon superset of the charsets in question) seems the most sensible
thing to do, because apps or tools can easily convert that over to other
charsets if needed -- they have all the required information.

Comment 10 Miloslav Trmač 2006-05-12 13:42:02 UTC
"system LC_CTYPE" is not explicit, it is just "what the default configuration is".
In practice all programs running on a system have to use the same LC_CTYPE value
anyway, because LC_CTYPE also specifies the encoding of file names.

While apps or tools can convert pw_gecos as needed, they don't currently do so
and IMHO they aren't likely to start; { printf ("%s", pw->pw_gecos); } should
Just Work.

Comment 11 Miloslav Trmač 2006-05-17 19:41:42 UTC
Created attachment 129356 [details]
Use the LC_CTYPE-specified encoding for pw_gecos

Nils, is this OK with you?

Comment 12 Nils Philippsen 2006-05-18 08:17:38 UTC
Mirek,

thanks for the patch. But I'm still not quite sure that it's the correct thing
to do for the future as that way we might end up with different encodings for
different entries of /etc/passwd.

Consider this scenario:

To make it complicated enough, let's assume e.g. a student-managed server for an
international college with for example a French, a Russian and a Japanese
speaking administrator. The French uses an ISO8859 locale, the Russian UTF-8 and
the Japanese an SJIS locale. Students approach them with the usual requests for
new and changed accounts, passwords to be reset and so on. If passwd e.a.
contain different encodings for different entries this is just begging for
disaster IMO.

My approach would be to use an encoding that is able to represent all
alternatives that pretty much leaves us with Unicode charsets and UTF-8 is a
good candidate because it is semi-compatible to many other encodings) and let
apps convert at runtime if needed. I think that applications depending on data
shared between apps to be in a certain encoding (current LC_CTYPE in our case)
can be considered flawed these days.

Perhaps we should bring up this discussion with a larger audience, say
fedora-devel-list? In that case I'd write a summary. Let me know what you think.

Comment 13 Nils Philippsen 2006-05-18 08:26:58 UTC
Let me rephrase that last sentence of the second to last paragraph:

I think that applications depending on data shared between apps to be in a
certain encoding (current LC_CTYPE in our case) that can't represent the other
encodings as well can be considered flawed these days.

Comment 14 Nils Philippsen 2006-06-14 06:45:22 UTC
DB lost this:

------- Additional Comments From Miroslav Trmac  2006-06-10 16:28 EST -------
Comment #13 has not really clarified the above - it really seems to advocate
a restriction to ASCII.

The complicated scenario above really indicates that the "system locale" should
be UTF-8 in such multilingual environments; it doesn't provide evidence for the
necessity to always use UTF-8 in /etc/passwd.

In a multilingual environment (global multi-user systems, internet mail, ...)
UTF-8 is a huge win; nevertheless there are a lot of deployments standardized
on ISO-8859-x which are restricted to one country, or even a single user;
such people may be perfectly happy with ISO-8859-x and I can see no reason
to break their environment.

Compare with
http://blog.gmane.org/gmane.comp.internationalization.linux/month=20011101
suggesting to use MIME; using UTF-8 in non-UTF-8 locales is comparably
inpractical.

Feel free to bring this up to fedora-devel if you like.

Comment 15 Fedora Update System 2006-07-17 19:47:03 UTC
libuser-0.53.7-1.fc4.1 has been pushed for fc4, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 16 Fedora Update System 2006-07-26 18:24:04 UTC
libuser-0.53.7-1.fc4.1 has been pushed for fc4, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 17 Fedora Update System 2006-07-26 18:54:05 UTC
libuser-0.53.7-1.fc4.1 has been pushed for fc4, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 18 Miloslav Trmač 2006-07-26 23:33:09 UTC
The libuser patches from comments #3 and #4 are released as an erratum,
leaving s-c-u behavior in non-UTF8 locales.

Comment 19 Christian Iseli 2007-01-22 10:45:41 UTC
This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.