See: https://sourceforge.net/tracker/index.php?func=detail&aid=1298813&group_id=5470&atid=305470
Created attachment 119143 [details] Patch to make python use canonicalize_file_name if available
Patch applied in 2.4.1-7 Leaving the bug around to wait for the decision from upstream.
Created attachment 119174 [details] updated patch There's a small bug, as this function is (oddly) called on options which aren't filenames, so must pass them through transparently. New patch attached.
Created attachment 119259 [details] a bit more working patch there is a bug in the newest version as well, it doesn't set the n variable (length of buffer to be copied). i've also changed the function so that it tests the canonical_file_name(), then realpath(), and finally readlink() (since it's the 'weakest' of the three), and uses the first one available patch attached
Should be fixed in 2.4.1-13. Leaving the bug open until it gets into upstream.
*** Bug 187901 has been marked as a duplicate of this bug. ***
Fixed in FC5 by FEDORA-2006-689
I know this is ancient, but can anyone still remember further details about this? I'd really love to understand why such a complex patch was created to address this, when one-liner as: --- Python/sysmodule.c.canonicalize +++ Python/sysmodule.c.fixed @@ -1178,7 +1178,7 @@ void PySys_SetArgv(int argc, char **argv) { #if defined(HAVE_REALPATH) - char fullpath[MAXPATHLEN]; + char fullpath[PATH_MAX]; #elif defined(MS_WINDOWS) char fullpath[MAX_PATH]; #endif should be good enough? The problem here was that python was calling realpath with buffer of insufficient size. The reason for that was that python defines MAXPATHLEN and does not use one defined on Linux in sys/param.h. Python used to defined it unconditionally to 1024, regardless of the PATH_MAX value, which is the size expected by realpath. The canonicalize.patch we use never made it upstream, but other changes resolved the issue there. First, following change was done: http://svn.python.org/view?view=rev&revision=45715 MAXPATHLEN is no longer unconditionally set to 1024, but rather to PATH_MAX, if it's defined and is more than 1024. Additionally, in python3, PATH_MAX is used as size of the buffer passed to realpath (see _wrealpath() in Python/sysmodule.c). Or am I missing something non-obvious? Thanks!
There is this section in the realpath man page: BUGS Avoid using this function. It is broken by design since (unless using the non-standard resolved_path == NULL feature) it is impossible to determine a suitable size for the output buffer, resolved_path. According to POSIX a buffer of size PATH_MAX suffices, but PATH_MAX need not be a defined constant, and may have to be obtained using path- conf(3). And asking pathconf(3) does not really help, since on the one hand POSIX warns that the result of pathconf(3) may be huge and unsuit- able for mallocing memory. And on the other hand pathconf(3) may return -1 to signify that PATH_MAX is not bounded. This makes me think your proposed fix is not enough. I don't remember the details, but I tend to trust the glibc developers much better than my own judgment on this. I believe the canonicalize_file_name patch was suggested by one of them at the time. You may want to run this fix through one of them (and get ridiculed in the process :-)
Hey Mihai, thanks for further insight on this, it'll hopefully help me understand why it was done this way. Maybe there is further into in SF.net bug in comment #0 to shed more light, but it's restricted, so I can only try to guess intentions from the fix. (In reply to comment #9) > According to POSIX a buffer of size PATH_MAX suffices, but PATH_MAX > need not be a defined constant, and may have to be obtained using path- > conf(3). I'll need to dig deeper into this, but it reads to me as: On some platforms/systems/OSes, PATH_MAX may not be defined as a constant, but has to be determined some other way. From what I can see, on Linux, PATH_MAX is defined constant in linux/limits.h (which gets in when you include limits.h). So it sounds like this should not worry us on Linux. Looking into realpath implementation in glibc, when PATH_MAX is defined, it is used as the size limit. pathconf call is only compiled in when PATH_MAX is not defined. http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/canonicalize.c;h=67e4d05535;hb=HEAD#l68 Anyway, as canonicalize_file_name is GNU extension that is nothing but a call to realpath(name, NULL) http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/canonicalize.c;h=67e4d05535;hb=HEAD#l242 the patch may have slightly better chance to get accepted upstream without the non-portable wrapper. Also looking at the realpath implementation, when you pass NULL as resolved, realpath will malloc a buffer of path_max (i.e. PATH_MAX) http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/canonicalize.c;h=67e4d05535;hb=HEAD#l77 So if realpath, for some input, overflows static buffer with PATH_MAX size, it should just overflow the dynamically allocate buffer of the same size just the same way, it just moves overflow from stack to heap. > (and get ridiculed in the process :-) Ah, I'm getting used to that. I get called names when trying to help with / understand some issue, that's life... Few more references for the sake of completeness: Upstream bug: http://bugs.python.org/issue1298813 "Exploit" demonstrating this "security" issue: http://www.milw0rm.com/exploits/1591
One of the reasons for using canonicalize_file_name is because you can check for its presence using autoconf. The non-standard GNU extension of passing NULL as the second argument cannot be probed for. The patch was more complicated than it should be since it was trying to be general enough and be accepted upstream. If you _know_ you are running against glibc, you can definitely shortcut lots of things.