Bug 451494
Summary: | Listdir function fails randomly due to mislocated errno assignment [patch included] | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Philip Spencer <pspencer> | ||||
Component: | python | Assignee: | James Antill <james.antill> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 8 | CC: | dcantrell, james.antill, jonathansteffan, katzj, matt_domsch | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-08-12 14:20:29 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Philip Spencer
2008-06-14 22:28:28 UTC
Created attachment 309374 [details]
Patch to fix the problem of listdir mistakenly thinking error occurred
Can you log this bug upstream ... as it isn't "fixed" in 2.6.x. I'm also unsure about the patch: 1. Just before the loop the code is: if ((d = PyList_New(0)) == NULL) { closedir(dirp); PyMem_Free(name); return NULL; } ...and as far as I can see d is _only_ set to NULL again, when one of the Py_* functions fails (from a memory related problem). 2. Given #1 the end test: if (errno != 0 && d != NULL) { ...will not trigger, due to a spurious errno ... because the check on d won't trigger. The problem is that I don't see how Python does the right thing if readdir() _does_ return an error (I'm guessing it doesn't, but that's hard to test so noone knew?) ... which makes me worry about NFS. In createrepo. Can you try changing __init__.py:_os_path_walk(): try: names = os.listdir(top) except os.error: return ...into: try: try: names = os.listdir(top) except os.error: names = os.listdir(top) except os.error: return Also, I just tried: % for i in $(seq 1 128); repomanage -c --old /tmp/f8updates > /dev/null % du -sh /tmp/f8updates/packages 404M /tmp/f8updates/packages ...and got nothing. I have logged it upstream as issue #3115 at bugs.python.org. Re "will not trigger ... because the check on d won't trigger ... I don't see how Python does the right thing if readdir() _does_ return an error" The rest of the code is correct apart from the misplaced errno initialization. Before the loop, d is set to PyList_New(0), a new python list. If d is null (list init failed), the routine returns immediately. So, by the time the loop is entered, d is known to be non-null. The loop is exited in one of two ways: (1) readdir returns null, in which case d is still non-null upon loop exit, (2) a failure occurs during attempts to copy or convert a file name or to append it to the list, in which case d is explicitly set to null before exiting the loop. At the end of the loop, a test is made for errno != 0 && d != NULL The "d != NULL" means that the loop exited for reason #1 (by readdir returning null instead of by a failure in another part of the code) and the "errno != 0" means readdir encountered an error not just the normal end of directory. Therefore, the test "if (errno != 0 && d != NULL)" is a correct test for "readdir encountered an error" -- as long as errno was set to 0 before readdir. Re: "In createrepo, can you try..." I can't, because I've already patched python. But I do know that if I take out the "try:" altogether it will fail with "Errno=12 (unable to allocate memory)". Re "I just tried ... and got nothing" --- were you on an x86_64 machine? It is only on x86_64 that I've seen glibc's malloc set errno=12 despite succeeding. And it only happens sporadically. Memory allocation may be set slightly differently on your machine. In any event, given the possibility that malloc (or other system calls) can set errno even when they succeed, the python code as it stands is obviously incorrect because if errno is NOT initialized to 0 right before calling readdir, then the test for errno != 0 upon loop exit for reason #1 is no longer an accurate test for readdir returning an error. """ At the end of the loop, a test is made for errno != 0 && d != NULL The "d != NULL" means that the loop exited for reason #1 (by readdir returning null instead of by a failure in another part of the code) """ ...you're right, for some reason I read that as "d == NULL" ... so I can see how it could happen, and I don't see any possible problems with the fix. I'm building the fix for Fed-9 now: http://koji.fedoraproject.org/koji/taskinfo?taskID=663001 """ Re "I just tried ... and got nothing" --- were you on an x86_64 machine? """ ...yes, I'm on x86_64. python-2.5.1-26.fc9 has been submitted as an update for Fedora 9 python-2.5.1-26.fc8 has been submitted as an update for Fedora 8 python-2.5.1-26.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update python'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-5366 python-2.5.1-26.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report. python-2.5.1-26.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report. my FTBFS rebuilds in mock suffered a few failures with -25; will try now with -26. python-2.5.1-26.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report. F-8 and F-9 updates pushed to stable, closing. |