Bug 441685
Summary: | Fatal error : Uncaught exception Out of memory | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jiri Cerny <ji.cerny> |
Component: | unison227 | Assignee: | Stephen Warren <swarren> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | alex, blomgren.peter, cweyl, mmorales, rjones, susi.lehtola |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.13.16-10.fc9.1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-06-03 07:33:52 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 445545, 454384 | ||
Bug Blocks: |
Description
Jiri Cerny
2008-04-09 14:17:55 UTC
Stephen, do you know why the unison??? bug reports are not assigned to you? Gerard, for some reason, there's no bugzilla component for unison213/unison227 yet (I think because the new packages haven't been pushed to stable yet; I just requested that yesterday) So, the bugs are simply filed against unison right now, which you own. Perhaps if you "release ownership" of the old unison package, I can own it too, which should fix the issue (assuming I can remember how to "take ownership"...) Jiri, a few questions: * Are both the F8 and devel machines x86_64, or just one? * How big is the file tree being synchronized? i.e. How many megabytes, how many files. There's a Unison FAQ entry that might be relevant too (see below). Can you try adjusting the stack size limit to see if it fixes the issue? Finally, can you tell me the exact unison command you're running? ========== Unison crashes with an "out of memory" error when used to synchronize really huge directories (e.g., with hundreds of thousands of files). You may need to increase your maximum stack size. On Linux and Solaris systems, for example, you can do this using the ulimit command (see the bash documentation for details). ========== Thanks. You are right. I should read the FAQ first. The error gets away when I do ulimit -s unlimited. The trees I synchronize are quite large, around 100k of files. I must be somewhere on the border, because in F8 both trees synchronize without problem and in rawhide not. Thanks for your help and feel free to close the bug. Excellent. Thanks very much for checking this. *** Bug 446316 has been marked as a duplicate of this bug. *** *** Bug 443304 has been marked as a duplicate of this bug. *** (Directed here from Bug 446316...) I'm synchronizing a serious amount of data $ du -hs SYNC/ 28G SYNC/ $ find SYNC/ -type f -o -type l | wc -l 95167 unison227-2.27.57-8.fc9.x86_64 gives up, even with ulimit as discussed above; however unison227-2.27.57-8.fc9.i386 works. This isn't just "it uses lots of memory when you sync lots of files". I'm getting the error even on a very small sync. From googling around a little, it looks like it's caused by a bug in the ocaml runtime on x86_64 (http://caml.inria.fr/mantis/view.php?id=4448) which causes it to occasionally try to allocate ridiculous amounts of memory that it doesn't actually need. A workaround for that bug has been committed (http://camlcvs.inria.fr/cgi-bin/cvsweb/ocaml/byterun/unix.c.diff?r1=1.28;r2=1.28.4.1) and is apparently in ocaml 3.10.2. So we probably want to upgrade ocaml and rebuild any packages that use it on x86_64. OK. In that case, you should file a bug against OCAML itself, re-open your original unison bug report, and mark the unison bug report as depending on the OCAML bug report. I'll rebuild unison once OCAML is fixed. Adding dependency on bug 445545, since that appears to be what is described in comment #9. Dan, Can you take a look at the patch in bug 445545, since it seems pretty different from the patch you linked to (although they do both affect the same function). Do the patches simply fix the same bug in different ways, or are there 2 different bugs? Thanks. I've only just seen this (I'm the Fedora OCaml maintainer!) Do you think someone could 'strace' the process when it fails. It's very easy to tell if the failure is related to bug 445545 from the strace. Also do the OCAMLRUNPARAM=v=0x1ff thing as described here: http://caml.inria.fr/pub/ml-archives/caml-list/2008/05/9c24581520a98afa2e11185845b5458a.en.html The good news is that if it is this bug, a simple rebuild with the latest OCaml compiler package will fix it. Looking back a little I see that people have mentioned the upstream bug 4448 (http://caml.inria.fr/mantis/view.php?id=4448). This bug is related to the problem, but the solutions mentioned there & the patch that went into the compiler _do_not_ fix the problem on Fedora. So any suggestions you read in the 4448 thread (eg. turning off VA randomization, etc.) _will_not_ help. You need the patch which has gone into the OCaml compiler, see bug 445545. Based on information supplied to me separately, this is an instance of bug 445545. I'll rebuild unison against the fixed OCaml module which should resolve this issue. Koji is down at the moment, but I've just checked in an updated unison227 which should fix the issue. https://www.redhat.com/archives/fedora-devel-announce/2008-May/msg00012.html I'll rebuild it when the above outage is over. I see you bumped the devel version of unison227. I assume the F9 branch also needs a rebuild? Does unison213 also need a rebuild? > I see you bumped the devel version of unison227. Yes, I bumped it but didn't rebuild the devel package til just now. Koji was down all of yesterday. http://koji.fedoraproject.org/koji/taskinfo?taskID=627546 > I assume the F9 branch also > needs a rebuild? Does unison213 also need a rebuild? Yes too. If you want the fix in F-9, you need to rebuild the program with the fixed compiler. (The reason is because the runtime containing the buggy GC is statically linked into programs. A rebuild of the program statically links the new fixed runtime into the program). The minimum compiler versions which have the fix are: F-9: ocaml >= 3.10.1-3 devel: ocaml >= 3.10.2-2 There is no fix in F-8's compiler. If you want it, please file a bug, but I have never seen the problem occur in F-8 myself, and the problem may be caused by some change in the mmap(2) call in the kernel between F-8 and F-9. I'll leave any decision to rebuild F-9 and other versions of unison up to you because this bug is marked against unison227 in Rawhide only. (In reply to comment #17) > I'll leave any decision to rebuild F-9 and other versions of unison up to you > because this bug is marked against unison227 in Rawhide only. F-9 rebuild makes sense, since bug 446316 is for the F-9 version of this problem. Thanks. Installing the package unison227-2.27.57-9.fc10.x86_64.rpm fixes the problem in Fedora 9. unison227-2.27.57-8.fc9.1,unison213-2.13.16-10.fc9.1 has been submitted as an update for Fedora 9 I've rebuilt unison227 for F-9, and unison213 for F-9 & devel too, just in case. Richard, are EL-4/EL-5 affected by the OCaml issue? Thanks. Actually I don't know. I've not seen this on Fedora < 9. In theory it _could_ happen -- it's a stupid assumption made by the OCaml runtime about how mmap allocations happen. (Fixed permanently and properly in OCaml >= 3.11). But for some reason it doesn't seem to happen in Fedora < 9 (or in any Debian). Is that because the kernel mmap(2) implementation is different, or is it just luck? I really have no idea. unison213-2.13.16-10.fc9.1, unison227-2.27.57-8.fc9.1 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update unison213 unison227'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-4618 unison227-2.27.57-8.fc9.1 resolves the "out of memory" issue for me. unison213-2.13.16-10.fc9.1, unison227-2.27.57-8.fc9.1 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report. I still encounter the problem when synchronizing between a 32bit fedora-9 client (my laptop) and an x86_64 fedora-8 server, using unison227-2.27.57-8.fc9.1 on the client and unison227-2.27.57-7.fc8.2 on the server. This is the same problem as described above, except that the out of memory happens on the fedora-8 side. Presumably we need an update for fedora-8 as well. You need to rebuild on every branch where this happens. The patch which fixes bug 445545 hasn't been backported as far as F-8, although it ought to apply fairly straightforwardly because the two versions of OCaml are very similar to each other. Adding dependency on bug 454384; a request for a backport of the fix for 445545 to F-8. Once OCaml is fixed, I'll rebuild unison for F-8. If this bug really is triggered by kernel mmap differences as previously suggested, this might have only recently been exposed on F-8, due to the kernel version having been recently bumped. It's odd that an "old" release like F-8 tracks the latest kernel so closely... OK. I've modified the spec files and tagged then. Just have to wait until the new ocaml build is tagged and available for Koji to build against, then I'll rebuild the F-8 unisons too. unison213-2.13.16-9.fc8.3,unison227-2.27.57-7.fc8.3 has been submitted as an update for Fedora 8 unison213-2.13.16-9.fc8.3, unison227-2.27.57-7.fc8.3 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report. |