Bug 441685

Summary: Fatal error : Uncaught exception Out of memory
Product: [Fedora] Fedora Reporter: Jiri Cerny <ji.cerny>
Component: unison227Assignee: Stephen Warren <swarren>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: rawhideCC: alex, blomgren.peter, cweyl, mmorales, rjones, susi.lehtola
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.13.16-10.fc9.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-03 07:33:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 445545, 454384    
Bug Blocks:    

Description Jiri Cerny 2008-04-09 14:17:55 UTC
Description of problem:
When I try synchronise two machines (one rawhide and second F8, both with unison
2.27.57) I get error message "Fatal error Uncaught exception Out of memory"
shortly after typing my password. The main window is already opened at this
moment. It works when the rawhide machine is rebooted back to F8.

Version-Release number of selected component (if applicable):
unison227-2.27.57-8.fc9.x86_64 in rawhide

How reproducible:
always

Additional info: maybe this can be useful: if I mount the F8 partition to
/mnt/tmp and I run in rawhide /mnt/tmp/usr/bin/unison I get the same error, even
if this binary works without problem in FC8. So maybe there is some bad
interaction between several components.

Comment 1 GĂ©rard Milmeister 2008-04-09 16:06:06 UTC
Stephen, do you know why the unison??? bug reports are not assigned to you?

Comment 2 Stephen Warren 2008-04-09 16:19:31 UTC
Gerard, for some reason, there's no bugzilla component for unison213/unison227
yet (I think because the new packages haven't been pushed to stable yet; I just
requested that yesterday)

So, the bugs are simply filed against unison right now, which you own. Perhaps
if you "release ownership" of the old unison package, I can own it too, which
should fix the issue (assuming I can remember how to "take ownership"...)


Comment 3 Stephen Warren 2008-04-10 06:06:35 UTC
Jiri, a few questions:

* Are both the F8 and devel machines x86_64, or just one?
* How big is the file tree being synchronized? i.e. How many megabytes, how many
files.

There's a Unison FAQ entry that might be relevant too (see below). Can you try
adjusting the stack size limit to see if it fixes the issue?

Finally, can you tell me the exact unison command you're running?

==========
Unison crashes with an "out of memory" error when used to synchronize really
huge directories (e.g., with hundreds of thousands of files).

You may need to increase your maximum stack size. On Linux and Solaris systems,
for example, you can do this using the ulimit command (see the bash
documentation for details).
==========

Thanks.


Comment 4 Jiri Cerny 2008-04-10 15:03:33 UTC
You are right. I should read the FAQ first. The error gets away when I do ulimit
-s unlimited. The trees I synchronize are quite large, around 100k of files. 
I must be somewhere on the border, because in F8 both trees synchronize without
problem and in rawhide not. Thanks for your help and feel free to close the bug. 

Comment 5 Stephen Warren 2008-04-10 15:22:54 UTC
Excellent. Thanks very much for checking this.


Comment 6 Stephen Warren 2008-05-14 11:33:54 UTC
*** Bug 446316 has been marked as a duplicate of this bug. ***

Comment 7 Stephen Warren 2008-05-14 11:34:05 UTC
*** Bug 443304 has been marked as a duplicate of this bug. ***

Comment 8 Peter Blomgren 2008-05-14 16:41:07 UTC
(Directed here from Bug 446316...)

I'm synchronizing a serious amount of data

   $ du -hs SYNC/
   28G     SYNC/

   $ find SYNC/ -type f -o -type l | wc -l 
   95167

unison227-2.27.57-8.fc9.x86_64 gives up, even with ulimit as discussed above;
however unison227-2.27.57-8.fc9.i386 works.


Comment 9 Dan Winship 2008-05-14 19:09:37 UTC
This isn't just "it uses lots of memory when you sync lots of files". I'm
getting the error even on a very small sync.

From googling around a little, it looks like it's caused by a bug in the ocaml
runtime on x86_64 (http://caml.inria.fr/mantis/view.php?id=4448) which causes it
to occasionally try to allocate ridiculous amounts of memory that it doesn't
actually need. A workaround for that bug has been committed
(http://camlcvs.inria.fr/cgi-bin/cvsweb/ocaml/byterun/unix.c.diff?r1=1.28;r2=1.28.4.1)
and is apparently in ocaml 3.10.2. So we probably want to upgrade ocaml and
rebuild any packages that use it on x86_64.


Comment 10 Stephen Warren 2008-05-19 11:00:54 UTC
OK. In that case, you should file a bug against OCAML itself, re-open your
original unison bug report, and mark the unison bug report as depending on the
OCAML bug report. I'll rebuild unison once OCAML is fixed.


Comment 11 Stephen Warren 2008-05-21 21:02:29 UTC
Adding dependency on bug 445545, since that appears to be what is described in
comment #9.

Dan, Can you take a look at the patch in bug 445545, since it seems pretty
different from the patch you linked to (although they do both affect the same
function). Do the patches simply fix the same bug in different ways, or are
there 2 different bugs?

Thanks.


Comment 12 Richard W.M. Jones 2008-05-21 21:13:36 UTC
I've only just seen this (I'm the Fedora OCaml maintainer!)

Do you think someone could 'strace' the process when it fails.  It's very easy
to tell if the failure is related to bug 445545 from the strace.  Also do the
OCAMLRUNPARAM=v=0x1ff thing as described here:
http://caml.inria.fr/pub/ml-archives/caml-list/2008/05/9c24581520a98afa2e11185845b5458a.en.html

The good news is that if it is this bug, a simple rebuild with the latest OCaml
compiler package will fix it.

Comment 13 Richard W.M. Jones 2008-05-21 21:22:36 UTC
Looking back a little I see that people have mentioned the upstream bug 4448
(http://caml.inria.fr/mantis/view.php?id=4448).  This bug is related to the
problem, but the solutions mentioned there & the patch that went into the
compiler _do_not_ fix the problem on Fedora.  So any suggestions you read in
the 4448 thread (eg. turning off VA randomization, etc.) _will_not_ help.

You need the patch which has gone into the OCaml compiler, see bug 445545.

Comment 14 Richard W.M. Jones 2008-05-24 08:10:26 UTC
Based on information supplied to me separately, this is an instance of bug 445545.

I'll rebuild unison against the fixed OCaml module which should resolve this issue.

Comment 15 Richard W.M. Jones 2008-05-24 08:58:45 UTC
Koji is down at the moment, but I've just checked in an updated unison227 which
should fix the issue.

https://www.redhat.com/archives/fedora-devel-announce/2008-May/msg00012.html

I'll rebuild it when the above outage is over.

Comment 16 Stephen Warren 2008-05-25 05:04:01 UTC
I see you bumped the devel version of unison227. I assume the F9 branch also
needs a rebuild? Does unison213 also need a rebuild?


Comment 17 Richard W.M. Jones 2008-05-25 08:31:29 UTC
> I see you bumped the devel version of unison227.

Yes, I bumped it but didn't rebuild the devel package til just now.  Koji was
down all of yesterday.
http://koji.fedoraproject.org/koji/taskinfo?taskID=627546

> I assume the F9 branch also
> needs a rebuild? Does unison213 also need a rebuild?

Yes too.  If you want the fix in F-9, you need to rebuild the program with the
fixed compiler.  (The reason is because the runtime containing the buggy GC
is statically linked into programs.  A rebuild of the program statically links the
new fixed runtime into the program).

The minimum compiler versions which have the fix are:

F-9: ocaml >= 3.10.1-3
devel: ocaml >= 3.10.2-2

There is no fix in F-8's compiler.  If you want it, please file a bug, but I have
never seen the problem occur in F-8 myself, and the problem may be caused
by some change in the mmap(2) call in the kernel between F-8 and F-9.

I'll leave any decision to rebuild F-9 and other versions of unison up to you
because this bug is marked against unison227 in Rawhide only.

Comment 18 Peter Blomgren 2008-05-25 14:26:20 UTC
(In reply to comment #17)
> I'll leave any decision to rebuild F-9 and other versions of unison up to you
> because this bug is marked against unison227 in Rawhide only.

F-9 rebuild makes sense, since bug 446316 is for the F-9 version of this
problem.  Thanks.

Comment 19 Susi Lehtola 2008-05-26 06:02:29 UTC
Installing the package unison227-2.27.57-9.fc10.x86_64.rpm fixes the problem in
Fedora 9. 

Comment 20 Fedora Update System 2008-05-26 18:25:34 UTC
unison227-2.27.57-8.fc9.1,unison213-2.13.16-10.fc9.1 has been submitted as an update for Fedora 9

Comment 21 Stephen Warren 2008-05-26 18:28:17 UTC
I've rebuilt unison227 for F-9, and unison213 for F-9 & devel too, just in case.

Richard, are EL-4/EL-5 affected by the OCaml issue? Thanks.


Comment 22 Richard W.M. Jones 2008-05-26 19:43:30 UTC
Actually I don't know.  I've not seen this on Fedora < 9.

In theory it _could_ happen -- it's a stupid assumption made by the OCaml runtime
about how mmap allocations happen.  (Fixed permanently and properly in OCaml >= 3.11).

But for some reason it doesn't seem to happen in Fedora < 9 (or in any Debian).
Is that because the kernel mmap(2) implementation is different, or is it just
luck?  I really have no idea.

Comment 23 Fedora Update System 2008-05-29 02:50:13 UTC
unison213-2.13.16-10.fc9.1, unison227-2.27.57-8.fc9.1 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update unison213 unison227'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-4618

Comment 24 Chris Weyl 2008-06-01 17:27:28 UTC
unison227-2.27.57-8.fc9.1 resolves the "out of memory" issue for me.

Comment 25 Fedora Update System 2008-06-03 07:33:50 UTC
unison213-2.13.16-10.fc9.1, unison227-2.27.57-8.fc9.1 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 26 Ivo 2008-07-07 22:04:41 UTC
I still encounter the problem when synchronizing between a 32bit fedora-9 client
(my laptop) and an x86_64 fedora-8 server, using unison227-2.27.57-8.fc9.1 on
the client and unison227-2.27.57-7.fc8.2 on the server. 

This is the same problem as described above, except that the out of memory
happens on the fedora-8 side. 

Presumably we need an update for fedora-8 as well. 


Comment 27 Richard W.M. Jones 2008-07-07 22:13:17 UTC
You need to rebuild on every branch where this happens.

The patch which fixes bug 445545 hasn't been backported as far as F-8,
although it ought to apply fairly straightforwardly because the two
versions of OCaml are very similar to each other.

Comment 28 Stephen Warren 2008-07-08 05:49:53 UTC
Adding dependency on bug 454384; a request for a backport of the fix for 445545
to F-8. Once OCaml is fixed, I'll rebuild unison for F-8.

If this bug really is triggered by kernel mmap differences as previously
suggested, this might have only recently been exposed on F-8, due to the kernel
version having been recently bumped. It's odd that an "old" release like F-8
tracks the latest kernel so closely...


Comment 29 Stephen Warren 2008-07-13 22:49:16 UTC
OK. I've modified the spec files and tagged then. Just have to wait until the
new ocaml build is tagged and available for Koji to build against, then I'll
rebuild the F-8 unisons too.


Comment 30 Fedora Update System 2008-07-26 01:17:57 UTC
unison213-2.13.16-9.fc8.3,unison227-2.27.57-7.fc8.3 has been submitted as an update for Fedora 8

Comment 31 Fedora Update System 2008-08-12 18:23:46 UTC
unison213-2.13.16-9.fc8.3, unison227-2.27.57-7.fc8.3 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.