Bug 1900335 - Memory leak in git-fast-import
Summary: Memory leak in git-fast-import
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: git
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Todd Zullinger
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-22 11:13 UTC by Villy Kruse
Modified: 2020-12-08 01:47 UTC (History)
9 users (show)

Fixed In Version: git-2.29.2-3.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-08 01:47:34 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Villy Kruse 2020-11-22 11:13:51 UTC
Description of problem:

When cloning or otherwise access a mercurial repository the process is killed by OOM-killer.

Version-Release number of selected component (if applicable):

git-2.28.0-1.fc33.x86_64

How reproducible:

Always

Steps to Reproduce:
1.  git clone hg::https://hg.mozilla.org/projects/nss

Actual results:

The process git-fast-import is killed by OOM-killer.  Using top I see that first the available core memory is used up and then the available swap memory is also used up, so no more memory is available.

Expected results:

The repository is cloned successfully.

Additional info:

It seem upstream has a fix:


commit 3f018ec716292b4d757385686f42f57af3bca685
Author: Jeff King <peff>
Date:   Thu Oct 15 11:38:49 2020 -0400

    fast-import: fix over-allocation of marks storage
    
    Fast-import stores its marks in a trie-like structure made of mark_set
    structs. Each struct has a fixed size (1024). If our id number is too
    large to fit in the struct, then we allocate a new struct which shifts
    the id number by 10 bits. Our original struct becomes a child node
    of this new layer, and the new struct becomes the top level of the trie.
    
    This scheme was broken by ddddf8d7e2 (fast-import: permit reading
    multiple marks files, 2020-02-22). Before then, we had a top-level
    "marks" pointer, and the push-down worked by assigning the new top-level
    struct to "marks". But after that commit, insert_mark() takes a pointer
    to the mark_set, rather than using the global "marks". It continued to
    assign to the global "marks" variable during the push down, which was
    wrong for two reasons:
    
      - we added a call in option_rewrite_submodules() which uses a separate
        mark set; pushing down on "marks" is outright wrong here. We'd
        corrupt the "marks" set, and we'd fail to correctly store any
        submodule mappings with an id over 1024.
    
      - the other callers passed "marks", but the push-down was still wrong.
        In read_mark_file(), we take the pointer to the mark_set as a
        parameter. So even though insert_mark() was updating the global
        "marks", the local pointer we had in read_mark_file() was not
        updated. As a result, we'd add a new level when needed, but then the
        next call to insert_mark() wouldn't see it! It would then allocate a
        new layer, which would also not be seen, and so on. Lookups for the
        lost layers obviously wouldn't work, but before we even hit any
        lookup stage, we'd generally run out of memory and die.
    
    Our tests didn't notice either of these cases because they didn't have
    enough marks to trigger the push-down behavior. The new tests in t9304
    cover both cases (and fail without this patch).
    
    We can solve the problem by having insert_mark() take a pointer-to-pointer
    of the top-level of the set. Then our push down can assign to it in a
    way that the caller actually sees. Note the subtle reordering in
    option_rewrite_submodules(). Our call to read_mark_file() may modify our
    top-level set pointer, so we have to wait until after it returns to
    assign its value into the string_list.
    
    Reported-by: Sergey Brester <serg.brester>
    Signed-off-by: Jeff King <peff>
    Signed-off-by: Junio C Hamano <gitster>

Comment 1 Todd Zullinger 2020-11-26 04:51:06 UTC
Hi Villy,

Thanks for the report and the link to the upstream patch.  I'll get that into the git-2.29.2 update I plan to submit to f33 soon.

If you're interested in testing that sooner, I've got a build containing the patch in my COPR:

https://copr.fedorainfracloud.org/coprs/tmz/git/

I tested it and the clone of nss succeeded without fast-import using much memory at all.

I also noticed that git-remote-hg pulls in git rather than git-core, which seems unnecessary.  I may submit a PR to change that, but if you or anyone else who uses git-remote-hg would like to beat me to it, that would be great.  It looks like the git-remote-hg package is a few revisions behind upstream, but I didn't want to dig too deeply.

Comment 2 Villy Kruse 2020-11-26 06:16:07 UTC
(In reply to Todd Zullinger from comment #1)
> Hi Villy,
> 
> Thanks for the report and the link to the upstream patch.  I'll get that
> into the git-2.29.2 update I plan to submit to f33 soon.

And to rawhide as well, perhaps.

> 
> If you're interested in testing that sooner, I've got a build containing the
> patch in my COPR:
> 
> https://copr.fedorainfracloud.org/coprs/tmz/git/
> 
> I tested it and the clone of nss succeeded without fast-import using much
> memory at all.

I already did a build of git-2.29 with the patch and tested it.

> 
> I also noticed that git-remote-hg pulls in git rather than git-core, which
> seems unnecessary.  I may submit a PR to change that, but if you or anyone
> else who uses git-remote-hg would like to beat me to it, that would be
> great.  It looks like the git-remote-hg package is a few revisions behind
> upstream, but I didn't want to dig too deeply.

Supposedly, git-remote-hg from https://github.com/mnauw/git-remote-hg.git is python3 ready.

Comment 3 Todd Zullinger 2020-11-26 07:27:25 UTC
(In reply to Villy Kruse from comment #2)
> (In reply to Todd Zullinger from comment #1)
> > Hi Villy,
> > 
> > Thanks for the report and the link to the upstream patch.  I'll get that
> > into the git-2.29.2 update I plan to submit to f33 soon.
> 
> And to rawhide as well, perhaps.

Of course.  It'll always go to rawhide first. :)

> Supposedly, git-remote-hg from https://github.com/mnauw/git-remote-hg.git is
> python3 ready.

That's good to know.  As you probably know (but for others who might not), that is the upstream source of the Fedora packages these days.  It'd be great if someone who uses that package got it switched over to python3.

Comment 4 Villy Kruse 2020-11-26 11:03:24 UTC
(In reply to Todd Zullinger from comment #3)

> That's good to know.  As you probably know (but for others who might not),
> that is the upstream source of the Fedora packages these days.  It'd be
> great if someone who uses that package got it switched over to python3.

It should be there anytime now:

https://src.fedoraproject.org/rpms/git-remote-hg/pull-request/3

Comment 5 Todd Zullinger 2020-11-26 18:33:33 UTC
Excellent.  I rebased my tiny `s/git/git-core/` change on top of PR#3 and submitted it as https://src.fedoraproject.org/rpms/git-remote-hg/pull-request/4 for comment.

Comment 6 Fedora Update System 2020-11-30 15:31:56 UTC
FEDORA-2020-5476485ad2 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5476485ad2

Comment 7 Fedora Update System 2020-12-01 02:14:00 UTC
FEDORA-2020-5476485ad2 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5476485ad2`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5476485ad2

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2020-12-08 01:47:34 UTC
FEDORA-2020-5476485ad2 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.