Bug 1759140 - Tar extraction consumes several GB of memory as if a memory leak was occuring
Summary: Tar extraction consumes several GB of memory as if a memory leak was occuring
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: tar
Version: 7.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Ondrej Dubaj
QA Contact: RHEL CS Apps Subsystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-07 13:16 UTC by Renaud Métrich
Modified: 2019-12-04 12:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-04 06:41:11 UTC
Target Upstream Version:


Attachments (Terms of Use)
Symlink generator (901 bytes, text/x-csrc)
2019-10-07 13:23 UTC, Renaud Métrich
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4633901 None None None 2019-12-04 12:37:23 UTC

Description Renaud Métrich 2019-10-07 13:16:19 UTC
Description of problem:

When extracting a Tar archive containing millions of symbolic links, Tar starts consuming more and more memory until (usually) the system runs out of memory.
When the system has enough memory, we can still see a slowness processing the Tar archive, with time passing, due mostly to swapping.


Version-Release number of selected component (if applicable):

tar all versions


How reproducible:

Always


Steps to Reproduce:

1. Compile the symlink generator in attachment (10 millions links to 1000 files in "resolved/" directory)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # yum -y install gcc
  # gcc -o /root/create /root/create.c
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Create the directory structure where you have space for it (lots of free inodes, here "/backup", a dedicated filesystem)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # mkfs.xfs /dev/sdb
  # mount /dev/sdb /backup
  # cd /backup && /root/create
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Create an archive, making sure symlinks are tar'ed first (may actually not be needed)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # tar cf archive.tar links* resolved
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

4. Extract the archive where you have space for it (lots of free inodes, here "/restore", a dedicated file system)

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  # mkfs /dev/sdc
  # mkdir /restore && mount /dev/sdc /restore
  # tar xf archive.tar -C /restore
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Actual results:

  On my test system having 2GB of memory (no swap on purpose), tar will grow until it gets killed by OOM killer.

Expected results:

  Tar not consuming much memory.

Additional infos:

  A workaround is to extract the archive using "-P" flag ("--absolute-names", don't strip leading `/'s from file names).
  In such case, as little as 1176 kB of memory is used to extract a huge archive.

Comment 2 Renaud Métrich 2019-10-07 13:17:28 UTC
Digging further, it appears that the issue can be worked around by extracting the archive using "-P" flag ("--absolute-names", don't strip leading `/'s from file names).
In such case, no placeholder is created and symlink is created immediately, causing the symlink to be "dangling" until resolved, but this is not an issue.

The code creating the placeholder has always been there (at least since 2005 with a refactoring).

Apparently, when not using "-P" flag, there are cases where symlinks are "potentially dangerous", but I don't understand exactly why:

Related commit doing the refactoring (but previously there was already that code, it was just not integrated into a new function named "create_placeholder_file"):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
commit af5d05729ade7f6d6f1552df31bce4c3dbc37247
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Mon Sep 12 18:45:59 2005 +0000

    Treat fishy-looking hard links like fishy-looking symlinks.
    (struct delayed_set_stat): Rename after_symlinks
    member to after_links.  All uses changed.
    (struct delayed_link): Renamed from struct delayed_symlink.
    All uses changed.  New member is_symlink.
    (delayed_link_head): Renamed from delayed_symlink_head.  All uses
    changed.
    (create_placeholder_file): New function, taken from extract_symlink.
    (extract_link): Create placeholders for fishy-looking hard links.
    (extract_symlink): Move code into create_placeholder_file.
    (apply_delayed_links): Renamed from apply_delayed_symlinks.
    All uses changed.  Create both hard links and symlinks.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Related new code (refactoring):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
static int
extract_symlink (char *file_name, int typeflag)
{
#ifdef HAVE_SYMLINK
  int status;
  int interdir_made = 0; 

  if (! absolute_names_option
      && (IS_ABSOLUTE_FILE_NAME (current_stat_info.link_name)
          || contains_dot_dot (current_stat_info.link_name)))
    return create_placeholder_file (file_name, true, &interdir_made);

  --> WHEN NOT HAVING "-P" AND FILE STARTS WITH "../" OR IS ABSOLUTE, DELAY CREATION
...
}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Related old code showing comment about "potentially dangerous symlinks":
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
static int
extract_symlink (char *file_name, int typeflag)
{
#ifdef HAVE_SYMLINK
  int status, fd;
  int interdir_made = 0; 

  if (absolute_names_option
      || ! (IS_ABSOLUTE_FILE_NAME (current_stat_info.link_name)
            || contains_dot_dot (current_stat_info.link_name)))
    {    
  --> WHEN HAVING "-P" OR FILE DOESN'T START WITH "../" AND IS NOT ABSOLUTE, CREATE IMMEDIATELY
...
    }    
  else 
    {    
      /* This symbolic link is potentially dangerous.  Don't
         create it now; instead, create a placeholder file, which
         will be replaced after other extraction is done.  */
      struct stat st;

  --> WHEN NOT HAVING "-P" AND FILE STARTS WITH "../" OR IS ABSOLUTE, DELAY CREATION
...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 3 Renaud Métrich 2019-10-07 13:20:13 UTC
Some statistics :

- Creating the archive (see reproducer) took up to 674696 kB of memory

- Extracting the archive without -P took more than 2 GB of memory (OOM killed tar)
- At 50%, it was taking already 1192072 kB
- There were ~150K inodes created per minute

- Extracting the archive with -P took 1172 kB of memory ;-)
- There were ~1.5M inodes created per minute

Comment 4 Renaud Métrich 2019-10-07 13:23:51 UTC
Created attachment 1623129 [details]
Symlink generator

This symlink generator creates:
- 1000 plain files "X" (X=[0-999]) in "./resolved" directory
- 1000 directories "X" (X=[0-999]) under "./linksY" directories Y=[0-9]
- 1000 symlinks "X" (X=[0-999]) to "../../resolved/X" corresponding files

This hence builds a directory tree with 10M symlinks

Comment 6 Ondrej Dubaj 2019-12-04 06:41:11 UTC
After discussion with upstream of tar component, we came to conclusion
that this is expected behaviour.

When extracting a symlink to absolute file name or to a filename in a
parent directory, tar first creates a placeholder (a regular file of
zero length) in its place and records the fact in a list of such
"delayed links".  The placeholder is replaced with the actual link when
it becomes certain that it cannot be used for placing other file to
the absolute location unknown to the user.  Quite often this becomes
certain only at the end of extraction.  The delayed list link is kept
in the memory, and that's the reason for the excessive memory usage.

As you already mentions, delayed link creation can be disabled if 
the -P (--absolute-names) option is used.

From the about mentioned reasons, I am closing this bug as 
CLOSED:NOTABUG

Comment 7 Renaud Métrich 2019-12-04 12:37:23 UTC
OK, created a KCS.


Note You need to log in before you can comment on or make changes to this bug.