Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1680469

Summary:

tar: does not check for NULL error return from xgetcwd

Product:

Red Hat Enterprise Linux 7

Reporter:

Nag Pavan Chilakam <nchilaka>

Component:

tar

Assignee:

Petr Kubat <pkubat>

Status:

CLOSED WONTFIX

QA Contact:

RHEL CS Apps Subsystem QE <rhel-cs-apps-subsystem-qe>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

7.6

CC:

ashankar, codonell, databases-maint, dj, fweimer, hhorak, mnewsome, nchilaka, odubaj, panovotn, pfrankli, praiskup, storage-qa-internal

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1837871 (view as bug list)

Environment:

Last Closed:

2020-05-20 08:08:25 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
coredump	none

Description Nag Pavan Chilakam 2019-02-25 06:36:33 UTC

Description of problem:
====================
I have running some system(non-functional) testing for gluster, which involved multiple IO patterns from different clients.
One IO was linux untar.
However the linux untar failed after some iterations as below

[Sat Feb 23 05:11:20 2019] tar[5113]: segfault at 0 ip 00007f8c7985a901 sp 00007ffee8982248 error 4 in libc-2.17.so[7f8c796ec000+1c2000]
[Sat Feb 23 05:11:44 2019] tar[5242]: segfault at 0 ip 00007f2f894bd901 sp 00007ffc237f0078 error 4 in libc-2.17.so[7f2f8934f000+1c2000]
[Sat Feb 23 05:12:12 2019] tar[5589]: segfault at 0 ip 00007f88e6297901 sp 00007ffeba853af8 error 4 in libc-2.17.so[7f88e6129000+1c2000]
[Sat Feb 23 05:12:35 2019] tar[5704]: segfault at 0 ip 00007f3f04bd1901 sp 00007ffc7771f2f8 error 4 in libc-2.17.so[7f3f04a63000+1c2000]
[Sat Feb 23 05:12:55 2019] tar[6023]: segfault at 0 ip 00007ff15e314901 sp 00007ffd9129b1e8 error 4 in libc-2.17.so[7ff15e1a6000+1c2000]
[Sat Feb 23 05:13:20 2019] tar[6368]: segfault at 0 ip 00007f2de0e1a901 sp 00007ffe6371d848 error 4 in libc-2.17.so[7f2de0cac000+1c2000]
[Sat Feb 23 05:13:37 2019] tar[6683]: segfault at 0 ip 00007f5034ce0901 sp 00007ffd22577b18 error 4 in libc-2.17.so[7f5034b72000+1c2000]
[Sat Feb 23 05:14:03 2019] tar[7103]: segfault at 0 ip 00007fa991672901 sp 00007fff4eb384a8 error 4 in libc-2.17.so[7fa991504000+1c2000]
[Sat Feb 23 05:14:22 2019] tar[7491]: segfault at 0 ip 00007f7cadb00901 sp 00007ffe7d970468 error 4 in libc-2.17.so[7f7cad992000+1c2000]
[Sat Feb 23 05:14:45 2019] tar[7946]: segfault at 0 ip 00007f0ffcf29901 sp 00007ffd8b8f4878 error 4 in libc-2.17.so[7f0ffcdbb000+1c2000]
[Sat Feb 23 05:15:01 2019] tar[8158]: segfault at 0 ip 00007fcc4e40e901 sp 00007ffd400c1c88 error 4 in libc-2.17.so[7fcc4e2a0000+1c2000]
[Sat Feb 23 05:15:17 2019] tar[8321]: segfault at 0 ip 00007f2975c8a901 sp 00007ffed73a7c18 error 4 in libc-2.17.so[7f2975b1c000+1c2000]
[Sat Feb 23 05:15:42 2019] tar[8803]: segfault at 0 ip 00007fce66dc5901 sp 00007ffd42f0c328 error 4 in libc-2.17.so[7fce66c57000+1c2000]
[Sat Feb 23 05:16:00 2019] tar[8893]: segfault at 0 ip 00007f6613683901 sp 00007ffe4a8c69f8 error 4 in libc-2.17.so[7f6613515000+1c2000]
[Sat Feb 23 05:16:26 2019] tar[9118]: segfault at 0 ip 00007f7258a6d901 sp 00007ffdcd2ace48 error 4 in libc-2.17.so[7f72588ff000+1c2000]
[Sat Feb 23 05:16:55 2019] tar[9213]: segfault at 0 ip 00007f86e7b8b901 sp 00007ffe441bf958 error 4 in libc-2.17.so[7f86e7a1d000+1c2000]
[Sat Feb 23 05:17:14 2019] tar[9292]: segfault at 0 ip 00007f9c77011901 sp 00007ffdf84fab78 error 4 in libc-2.17.so[7f9c76ea3000+1c2000]
[Sat Feb 23 05:17:38 2019] tar[9500]: segfault at 0 ip 00007f7add332901 sp 00007ffc3e20d938 error 4 in libc-2.17.so[7f7add1c4000+1c2000]
[Sat Feb 23 05:17:56 2019] tar[9700]: segfault at 0 ip 00007f5ab3d3f901 sp 00007ffea2e95d08 error 4 in libc-2.17.so[7f5ab3bd1000+1c2000]
[Sat Feb 23 05:18:23 2019] tar[10280]: segfault at 0 ip 00007f97ee1ad901 sp 00007ffe7703d0d8 error 4 in libc-2.17.so[7f97ee03f000+1c2000]
[Sat Feb 23 05:18:41 2019] tar[10627]: segfault at 0 ip 00007fd47798e901 sp 00007ffca7469a88 error 4 in libc-2.17.so[7fd477820000+1c2000]
[Sat Feb 23 05:19:07 2019] tar[11040]: segfault at 0 ip 00007f2a7e9d7901 sp 00007ffd9b1e2108 error 4 in libc-2.17.so[7f2a7e869000+1c2000]
[Sat Feb 23 05:19:38 2019] tar[11294]: segfault at 0 ip 00007f9bc58e5901 sp 00007fff518c1968 error 4 in libc-2.17.so[7f9bc5777000+1c2000]
[Sat Feb 23 05:19:57 2019] tar[11428]: segfault at 0 ip 00007f4eb6b3a901 sp 00007ffe875f35f8 error 4 in libc-2.17.so[7f4eb69cc000+1c2000]
[Sat Feb 23 05:20:13 2019] tar[11758]: segfault at 0 ip 00007f002fe49901 sp 00007fff477e48a8 error 4 in libc-2.17.so[7f002fcdb000+1c2000]
[Sat Feb 23 05:20:31 2019] tar[12027]: segfault at 0 ip 00007f2b630f7901 sp 00007ffe3d1f85c8 error 4 in libc-2.17.so[7f2b62f89000+1c2000]
[Sat Feb 23 05:20:50 2019] tar[12117]: segfault at 0 ip 00007f39fb3d5901 sp 00007ffec9ec06f8 error 4 in libc-2.17.so[7f39fb267000+1c2000]
[Sat Feb 23 05:21:09 2019] tar[12325]: segfault at 0 ip 00007f267efef901 sp 00007fff5b0b3b28 error 4 in libc-2.17.so[7f267ee81000+1c2000]
[Sat Feb 23 05:21:29 2019] tar[12565]: segfault at 0 ip 00007f5feaf08901 sp 00007ffde8471b68 error 4 in libc-2.17.so[7f5fead9a000+1c2000]
[Sat Feb 23 05:21:49 2019] tar[12741]: segfault at 0 ip 00007f9f27ebb901 sp 00007ffeeaa95fd8 error 4 in libc-2.17.so[7f9f27d4d000+1c2000]
[Sat Feb 23 05:22:05 2019] tar[12882]: segfault at 0 ip 00007f08ecd4a901 sp 00007ffc0fcd8e78 error 4 in libc-2.17.so[7f08ecbdc000+1c2000]
[Sat Feb 23 05:22:27 2019] tar[13112]: segfault at 0 ip 00007fe924977901 sp 00007ffd5c5eb7e8 error 4 in libc-2.17.so[7fe924809000+1c2000]
[Sat Feb 23 05:22:49 2019] tar[13200]: segfault at 0 ip 00007f1f35a51901 sp 00007ffcdec60f18 error 4 in libc-2.17.so[7f1f358e3000+1c2000]
[Mon Feb 25 08:04:26 2019] sched: RT throttling activated
[root@dhcp35-64 IOs]# 
[root@dhcp35-64 IOs]# 
[root@dhcp35-64 IOs]# ls /


dir.17/linux-4.20.8/virt/kvm/coalesced_mmio.c
dir.17/linux-4.20.8/virt/lib/
dir.17/linux-4.20.8/virt/lib/Makefile
dir.17/linux-4.20.8/virt/lib/Kconfig
dir.17/linux-4.20.8/virt/lib/irqbypass.c
dir.17/linux-4.20.8/virt/Makefile
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault                                                    




Version-Release number of selected component (if applicable):
=================
[root@dhcp35-64 glusterfs]# rpm -qa|egrep "libc|kernel|gluster"
libcap-2.22-9.el7.x86_64
glusterfs-3.12.2-43.el7.x86_64
glibc-common-2.17-260.el7_6.3.x86_64
kernel-tools-libs-3.10.0-957.5.1.el7.x86_64
kernel-3.10.0-957.5.1.el7.x86_64
glusterfs-libs-3.12.2-43.el7.x86_64
glusterfs-client-xlators-3.12.2-43.el7.x86_64
kernel-3.10.0-957.el7.x86_64
libcroco-0.6.12-4.el7.x86_64
kernel-3.10.0-862.14.4.el7.x86_64
libcap-ng-0.7.5-4.el7.x86_64
libcurl-7.29.0-51.el7.x86_64
glibc-2.17-260.el7_6.3.x86_64
kernel-tools-3.10.0-957.5.1.el7.x86_64
glusterfs-fuse-3.12.2-43.el7.x86_64
libcom_err-1.42.9-13.el7.x86_64
[root@dhcp35-64 glusterfs]# cat /etc/red*
Red Hat Enterprise Linux Server release 7.6 (Maipo)


How reproducible:
================
hit it once

Steps to Reproduce:
=====================
1.created a single 3x3 volume on 4 node setup (brickmux disabled, and all settings are default)
2.mounted volume on 8 clients and triggered below IOs
IOs:
1) linux untar from all mounts ---> note that on 4 clients they were being done from non root user, after enabling access through ACLs 
2) was collecting resource consumption and appending to individual files on the mount point
3) continous lookups on all clients
4) same deep directory path creation from all 8 clients parallely
--kept the IOs going for about 2 days, and then as a random health check, --
5)after 2 days, enabled quota and uss
6) rebooted one node
7)
I triggered ls and ls -l to one of the directory paths and I see the above problem




Actual results:
==============
saw that on client 10.70.35.64, linux untar failed with segfault

Comment 7 Florian Weimer 2019-02-25 16:11:00 UTC

Without a coredump, there doesn't seem to be enough data to investigate this.

The crash at this address:

[Sat Feb 23 05:22:49 2019] tar[13200]: segfault at 0 ip 00007f1f35a51901 sp 00007ffcdec60f18 error 4 in libc-2.17.so[7f1f358e3000+1c2000]

is at the movdqu instruction here:

Dump of assembler code for function __strlen_sse2_pminub:
   0x000000000016e8f0 <+0>:	xor    %rax,%rax
   0x000000000016e8f3 <+3>:	mov    %edi,%ecx
   0x000000000016e8f5 <+5>:	and    $0x3f,%ecx
   0x000000000016e8f8 <+8>:	pxor   %xmm0,%xmm0
   0x000000000016e8fc <+12>:	cmp    $0x30,%ecx
   0x000000000016e8ff <+15>:	ja     0x16e91e <__strlen_sse2_pminub+46>
   0x000000000016e901 <+17>:	movdqu (%rdi),%xmm1

This means that something called strlen with a NULL argument.  That is still not enough to debug this further.  We do not even know the responsible component yet.

Maybe Carlos can have a look, but I really don't see how we can move this forward.

Comment 8 Carlos O'Donell 2019-02-25 21:09:29 UTC

(In reply to Florian Weimer from comment #7)
> This means that something called strlen with a NULL argument.  That is still
> not enough to debug this further.  We do not even know the responsible
> component yet.
> 
> Maybe Carlos can have a look, but I really don't see how we can move this
> forward.

We absolutely need a coredump.

It's vital to determining if this is a tar issue with invalid input, or a glibc issue with strlen.

Comment 11 Nag Pavan Chilakam 2019-02-28 13:52:33 UTC

Created attachment 1539495 [details]
coredump

Attached is the coredump for the segfault

Comment 12 Florian Weimer 2019-02-28 16:43:10 UTC

Thanks a lot for providing the coredump.  I think we are finally getting somewhere:

704	  if (dir[0])
705	    {
706	      while (dir[0] == '.' && ISSLASH (dir[1]))
707		for (dir += 2;  ISSLASH (*dir);  dir++)
708		  continue;
709	      if (! dir[dir[0] == '.'])
710		return wd_count - 1;
711	    }
712	
713	  wd[wd_count].name = dir;
714	  /* if the given name is an absolute path, then use that path
715	     to represent this working directory; otherwise, construct
716	     a path based on the previous -C option's absolute path */
717	  if (IS_ABSOLUTE_FILE_NAME (wd[wd_count].name))
718	    wd[wd_count].abspath = xstrdup (wd[wd_count].name);
719	  else
720	    {
721	      namebuf_t nbuf = namebuf_create (wd[wd_count - 1].abspath);
722	      namebuf_add_dir (nbuf, wd[wd_count].name);
723	      wd[wd_count].abspath = namebuf_finish (nbuf);
724	    }

abspath is NULL on line 721.

(gdb) print wd[wd_count]
$5 = {name = 0x1153130 "dir.101", abspath = 0x0, fd = 2}
(gdb) print wd[wd_count - 1]
$6 = {name = 0x43da38 ".", abspath = 0x0, fd = -100}
(gdb) print wd_count
$9 = 1

abspath is assigned here in src/misc.c:

int
chdir_arg (char const *dir)
{
  if (wd_count == wd_alloc)
    {
      if (wd_alloc == 0)
	{
	  wd_alloc = 2;
	  wd = xmalloc (sizeof *wd * wd_alloc);
	}
      else
	wd = x2nrealloc (wd, &wd_alloc, sizeof *wd);

      if (! wd_count)
	{
	  wd[wd_count].name = ".";
	  wd[wd_count].abspath = xgetcwd ();
	  wd[wd_count].fd = AT_FDCWD;
	  wd_count++;
	}
    }
…

However, gnu/xgetcwd.c actually does this:

/* Return the current directory, newly allocated.
   Upon an out-of-memory error, call xalloc_die.
   Upon any other type of error, return NULL.  */

char *
xgetcwd (void)
{
  char *cwd = getcwd (NULL, 0);
  if (! cwd && errno == ENOMEM)
    xalloc_die ();
  return cwd;
}

So even though this is an x* style interface, the caller has to check for a NULL result, and chdir_arg does not do this.

The observed NULL value may have been the result of the glibc change for additional getcwd error reporting in bug 1534635.

In any case, this is a bug in tar and needs to be fixed there.

Comment 13 Pavel Raiskup 2019-03-01 14:15:53 UTC

Forwarded upstream:
https://www.mail-archive.com/bug-tar@gnu.org/msg05768.html

Comment 14 Pavel Raiskup 2019-03-04 10:19:24 UTC

http://git.savannah.gnu.org/cgit/tar.git/commit/?id=66162927ebdfe9dd4ef570a132663fd76217952f

Comment 15 Honza Horak 2020-05-18 13:40:02 UTC

This bug does not seem to important enough to be fixed in RHEL-7 any more. I'd propose to check whether it is fixed in RHEL-8 and either move there or close it WONTFIX.

Comment 16 Ondrej Dubaj 2020-05-20 07:57:59 UTC

This issue is not fixed in rhel-8. I will create a tracker for rhel-8 and close this issue for rhel-7.

Comment 17 Ondrej Dubaj 2020-05-20 08:08:25 UTC

Cloned bug for rhel-8 #1837871 

Closing this issue as WONTFIX

Comment 20 Nag Pavan Chilakam 2020-05-27 11:18:46 UTC

Download the kernel tarball from kernel.org to a glusterfs mount and run below

for i in {1..100};do mkdir dir.$i; cp ../linux-5.3.2.tar.xz dir.$i/linux-5.3.2.tar.xz; echo "############ this is loop $i" >> untar.$i.log ;echo "############ this is loop $i" >> tarball.$i.log ;date >> untar.$i.log ;tar -xvf dir.$i/linux-5.3.2.tar.xz -C dir.$i/ 2>> untar.$i.log;date >> untar.$i.log ;date >> tarball.$i.log ;tar -cvf dir.$i/lin.my.tar dir.$i/linux-5.3.2 2>> tarball.$i.log;date >> tarball.$i.log ;done

Comment 21 Ondrej Dubaj 2020-06-01 07:58:05 UTC

Thank you. Can you please provide more details about your glusterFS ? How many nodes is necessary to reproduce the issue, how much disk memory do they need...

Comment 22 Nag Pavan Chilakam 2020-06-15 05:54:06 UTC

(In reply to Ondrej Dubaj from comment #21)
> Thank you. Can you please provide more details about your glusterFS ? How
> many nodes is necessary to reproduce the issue, how much disk memory do they
> need...

Glusterfs Server nodes: 3 should be enough with each of atleast 16GB. Will need 3x3 disks(LVs for using as glusterfs bricks) of 100GB in each node, apart from the OS disk
Client Nodes: 4 clients each of 4GB atleast. Each with about 40GB disk size is sufficient