Bug 444958 - /sbin/restore SEGV (in readxattr at tape.c:1234 'if (curfile.dip->di_size > XATTR_MAXSIZE)'
/sbin/restore SEGV (in readxattr at tape.c:1234 'if (curfile.dip->di_size > X...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: dump (Show other bugs)
5.1
All Linux
high Severity high
: rc
: ---
Assigned To: Adam Tkac
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-02 07:06 EDT by Issue Tracker
Modified: 2013-04-30 19:39 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-21 07:35:32 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
core of crash (63.32 KB, application/x-gzip)
2008-05-27 08:46 EDT, Martin Poole
no flags Details

  None (edit)
Description Issue Tracker 2008-05-02 07:06:53 EDT
Escalated to Bugzilla from IssueTracker
Comment 1 Issue Tracker 2008-05-02 07:06:54 EDT
Hello,

the customer experienced a segmentation fault during the
usage of "restore" on their RHEL 5.1 (2.6.18-53.1.14.el5xen,
i686, dump-0.4b41-2.fc6.i386). 

They use a HP ProLiant DL380 G5 server with HP Storage Works
DAT72 USB tape drive, the command "dump" that they use is:

# dump -f /dev/nst0 -0a -A /tmp/file.toc -L 2008-03-01 /backup/2008-03-01

As the end of tape is reached, "dump" asks for the next
tape, so this backup goes on the end of the 1st tape and on
the beginning of the 2nd.

They restore of the content of what is on the beginning of
the 1st tape work, but when they reach what has been dumped
with the previous "dump":

# restore -f /dev/nst0 -xaov
[...]
extract file ./backup/2008-03-04/some.file.tar.bz
End-of-tape encountered
Mount tape volume 2 
Enter ``none'' if there are no more tapes
otherwise enter tape name (default: /dev/nst0)
Tape block size is 10
Missing blocks at the end of
./backup/2008-03-04/some.file.tar.bz, assuming hole
resync restore, skipped 30 blocks
Segmentation fault
#

and so nothing is restored.

Since this seems to be reproducible, they were able to
collect the "core" of "restore" and I'm attaching it here:
- core.26626.gz
Please see also the sosreport of their system:
- sosreport-LukaszLesniak.01-167163-6e433a.tar.bz2

If I open a "gdb" on this core I see:

$ gdb /sbin/restore core.26626
[...]
warning: shared library handler failed to enable breakpoint
Core was generated by `restore -f /dev/nst0 -b10 -xaov'.
Program terminated with signal 11, Segmentation fault.
#0  readxattr (buffer=0xbff5b34c "") at tape.c:1234
1234            if (curfile.dip->di_size > XATTR_MAXSIZE) {
(gdb) bt
#0  readxattr (buffer=0xbff5b34c "") at tape.c:1234
#1  0x08052758 in extractattr (path=0x8147bb6 "./2008-03-26/2008-03-26.10.2.2.19.tar.bz") at tape.c:1042
#2  0x08052931 in extractfile (ep=0x9798d70, doremove=0) at tape.c:1010
#3  0x0804dd86 in createfiles () at restore.c:1063
#4  0x0804d19a in main (argc=Cannot access memory at address 0x8180
) at main.c:603
#5  0x08090807 in __libc_start_main ()
#6  0x08048131 in _start ()
(gdb) print curfile
$1 = {name = 0x810687f "EA block", ino = 0, dip = 0x0, action = 3 '\\003'}
(gdb) print curfile.dip
$2 = (struct new_bsd_inode *) 0x0

So basically it appears as NULL pointer deference.

This is the section of the code ("restore/tape.c"):

--- CUT HERE ---
int
readxattr(char *buffer)
{
        if (dflag)
                msg("reading EA data for inode %lu\\n", curfile.ino);

        curfile.name = "EA block";
        if (curfile.dip->di_size > XATTR_MAXSIZE) {
                fprintf(stderr, "EA size too big (%ld)", (long)curfile.dip->di_size);
                skipfile();
                return (FAIL);
        }
--- CUT HERE ---

The definition of "curfile" is in "restore/restore.h":

--- CUT HERE ---
/*
 * The entry describes the next file available on the tape
 */
struct context {
        char    *name;          /* name of file */
        dump_ino_t ino;         /* inumber of file */
#if defined(__linux__) || defined(sunos)
        struct  new_bsd_inode *dip;     /* pointer to inode */
#else
        struct  dinode *dip;    /* pointer to inode */
#endif
        char    action;         /* action being taken on this file */
} curfile;
--- CUT HERE ---

while the definition of "new_bsd_inode" is in "compat/include/bsdcompat.h":

--- CUT HERE ---
/*
 * This is the new (4.4) BSD inode structure
 * copied from the FreeBSD 2.0 <ufs/ufs/dinode.h> include file
 */
struct new_bsd_inode {
        __u16           di_mode;
        __s16           di_nlink;
        union {
                __u16           oldids[2];
                __u32           inumber;
        }               di_u;
        u_quad_t        di_size;
        struct bsdtimeval       di_atime;
        struct bsdtimeval       di_mtime;
        struct bsdtimeval       di_ctime;
        __u32           di_db[NDADDR];
        __u32           di_ib[NIADDR];
        __u32           di_flags;
        __s32           di_blocks;
        __s32           di_gen;
        __u32           di_uid;
        __u32           di_gid;
        __s32           di_spare[2];
};
--- CUT HERE ---

I can see a similar report in BZ# 232415 (for Fedora 6), but
nothing relevant about RHEL 5.

Can you find a reason for this? Maybe the NULL pointer is an
expected behaviour (no information available here) and the
code should check it before accessing the content?

The severity has been set to "2-High" by the customer
because this prevents them to perform regular backups
(actually to use them).

Thanks, Leonardo.
This event sent from IssueTracker by mpoole  [SEG - Base OS]
 issue 177727
Comment 2 Martin Poole 2008-05-02 07:25:29 EDT
It looks like this is caused when a tape change occurs on the boundary between
the file contents and the extended attributes block.

The tape change itself will cause a resync in findinode() in tape.c and if it
does not meet an INODE block then the .dip pointer that triggered the SEGV is
not set.
Comment 3 Adam Tkac 2008-05-20 11:27:58 EDT
Let me check this theory.
Comment 4 Martin Poole 2008-05-27 08:46:46 EDT
Created attachment 306764 [details]
core of crash
Comment 5 Adam Tkac 2008-05-28 06:32:27 EDT
It really seems that problem is when file content block is on end of the first
tape and EA block is on second tape.
Would it be possible re-run restore with -d parameter and attach output, please?
Thanks
Comment 11 RHEL Product and Program Management 2009-01-21 07:35:32 EST
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.