Escalated to Bugzilla from IssueTracker
Hello, the customer experienced a segmentation fault during the usage of "restore" on their RHEL 5.1 (2.6.18-53.1.14.el5xen, i686, dump-0.4b41-2.fc6.i386). They use a HP ProLiant DL380 G5 server with HP Storage Works DAT72 USB tape drive, the command "dump" that they use is: # dump -f /dev/nst0 -0a -A /tmp/file.toc -L 2008-03-01 /backup/2008-03-01 As the end of tape is reached, "dump" asks for the next tape, so this backup goes on the end of the 1st tape and on the beginning of the 2nd. They restore of the content of what is on the beginning of the 1st tape work, but when they reach what has been dumped with the previous "dump": # restore -f /dev/nst0 -xaov [...] extract file ./backup/2008-03-04/some.file.tar.bz End-of-tape encountered Mount tape volume 2 Enter ``none'' if there are no more tapes otherwise enter tape name (default: /dev/nst0) Tape block size is 10 Missing blocks at the end of ./backup/2008-03-04/some.file.tar.bz, assuming hole resync restore, skipped 30 blocks Segmentation fault # and so nothing is restored. Since this seems to be reproducible, they were able to collect the "core" of "restore" and I'm attaching it here: - core.26626.gz Please see also the sosreport of their system: - sosreport-LukaszLesniak.01-167163-6e433a.tar.bz2 If I open a "gdb" on this core I see: $ gdb /sbin/restore core.26626 [...] warning: shared library handler failed to enable breakpoint Core was generated by `restore -f /dev/nst0 -b10 -xaov'. Program terminated with signal 11, Segmentation fault. #0 readxattr (buffer=0xbff5b34c "") at tape.c:1234 1234 if (curfile.dip->di_size > XATTR_MAXSIZE) { (gdb) bt #0 readxattr (buffer=0xbff5b34c "") at tape.c:1234 #1 0x08052758 in extractattr (path=0x8147bb6 "./2008-03-26/2008-03-26.10.2.2.19.tar.bz") at tape.c:1042 #2 0x08052931 in extractfile (ep=0x9798d70, doremove=0) at tape.c:1010 #3 0x0804dd86 in createfiles () at restore.c:1063 #4 0x0804d19a in main (argc=Cannot access memory at address 0x8180 ) at main.c:603 #5 0x08090807 in __libc_start_main () #6 0x08048131 in _start () (gdb) print curfile $1 = {name = 0x810687f "EA block", ino = 0, dip = 0x0, action = 3 '\\003'} (gdb) print curfile.dip $2 = (struct new_bsd_inode *) 0x0 So basically it appears as NULL pointer deference. This is the section of the code ("restore/tape.c"): --- CUT HERE --- int readxattr(char *buffer) { if (dflag) msg("reading EA data for inode %lu\\n", curfile.ino); curfile.name = "EA block"; if (curfile.dip->di_size > XATTR_MAXSIZE) { fprintf(stderr, "EA size too big (%ld)", (long)curfile.dip->di_size); skipfile(); return (FAIL); } --- CUT HERE --- The definition of "curfile" is in "restore/restore.h": --- CUT HERE --- /* * The entry describes the next file available on the tape */ struct context { char *name; /* name of file */ dump_ino_t ino; /* inumber of file */ #if defined(__linux__) || defined(sunos) struct new_bsd_inode *dip; /* pointer to inode */ #else struct dinode *dip; /* pointer to inode */ #endif char action; /* action being taken on this file */ } curfile; --- CUT HERE --- while the definition of "new_bsd_inode" is in "compat/include/bsdcompat.h": --- CUT HERE --- /* * This is the new (4.4) BSD inode structure * copied from the FreeBSD 2.0 <ufs/ufs/dinode.h> include file */ struct new_bsd_inode { __u16 di_mode; __s16 di_nlink; union { __u16 oldids[2]; __u32 inumber; } di_u; u_quad_t di_size; struct bsdtimeval di_atime; struct bsdtimeval di_mtime; struct bsdtimeval di_ctime; __u32 di_db[NDADDR]; __u32 di_ib[NIADDR]; __u32 di_flags; __s32 di_blocks; __s32 di_gen; __u32 di_uid; __u32 di_gid; __s32 di_spare[2]; }; --- CUT HERE --- I can see a similar report in BZ# 232415 (for Fedora 6), but nothing relevant about RHEL 5. Can you find a reason for this? Maybe the NULL pointer is an expected behaviour (no information available here) and the code should check it before accessing the content? The severity has been set to "2-High" by the customer because this prevents them to perform regular backups (actually to use them). Thanks, Leonardo. This event sent from IssueTracker by mpoole [SEG - Base OS] issue 177727
It looks like this is caused when a tape change occurs on the boundary between the file contents and the extended attributes block. The tape change itself will cause a resync in findinode() in tape.c and if it does not meet an INODE block then the .dip pointer that triggered the SEGV is not set.
Let me check this theory.
Created attachment 306764 [details] core of crash
It really seems that problem is when file content block is on end of the first tape and EA block is on second tape. Would it be possible re-run restore with -d parameter and attach output, please? Thanks
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.