Bug 637431

Summary:

star: Implementation botch: with FIFO_MEOF

Product:

[Fedora] Fedora

Reporter:

wolfgang pichler <wolfgang.pichler>

Component:

star

Assignee:

Ondrej Vasik <ovasik>

Status:

CLOSED WONTFIX

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

CC:

kdudka, ovasik

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-06-28 12:20:52 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
bt full as requested	none

Description wolfgang pichler 2010-09-25 19:23:07 UTC

Description of problem:

star -diff -v -time -fifostats diffopts=!atime,lmtime,ctime -multivol new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/tmp/s2t.Btuhkqp2bA -C /srv/save samba grass streeruwitz IMG2

outputs on stderr ---

Label       2010_09_23__19_46 DATA
Release     star 1.5.1 (i686-redhat-linux-gnu)
Archtype    exustar
Dumpdate    1285264114.917621 (Thu Sep 23 19:48:34 2010)
Volno       1
Blocksize   20 records

[...] (other normal messages)

outputs on stdout ---

diffopts=perm,symperm,type,nlink,uid,gid,uname,gname,size,data,rdev,hardlink,symlink,sympath,sparse,mtime,dir,acl,xattr,fflags

[...] (other normal messages)

star: Implementation botch: with FIFO_MEOF
star: Did not wake up from fifo_chitape() - got 'S'.

returns --- ERRNO 141

Version-Release number of selected component (if applicable):

star: star 1.5.1 (i686-redhat-linux-gnu)
Options: acl find fflags remote Linux-xattr

... but i observe this error for
1) different versions since 1.4
2) different tapes

How reproducible:

bet on it
i do a workaround NOT to insist on correct -diff but to trust -c ;-))


---

i wrote some email to j.schilling about this years ago
but it was actually not answered ...

any clues ?
it would be amazing to have a nice -diff in star ...

greez w.

Comment 1 wolfgang pichler 2010-10-18 21:00:41 UTC

ANYONE TAKING THIS SERIOUS ???

Comment 2 Kamil Dudka 2010-10-18 21:36:21 UTC

Please provide a self contained test case.

Comment 3 Ondrej Vasik 2010-10-19 08:10:38 UTC

Just to summarize things:
1) Upstream is informed and ignores it for years (ok, I know Joerg is hard to talk)
2) You provided some reproducer, but it requires some things/paths which are specific only to your system. Please provide a clear step by step reproducer or at least more information how to reproduce it on common system (I don't have tape device at the moment). It's not clear from your report (no, I don't use tape device on my laptop at the moment).

Without additional information it is not really possible to work on the issue (actually when you reported the issue, I took a look into the sources and I thought I sent a comment requesting more information into bugzilla - but apparently I was disturbed by something before doing that, sorry for delay).

Comment 4 wolfgang pichler 2010-10-21 07:43:25 UTC

thank you for your attention

i also observe this error since years and tried to contact y.s. - outcome was not very happy for me too

1) in my experience this error 

- occurs on quantum dlt-v4 drives
- occurs only on large data operations

maybe it has to do with eof-detection on dlt ? no idea ...

2) i have no problem e.g. /w quantum dlt-vs160 drive with star-1.5a84-3.fc7

3) in other words : SAME backup-SCRIPT which issues

star -diff -v -time -fifostats diffopts=!atime,lmtime,ctime -multivol new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=128m -acl diffopts=!acl errctl=/tmp/s2t.xdDFU29637 -C /srv/save .

succeeds with dlt-vs160 star-1.5a84-3.fc7

BUT FAILS with dlt-v4 star-1.5.1-4.fc13

4) i would suggest some star-test-rpm which contains proper code to hunt down the error (note : we know the exit-message very well, so any interesting things can be triggered here ...)

5) 

- the rationale to use star was the xattr & acl feature
- since tar does this recently it is maynbe a good idea to switch to it
- on the other hand this would mean star to be superfluous and deprecate it

any thoughts on this altogether are welcome ...

thanks
wp

Comment 5 Kamil Dudka 2010-10-21 14:11:36 UTC

(In reply to comment #4)
> 4) i would suggest some star-test-rpm which contains proper code to hunt down
> the error (note : we know the exit-message very well, so any interesting things
> can be triggered here ...)

Could you please try the following debug build?  I've put there a breakpoint right before the exit-message:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2546721

Comment 6 wolfgang pichler 2010-10-21 17:02:49 UTC

ok, installed
star-1.5.1-5.2.fc13.i686.rpm
star-debuginfo-1.5.1-5.2.fc13.i686.rpm

/rbin/sys2tapeR.star: line 195:  5060 Segmentation fault      (core dumped) nice -n 10 star -c -v -time -fifostats -multivol VOLHDR="2010_10_21__18_51 DATA" new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/tmp/s2t.nTvbo2LVRG -C /srv/save samba grass streeruwitz IMG2 vco
ERRNO 139

Oct 21 18:53:50 tbs abrt[5062]: saved core dump of pid 5060 (/usr/bin/star) to /var/spool/abrt/ccpp-1287680025-5060.new/coredump (1208590336 bytes)
Oct 21 18:53:50 tbs abrtd: Directory 'ccpp-1287680025-5060' creation detected
Oct 21 18:53:51 tbs abrtd: Package 'star' isn't signed with proper key
Oct 21 18:53:51 tbs abrtd: Corrupted or bad crash /var/spool/abrt/ccpp-1287680025-5060 (res:5), deleting

questions :

- is sigsegv right after start intentional ?
- how to get a proper core unless via abrt ?

thx & greetings
wp

Comment 7 Kamil Dudka 2010-10-21 20:18:20 UTC

> - is sigsegv right after start intentional ?

Nope, I am not able to repeat the crash myself.

> - how to get a proper core unless via abrt ?

Please run it through gdb:

$ gdb -q --args star -diff -v -time -fifostats diffopts=!atime,lmtime,ctime -multivol new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/tmp/s2t.Btuhkqp2bA -C /srv/save samba grass streeruwitz IMG2

Comment 8 wolfgang pichler 2010-10-22 07:17:19 UTC

NOTE : i have to run a backup to tape first before i can compare it ... ;-))

gdb -q --args nice -n 10 star -c -v -time -fifostats -multivol VOLHDR="2010_10_22__09_00 DATA" new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /bin/nice...Reading symbols from /usr/lib/debug/bin/nice.debug...done.
done.
(gdb) run
Starting program: /bin/nice -n 10 star -c -v -time -fifostats -multivol VOLHDR=2010_10_22__09_00\ DATA new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
process 13256 is executing new program: /usr/bin/star
Missing separate debuginfos, use: debuginfo-install coreutils-8.4-8.fc13.i686
Detaching after fork from child process 13259.

Program received signal SIGSEGV, Segmentation fault.
0x0806b0b4 in marktcb (addr=0x9ffeb000 "././@PaxHeader") at buffer.c:863
863		if (bit_test(mp->bmap, bit))	/* Remove this paranoia test in future. */
Missing separate debuginfos, use: debuginfo-install glibc-2.12-3.i686
(gdb) bt
#0  0x0806b0b4 in marktcb (addr=0x9ffeb000 "././@PaxHeader") at buffer.c:863
#1  0x0805095a in write_tcb (ptb=0x9ffeb000, info=0xbfffd964) at header.c:1127
#2  0x08054c1f in write_xhdr (type=103) at xheader.c:346
#3  0x08054cd0 in info_to_xhdr (info=0xbfffef64, ptb=0xbffff02c)
    at xheader.c:377
#4  0x0805086b in put_tcb (ptb=0xbffff02c, info=0xbfffef64) at header.c:1101
#5  0x08073de1 in put_svolhdr (name=0xbffff5ce "2010_10_22__09_00 DATA")
    at volhdr.c:667
#6  0x08073c93 in put_volhdr (name=0xbffff5ce "2010_10_22__09_00 DATA", putv=1)
    at volhdr.c:634
#7  0x0804a9fd in star_create (ac=21, av=0xbffff3b8) at star.c:629
#8  0x0804a7f9 in main (ac=22, av=0xbffff3b4) at star.c:546
(gdb) 

rpm -qa | grep coreutils

coreutils-debuginfo-8.4-9.fc13.i686
coreutils-8.4-8.fc13.i686

i have not updated them and cannot disturb the running machine
hope this does not affect anything important ;-))

greez & thx
wp

Comment 9 Kamil Dudka 2010-10-26 11:36:45 UTC

The crash looks like some side effect of the star debugging facility.  Please try this one:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2554092

It's compiled without optimizations and with the mentioned breakpoint.  But there will be no additional debugging output, so you will probably need to dig the requested info yourself from gdb, as soon as it stops.

Comment 10 wolfgang pichler 2010-10-27 19:23:49 UTC

dig what ??? how ?
sorry, but it is over my possibilities to debug star ...
if you like i can run some code tomorrow 18:00 met ...
if not - i am currently investigating the tar/mbuffer thing heavily - life goes on also with buggy starlets ;-))

greez w

Comment 11 Kamil Dudka 2010-10-27 19:37:46 UTC

Full backtrace might be a viable start.  I suspect a synchronization issue here, which may be hard to track down.  Even worse when we have no reliable reproducer.  Any chance to repeat the behavior with a file only?

I could probably setup a machine with tape device, but it would take some time...

Comment 12 wolfgang pichler 2010-10-27 21:24:37 UTC

star-1.5.1-5.3.fc13.i686.rpm
star-debuginfo-1.5.1-5.3.fc13.i686.rpm
installed

here again the faulting /w tape use

gdb -q --args star -c -v -time -fifostats f=/dev/nst0 -multivol VOLHDR='2010_10_27__20_00 DATA' new-volume-script=/rbin/mtchgR.pl -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /usr/bin/star...Reading symbols from /usr/lib/debug/usr/bin/star.debug...done.
done.
(gdb) run
Starting program: /usr/bin/star -c -v -time -fifostats f=/dev/nst0 -multivol VOLHDR=2010_10_27__20_00\ DATA new-volume-script=/rbin/mtchgR.pl -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Detaching after fork from child process 10260.

Program received signal SIGSEGV, Segmentation fault.
0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
863		if (bit_test(mp->bmap, bit))	/* Remove this paranoia test in future. */
(gdb) bt
#0  0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
#1  0x080508e2 in write_tcb (ptb=0x97feb000, info=0xbfffdc44) at header.c:1127
#2  0x08054ba7 in write_xhdr (type=103) at xheader.c:346
#3  0x08054c58 in info_to_xhdr (info=0xbffff244, ptb=0xbffff30c)
    at xheader.c:377
#4  0x080507f3 in put_tcb (ptb=0xbffff30c, info=0xbffff244) at header.c:1101
#5  0x08073ab5 in put_svolhdr (name=0xbffff858 "2010_10_27__20_00 DATA")
    at volhdr.c:667
#6  0x08073967 in put_volhdr (name=0xbffff858 "2010_10_27__20_00 DATA", putv=1)
    at volhdr.c:634
#7  0x0804a9cd in star_create (ac=23, av=0xbffff698) at star.c:629
#8  0x0804a7c9 in main (ac=24, av=0xbffff694) at star.c:546
(gdb) 

and the faulting /w FILE USE

gdb -q --args star -c -v -time -fifostats f=/srv/save/test.exustar  -multivol VOLHDR='2010_10_27__20_00 DATA' -L 10g -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /usr/bin/star...Reading symbols from /usr/lib/debug/usr/bin/star.debug...done.
done.
(gdb) run
Starting program: /usr/bin/star -c -v -time -fifostats f=/srv/save/test.exustar -multivol VOLHDR=2010_10_27__20_00\ DATA -L 10g -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Detaching after fork from child process 10359.

Program received signal SIGSEGV, Segmentation fault.
0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
863        if (bit_test(mp->bmap, bit))    /* Remove this paranoia test in future. */
(gdb) bt
#0  0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
#1  0x080508e2 in write_tcb (ptb=0x97feb000, info=0xbfffdc54) at header.c:1127
#2  0x08054ba7 in write_xhdr (type=103) at xheader.c:346
#3  0x08054c58 in info_to_xhdr (info=0xbffff254, ptb=0xbffff31c)
    at xheader.c:377
#4  0x080507f3 in put_tcb (ptb=0xbffff31c, info=0xbffff254) at header.c:1101
#5  0x08073ab5 in put_svolhdr (name=0xbffff873 "2010_10_27__20_00 DATA")
    at volhdr.c:667
#6  0x08073967 in put_volhdr (name=0xbffff873 "2010_10_27__20_00 DATA", putv=1)
    at volhdr.c:634
#7  0x0804a9cd in star_create (ac=24, av=0xbffff6a8) at star.c:629
#8  0x0804a7c9 in main (ac=25, av=0xbffff6a4) at star.c:546
(gdb) 

file test.exustar is empty - no volheader was written

you need no tape-drive ... ;-))

what to do next ?
greez

Comment 13 wolfgang pichler 2010-10-27 22:05:20 UTC

same thing here ...

gdb -q --args star -c -v -time -fifostats f=/srv/save/test.exustar  -multivol -L 10g fs=512m  errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /usr/bin/star...Reading symbols from /usr/lib/debug/usr/bin/star.debug...done.
done.
(gdb) run
Starting program: /usr/bin/star -c -v -time -fifostats f=/srv/save/test.exustar -multivol -L 10g fs=512m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Detaching after fork from child process 10710.

Program received signal SIGSEGV, Segmentation fault.
0x0806afa0 in marktcb (addr=0x97fea000 "<none>") at buffer.c:863
863		if (bit_test(mp->bmap, bit))	/* Remove this paranoia test in future. */
(gdb) bt
#0  0x0806afa0 in marktcb (addr=0x97fea000 "<none>") at buffer.c:863
#1  0x080508e2 in write_tcb (ptb=0xbffff36c, info=0xbffff2a4) at header.c:1127
#2  0x0805080f in put_tcb (ptb=0xbffff36c, info=0xbffff2a4) at header.c:1105
#3  0x08073ab5 in put_svolhdr (name=0x808f9f3 "<none>") at volhdr.c:667
#4  0x08073967 in put_volhdr (name=0x808f9f3 "<none>", putv=1) at volhdr.c:634
#5  0x0804a9cd in star_create (ac=17, av=0xbffff6f8) at star.c:629
#6  0x0804a7c9 in main (ac=18, av=0xbffff6f4) at star.c:546
(gdb) 

BUT RUNS NORMALLY IF -multivol and -L is NOT USED

so it seems to me to be in handling of MULTIVOL

greez

Comment 14 Kamil Dudka 2010-10-27 22:43:39 UTC

Wait, your original report is about error "star: Implementation botch: with FIFO_MEOF", right?

What did introduce the SIGSEGV?

I didn't change anything like that in the last scratch build, only recompiled it with -O0 and put there a breakpoint.  Does the SIGSEGV happen if you downgrade to the stable star package?

Comment 15 wolfgang pichler 2010-10-28 05:43:49 UTC

ah you got it ;-)))
sigsegv IS INDEED a "new" problem which keeps me from hunting the fifo_meof
and no - stable current star-1.5.1-4.fc13 does not segfault in this way

Comment 16 Kamil Dudka 2010-10-28 07:38:29 UTC

Well, that's something hard to understand for me.  Please try to get the backtrace from stable star-1.5.1-4.fc13 then:

1) downgrade star and install the corresponding debuginfo for it
2) run it through gdb and put there a break point manually:

$ gdb -q --args star ...

(gdb) list fifo.c:551
(gdb) break fifo.c:551
(gdb) run
(gdb) bt full

3) submit the gdb output here as attachment

Comment 17 wolfgang pichler 2010-11-08 10:41:49 UTC

Created attachment 458701 [details]
bt full as requested

Comment 18 wolfgang pichler 2010-11-08 10:48:35 UTC

fyi :

backup consists of 2 or more tapes

error occurs always when comparing FIRST tape (at end of tape ?)
in other words : remaining tapes are NEVER requested ...

greez w

Comment 19 wolfgang pichler 2010-11-08 10:51:42 UTC

(sorry for the multiple comments - would be better one ... ;-)

mt -f /dev/nst0 status

SCSI 2 tape drive:
File number=1, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x51 (IBM 3592 J1A).
Soft error count since last status=0
General status bits on (81010000):
 EOF ONLINE IM_REP_EN

Comment 20 Kamil Dudka 2010-11-08 20:17:47 UTC

Oops, I didn't realize the error could have occurred in a child process.  So the debugger was not prepared for that -- the following command was missing:

(gdb) set follow-fork-mode child

The backtrace therefore describes a SIGPIPE, which is probably only a consequence of the crash.  I admit I don't understand much the code, but the backtrace goes through nextitape():

/*
 * High level input medium change routine.
 * Currently called from buf_rwait().
 * Volume verification in the fifo case is done in the fifo process.
 * For this reason, we only verify the new volume in the non fifo case.
 */

... so that you might be right that the crash is related to end of tape.

Comment 21 Bug Zapper 2011-05-31 12:36:11 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 22 Bug Zapper 2011-06-28 12:20:52 UTC

Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.