Bug 637431
| Summary: | star: Implementation botch: with FIFO_MEOF | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | wolfgang pichler <wolfgang.pichler> | ||||
| Component: | star | Assignee: | Ondrej Vasik <ovasik> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 13 | CC: | kdudka, ovasik | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-06-28 12:20:52 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
wolfgang pichler
2010-09-25 19:23:07 UTC
ANYONE TAKING THIS SERIOUS ??? Please provide a self contained test case. Just to summarize things: 1) Upstream is informed and ignores it for years (ok, I know Joerg is hard to talk) 2) You provided some reproducer, but it requires some things/paths which are specific only to your system. Please provide a clear step by step reproducer or at least more information how to reproduce it on common system (I don't have tape device at the moment). It's not clear from your report (no, I don't use tape device on my laptop at the moment). Without additional information it is not really possible to work on the issue (actually when you reported the issue, I took a look into the sources and I thought I sent a comment requesting more information into bugzilla - but apparently I was disturbed by something before doing that, sorry for delay). thank you for your attention i also observe this error since years and tried to contact y.s. - outcome was not very happy for me too 1) in my experience this error - occurs on quantum dlt-v4 drives - occurs only on large data operations maybe it has to do with eof-detection on dlt ? no idea ... 2) i have no problem e.g. /w quantum dlt-vs160 drive with star-1.5a84-3.fc7 3) in other words : SAME backup-SCRIPT which issues star -diff -v -time -fifostats diffopts=!atime,lmtime,ctime -multivol new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=128m -acl diffopts=!acl errctl=/tmp/s2t.xdDFU29637 -C /srv/save . succeeds with dlt-vs160 star-1.5a84-3.fc7 BUT FAILS with dlt-v4 star-1.5.1-4.fc13 4) i would suggest some star-test-rpm which contains proper code to hunt down the error (note : we know the exit-message very well, so any interesting things can be triggered here ...) 5) - the rationale to use star was the xattr & acl feature - since tar does this recently it is maynbe a good idea to switch to it - on the other hand this would mean star to be superfluous and deprecate it any thoughts on this altogether are welcome ... thanks wp (In reply to comment #4) > 4) i would suggest some star-test-rpm which contains proper code to hunt down > the error (note : we know the exit-message very well, so any interesting things > can be triggered here ...) Could you please try the following debug build? I've put there a breakpoint right before the exit-message: http://koji.fedoraproject.org/koji/taskinfo?taskID=2546721 ok, installed star-1.5.1-5.2.fc13.i686.rpm star-debuginfo-1.5.1-5.2.fc13.i686.rpm /rbin/sys2tapeR.star: line 195: 5060 Segmentation fault (core dumped) nice -n 10 star -c -v -time -fifostats -multivol VOLHDR="2010_10_21__18_51 DATA" new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/tmp/s2t.nTvbo2LVRG -C /srv/save samba grass streeruwitz IMG2 vco ERRNO 139 Oct 21 18:53:50 tbs abrt[5062]: saved core dump of pid 5060 (/usr/bin/star) to /var/spool/abrt/ccpp-1287680025-5060.new/coredump (1208590336 bytes) Oct 21 18:53:50 tbs abrtd: Directory 'ccpp-1287680025-5060' creation detected Oct 21 18:53:51 tbs abrtd: Package 'star' isn't signed with proper key Oct 21 18:53:51 tbs abrtd: Corrupted or bad crash /var/spool/abrt/ccpp-1287680025-5060 (res:5), deleting questions : - is sigsegv right after start intentional ? - how to get a proper core unless via abrt ? thx & greetings wp > - is sigsegv right after start intentional ? Nope, I am not able to repeat the crash myself. > - how to get a proper core unless via abrt ? Please run it through gdb: $ gdb -q --args star -diff -v -time -fifostats diffopts=!atime,lmtime,ctime -multivol new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/tmp/s2t.Btuhkqp2bA -C /srv/save samba grass streeruwitz IMG2 NOTE : i have to run a backup to tape first before i can compare it ... ;-))
gdb -q --args nice -n 10 star -c -v -time -fifostats -multivol VOLHDR="2010_10_22__09_00 DATA" new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /bin/nice...Reading symbols from /usr/lib/debug/bin/nice.debug...done.
done.
(gdb) run
Starting program: /bin/nice -n 10 star -c -v -time -fifostats -multivol VOLHDR=2010_10_22__09_00\ DATA new-volume-script=/rbin/mtchgR.pl f=/dev/nst0 H=exustar -xfflags -xattr -sparse fs=384m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
process 13256 is executing new program: /usr/bin/star
Missing separate debuginfos, use: debuginfo-install coreutils-8.4-8.fc13.i686
Detaching after fork from child process 13259.
Program received signal SIGSEGV, Segmentation fault.
0x0806b0b4 in marktcb (addr=0x9ffeb000 "././@PaxHeader") at buffer.c:863
863 if (bit_test(mp->bmap, bit)) /* Remove this paranoia test in future. */
Missing separate debuginfos, use: debuginfo-install glibc-2.12-3.i686
(gdb) bt
#0 0x0806b0b4 in marktcb (addr=0x9ffeb000 "././@PaxHeader") at buffer.c:863
#1 0x0805095a in write_tcb (ptb=0x9ffeb000, info=0xbfffd964) at header.c:1127
#2 0x08054c1f in write_xhdr (type=103) at xheader.c:346
#3 0x08054cd0 in info_to_xhdr (info=0xbfffef64, ptb=0xbffff02c)
at xheader.c:377
#4 0x0805086b in put_tcb (ptb=0xbffff02c, info=0xbfffef64) at header.c:1101
#5 0x08073de1 in put_svolhdr (name=0xbffff5ce "2010_10_22__09_00 DATA")
at volhdr.c:667
#6 0x08073c93 in put_volhdr (name=0xbffff5ce "2010_10_22__09_00 DATA", putv=1)
at volhdr.c:634
#7 0x0804a9fd in star_create (ac=21, av=0xbffff3b8) at star.c:629
#8 0x0804a7f9 in main (ac=22, av=0xbffff3b4) at star.c:546
(gdb)
rpm -qa | grep coreutils
coreutils-debuginfo-8.4-9.fc13.i686
coreutils-8.4-8.fc13.i686
i have not updated them and cannot disturb the running machine
hope this does not affect anything important ;-))
greez & thx
wp
The crash looks like some side effect of the star debugging facility. Please try this one: http://koji.fedoraproject.org/koji/taskinfo?taskID=2554092 It's compiled without optimizations and with the mentioned breakpoint. But there will be no additional debugging output, so you will probably need to dig the requested info yourself from gdb, as soon as it stops. dig what ??? how ? sorry, but it is over my possibilities to debug star ... if you like i can run some code tomorrow 18:00 met ... if not - i am currently investigating the tar/mbuffer thing heavily - life goes on also with buggy starlets ;-)) greez w Full backtrace might be a viable start. I suspect a synchronization issue here, which may be hard to track down. Even worse when we have no reliable reproducer. Any chance to repeat the behavior with a file only? I could probably setup a machine with tape device, but it would take some time... star-1.5.1-5.3.fc13.i686.rpm
star-debuginfo-1.5.1-5.3.fc13.i686.rpm
installed
here again the faulting /w tape use
gdb -q --args star -c -v -time -fifostats f=/dev/nst0 -multivol VOLHDR='2010_10_27__20_00 DATA' new-volume-script=/rbin/mtchgR.pl -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /usr/bin/star...Reading symbols from /usr/lib/debug/usr/bin/star.debug...done.
done.
(gdb) run
Starting program: /usr/bin/star -c -v -time -fifostats f=/dev/nst0 -multivol VOLHDR=2010_10_27__20_00\ DATA new-volume-script=/rbin/mtchgR.pl -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Detaching after fork from child process 10260.
Program received signal SIGSEGV, Segmentation fault.
0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
863 if (bit_test(mp->bmap, bit)) /* Remove this paranoia test in future. */
(gdb) bt
#0 0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
#1 0x080508e2 in write_tcb (ptb=0x97feb000, info=0xbfffdc44) at header.c:1127
#2 0x08054ba7 in write_xhdr (type=103) at xheader.c:346
#3 0x08054c58 in info_to_xhdr (info=0xbffff244, ptb=0xbffff30c)
at xheader.c:377
#4 0x080507f3 in put_tcb (ptb=0xbffff30c, info=0xbffff244) at header.c:1101
#5 0x08073ab5 in put_svolhdr (name=0xbffff858 "2010_10_27__20_00 DATA")
at volhdr.c:667
#6 0x08073967 in put_volhdr (name=0xbffff858 "2010_10_27__20_00 DATA", putv=1)
at volhdr.c:634
#7 0x0804a9cd in star_create (ac=23, av=0xbffff698) at star.c:629
#8 0x0804a7c9 in main (ac=24, av=0xbffff694) at star.c:546
(gdb)
and the faulting /w FILE USE
gdb -q --args star -c -v -time -fifostats f=/srv/save/test.exustar -multivol VOLHDR='2010_10_27__20_00 DATA' -L 10g -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Reading symbols from /usr/bin/star...Reading symbols from /usr/lib/debug/usr/bin/star.debug...done.
done.
(gdb) run
Starting program: /usr/bin/star -c -v -time -fifostats f=/srv/save/test.exustar -multivol VOLHDR=2010_10_27__20_00\ DATA -L 10g -b 1024 fs=512m H=exustar -xfflags -xattr -sparse errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco
Detaching after fork from child process 10359.
Program received signal SIGSEGV, Segmentation fault.
0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
863 if (bit_test(mp->bmap, bit)) /* Remove this paranoia test in future. */
(gdb) bt
#0 0x0806afa0 in marktcb (addr=0x97feb000 "././@PaxHeader") at buffer.c:863
#1 0x080508e2 in write_tcb (ptb=0x97feb000, info=0xbfffdc54) at header.c:1127
#2 0x08054ba7 in write_xhdr (type=103) at xheader.c:346
#3 0x08054c58 in info_to_xhdr (info=0xbffff254, ptb=0xbffff31c)
at xheader.c:377
#4 0x080507f3 in put_tcb (ptb=0xbffff31c, info=0xbffff254) at header.c:1101
#5 0x08073ab5 in put_svolhdr (name=0xbffff873 "2010_10_27__20_00 DATA")
at volhdr.c:667
#6 0x08073967 in put_volhdr (name=0xbffff873 "2010_10_27__20_00 DATA", putv=1)
at volhdr.c:634
#7 0x0804a9cd in star_create (ac=24, av=0xbffff6a8) at star.c:629
#8 0x0804a7c9 in main (ac=25, av=0xbffff6a4) at star.c:546
(gdb)
file test.exustar is empty - no volheader was written
you need no tape-drive ... ;-))
what to do next ?
greez
same thing here ... gdb -q --args star -c -v -time -fifostats f=/srv/save/test.exustar -multivol -L 10g fs=512m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco Reading symbols from /usr/bin/star...Reading symbols from /usr/lib/debug/usr/bin/star.debug...done. done. (gdb) run Starting program: /usr/bin/star -c -v -time -fifostats f=/srv/save/test.exustar -multivol -L 10g fs=512m errctl=/srv/save/errctl -C /srv/save samba grass streeruwitz IMG2 vco Detaching after fork from child process 10710. Program received signal SIGSEGV, Segmentation fault. 0x0806afa0 in marktcb (addr=0x97fea000 "<none>") at buffer.c:863 863 if (bit_test(mp->bmap, bit)) /* Remove this paranoia test in future. */ (gdb) bt #0 0x0806afa0 in marktcb (addr=0x97fea000 "<none>") at buffer.c:863 #1 0x080508e2 in write_tcb (ptb=0xbffff36c, info=0xbffff2a4) at header.c:1127 #2 0x0805080f in put_tcb (ptb=0xbffff36c, info=0xbffff2a4) at header.c:1105 #3 0x08073ab5 in put_svolhdr (name=0x808f9f3 "<none>") at volhdr.c:667 #4 0x08073967 in put_volhdr (name=0x808f9f3 "<none>", putv=1) at volhdr.c:634 #5 0x0804a9cd in star_create (ac=17, av=0xbffff6f8) at star.c:629 #6 0x0804a7c9 in main (ac=18, av=0xbffff6f4) at star.c:546 (gdb) BUT RUNS NORMALLY IF -multivol and -L is NOT USED so it seems to me to be in handling of MULTIVOL greez Wait, your original report is about error "star: Implementation botch: with FIFO_MEOF", right? What did introduce the SIGSEGV? I didn't change anything like that in the last scratch build, only recompiled it with -O0 and put there a breakpoint. Does the SIGSEGV happen if you downgrade to the stable star package? ah you got it ;-))) sigsegv IS INDEED a "new" problem which keeps me from hunting the fifo_meof and no - stable current star-1.5.1-4.fc13 does not segfault in this way Well, that's something hard to understand for me. Please try to get the backtrace from stable star-1.5.1-4.fc13 then: 1) downgrade star and install the corresponding debuginfo for it 2) run it through gdb and put there a break point manually: $ gdb -q --args star ... (gdb) list fifo.c:551 (gdb) break fifo.c:551 (gdb) run (gdb) bt full 3) submit the gdb output here as attachment Created attachment 458701 [details]
bt full as requested
fyi : backup consists of 2 or more tapes error occurs always when comparing FIRST tape (at end of tape ?) in other words : remaining tapes are NEVER requested ... greez w (sorry for the multiple comments - would be better one ... ;-) mt -f /dev/nst0 status SCSI 2 tape drive: File number=1, block number=0, partition=0. Tape block size 0 bytes. Density code 0x51 (IBM 3592 J1A). Soft error count since last status=0 General status bits on (81010000): EOF ONLINE IM_REP_EN Oops, I didn't realize the error could have occurred in a child process. So the debugger was not prepared for that -- the following command was missing: (gdb) set follow-fork-mode child The backtrace therefore describes a SIGPIPE, which is probably only a consequence of the crash. I admit I don't understand much the code, but the backtrace goes through nextitape(): /* * High level input medium change routine. * Currently called from buf_rwait(). * Volume verification in the fifo case is done in the fifo process. * For this reason, we only verify the new volume in the non fifo case. */ ... so that you might be right that the crash is related to end of tape. This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |