Bug 182968

Summary:	Seek meaning of recently appearing error message "Assertion `off' failed."
Product:	[Fedora] Fedora	Reporter:	Richard Bonomo <bonomo>
Component:	xfsdump	Assignee:	Russell Cattelan <cattelan>
Status:	CLOSED UPSTREAM	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	4	CC:	cattelan
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-08-22 19:15:08 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Richard Bonomo 2006-02-24 20:07:56 UTC

Description of problem: Dumps have been failing frequently with this output to terminal:

**begin quote
/usr/sbin/xfsdump: using scsi tape (drive_scsitape) strategy
/usr/sbin/xfsdump: version 2.2.25 (dump format 3.0) - Running single-threaded

 ============================= dump label dialog 
==============================

please enter label for this dump session (timeout in 300 sec)
 -> Label call found.

session label entered: ""

 --------------------------------- end dialog ---------------------------------

/usr/sbin/xfsdump: WARNING: no session label specified
xfsdump: inv_stobj.c:1260: stobj_copy_invsess: Assertion `off' failed.
could not confirm session label
***** end quote

The "no session label" warning is normal (the label is null).

Version-Release number of selected component (if applicable):
xfsdump version 2.2.25

How reproducible:
The problem arose spontaneously a couple of weeks ago, after several months
of flawless operation.  It seems to occur most commonly with the "second" or later
epoch dump on the medium or with any incremental dump.

Steps to Reproduce:
1. run an "incremental" xfdsump, for example.
2.
3.
  
Actual results:
As noted above

Expected results:
A successfully started and completed backup

Additional info:
I am not aware of any changes to the system which may have caused this, unless defecitve
code came in via the nightly yum update.  I have also posted a query on an SGI open source
list.

Comment 1 Richard Bonomo 2006-05-12 18:32:18 UTC

I also noticed this behavior starting a couple of months ago. It got worse when I discovered I could not
read the dumpset with xfsrestore until I removed the drive from the sytem, attached it to a second
system, and then did the restore *remotely* from the first. Things got worse. I swapped tape drives. I
found that I could do a "tar" save and read, but not an xfsdump. Whenever tried to do a dump, I would
get a failure similar to this:
<joining dump in progress>
/usr/sbin/xfsdump: NOTE: pruned 21224 files: skip attribute set
/usr/sbin/xfsdump: ino map phase 3: skipping (no pruning necessary)
/usr/sbin/xfsdump: ino map phase 4: skipping (size estimated in phase 2)
/usr/sbin/xfsdump: ino map phase 5: skipping (only one dump stream)
/usr/sbin/xfsdump: ino map construction complete
/usr/sbin/xfsdump: estimated dump size: 171930662400 bytes
/usr/sbin/xfsdump: preparing drive
/usr/sbin/xfsdump: ERROR: unexpected tape error: errno 16 nread -1 blksz 1048576 recsz 1048576
isvar 1 wasatbot 1
eod 0 fmk 0 eot 0 onl 1 wprot 0 ew 0
/usr/sbin/xfsdump: ERROR: unexpected error from do_begin_read: 10
/usr/sbin/xfsdump: dump size (non-dir files) : 0 bytes
/usr/sbin/xfsdump: NOTE: dump interrupted: 1 seconds elapsed: may resume later using -R option
/usr/sbin/xfsdump: Dump Status: INTERRUPT
<end of quote>

Just last night I rebooted the system into a earlier version of the kernel (2.6.14-1.1656_FC4) instead of
the kernels thereafter (most recent 2.6.16-1.2108_FC4), which earlier kernel corresponds to the last
time dumps went well. There is a dump in progress now, for the first time in weeks. I hope to test
the dump with xfsrestore over the weekend. It looks like someone clobbered something in the update
from .14 to .15. bonomo.YYY.eduYYY (drop the Y's).

Comment 2 Richard Bonomo 2006-05-12 18:39:59 UTC

Goodness!  I forgot that I made the original report in February!

Comment 3 Eric Sandeen 2006-11-16 04:04:25 UTC

xfsdump only recently made it into fedora extras... do you still see this problem?

Comment 4 Richard Bonomo 2006-11-17 20:14:25 UTC

I have been keeping the kernel at 2.6.14-1.1656_FC4, and have had no problems with the dumps
with the kernel at this level.  I am sure that if I reverted back to 2.6.16-1.2108_FC4, the problem 
would recur.  I have not tried updating to the more recent kernels as I have seen no reports of the
faults evidently introduced in going from .14 to .15 being found and fixed.

Comment 5 Christian Iseli 2007-01-22 11:14:04 UTC

This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.

Comment 6 Richard Bonomo 2007-01-23 05:16:51 UTC

Since filing the original report in February 2006, my status has changed:
I have been layed-off (laid-off?) from my position, as of 11/30/2006, as
a consequence of a budget shortfall.  I am no longer in a position to supply
additional information.  I suggest checking for kernel changes 
between kernel versions referenced in comment # 4, and looking for 
tape-handling code which might have caused this.  Then it is a question
of whether the error has been propagated to the current release.
I have removed my address from the CC list.

Comment 7 Eric Sandeen 2007-08-22 19:15:08 UTC

This looks to me like an issue that needs to be addressed upstream with the
xfsdump maintainers, if it hasn't already been addressed.

http://www.oss.sgi.com/archives/xfs/2006-02/msg00100.html