Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
One of the customer faced I/O errors while archiving a huge file 11 TB and observed that after Tar had hit read I/O error due to xfs filesystem, it still continue writing 0's to the file using strace. However there was no indication for tar that it was writing 0's when the error occurred.
Later it was found that it is expected behavior to write 0's as the file header is already written. Hence, it need to be padded with 0's.
Using the reproducing steps provided by customer, we can see this behavior.
Padding 0's is expected behavior however it does so silently (for Read error at byte...), it should say it is Padding with zeros similar to how it reports "File Shrank , padding with zeroes"
Version-Release number of selected component (if applicable):
# rpm -q tar libtar
tar-1.30-5.el8.x86_64
libtar-1.2.20-15.el8.x86_64
How reproducible:
100%
The below script may not give you read error on every run however when Read error occurs, we do see that tar is writing 0's in strace.
Steps to Reproduce:
==================
The following steps provided by customer I can reproduce the issue. It setup a device with dmsetup using error target.
#!/bin/bash
# Reproducer "tardust"
#
# When "tar create" reads a file there are several shortcomings when it hits read error
#
# 1) When read() returns 0 bytes due to read error, then this happens
# read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# read(4, 0x563adef7b000, 3584) = -1 EIO (Input/output error)
# write(2, "tar: ", 5tar: ) = 5
# write(2, "/mntx/testfile: Read error at by"..., 70/mntx/testfile: Read error at byte 260653056, while reading 3584 bytes) = 70
# write(2, ": Input/output error", 20: Input/output error) = 20
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# Actual behaviour: it prints a message about "Read error", but it conceals the information it will pad the output with zeros
# Expected behaviour: it should also print the information "padding with zero"
# 2) There is a 2nd shortcoming about tar not differentiate between "read error" and "file shrinkage"
# That means when it sees a short read due to read error, it does not report read error.
# It looks like this:
# read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 2560
# write(2, "tar: ", 5tar: ) = 5
# write(2, "/mntx/testfile: File shrank by 5"..., 65/mntx/testfile: File shrank by 53927936 bytes; padding with zeros) = 65
# write(2, "\n", 1
# ) = 1
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# Summary: A read error is not reported here. At least it now says "padding with zeros"
# Expected behaviour: it should report a read error, so the user knows what it going on.
#
# 3) Side-Note:
# The blocking factor is applied to the output. When reading a file, all reads are misaligned by 512 bytes.
# This is because it writes a 512 header for every archived file.
# That means the first read from file is 512bytes too short:
# Running with tar-blocking-factor=7
# fstat(1, {st_mode=S_IFREG|0644, st_size=17827, ...}) = 0
# write(1, "/mntx/testfile\n", 15/mntx/testfile
# ) = 15
# read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3072) = 3072 #1st read 512bytes too short
# write(3, "mntx/testfile\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
# write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584
#
# 4) Reproducer overview:
# - Create a 500MB testimage, then create a testfile1 in the image
# - Use losetup/dmsetup with the "dust" target type
# - you can inject IO errors at specified block number in "dust"
# - You must hit a 4K boundary to see EIO, so use tar-blocking-factor=7 and
# - vary the bad blocknumber to find the case (1)
echo Step 1 Create disk image
dd if=/dev/zero of=/tmp/testimage bs=1M count=500 || exit
echo Step 2 Create XFS in image
mkfs.xfs /tmp/testimage || exit
echo Step 3 Use losetup so the file can be used a block device
losetup /dev/loop1 /tmp/testimage || exit
losetup
echo Step 6 Now create the testfile, this will have read error injected later
mkdir /mntx
mount /dev/loop1 /mntx || exit
dd if=/dev/zero of=/mntx/testfile bs=1M count=300 || exit
umount /mntx
echo Step7 Now iterating through bad blocks
echo As result, there are strace output file a1000 ... a1040
for i in `seq 1000 1 1040`
do
echo
echo Badblock $i
let ERR=i
let ERR1=i+1
let NUMSECTOR2=1024000-ERR1
#echo ERR1 is $ERR1
#echo NUMSECTOR2 is $NUMSECTOR2
dmsetup create tardust <<EOF
0 $ERR linear /dev/loop1 0
$ERR 1 error
$ERR1 $NUMSECTOR2 linear /dev/loop1 $ERR1
EOF
#dmsetup ls
#dmsetup status
#dmsetup table
mount /dev/mapper/tardust /mntx || exit
strace tar cvbf 7 /tmp/tardust.tar /mntx/testfile >&/tmp/a$i
umount /mntx
dmsetup remove tardust
grep -e error -e shrank /tmp/a$i
done
echo "Done: inspect the strace output file for error behaviour (grep error ; Look at last read()-call )"
losetup -d /dev/loop1
==================
Actual results:
- tar can report a disk read error as "Read error at byte", that is correct when it happens
- then it writes zeros (aka padding) according to initial file size (but does not print that message)
- the padding itself it correct again
Expected results:
- In addition it SHALL print "Padding with zeros". This is missing currently.
By correctly recognizing what the root cause is, the admin can take the right actions immediately.
Additional info:
Note on reproducibility:
My investigation found the root cause for non-reproducibility (i.e. you have to try several times):
- XFS does delayed allocation, so the disk layout can be different each run as shown with "xfs_bmap":
e.g.
"xfs_bmap /mntx/testfile"
/mntx/testfile:
0: [0..254119]: 256080..510199
1: [254120..508239]: 768080..1022199
2: [508240..614399]: 192..106351
- When using the "oflag=direct" for creating the testfile the layout can be made the same every time:
/mntx/testfile:
0: [0..253951]: 192..254143
1: [253952..507903]: 256080..510031
2: [507904..614399]: 523024..629519
- This explains why your experiments showed differing results/ you had to repeat
Summary: When creating the testfile with "dd" use "oflag=direct" to make it reproducible on every run.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (tar bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:7770
Description of problem: One of the customer faced I/O errors while archiving a huge file 11 TB and observed that after Tar had hit read I/O error due to xfs filesystem, it still continue writing 0's to the file using strace. However there was no indication for tar that it was writing 0's when the error occurred. Later it was found that it is expected behavior to write 0's as the file header is already written. Hence, it need to be padded with 0's. Using the reproducing steps provided by customer, we can see this behavior. Padding 0's is expected behavior however it does so silently (for Read error at byte...), it should say it is Padding with zeros similar to how it reports "File Shrank , padding with zeroes" Version-Release number of selected component (if applicable): # rpm -q tar libtar tar-1.30-5.el8.x86_64 libtar-1.2.20-15.el8.x86_64 How reproducible: 100% The below script may not give you read error on every run however when Read error occurs, we do see that tar is writing 0's in strace. Steps to Reproduce: ================== The following steps provided by customer I can reproduce the issue. It setup a device with dmsetup using error target. #!/bin/bash # Reproducer "tardust" # # When "tar create" reads a file there are several shortcomings when it hits read error # # 1) When read() returns 0 bytes due to read error, then this happens # read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # read(4, 0x563adef7b000, 3584) = -1 EIO (Input/output error) # write(2, "tar: ", 5tar: ) = 5 # write(2, "/mntx/testfile: Read error at by"..., 70/mntx/testfile: Read error at byte 260653056, while reading 3584 bytes) = 70 # write(2, ": Input/output error", 20: Input/output error) = 20 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # Actual behaviour: it prints a message about "Read error", but it conceals the information it will pad the output with zeros # Expected behaviour: it should also print the information "padding with zero" # 2) There is a 2nd shortcoming about tar not differentiate between "read error" and "file shrinkage" # That means when it sees a short read due to read error, it does not report read error. # It looks like this: # read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 2560 # write(2, "tar: ", 5tar: ) = 5 # write(2, "/mntx/testfile: File shrank by 5"..., 65/mntx/testfile: File shrank by 53927936 bytes; padding with zeros) = 65 # write(2, "\n", 1 # ) = 1 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # Summary: A read error is not reported here. At least it now says "padding with zeros" # Expected behaviour: it should report a read error, so the user knows what it going on. # # 3) Side-Note: # The blocking factor is applied to the output. When reading a file, all reads are misaligned by 512 bytes. # This is because it writes a 512 header for every archived file. # That means the first read from file is 512bytes too short: # Running with tar-blocking-factor=7 # fstat(1, {st_mode=S_IFREG|0644, st_size=17827, ...}) = 0 # write(1, "/mntx/testfile\n", 15/mntx/testfile # ) = 15 # read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3072) = 3072 #1st read 512bytes too short # write(3, "mntx/testfile\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) = 3584 # # 4) Reproducer overview: # - Create a 500MB testimage, then create a testfile1 in the image # - Use losetup/dmsetup with the "dust" target type # - you can inject IO errors at specified block number in "dust" # - You must hit a 4K boundary to see EIO, so use tar-blocking-factor=7 and # - vary the bad blocknumber to find the case (1) echo Step 1 Create disk image dd if=/dev/zero of=/tmp/testimage bs=1M count=500 || exit echo Step 2 Create XFS in image mkfs.xfs /tmp/testimage || exit echo Step 3 Use losetup so the file can be used a block device losetup /dev/loop1 /tmp/testimage || exit losetup echo Step 6 Now create the testfile, this will have read error injected later mkdir /mntx mount /dev/loop1 /mntx || exit dd if=/dev/zero of=/mntx/testfile bs=1M count=300 || exit umount /mntx echo Step7 Now iterating through bad blocks echo As result, there are strace output file a1000 ... a1040 for i in `seq 1000 1 1040` do echo echo Badblock $i let ERR=i let ERR1=i+1 let NUMSECTOR2=1024000-ERR1 #echo ERR1 is $ERR1 #echo NUMSECTOR2 is $NUMSECTOR2 dmsetup create tardust <<EOF 0 $ERR linear /dev/loop1 0 $ERR 1 error $ERR1 $NUMSECTOR2 linear /dev/loop1 $ERR1 EOF #dmsetup ls #dmsetup status #dmsetup table mount /dev/mapper/tardust /mntx || exit strace tar cvbf 7 /tmp/tardust.tar /mntx/testfile >&/tmp/a$i umount /mntx dmsetup remove tardust grep -e error -e shrank /tmp/a$i done echo "Done: inspect the strace output file for error behaviour (grep error ; Look at last read()-call )" losetup -d /dev/loop1 ================== Actual results: - tar can report a disk read error as "Read error at byte", that is correct when it happens - then it writes zeros (aka padding) according to initial file size (but does not print that message) - the padding itself it correct again Expected results: - In addition it SHALL print "Padding with zeros". This is missing currently. By correctly recognizing what the root cause is, the admin can take the right actions immediately. Additional info: