Bug 1433460 - Extra line of ^@^@^@ at the end of the file
Summary: Extra line of ^@^@^@ at the end of the file
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: 3.10
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-17 16:57 UTC by patrice.linel
Modified: 2018-06-20 18:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 18:29:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description patrice.linel 2017-03-17 16:57:28 UTC
Description of problem:

When running a program on a distribute volume with sharding, sometimes
the files get created with an extra line of special characters ^@^@^@ at
the end. This happen under load with multiple nodes access the same
folders with job creating files and reading them very shortly on another
node (>0.5 sec later).
I did an strace and the major difference is an Inappropriate ioctl
followed by an lseek like below.


When running the same workload on a panfs filesystem , the strace is the
same but no extra line get created at the end.



Manually

ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
{B38400 opost isig icanon echo ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 32), ...}) = 0
readlink("/proc/self/fd/1", "/dev/pts/32", 4095) = 11
stat("/dev/pts/32", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 32),
...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
{B38400 opost isig icanon echo ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 32), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
{B38400 opost isig icanon echo ...}) = 0
write(1, " Done reading parameter file for"..., 57) = 57


VS

in the job :

ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
0x7fffffff6410) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
0x7fffffffa630) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(1, {st_mode=S_IFREG|0664, st_size=22753, ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
0x7fffffffa630) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " Done reading parameter file for"..., 57 Done reading
parameter file for chromosome            7
) = 57
lseek(1, 0, SEEK_CUR^@^@^@^@^@^@^@^@^@^@^@^@^@) 


Version-Release number of selected component (if applicable): 3.10


How reproducible: All the time


Steps to Reproduce:

Heavy workload running a compiled program creating/reading files.


Actual results: file with extra lines of ^@^@^@


Expected results: just eof .


Additional info:

mailing list thread: http://lists.gluster.org/pipermail/gluster-users/2017-March/030314.html

Comment 1 Niels de Vos 2017-03-28 12:33:09 UTC
The ioctl() mentioning TCGETS is unlikely related to the problem. Although, the strace shows the first parameter of the functions as "1", this is the file-descriptor where the function is executed on. By default, "1" refers to standard-out (/dev/stdout), which depends on the type of terminal, console or redirection to a file. It is strange to see lseek() being done on stdout.
Also, the strace show the ^@ symbols in part of the function call, this should not happen when writing the strace log to a file with "strace -o strace.log ...".

Could you explain a little bit more about the application?
- How is it writing files?
- Can you reproduce this on other volume types? (Try a single-brick volume too)
- How are you capturing the strace?
- Can you provide a shell script or simple application that reproduces this?
- What is the exact glusterfs version?
- Does this only happen over NFS, FUSE or other types of mounting?
- What options are set for the volume (output of 'gluster volume info <VOL>')?

Comment 2 patrice.linel 2017-04-03 22:30:15 UTC
Ok, I did some more digging, It took a while to isolate the bug.
> 
> Could you explain a little bit more about the application?
> - How is it writing files?
> - Can you reproduce this on other volume types? (Try a single-brick volume
> too)
> - How are you capturing the strace?
With this simplified program and the -o, the strace does not show anything particular anymore. Let me know if you still want to have them.

> - Can you provide a shell script or simple application that reproduces this?

Yes, the application is a fortran program that writes matrices to files.
the following code produce the bug:

program foo
implicit none
double precision,dimension(:,:),allocatable :: r
character(30) :: ifout
integer :: i,j
allocate(r(1000,10000))
call random_number(r)
r=r*10
call get_command_argument(1,ifout)

open(10,file=ifout)
do j=1,10000
write(10,'(1000(i1,1X))') int(r(:,j))
end do

close(10)
deallocate(r)
end program foo

compiled with gfortran(6.1.0) f.f90 and execute like ./a.out t.dat 



> - What is the exact glusterfs version?
3.10

> - Does this only happen over NFS, FUSE or other types of mounting?
fuse and ganesha, we add some instability with the fuse so now we are using ganesha v4.0 nfs.
> - What options are set for the volume (output of 'gluster volume info
> <VOL>')?
gluster volume info wau_inbox
 
Volume Name: wau_inbox
Type: Distribute
Volume ID: 460eb44f-aecb-426b-bd53-16e72522a422
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfs1:/glusterfs/vol1/brick1
Brick2: gfs2:/glusterfs/vol1/brick1
Options Reconfigured:
features.scrub-freq: weekly
features.scrub-throttle: normal
features.scrub: Active
features.bitrot: on
performance.cache-size: 1073741824
performance.io-thread-count: 4
user.smb: disable
user.cifs: disable
features.shard-block-size: 128MB
features.shard: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
features.cache-invalidation: on
performance.flush-behind: off
nfs-ganesha: enable


We have another similar volume without the sharding , and I can not reproduce the bug on it. 

below you can see the problem , the correct file are sized 20000000. 


-rw-rw-r-- 1 plinel plinel  20971520 Apr  3 17:12 132461.txt
-rw-rw-r-- 1 plinel plinel  21048576 Apr  3 17:12 132462.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132463.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132464.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132465.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132466.txt
-rw-rw-r-- 1 plinel plinel  22097152 Apr  3 17:12 132467.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132468.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132469.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132470.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132471.txt
-rw-rw-r-- 1 plinel plinel  21048576 Apr  3 17:12 132472.txt
-rw-rw-r-- 1 plinel plinel  21048576 Apr  3 17:12 132473.txt
-rw-rw-r-- 1 plinel plinel  21048576 Apr  3 17:12 132474.txt
-rw-rw-r-- 1 plinel plinel  21048576 Apr  3 17:12 132475.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132476.txt
-rw-rw-r-- 1 plinel plinel  21048576 Apr  3 17:12 132477.txt
-rw-rw-r-- 1 plinel plinel  20000000 Apr  3 17:12 132478.txt

Comment 3 Niels de Vos 2017-04-04 12:11:16 UTC
Thanks for the details! We should be able to reproduce this now with the little application you provided.

Does running the sample program reproduce the problem every time for you, or only every N out of M runs?

Comment 4 patrice.linel 2017-04-04 14:00:03 UTC
(In reply to Niels de Vos from comment #3)
>
> 
> Does running the sample program reproduce the problem every time for you, or
> only every N out of M runs?

I don't think it is consistent, my previous answer shows the files with numbers and their size, each number is a job id, you can see that it is not consistent, has sometimes it looks fine, and sometimes it does not. So if you are lucky the first time it will show you the bug, at least 1 every 10. Also all of those are concurrent, a quick calculation shows it happens 30% of the time.

Thanks
Patrice.

Comment 5 Shyamsundar 2018-06-20 18:29:07 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.