Bug 1276334 - Data Tiering:tiering deamon crashes when trying to heat the file
Summary: Data Tiering:tiering deamon crashes when trying to heat the file
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.1.2
Assignee: Nithya Balachandran
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1260783 1260923 1276562 1277587
TreeView+ depends on / blocked
 
Reported: 2015-10-29 13:35 UTC by Nag Pavan Chilakam
Modified: 2019-04-03 09:15 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.5-6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1276562 (view as bug list)
Environment:
Last Closed: 2016-03-01 05:48:48 UTC
Embargoed:


Attachments (Terms of Use)
Server and client logs (32.15 KB, application/vnd.oasis.opendocument.text)
2015-12-04 05:23 UTC, Sweta Anandpara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Nag Pavan Chilakam 2015-10-29 13:35:34 UTC
Description of problem:
=====================
I created about some 80 mp3 files on a ec volume and then attached a 2x2 tier volume followed with enabling ctr.
Then i tried to heat them, by issueing touch * twice.
While, the tier status shows as tier running, But I saw that the tier deamon crashed as below:

[2015-10-29 10:38:08.044649] E [MSGID: 109037] [tier.c:316:tier_migrate_using_query_file] 0-athens-tier-dht: failed parsing Akon

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 
2015-10-29 10:38:08
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.5
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f03c2c43002]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f03c2c5f48d]
/lib64/libc.so.6(+0x35650)[0x7f03c1331650]
/lib64/libc.so.6(gsignal+0x37)[0x7f03c13315d7]
/lib64/libc.so.6(abort+0x148)[0x7f03c1332cc8]
/lib64/libc.so.6(+0x75e07)[0x7f03c1371e07]
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f03c1409a57]
/lib64/libc.so.6(+0x10bc10)[0x7f03c1407c10]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x586ac)[0x7f03b47e76ac]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x59835)[0x7f03b47e8835]
/lib64/libpthread.so.0(+0x7df5)[0x7f03c1aabdf5]
/lib64/libc.so.6(clone+0x6d)[0x7f03c13f21ad]
---------



NOTE: these were mp3 files with names having spaces.







Backtrace:
=========
[New LWP 15642]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebalance/athens --xlator-option'.
Program terminated with signal 6, Aborted.
#0  0x00007f03c13315d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.7.5-0.3.el7rhgs.x86_64
(gdb) bt
#0  0x00007f03c13315d7 in raise () from /lib64/libc.so.6
#1  0x00007f03c1332cc8 in abort () from /lib64/libc.so.6
#2  0x00007f03c1371e07 in __libc_message () from /lib64/libc.so.6
#3  0x00007f03c1409a57 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007f03c1407c10 in __chk_fail () from /lib64/libc.so.6
#5  0x00007f03b47e76ac in tier_migrate_files_using_qfile.isra.4 ()
   from /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so
#6  0x00007f03b47e8835 in tier_promote ()
   from /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so
#7  0x00007f03c1aabdf5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f03c13f21ad in clone () from /lib64/libc.so.6
(gdb) quit



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-server-3.7.5-0.3.el7rhgs.x86_64

[root@zod ~]# gluster v tier athens status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            0                    0                    in progress         
yarrow               0                    0                    in progress         
volume rebalance: athens: success: 
[root@zod ~]# gluster v rebal athens status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress             582.00
                                  yarrow                0        0Bytes             0             0             0          in progress             582.00
volume rebalance: athens: success: 
[root@zod ~]# gluster v athens status
unrecognized word: athens (position 1)
[root@zod ~]# gluster v status athens
Status of volume: athens
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick yarrow:/rhs/brick6/athens_hot         49212     0          Y       30290
Brick zod:/rhs/brick6/athens_hot            49212     0          Y       14783
Brick yarrow:/rhs/brick7/athens_hot         49211     0          Y       30253
Brick zod:/rhs/brick7/athens_hot            49211     0          Y       14758
Cold Bricks:
Brick zod:/rhs/brick1/athens                49208     0          Y       2610 
Brick yarrow:/rhs/brick1/athens             49208     0          Y       26737
Brick zod:/rhs/brick2/athens                49209     0          Y       2628 
Brick yarrow:/rhs/brick2/athens             49209     0          Y       26761
Brick zod:/rhs/brick3/athens                49210     0          Y       2646 
Brick yarrow:/rhs/brick3/athens             49210     0          Y       26779
NFS Server on localhost                     2049      0          Y       14814
Self-heal Daemon on localhost               N/A       N/A        Y       14822
NFS Server on yarrow                        2049      0          Y       30403
Self-heal Daemon on yarrow                  N/A       N/A        Y       30411
 
Task Status of Volume athens
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 1a884838-3269-4216-a673-a4599df2241f
Status               : in progress         
 
[root@zod ~]# ps -ef|grep athens
root      2610     1  0 Oct28 ?        00:00:12 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick1-athens -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick1-athens.pid -S /var/run/gluster/2ad8876dd94aaa5b5423e0e651f23109.socket --brick-name /rhs/brick1/athens -l /var/log/glusterfs/bricks/rhs-brick1-athens.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49208 --xlator-option athens-server.listen-port=49208
root      2628     1  0 Oct28 ?        00:00:12 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick2-athens -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick2-athens.pid -S /var/run/gluster/1c56888663074eddd6c8a7538d14e146.socket --brick-name /rhs/brick2/athens -l /var/log/glusterfs/bricks/rhs-brick2-athens.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49209 --xlator-option athens-server.listen-port=49209
root      2646     1  0 Oct28 ?        00:00:12 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick3-athens -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick3-athens.pid -S /var/run/gluster/74296bde4d59aed9f6fc5288e71a13e3.socket --brick-name /rhs/brick3/athens -l /var/log/glusterfs/bricks/rhs-brick3-athens.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49210 --xlator-option athens-server.listen-port=49210
root     14758     1  0 15:57 ?        00:00:01 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick7-athens_hot -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick7-athens_hot.pid -S /var/run/gluster/ea58fdac54b1ee89d32e4f1becde8201.socket --brick-name /rhs/brick7/athens_hot -l /var/log/glusterfs/bricks/rhs-brick7-athens_hot.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49211 --xlator-option athens-server.listen-port=49211
root     14783     1  0 15:57 ?        00:00:02 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick6-athens_hot -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick6-athens_hot.pid -S /var/run/gluster/76e90e8a10761302750a787d5c509fba.socket --brick-name /rhs/brick6/athens_hot -l /var/log/glusterfs/bricks/rhs-brick6-athens_hot.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49212 --xlator-option athens-server.listen-port=49212
root     16101 27536  0 16:04 pts/1    00:00:00 tail -f athens-tier.log
root     29884 27664  0 19:03 pts/2    00:00:00 grep --color=auto athens
[root@zod ~]# rpm -qa|grep gluster
glusterfs-3.7.5-0.3.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-0.3.el7rhgs.x86_64
glusterfs-cli-3.7.5-0.3.el7rhgs.x86_64
glusterfs-libs-3.7.5-0.3.el7rhgs.x86_64
glusterfs-fuse-3.7.5-0.3.el7rhgs.x86_64
glusterfs-api-3.7.5-0.3.el7rhgs.x86_64
glusterfs-server-3.7.5-0.3.el7rhgs.x86_64
[root@zod ~]#

Comment 1 Nag Pavan Chilakam 2015-10-29 13:48:29 UTC
sosreports:
[nchilaka@rhsqe-repo bug.1276334]$ pwd
/home/repo/sosreports/nchilaka/bug.1276334
[nchilaka@rhsqe-repo bug.1276334]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com

Comment 3 Nithya Balachandran 2015-10-30 05:59:36 UTC
From the core:

(gdb) bt
#0  0x00007f03c13315d7 in raise () from /lib64/libc.so.6
#1  0x00007f03c1332cc8 in abort () from /lib64/libc.so.6
#2  0x00007f03c1371e07 in __libc_message () from /lib64/libc.so.6
#3  0x00007f03c1409a57 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007f03c1407c10 in __chk_fail () from /lib64/libc.so.6
#5  0x00007f03b47e76ac in strcpy (__src=<optimized out>, __dest=0x7f038affad20 "(Www.Black-E-Clipz.Skyrock.Com).mp3,/") at /usr/include/bits/string3.h:104
#6  tier_parse_query_str (link_size=0x7f0380000998, link_buffer=<optimized out>, gfid=0x7f038affad20 "(Www.Black-E-Clipz.Skyrock.Com).mp3,/", 
    query_record_str=0x7f038affbe20 "(Www.Black-E-Clipz.Skyrock.Com).mp3,/Just") at tier.c:55
#7  tier_migrate_using_query_file (_args=0x7f0380007580) at tier.c:311
#8  tier_migrate_files_using_qfile (query_cbk_args=query_cbk_args@entry=0x7f038affce90, qfile=<optimized out>, comp=0x7f03ace1dc60) at tier.c:1059
#9  0x00007f03b47e8835 in tier_promote (args=0x7f03ace1dc60) at tier.c:1136
#10 0x00007f03c1aabdf5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f03c13f21ad in clone () from /lib64/libc.so.6


The names of the files being promoted had spaces. The tier_migrate_using_query_file function uses fscanf to read the query_file. fscanf treats spaces as a delimiter, completely messing up the parsing and causing the crash when a name was copied into the gfid buffer overflowing it.


Solution: Use fgets() instead of fscanf().

Comment 4 Nithya Balachandran 2015-10-30 06:11:01 UTC
Wrote a test program to read the promote query file using fscanf and fgets.  Each line represents the contents of the buffer after a single fscanf/fgets call for the first two records:


Output with fscanf:
-------------------
69d34f4c-377e-47af-85b2-a56d4f21a44f|00000000-0000-0000-0000-000000000001,01
Adele
-
Rolling
in
the
Deep.mp3,/01
Adele
-
Rolling
in
the
Deep.mp3,0,0|111
8aed7fc4-62db-47dc-b208-d39c3a7e4ebf|00000000-0000-0000-0000-000000000001,01
Adele
-
Set
Fire
to
the
Rain
Lyrics&rlm;.mp3,/01
Adele
-
Set
Fire
to
the
Rain






Output with fgets:
-------------------
69d34f4c-377e-47af-85b2-a56d4f21a44f|00000000-0000-0000-0000-000000000001,01 Adele - Rolling in the Deep.mp3,/01 Adele - Rolling in the Deep.mp3,0,0|111

8aed7fc4-62db-47dc-b208-d39c3a7e4ebf|00000000-0000-0000-0000-000000000001,01 Adele - Set Fire to the Rain Lyrics&rlm;.mp3,/01 Adele - Set Fire to the Rain Lyrics&rlm;.mp3,0,0|137

Comment 6 Sweta Anandpara 2015-12-04 05:23:01 UTC
Tested and verified the above bug on the build glusterfs-3.7.5-7.el7rhgs.x86_64

Created 50 files with names that included spaces, on a 2*2 regular volume. Attached a tier and accessed about 20 files- which were moved to hot tier. Created new files and they were created in the hot tier. 

Changed the mode to test and set the write counter to 5. Waited for the session time and all the files were moved back to the cold tier. Heated up 20 different files again (with writes) which were moved to hot tier and were shifted back to cold tier upon no access. 

No crashes were observed, with the expected output. 

Moving this bug to verified in 3.1.2. Detailed logs are attached.

Comment 7 Sweta Anandpara 2015-12-04 05:23:40 UTC
Created attachment 1102132 [details]
Server and client logs

Comment 9 errata-xmlrpc 2016-03-01 05:48:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.