Description of problem: ===================== I created about some 80 mp3 files on a ec volume and then attached a 2x2 tier volume followed with enabling ctr. Then i tried to heat them, by issueing touch * twice. While, the tier status shows as tier running, But I saw that the tier deamon crashed as below: [2015-10-29 10:38:08.044649] E [MSGID: 109037] [tier.c:316:tier_migrate_using_query_file] 0-athens-tier-dht: failed parsing Akon pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2015-10-29 10:38:08 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.5 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f03c2c43002] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f03c2c5f48d] /lib64/libc.so.6(+0x35650)[0x7f03c1331650] /lib64/libc.so.6(gsignal+0x37)[0x7f03c13315d7] /lib64/libc.so.6(abort+0x148)[0x7f03c1332cc8] /lib64/libc.so.6(+0x75e07)[0x7f03c1371e07] /lib64/libc.so.6(__fortify_fail+0x37)[0x7f03c1409a57] /lib64/libc.so.6(+0x10bc10)[0x7f03c1407c10] /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x586ac)[0x7f03b47e76ac] /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x59835)[0x7f03b47e8835] /lib64/libpthread.so.0(+0x7df5)[0x7f03c1aabdf5] /lib64/libc.so.6(clone+0x6d)[0x7f03c13f21ad] --------- NOTE: these were mp3 files with names having spaces. Backtrace: ========= [New LWP 15642] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebalance/athens --xlator-option'. Program terminated with signal 6, Aborted. #0 0x00007f03c13315d7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.7.5-0.3.el7rhgs.x86_64 (gdb) bt #0 0x00007f03c13315d7 in raise () from /lib64/libc.so.6 #1 0x00007f03c1332cc8 in abort () from /lib64/libc.so.6 #2 0x00007f03c1371e07 in __libc_message () from /lib64/libc.so.6 #3 0x00007f03c1409a57 in __fortify_fail () from /lib64/libc.so.6 #4 0x00007f03c1407c10 in __chk_fail () from /lib64/libc.so.6 #5 0x00007f03b47e76ac in tier_migrate_files_using_qfile.isra.4 () from /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so #6 0x00007f03b47e8835 in tier_promote () from /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so #7 0x00007f03c1aabdf5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f03c13f21ad in clone () from /lib64/libc.so.6 (gdb) quit Version-Release number of selected component (if applicable): ============================================================= glusterfs-server-3.7.5-0.3.el7rhgs.x86_64 [root@zod ~]# gluster v tier athens status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 0 0 in progress yarrow 0 0 in progress volume rebalance: athens: success: [root@zod ~]# gluster v rebal athens status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 in progress 582.00 yarrow 0 0Bytes 0 0 0 in progress 582.00 volume rebalance: athens: success: [root@zod ~]# gluster v athens status unrecognized word: athens (position 1) [root@zod ~]# gluster v status athens Status of volume: athens Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick yarrow:/rhs/brick6/athens_hot 49212 0 Y 30290 Brick zod:/rhs/brick6/athens_hot 49212 0 Y 14783 Brick yarrow:/rhs/brick7/athens_hot 49211 0 Y 30253 Brick zod:/rhs/brick7/athens_hot 49211 0 Y 14758 Cold Bricks: Brick zod:/rhs/brick1/athens 49208 0 Y 2610 Brick yarrow:/rhs/brick1/athens 49208 0 Y 26737 Brick zod:/rhs/brick2/athens 49209 0 Y 2628 Brick yarrow:/rhs/brick2/athens 49209 0 Y 26761 Brick zod:/rhs/brick3/athens 49210 0 Y 2646 Brick yarrow:/rhs/brick3/athens 49210 0 Y 26779 NFS Server on localhost 2049 0 Y 14814 Self-heal Daemon on localhost N/A N/A Y 14822 NFS Server on yarrow 2049 0 Y 30403 Self-heal Daemon on yarrow N/A N/A Y 30411 Task Status of Volume athens ------------------------------------------------------------------------------ Task : Tier migration ID : 1a884838-3269-4216-a673-a4599df2241f Status : in progress [root@zod ~]# ps -ef|grep athens root 2610 1 0 Oct28 ? 00:00:12 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick1-athens -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick1-athens.pid -S /var/run/gluster/2ad8876dd94aaa5b5423e0e651f23109.socket --brick-name /rhs/brick1/athens -l /var/log/glusterfs/bricks/rhs-brick1-athens.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49208 --xlator-option athens-server.listen-port=49208 root 2628 1 0 Oct28 ? 00:00:12 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick2-athens -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick2-athens.pid -S /var/run/gluster/1c56888663074eddd6c8a7538d14e146.socket --brick-name /rhs/brick2/athens -l /var/log/glusterfs/bricks/rhs-brick2-athens.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49209 --xlator-option athens-server.listen-port=49209 root 2646 1 0 Oct28 ? 00:00:12 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick3-athens -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick3-athens.pid -S /var/run/gluster/74296bde4d59aed9f6fc5288e71a13e3.socket --brick-name /rhs/brick3/athens -l /var/log/glusterfs/bricks/rhs-brick3-athens.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49210 --xlator-option athens-server.listen-port=49210 root 14758 1 0 15:57 ? 00:00:01 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick7-athens_hot -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick7-athens_hot.pid -S /var/run/gluster/ea58fdac54b1ee89d32e4f1becde8201.socket --brick-name /rhs/brick7/athens_hot -l /var/log/glusterfs/bricks/rhs-brick7-athens_hot.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49211 --xlator-option athens-server.listen-port=49211 root 14783 1 0 15:57 ? 00:00:02 /usr/sbin/glusterfsd -s zod --volfile-id athens.zod.rhs-brick6-athens_hot -p /var/lib/glusterd/vols/athens/run/zod-rhs-brick6-athens_hot.pid -S /var/run/gluster/76e90e8a10761302750a787d5c509fba.socket --brick-name /rhs/brick6/athens_hot -l /var/log/glusterfs/bricks/rhs-brick6-athens_hot.log --xlator-option *-posix.glusterd-uuid=b0fb1eba-04be-46a1-bf3a-3e2de81f307d --brick-port 49212 --xlator-option athens-server.listen-port=49212 root 16101 27536 0 16:04 pts/1 00:00:00 tail -f athens-tier.log root 29884 27664 0 19:03 pts/2 00:00:00 grep --color=auto athens [root@zod ~]# rpm -qa|grep gluster glusterfs-3.7.5-0.3.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-0.3.el7rhgs.x86_64 glusterfs-cli-3.7.5-0.3.el7rhgs.x86_64 glusterfs-libs-3.7.5-0.3.el7rhgs.x86_64 glusterfs-fuse-3.7.5-0.3.el7rhgs.x86_64 glusterfs-api-3.7.5-0.3.el7rhgs.x86_64 glusterfs-server-3.7.5-0.3.el7rhgs.x86_64 [root@zod ~]#
sosreports: [nchilaka@rhsqe-repo bug.1276334]$ pwd /home/repo/sosreports/nchilaka/bug.1276334 [nchilaka@rhsqe-repo bug.1276334]$ hostname rhsqe-repo.lab.eng.blr.redhat.com
From the core: (gdb) bt #0 0x00007f03c13315d7 in raise () from /lib64/libc.so.6 #1 0x00007f03c1332cc8 in abort () from /lib64/libc.so.6 #2 0x00007f03c1371e07 in __libc_message () from /lib64/libc.so.6 #3 0x00007f03c1409a57 in __fortify_fail () from /lib64/libc.so.6 #4 0x00007f03c1407c10 in __chk_fail () from /lib64/libc.so.6 #5 0x00007f03b47e76ac in strcpy (__src=<optimized out>, __dest=0x7f038affad20 "(Www.Black-E-Clipz.Skyrock.Com).mp3,/") at /usr/include/bits/string3.h:104 #6 tier_parse_query_str (link_size=0x7f0380000998, link_buffer=<optimized out>, gfid=0x7f038affad20 "(Www.Black-E-Clipz.Skyrock.Com).mp3,/", query_record_str=0x7f038affbe20 "(Www.Black-E-Clipz.Skyrock.Com).mp3,/Just") at tier.c:55 #7 tier_migrate_using_query_file (_args=0x7f0380007580) at tier.c:311 #8 tier_migrate_files_using_qfile (query_cbk_args=query_cbk_args@entry=0x7f038affce90, qfile=<optimized out>, comp=0x7f03ace1dc60) at tier.c:1059 #9 0x00007f03b47e8835 in tier_promote (args=0x7f03ace1dc60) at tier.c:1136 #10 0x00007f03c1aabdf5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f03c13f21ad in clone () from /lib64/libc.so.6 The names of the files being promoted had spaces. The tier_migrate_using_query_file function uses fscanf to read the query_file. fscanf treats spaces as a delimiter, completely messing up the parsing and causing the crash when a name was copied into the gfid buffer overflowing it. Solution: Use fgets() instead of fscanf().
Wrote a test program to read the promote query file using fscanf and fgets. Each line represents the contents of the buffer after a single fscanf/fgets call for the first two records: Output with fscanf: ------------------- 69d34f4c-377e-47af-85b2-a56d4f21a44f|00000000-0000-0000-0000-000000000001,01 Adele - Rolling in the Deep.mp3,/01 Adele - Rolling in the Deep.mp3,0,0|111 8aed7fc4-62db-47dc-b208-d39c3a7e4ebf|00000000-0000-0000-0000-000000000001,01 Adele - Set Fire to the Rain Lyrics‏.mp3,/01 Adele - Set Fire to the Rain Output with fgets: ------------------- 69d34f4c-377e-47af-85b2-a56d4f21a44f|00000000-0000-0000-0000-000000000001,01 Adele - Rolling in the Deep.mp3,/01 Adele - Rolling in the Deep.mp3,0,0|111 8aed7fc4-62db-47dc-b208-d39c3a7e4ebf|00000000-0000-0000-0000-000000000001,01 Adele - Set Fire to the Rain Lyrics‏.mp3,/01 Adele - Set Fire to the Rain Lyrics‏.mp3,0,0|137
Tested and verified the above bug on the build glusterfs-3.7.5-7.el7rhgs.x86_64 Created 50 files with names that included spaces, on a 2*2 regular volume. Attached a tier and accessed about 20 files- which were moved to hot tier. Created new files and they were created in the hot tier. Changed the mode to test and set the write counter to 5. Waited for the session time and all the files were moved back to the cold tier. Heated up 20 different files again (with writes) which were moved to hot tier and were shifted back to cold tier upon no access. No crashes were observed, with the expected output. Moving this bug to verified in 3.1.2. Detailed logs are attached.
Created attachment 1102132 [details] Server and client logs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html