Description of problem: ====================== Brick process crashed after upgrade from RHGS-3.3.1 async(7.4) to RHGS-3.4(7.5) Version-Release number of selected component (if applicable): ------------------------------------------------------------ RHGS version: ------------ from version glusterfs-3.8.4-54.el7 to glusterfs-3.12.2-4.el7 OS version: ---------- from RHEL 7.4 to RHEL7.5 How reproducible: ---------------- Tried once, Only one node faced this issue out of 5 nodes in 6 node cluster Steps to Reproduce: ------------------ 1. Create 6 RHEL-7.4 machines. 2. Install RHGS-3.3.1 async build on RHEL-7.4 machines. 3. Then add firewall-services(glusterfs, nfs, rpc-bind) to all the cluster servers 4. Then perform peer probe from one node to remaining all 5 servers. 5. Now all servers peer status is in connected state. 6. Create around 50 volumes which consisted of different topologies including two-way distributed-replica volumes, three way distributed-replica volumes, Arbitrated-replicate volumes, Distributed dispersed volumes. 7. Then mount 5 volumes to RHEL-7.4 client and 5 volumes to RHEL-7.5 client. 8. Kept 5 volumes in offline 9. Copy RHLE 7.5 repos and RHGS-3.4 repos into /etc/yum.repos.d 10. Stop glusterd, glusterfs, glusterfsd services of one node which is getting upgrade. 11. Then perform yum update of that particular node. 12. After upgrade, upgraded node all bricks went to offline. 13. Core file generated in '/' directory with name of 'core.6282' 14.below is core details ************************************************************************* Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/66/a1ad12474aef1b8a3aac8363ef99e4c06ca5ab [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 10.70.37.208 --volfile-id arbtr_10.10.70.37.208.bricks-'. Program terminated with signal 11, Segmentation fault. #0 server_inode_new (itable=0x0, gfid=gfid@entry=0x7f1824022070 "") at server-helpers.c:1314 1314 return itable->root; Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-18.el7.x86_64 libacl-2.2.51-14.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-11.el7.x86_64 libgcc-4.8.5-28.el7.x86_64 libselinux-2.5-12.el7.x86_64 libuuid-2.23.2-52.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 sssd-client-1.16.0-16.el7.x86_64 zlib-1.2.7-17.el7.x86_64 ******************************************************************************** 15. bt details ******************************************************************************** #0 server_inode_new (itable=0x0, gfid=gfid@entry=0x7f1824022070 "") at server-helpers.c:1314 #1 0x00007f184cd1c13d in resolve_gfid (frame=frame@entry=0x7f182401fa30) at server-resolve.c:205 #2 0x00007f184cd1d038 in server_resolve_inode (frame=frame@entry=0x7f182401fa30) at server-resolve.c:418 a#3 0x00007f184cd1d2b0 in server_resolve (frame=0x7f182401fa30) at server-resolve.c:559 #4 0x00007f184cd1c88e in server_resolve_all (frame=frame@entry=0x7f182401fa30) at server-resolve.c:611 #5 0x00007f184cd1d344 in resolve_and_resume (frame=frame@entry=0x7f182401fa30, fn=fn@entry=0x7f184cd2a910 <server_getxattr_resume>) at server-resolve.c:642 #6 0x00007f184cd3f638 in server3_3_getxattr (req=0x7f181c0132b0) at server-rpc-fops.c:5121 #7 0x00007f1861c9a246 in rpcsvc_request_handler (arg=0x7f1850040c90) at rpcsvc.c:1899 #8 0x00007f1860d37dd5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f1860600b3d in clone () from /lib64/libc.so.6 ******************************************************************************** Note : Only one node faced this issue out of 5 nodes in 6 node cluster, for first 4 nodes didn't face this issue,in 5th node upgrade seen this issue,still one more node yet to upgrade Actual results: All bricks went to offline in upgraded node, and core found. Expected results: All bricks should be in online , no cores should found Additional info:
copied brick logs and sosreport of upgraded in below path: qe.eng.blr.redhat.com:/var/www/html/sosreports/rajesh/1545277
I have followed the steps mentioned in above description, i have created same setup(6 node cluster) which is mentioned in desc. i didn't find any brick crashes and all bricks came to online after upgrade. No cores found in all cluster nodes. Verified in below version: glusterfs-server-3.12.2-8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607
*** Bug 1763865 has been marked as a duplicate of this bug. ***