| Summary: | [Ganesha + Multi-Volume] : Ganesha crashes on nfs-ganesha restarts | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
| Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Manisha Saini <msaini> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | amukherj, bturner, dang, ffilz, jthottan, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | Flags: | skoduri:
needinfo?
(asoman) skoduri: needinfo? (asoman) |
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-19 08:21:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
This looks like a failure to start up, causing a shutdown without being fully initialized. Are there Ganesha logs from this run? (In reply to Daniel Gryniewicz from comment #4) > This looks like a failure to start up, causing a shutdown without being > fully initialized. Are there Ganesha logs from this run? You can get the logs location mentioned in c#2 and issue happened in node gqas014 Reason for crash: Here there are three export shares : testvol1,testvol2,testvol3. But out of three testvol1 & testvol3 ended up having same export id and it fails to export testvol3. During unexport it seems there stale export entry related to testvol3 and it triggers crash. So ideally during export failure ganesha should clean up export entry during cleanup(IMO it is not happening properly) There is another bug opened for export id issue BZ#1398257 and posted a fix conflicting export id. So that we won't end up in this kind of situaion So I was right, and this is a minor bug. Ganesha will exit anyway, so the crash won't cause more problems than the shutdown would. I'll work on a fix. How are the exports added and removed here? My testing is unable to reproduce a crash. (In reply to Daniel Gryniewicz from comment #7) > How are the exports added and removed here? My testing is unable to > reproduce a crash. Even I didn't try to reproduce it in my setup IMO below should result in crash for nfs-ganesha-2.4.1 Create three export block in which two of them have same Export_Id(above case it was 4,3,4) and start ganesha process. During stop it should dump a core This is exactly what I did, and I saw no crash, with several attempts. (In reply to Daniel Gryniewicz from comment #9) > This is exactly what I did, and I saw no crash, with several attempts. I will try to reproduce it on my environment and let u know |
Description of problem: ------------------------ 4 node cluster. I had 3 volumes within the cluster - 2 single brick and 1 replicate. Restarted Ganesha service and it crashed and dumped core on all nodes. Two of them were unique : ******* Core 1 ******* (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007f197ff643b1 in shutdown_export (export=0x7f1981b10a70) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:152 #2 destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:194 #3 0x00007f197ff8c27f in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:446 #4 admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:466 #5 0x00007f197e4eadc5 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f197dbb973d in clone () from /lib64/libc.so.6 (gdb) q ****** Core 2 ****** (gdb) bt #0 0x00007f9385a75e61 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f92fa58e48f in export_release (exp_hdl=0x7f92f40d1e20) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/FSAL_GLUSTER/export.c:74 #2 0x00007f93874ee3b1 in shutdown_export (export=0x7f92f40d1e20) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:152 #3 destroy_fsals () at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_destroyer.c:194 #4 0x00007f938751627f in do_shutdown () at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:446 #5 admin_thread (UnusedArg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_admin_thread.c:466 #6 0x00007f9385a74dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f938514373d in clone () from /lib64/libc.so.6 (gdb) q This really looks specific to a multi-volume scenario. I restarted Ganesha service a couple of times with single volume in the cluster,and I hit the crash reported in https://bugzilla.redhat.com/show_bug.cgi?id=1393526.The ones mentioned above look new/unreported. Version-Release number of selected component (if applicable): ------------------------------------------------------------- nfs-ganesha-2.4.1-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 How reproducible: ----------------- 2/2 Steps to Reproduce: ------------------- 1. Create a 4 node cluster with > 1 volume. 2. Restart Ganesha service. Actual results: --------------- Ganesha crashes and dumps core. Expected results: ------------------- Ganesha shouldn't crash. Additional info: ----------------- OS : RHEL 7.3 *Vol Config* : Volume Name: testvol1 Type: Distribute Volume ID: 547da16a-b56e-46d4-ab69-98b56b6aa07e Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol1_brick0 Options Reconfigured: ganesha.enable: on features.cache-invalidation: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Volume Name: testvol2 Type: Distribute Volume ID: d73c4b60-b3f6-492f-9825-02021f625cee Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol2_brick1 Options Reconfigured: ganesha.enable: on features.cache-invalidation: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Volume Name: testvol3 Type: Replicate Volume ID: e7e6890e-87aa-486a-b164-732973b4b179 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gqas009.sbu.lab.eng.bos.redhat.com:/bricks/testvol3_brick2 Brick2: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol3_brick3 Options Reconfigured: ganesha.enable: on features.cache-invalidation: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas014 tmp]#