| Summary: | cluster/replicate segfaults on armv5tel. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Hraban Luyat <bubblboy> | ||||
| Component: | replicate | Assignee: | Vikas Gorur <vikas> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | mainline | CC: | anush, gluster-bugs | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | --- | |||||
| Regression: | RTNR | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Hraban Luyat
2009-12-21 03:25:37 UTC
*ouf*, mea culpa, this was totally my fault. I thought the return value of dict_get_ptr was never checked so neither did I, but I overlooked the whole if (ret != 0) pretty much everywhere, in my second patch for bug #762225. All my apologies. This bug is invalid, the other patch as well, I will commit an updated patch asap! also, I mixed up step and next in gdb... that gave it away, eventually. blegh. Hello all,
Yet another error on the armv5tel arch, ye good ole Segfault this time. It has something to do with afr, but it all looks pretty confusing to me. A simple configuration (two local dirs replicated) already yields the error, consistently at the same place in the code. I ran it in GDB and printed some random local variables that looked interesting as the code was approaching certain doom, hopefully it is of some use to anybody.
The error always occurs during the STACK_WIND call on line 2955 of fuse-brigde.c:
2989
2990 dict = dict_new ();
2991 frame = create_frame (this, this->ctx->pool);
2992 frame->root->type = GF_OP_TYPE_FOP_REQUEST;
2993 xl = this->children->xlator;
2994
2995 STACK_WIND (frame, fuse_first_lookup_cbk, xl, xl->fops->lookup,
2996 &loc, dict);
2997 dict_unref (dict);
2998
Another interesting note: in a more complex setup (nufa over two replicated bricks of each two bricks imported over the net, i.e.: 4 transport/tcp imports) the error occurs almost in the same place, but not quite. Here is where it goes awry in that case:
2998
2999 pthread_mutex_lock (&priv->first_call_mutex);
3000 {
3001 while (priv->first_call) {
3002 pthread_cond_wait (&priv->first_call_cond,
3003 &priv->first_call_mutex);
3004 }
3005 }
3006 pthread_mutex_unlock (&priv->first_call_mutex);
3007
The program will consistently segfault during the call to pthread_cond_wait on line 3002, but it passes line 2995 without problems.
Any suggestions / pointers for debugging this further are greatly appreciated; I am pretty much clueless here.
Greetings,
Hraban Luyat
|