Bug 828058
Summary: | cli crash with mixture of 32, 64bit machines | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> | ||||
Component: | cli | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Anush Shetty <ashetty> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | mainline | CC: | blacktear23, gluster-bugs, joe, mailbox, ndevos, nragusa, ujjwala | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-07-24 17:24:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 817967 | ||||||
Attachments: |
|
Description
Pranith Kumar K
2012-06-04 07:17:02 UTC
review available at http://review.gluster.com/3534 (release-3.3) and http://review.gluster.com/3535 (master) Gowri Shankar provided the machine where the crash was seen. I have installed rpms just with my patch. For the same volume/data the crash does not appear anymore. :-) Yes, this does solve the problem I was seeing. Created attachment 591413 [details]
core of the crashed process
Some details from the core:
Core was generated by `gluster'.
Program terminated with signal 11, Segmentation fault.
#0 __strftime_internal (s=0x7fffdcb56040 "\220`\265\334\377\177",
maxsize=256, format=0x431f1c "%Y-%m-%d %H:%M:%S", tp=0x0,
tzset_called=0x7fffdcb55fdf, loc=0x334f98a580) at strftime_l.c:508
508 int hour12 = tp->tm_hour;
(gdb) bt
#0 __strftime_internal (s=0x7fffdcb56040 "\220`\265\334\377\177",
maxsize=256, format=0x431f1c "%Y-%m-%d %H:%M:%S", tp=0x0,
tzset_called=0x7fffdcb55fdf, loc=0x334f98a580) at strftime_l.c:508
#1 0x000000334f6a4026 in __strftime_l (s=<value optimized out>,
maxsize=<value optimized out>, format=<value optimized out>,
tp=<value optimized out>, loc=<value optimized out>) at strftime_l.c:486
#2 0x00000000004169c1 in cmd_heal_volume_brick_out (dict=0x22e25b4, brick=0)
at cli-rpc-ops.c:5836
#3 0x0000000000416d49 in gf_cli3_1_heal_volume_cbk (
req=<value optimized out>, iov=<value optimized out>,
count=<value optimized out>, myframe=<value optimized out>)
at cli-rpc-ops.c:5944
#4 0x0000003350e0f095 in rpc_clnt_handle_reply (clnt=0x22f23e0,
pollin=0x22fddb0) at rpc-clnt.c:788
...
The cmd_heal_volume_brick_out() in frame 2 passes tz=NULL, the return value of localtime() to strftime() and crashes.
(gdb) f 2
#2 0x00000000004169c1 in cmd_heal_volume_brick_out (dict=0x22e25b4, brick=0)
at cli-rpc-ops.c:5836
5836 strftime (timestr, sizeof (timestr),
(gdb) l
5831 ret = dict_get_uint32 (dict, key, &time);
5832 if (!time) {
5833 cli_out ("%s", path);
5834 } else {
5835 tm = localtime ((time_t*)(&time));
5836 strftime (timestr, sizeof (timestr),
5837 "%Y-%m-%d %H:%M:%S", tm);
5838 if (i ==0) {
5839 cli_out ("at path on brick");
5840 cli_out ("-----------------------------------");
Conversion of the time variable (seconds after epoch) should work, unless type-casting interferes. We can check the value of the time variable and see if it is sane:
(gdb) print time
$1 = 1339053312
$ date -d '1 jan 1970 1339053312 sec'
Thu Jun 7 07:15:12 UTC 2012
localtime() returned NULL, but the "time" variable contains a valid date. The patch corrects the type-cast, so it should not be possible to hit this problem anymore.
The patch also changes localtime() to localtime_r(), a race condition is possible if multiple threads call localtime(), localtime_r() prevents this. I think it is unlikely that the possible race condition in localtime() caused this problem, but using localtime_r() is surely a good thing.
Verified with RC2. Created 1000 files when one of the bricks was down in a 2 replica volume. Then brought of the servers up and ran 'gluster volume heal volname full' and 'gluster volume heal volname info healed'. In 3.3.0qa45, glusterd crashed when 'gluster volume heal volname info healed' was run. *** Bug 836421 has been marked as a duplicate of this bug. *** |