Bug 132108
Summary: | non-i386 canna LE causes htt_server SIGABRT, heavy CPU usage | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Zack Cerza <zcerza> | ||||
Component: | im-sdk | Assignee: | Akira TAGOH <tagoh> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | byte, eng-i18n-bugs, ndbecker2, tagoh, wtogami | ||||
Target Milestone: | --- | Keywords: | i18n | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 12.0.1-7.svn1891 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-10-04 17:30:20 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 123268, 125997 | ||||||
Attachments: |
|
Description
Zack Cerza
2004-09-08 21:19:12 UTC
Created attachment 103611 [details]
snipped 'top' output
Pay close attention to the uptime, and htt's 'TIME+' column.
did you see any messages which is related htt on /var/log/messages? This is all I find: [root log]# date Thu Sep 9 11:07:47 EDT 2004 [root log]# egrep \(iii\|htt[.\ ]\) messages Sep 7 12:46:54 tallest iiim: htt shutdown succeeded Sep 7 12:46:54 tallest su(pam_unix)[32593]: session opened for user htt by (uid=0)tallest Sep 7 12:46:55 tallest iiim: htt startup succeeded Sep 7 17:28:39 tallest su(pam_unix)[2946]: session opened for user htt by (uid=0) Sep 7 17:28:39 tallest iiim: htt startup succeeded This does seem to be a 64bit specific problem. htt_server fails to start when loading the canna LE. Here is "htt_server -d" output and backtrace when it reaches that failure. LE(CannaLE) is loading. Path=/usr/lib64/im/leif/ version=1.2 locale= need_thread_lock=true langs=ja, object for CannaLE object_type = 131 object id = 32770 object size = 0 rev. domain name = com.OpenI18N.leif path = ./locale/ja/CannaLE/aux.so scope = CannaLE signature = basepath = encoding = Internal error "Invalid Object Type.": IMBasicObject.cpp (255) Program received signal SIGABRT, Aborted. [Switching to Thread 182895402720 (LWP 6268)] 0x0000003acd12dda1 in raise () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003acd12dda1 in raise () from /lib64/tls/libc.so.6 #1 0x0000003acd12f5be in abort () from /lib64/tls/libc.so.6 #2 0x0000000000443ad6 in convert_od_type (ot=1920169263) at IMBasicObject.cpp:255 #3 0x0000000000444005 in IMObjectWithDesc (this=0x58ee40, desc=@0x583428) at IMBasicObject.cpp:263 #4 0x000000000041f6d9 in LEBase::add_imobjectdesc (this=0x582c80, pol=0x583428) at LE.cpp:77 #5 0x000000000041fe62 in LEBase::loadif (this=0x582c80) at LE.cpp:145 #6 0x00000000004208ab in LEBase (this=0x582c80, x_dirname=@0x5826f8, x_filename=@0x5826f0) at LE.cpp:280 #7 0x0000000000418936 in LEMgr::listup_LEs (this=0x5836d0) at LEMgr.cpp:20 #8 0x000000000041a3f5 in LEMgr (this=0x5836d0, x_lepath=0x5804f8 "/usr/lib64/im/leif", xml=@0x5825d0) at LEMgr.cpp:330 #9 0x0000000000405e76 in IMSvr::config_le (this=0x7fbffff720, lepath=0x5804f8 "/usr/lib64/im/leif", xml=@0x5825d0) at IMSvr.cpp:28 #10 0x000000000040951b in IMSvrCfg::config_le (this=0x7fbffff820, pimsvr=0x7fbffff720, lepath=0x5804f8 "/usr/lib64/im/leif", xml=@0x5825d0) at IMSvrCfg.hh:115 #11 0x000000000040874e in IMSvrArg::configure (this=0x7fbffff820, pimsvr=0x7fbffff720) at IMSvrArg.cpp:189 #12 0x00000000004060ec in IMSvr::start (this=0x7fbffff720) at IMSvr.cpp:78 #13 0x0000000000405a99 in main (argc=2, argv=0x7fbffff998) at main.cpp:44 #14 0x0000003acd11befa in __libc_start_main () from /lib64/tls/libc.so.6 #15 0x00000000004058ba in _start () #16 0x0000007fbffff988 in ?? () #17 0x000000000000001c in ?? () Previous frame inner to this frame (corrupt stack?) The excessive CPU usage that Zack describes is because htt is looping infinitely like the below strace output. htt should GIVE UP after a set number of tries and log the failure rather than loop. --- SIGUSR1 (User defined signal 1) @ 0 (0) --- rt_sigaction(SIGUSR1, {SIG_DFL}, {SIG_IGN}, 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2a9556b810) = 8420 wait4(8420, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGABRT}], 0, NULL) = 8420 --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGUSR1, {SIG_IGN}, {SIG_DFL}, 8) = 0 kill(0, SIGUSR1) I spent a few hours investigating possible causes for this 64bit problem, and could find nothing readily obvious. I compared canna LE to /leif/sun_le_asia/th_TH/leif/le.c to see what canna is doing differently, because iiimf-le-sun-thai similarly add an added object. The only real difference was in memory allocation for the IMObject. If I am understanding the code correctly, it should only need "1" sizeof allocated, but Sun arbitrarily allocates "2" here. objects = (IMObjectDescriptorStruct *) calloc(2, sizeof(IMObjectDescriptorStruct)); The below patch prevents the SIGABRT and infinite looping. This I think is NOT A FIX, but rather an ugly workaround copied from Sun's Thai LE. It may be worth it to find the real cause of this problem. Note that I have not done any runtime testing of actually using IIIMF on x86_64 yet. I suspect there is other 64bit breakage elsewhere. --- im-sdk-r12_0_1-svn1891/leif/canna/CannaLE.c.orig 2004-09-11 00:56:27.859055785 -1000 +++ im-sdk-r12_0_1-svn1891/leif/canna/CannaLE.c 2004-09-11 00:56:40.633274812 -1000 @@ -300,7 +300,7 @@ init_objects() { IMObjectDescriptorStruct *l; - objects = (IMObjectDescriptorStruct *) calloc(1, sizeof (IMObjectDescriptorStruct)); + objects = (IMObjectDescriptorStruct *) calloc(2, sizeof (IMObjectDescriptorStruct)); l = objects; ==17527== Address 0x1B9A0810 is 0 bytes after a block of size 56 alloc'd ==17527== at 0x1B90340D: calloc (vg_replace_malloc.c:176) ==17527== by 0x1B933B77: init_objects (CannaLE.c:303) ==17527== by 0x1B93657B: if_GetIfInfo (CannaLE.c:1746) ==17527== by 0x8081F67: (within /usr/sbin/htt_server) valgrind i386 shows this problem with this original line of code. Thus this is somehow related to the memory allocation problem that cauess trouble on x86_64. CannaLE.c (line 303): objects = (IMObjectDescriptorStruct *) calloc(1, sizeof (IMObjectDescriptorStruct)); MALLOC_CHECK_=3 htt_server -d If you run this in rawhide, it dies in the same place on i386. Please try this on 32-bit systems too. My fresh rawhide install (9/11) on ppc meant that htt_server ate up lots of cpu time. /etc/init.d/iiim stop solves this for me (because I don't need to use it) Seems this happens on non-i386 archs, also exposed by MALLOC_CHECK_=3 on i386. Similar finding in Bug #132396 o_O... ppc64 kernel with ppc userspace iiimf-* behaves identically to i386, contrary to Colin's report in comment #7. Very odd. Ok, well, Warren, your patch is correct. IMObjectDescriptorStruct must be terminated by NULL. and calloc(2, ...) does it. I'll apply your patch for next build. thanks. *** Bug 133765 has been marked as a duplicate of this bug. *** This problem should be fixed in 12.0.1-7.svn1891. however there is another bugs to get working on x86-64. Please check Bug#132940, Bug#132941, Bug#132950 well, in 12.0.1-10.svn1943, htt_server itself should works on even 64bit architectures. please let me know if you still found a problem. It looks fixed, thanks :) |