Bug 55005

Summary: strdup could be faster
Product: [Retired] Red Hat Linux Reporter: David Baron <dbaron>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Aaron Brown <abrown>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: fweimer
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-10-24 08:51:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strdup_perf.c none

Description David Baron 2001-10-24 08:50:33 UTC
strdup could be made faster.  I found (when comparing some Mozilla string
code, which uses memcpy) that calling strlen, malloc, and memcpy, is faster
than calling strdup.  Probably there's some optimization in memcpy (moving
more bits each instruction?) that's not being used for strdup.  If it were,
strdup would be faster.

I'll attach a testcase that has a my_strdup that beats the libc strdup on
my 1GHz Pentium III processor, with the results below.  The test tests 3
million calls for each on a string a little over 400 characters, although
my_strdup beats strdup for a string of 31 characters (see testcase for the
strings).

strdup:
10630 ms
my_strdup:
6218 ms

Comment 1 David Baron 2001-10-24 08:51:22 UTC
Created attachment 34862 [details]
strdup_perf.c

Comment 2 Jakub Jelinek 2001-10-24 09:54:50 UTC
You know how imprecise that measurement is, especially when the loop is
long enough for reschedule, right?
I've modified the test to measure the minimum amount of ticks for a strdup
resp. my_strdup call during those 3 million invocations, and the real timing
really depends on the actual flags used to compile strdup_perf.c
(especially whether -O2 (resp. -O3) is used or whether -fno-builtin is used).
If optimizing and -fno-builtin is not specified, then gcc itself optimizes
strdup away (but of course not my_strdup), as it knows the length of the string,
so just calls malloc and memcpy (or if memcpy is worth expanding inline, only
calls malloc).
So, I usually get numbers like ~ 550 ticks for builtin strdup and ~ 2600 ticks
for either not builtin strdup or my_strdup.