Bugzilla will be upgraded to version 5.0 on December 2, 2018. The outage period for the upgrade will start at 0:00 UTC and have a duration of 12 hours
Bug 60910 - Scalability: 4-CPU performace degradation
Scalability: 4-CPU performace degradation
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2002-03-08 16:51 EST by S Glukhov
Modified: 2008-08-01 12:22 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-09-30 11:39:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
simple code that demonstrates the problem (1.47 KB, application/octet-stream)
2002-03-08 16:55 EST, S Glukhov
no flags Details

  None (edit)
Description S Glukhov 2002-03-08 16:51:44 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)

Description of problem:
A simple 4-thread code runs significantly slower on a 4-CPU computer than on a 
1-CPU computer. Apparently a scalability problem. When the code runs CPUs are 
mostly in idle state (60%) while the rest is split between user and system. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Take the code 
2. Compile 
3. Run on a 4-CPU computer with 1 thread and 4 threads.
4. Watch CPU states 
5. Compare the results (seconds per iteration)

Actual Results:  STL C++ code with standard alloc

Launching 1 threads
6.1344e-06 sec per iteration

With 4 threads 
9.64e-5 sec per iteration

Expected Results:  maximum 10-20% degradation due to lock contention, not order 
of magnitude.

Additional info:

////  test with: c++  -pthread  PTest.C  -lpthread
////  actual code does nothing but fills a hash map. Memory 
////  consumption is 80-100 M per thread for  NUM_ITERATION=500000

#include <pthread.h>
#include <cstdlib>
#include <unistd.h>
#include <sys/time.h>

#include <map>
#include <string>
#include <utility>
#include <iostream>
#include <iomanip>
#include <bits/stl_pthread_alloc.h>
using namespace  std;

const int NUM_ITERATIONS=500000;

void *f(void *arg)
  using std::cout;
  using std::endl;
  using std::string;
  struct timeval tv_start, tv_end;

//  typedef 
string> > > map_str_int;
  typedef std::map<int,string,std::less<int> > map_str_int;

  map_str_int msi;
  gettimeofday(&tv_start, NULL);
  for(int i = 0; i < NUM_ITERATIONS; i++) msi.insert(std::pair<int, string >
(i, "value"));
  gettimeofday(&tv_end, NULL);
  cout << ( (tv_end.tv_sec - tv_start.tv_sec) + (tv_end.tv_usec - 
tv_start.tv_usec)/1.e6)/NUM_ITERATIONS << " sec per iteration" << endl;
  map_str_int::const_iterator fi;
  if((fi=msi.find(1491)) != msi.end())
    cout << "found 1491 " << fi->first << ';' << fi->second << endl;
  return 0;

main( int argc, char ** argv) {

  int NUM_THREADS = atoi(argv[1]);
  cerr << "Launching " << NUM_THREADS << " threads" << endl;
  void* ret=0;
  pthread_t thread[NUM_THREADS];
  for(int i=0;i<NUM_THREADS;++i)
    pthread_create(&thread[i], NULL, f, (void*)i);
  for(int i=0 ; i < NUM_THREADS; ++i)
    pthread_join(thread[i], &ret);

Comment 1 S Glukhov 2002-03-08 16:55:17 EST
Created attachment 47942 [details]
simple code that demonstrates the problem
Comment 2 Arjan van de Ven 2002-03-19 12:26:38 EST
Welcome to the term "cache line bounces".
On first sight your program has a scalability problem in itself, not the kernel.
If you write to the same memory in 2 separate threads, you'll get the "cache
line bounce" effect, basically every access to it will be a cache miss, which
makes things very slow.
Comment 3 S Glukhov 2002-03-19 12:37:58 EST
The program runs faster on a 2 CPU machine than on 4 CPU machine. THe program 
allocaltes memory for different objects, it does not write to "the same memory" 
whatever this is supposed to mean. The behaviour is basically the same for "per 
thread" allocator as with standard alloc allocators.
Comment 4 Bugzilla owner 2004-09-30 11:39:25 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.