![]() I ran the tests on my notebook with 64-bit Windows 7 SP1 and CPU Intel Core i3 M380 using a simple batch file off Sum:= CHUNK * 64 // number of summands Writeln('Time elapsed (seconds): ', :3:2) Writeln('Number of concurrent threads: ', Count) Then raise Exception.Create('WaitForMultipleObjects Failed') If WaitForMultipleObjects(64, True, INFINITE) = WAIT_FAILED Threads:= TSumThread.Create(1 + I * CHUNK, CHUNK, Semaphore) Semaphore:= CreateSemaphore(nil, Count, Count, nil) Raise Exception.Create('Invalid number of concurrent threads') Raise Exception.Create('Number of concurrent threads not defined') To run the test I have written a simple console application that receives the number of concurrent threads as command line parameter, and outputs the number of concurrent threads and total execution time in seconds: to increase execution time the calculation is repeated WaitForSingleObject(FSemaphore, INFINITE) Finally I decided to insert an additional loop into the thread function so that the thread execution time exceeded 1 second on my system:Ĭonstructor Create(ABase, ACount: Integer ASemaphore: THandle) Ĭonstructor TSumThread.Create(ABase, ACount: Integer ASemaphore: THandle) To increase the thread execution time I turned optimization off and chosen the Count value as much as possible for the resulting sum to fit into int64 range, but it appeared insufficient. To obtain valid timings it is important that each thread is executed sufficiently long, so that the thread is preempted many times by other contending threads (when the scheduler’s time quantum ends). To find the sum I run 64 threads, each thread calculates a partial sum from 1 + I * Count to (I + 1) * Count, I =. Or, in other words, how much is the performance loss caused by thread context switching.Īs a test problem I have chosen the sum of arithmetic progression: S = 1 + 2 + 3 +… + Count * 64. It is interesting to estimate how efficient semaphore throttle is on real system. There is a simple technique called semaphore throttle which limits the number of active, contending threads. A better approach is to execute threads sequentially on the available CPU cores, to avoid the unnecessary thread context switching. Running the algorithm on the system leads to performance loss because the threads are contend for available CPU cores and cause time-consuming thread context switching. Suppose we have an algorithm that uses N parallel threads, and we have a system with M CPU cores, N > M.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |