1. Definition

Here is the definition for multithreading (From Wikipedia):

In computer science, a **thread of execution** is the smallest unit of processing that can be scheduled by anoperating system. It generally results from a fork of acomputer program into two or more concurrently runningtasks.

On a single processor, **multithreading** generally occurs by time-division multiplexing (as inmultitasking): the processor switches between different threads. On a multiprocessor or multi-core system, the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task.

The difference between multithreading and parallel computing has been vague. Different people have different understandings.

Here (http://www.danielmoth.com/Blog/threadingconcurrency-vs-parallelism.aspx), parallel computing is just for multi-core/cpu system.

Here (http://stackoverflow.com/questions/1740304/difference-between-multithreaded-and-parallel-programming), it says “multithreading is a *type* of parallelism, so parallel programming would include multithreading. Other types of parallelism include multiprocessing (where you use multiple processes that communicate with each other) and distributed computing (where tasks run on different computers).”

Recently when people talk about parallel computing, they usually refer to using muti-core cpu or multiple computers or GPU. But generally speaking, I would rather use the latter one, since people have been using the terminology “parallel” for a long time in a lot of cases, such as parallel port vs serial port. Time division in optical communication is also common, which I think is also considered parallelism.

Here (http://zone.ni.com/devzone/cda/tut/p/id/6424) explained the difference between multithreading and multitasking.

2. Multithreading in matlab

http://www.mathworks.com/support/solutions/en/data/1-WNXI8/index.html?solution=1-WNXI8

Prior to MATLAB 7.4 (R2007a) MATLAB was a single threaded application. So on a hyperthreaded dual processor machine which runs 4 threads, two per CPU, MATLAB will take only 25% of CPU time corresponding to 1 thread.

As of MATLAB 7.4 (R2007a), MATLAB can be multi-threaded for certain applications.

http://www.mathworks.com/support/solutions/en/data/1-46OY0H/index.html?solution=1-46OY0H

Multithreaded computations, introduced in MATLAB R2007a, are turned on by default as of MATLAB R2008a release.

Another words, prior to matlab 7.4, there is no multithreading. Even if you have multi-core computer, it doesn’t help. While for higher version matlabs, multithreading are embedded, so we can benefit from multi-core computer.

I actually found some people made mex files using another language like c++ or java to achieve multithreading in Matlab. For example: http://www.instructables.com/id/Matlab-Multithreading-EASY/.

I was trying to compile it. I tried in Ubuntu with matlab 2010a, and in windows with matlab 2010b. I failed and got the errors as other people got:

mex mythreadprog.cpp

Error mythreadprog.cpp: 11 syntax error; found `MyThread’ expecting `;’

Error mythreadprog.cpp: 19 redeclaration of `mexFunction’ previously declared at .\mex.h 135

Warning mythreadprog.cpp: 23 assignment of pointer to void function(pointer to void) to pointer to int function(pointer to void)

Error mythreadprog.cpp: 33 syntax error; found `MyThread’ expecting `;’

Warning mythreadprog.cpp: 58 missing return value

Error mythreadprog.cpp: 33 undefined size for `void cdecl’

4 errors, 2 warnings

C:\PROGRA~1\MATLAB\R2007B\BIN\MEX.PL: Error: Compile of ‘mythreadprog.cpp’ failed.

Another multithreading example compiled from c is here:

http://robertoostenveld.ruhosting.nl/index.php/multithreading-matlab-mex/

3. Vectorization

Matlab is a matrix language. Vectors and Matrices are used in matlab to make computing faster and improve performance. You can find more details about the techniques improving performance in Matlab (http://www.mathworks.com/help/techdoc/matlab_prog/f8-784135.html and http://www.mathworks.com/support/solutions/en/data/1-15NM7/?solution=1-15NM7).

As I understand, vectorization is not parallel computing. Vectorization is used to improve performance due to higher memory efficiency. But of course, vector and matrix computing can also use parallel computing.

4. Parallel computing in Matlab

A great article can be found here: https://docs.google.com/viewer?url=http%3A%2F%2Fweb.eecs.utk.edu%2F~luszczek%2Fpubs%2Fparallelmatlab.pdf

The parallel computing toolbox was actually embedded in matlab since R2006b. But multithreading is built-in in recent versions. It seems the multithreading is built for multi-core computers. We won’t benefit from multithreading using a single-core computer. For more information about parallel computing in Matlab, check: http://www.mathworks.com/products/parallel-computing/, “*Parallel Computing Toolbox™ lets you solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters*.”, http://www.mathworks.com/help/toolbox/distcomp/f3-6010.html and http://www.mathworks.com/help/toolbox/optim/ug/briutum.html.

More Links for parallel computing matlab:

http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html

http://www.mathworks.com/products/distriben/

http://www.mathworks.com/support/product/DM/installation/ver_current/

Even though R2006b was single-threaded, people can benefit from the parallel computing toolbox with network cluster computers and distributed arrays whichhas better memory efficient usage. “each lab has to store and process means a more efficient use of memory and faster processing”.

“Parallel for-loops let you run a for-loop across your labs simultaneously.” One thing I don’t understand is that when I tried Parallel for-loops on R2006b, I can only have one lab, but parfor still runs faster than for loop.

5. Some test results

Note: Factors That Affect Speed (http://www.mathworks.com/help/toolbox/optim/ug/briutum.html)

a. Parallel overhead. There is overhead in calling `parfor` instead of `for`. If function evaluations are fast, this overhead could become appreciable. In particular, solving a problem in parallel can be slower than solving the problem serially.

b. No nested `parfor` loops. When executing serially, `parfor` loops run slower than `for` loops. Therefore, for best performance, ensure that only your outermost parallel loop calls `parfor`.

Here are some testing codes and results using Matlab R2010b in Windows XP.

**Codes:**

%%%%%%% 1 %%%%%%%

clear all

M=zeros(1,100000);

tic

for i=1:100000

M(i)=i;

end

toc

%%%%%%%% 2 %%%%%%%%%%%%

clear all

tic

for i=1:100000

M(i)=i;

end

toc

%%%%%%%%%3 %%%%%%%%%%%

clear all

M=zeros(1,100000);

tic

parfor i=1:100000

M(i)=i;

end

toc

%%%%%%%% 4 %%%%%%%%%%%%

clear all

tic

parfor i=1:100000

M(i)=i;

end

toc

%%%%%%%% 5 %%%%%%%%%%%%

clear all

tic

M = 1:100000;

toc

%%%%%%%%% 6 %%%%%%%%%%%

clear all

matlabpool open 2

tic

parfor i=1:100000

M(i)=i;

end

toc

matlabpool close

%%%%%%%%% 7 %%%%%%%%%%%

clear all

matlabpool open 2

tic

parfor i=1:100

fprintf(‘labindex = %d\n’, labindex);

M(i)=i;

end

toc

matlabpool close

%%%%%%%%% 8 %%%%%%%%%%%

clear all

matlabpool open 2

tic

spmd

labindex

end

tocmatlabpool close

%%%%%%%%%%%%%%%%%%%%

**Results:**

1) regular for loop using pre-allocation:

Elapsed time is 0.000813 seconds.

2) regular for loop without pre-allocation:

Elapsed time is 20.986812 seconds.

3) parfor-loop with pre-allocation

Elapsed time is 0.157174 seconds.

4) parfor-loop without pre-allocation

Elapsed time is 0.157019 seconds.

5) using vector

Elapsed time is 0.000505 seconds.

6) open a work pool of size 2 (the most I can create with my computer)

Elapsed time is 0.261995 seconds.

7) labindex = 1

8 ) Lab 1: 1, Lab 2: 2

In R2006, 4) is also faster than 2). Like I said, I don’t know the exact answer. Here, 4) is faster than 2) too in Matlab 2010b, without opening multiple worker labs. Basically I don’t how parfor-loop works. It’s called parallel simoutaneously. Different iterations can be run at the same time, and the order is not important. But if there’s only a single thread (R2006b), is it using multi-core of CPU? How does it do that if I don’t open multiple work labs using matlabpool. If it’s using a single core and single thread, how is the “parallelism” done? It seems the answer has to be “a cpu-core can be used without creating a worker lab.” But when I checked the processes for 1) through 4), only one matlab process is running which takes 50% cpu resources.

3) and 4) are about the same and a little slower than 1). This may be because the parfor parallel processing takes about the same time as pre-allocation in 1), but then parfor loop has its overhead added, which make it a little slower.

In 6) When matlabpool open is used, the parfor-loop even takes longer. With or without pre-allocation are about the same. And 7) shows all parfor loops are performed in lab 1. Lab 2 is not used. I don’t understand wny lab 2 is not used either. Another thing is that I have noticed in 6) and 7), there are totally 3 matlab processes, one of them is about 0%, each of the rest two takes about 50%. Another words, when two worker labs are created, two cpu cores are used. In total it takes 100% CPU. But when I show the labindex for each parfor loop, the labindex is always 1. And using two cores is not faster than just using one core. It seems one core is working for nothing. I tied to change in 6) from “parfor i=1:100000” to “parfor (i=100000,1)”, it takes about the same time.

It seems I’m missing some point here.

Using spmd in 8 ) seems to have two worker working for sure.

Filed under: Computer, Matlab, Programming | 9 Comments »