New techniques for accelerating the fourth generation of MATLAB
The fourth generation programming languages are much easier to use than third generation languages. But, however, as a result of how the code is interpreted, execution is often slower. Here, Leslie Mehrez looks at new technology designed for accelerating the performance of the MATLAB language.
Third-generation languages are sometimes referred to as high-level languages because they add a layer of abstraction to hard-to-use lower-level languages, such as assembly and machine code. 3GLs are translated into assembly or machine language to execute very quickly. To use 3GLs effectively requires a good deal of programming experience and knowledge.
Fourth-generation languages are much less procedural in nature than 3GLs and consist of statements similar to those in human language. For this reason, 4GLs are typically much easier to use than 3GLs. However, due to the way that 4GL code is interpreted, execution time is often slower than with 3GLs. MATLAB is a 4GL that was developed for engineers and scientists.
Before MATLAB6.5, the MATLAB language was processed in two steps. First, the MATLAB code was converted into a linear stream of p-code, the instruction set that is executed by the MATLAB interpreter. Second, the interpreter executed each p-code instruction in sequence. The execution of each p-code instruction incurred a small amount of overhead. Some p-code instructions were high-level and took much longer to execute than the overhead, so the execution overhead was insignificant. In some cases however, the p-code operation ran very quickly and the interpreter overhead was the majority of the total execution time. The most common example is code that deals with scalar values and for loops.
MATLAB users do not have to declare variables to be of certain data types, as is required with 3GLs. Any variable can be assigned a value of any type, and that type can be changed implicitly at will because of an assignment to a new value of a different type. As a result, the MATLAB interpreter is prepared to deal with the most complicated data types (such as an n-dimensional array of complex doubles) and is capable of performing operations no matter what the actual data types turn out to be at run-time. In previous versions, the p-code specified the most complicated case. As a result, code that operated on scalar values incurred additional overhead in execution time and storage.
The MATLAB JIT-Accelerator, introduced in MATLAB6.5, is a built-in feature that lets users automatically take advantage of increased code execution speed using two methods: Just-In-Time Code Generation and Run-time Type Analysis.
The JIT-Accelerator converts many p-code instructions into native machine instructions. These instructions suffer no interpreter overhead, and therefore run very quickly. In some cases, code generated by the JIT-Accelerator can run several thousand times faster than was possible in prior versions.
There are also some additional optimisations for Intel X86-based Linux and Windows systems. In previous versions, some additional overhead was generated during p-code execution due to the way that MATLAB handled the typing of variables. Run-time type analysis eliminates this overhead, significantly speeding up execution of many p-code operations.
Run-time type analysis is based on the premise that if a line of M-code has been processed before, it is very likely that the variables have the same types and shapes that they had the last time the system saw this line of code. The first time that a line of code is executed, the system examines the variables and generates specific code for the data types and shapes that were found.
Subsequent executions of the line can reuse this code as long as the system verifies that the variable types and sizes have not changed. Since the types rarely change, subsequent executions run as quickly as possible.
The JIT-Accelerator gives the flexibility to run code faster without having to perform vectorisation. This is the process of structuring MATLAB code to work on matrices, serves two purposes: It enables algorithms to be expressed more succinctly and provides a mechanism for improving MATLAB execution speed. Today with the JIT-Accelerator, it is no longer necessary to vectorise code to speed up the execution of many applications. MATLAB users write the MATLAB code that is the most understandable or that best fits their application. The JIT-Accelerator then ensures optimal performance.
Vectorisation can be used if it results in code that is clearer and more concise. Loops processed by the JIT-Accelerator often execute at least as fast as vectorised loops. Programs that are already vectorised may experience minor improvements with the JIT-Accelerator. However, since the internal vectorised functions have already been optimised, acceleration of this code may be modest. The JIT-Accelerator supports data types and array shapes, for loops, conditional statements and array size.
MATLAB accelerates code that uses the data types that are shaded in the class hierarchy diagram below. Both real and complex doubles are accelerated. All arrays are accelerated except sparse arrays and those array shapes with more than three dimensions.
Loops controlled by a for statement execute faster in MATLAB as long as they meet the following conditions; indices of the for loop are set to a range of scalar values.; code in the for loop uses only the supported data types and array shapes; any functions called within the loop are MATLAB built-in functions.
Loop performance is optimal when every line of code in the loop can take advantage of the JIT-Accelerator. When this is the case, MATLAB speeds up execution of the entire loop, including the for and end statements. If this is not the case, then acceleration of the loop is temporarily interrupted on each iteration of the loop.
The conditional statements if, elseif, while, and switch statements execute faster as long as the expression in the statement evaluates to a scalar value.
The execution overhead represents a small per centage of the total execution time when handling large arrays. For small arrays, however, the reverse is true.
The code that benefits most from the JIT-Accelerator Is programs consisting of large, contiguous portions of code that contain only those elements of MATLAB that take advantage of the JIT-Accelerator will show the greatest speed improvement. This is because whenever the interpreter encounters an element not supported by the JIT-Accelerator, it handles the instruction through the non-accelerated interpreter. The more often this happens, the less opportunity there is to speed up the code. For this reason, the most significant performance improvements are achieved in functions and scripts that primarily consist of self-contained loops, particularly loops that make no calls to M-file functions and operate on scalar data.
MATLAB is optimised for vector and matrix math so most of the built-in functions in MATLAB (such as fft, eig, and matrix multiply functions) are already running as fast as possible. As a result, this type of code will not be affected by the JIT-Accelerator.
A Simple example
The following code example demonstrates the optimisations provided by the JIT-Accelerator:
a = 2.5;
b = 3.33;
c = 6.23;
for i = 1:10000000
a = b + c;
b = a - c;
c = b * a;
if(a > b)
a = b - c;
else
b = b + 12.3;
end
end
This piece of code cycles through a loop ten million times. The loop performs some scalar math and a comparison. Tests show this code running approximately 550 times faster in MATLAB 6.5 with the JIT-Accelerator than with MATLAB 6.1. This code was developed with code acceleration in mind, and represents one of the larger improvements that one is likely to experience. However, other examples executed as much as 3400 times faster than with MATLAB 6.1.
Summary
The MATLAB 6.5 JIT-Accelerator is the first step towards the MathWorks ultimate goal of eliminating the performance difference between MATLAB and 3GLs. Without any difference in performance, users can skip the additional work of recoding their programs in C, for example. With MATLAB 6.5, programs that include a lot of scalar math and loops for iterative numeric algorithms will experience the greatest performance benefits. While it is still the case that MATLAB functions operate on matrices quickly, it is no longer necessary to vectorise code to achieve optimum speed.
The JIT-Accelerator enables MATLAB users to write programs in a style that is both comfortable and fits the application at hand. With its flexible development environment, intuitive language, and the performance optimisation of the JIT-Accelerator, MATLAB 6.5 is a compelling alternative to 3GLs for technical computing applications.
This program scans two sorted input vectors and finds the elements that are common to both. It returns the indices of these common elements. There are two versions of this program: Program 1 processes the vectors using a while loop. Program 2 replaces the loop with vectorised code. When run on MATLAB without acceleration, there is a big difference in performance between the two. When run with acceleration, there is no significant difference.
function [aIndex, bIndex] = vfind_scalar(avec, bvec)
avecLen = length(avec);
bvecLen = length(bvec);
% Size aIndex and bIndex to be large enough
outlen = min(avecLen, bvecLen);
aIndex = zeros(outlen,1);
bIndex = zeros(outlen,1);
n = 0;
ai = 1;
bi = 1;
while (ai <= avecLen || bi <= bvecLen)
% Get vector elements at indices ai and bi
A = avec(ai);
B = bvec(bi);
per cent If equal, record indices where elements match
if A == B
n = n + 1;
aIndex(n) = ai;
bIndex(n) = bi;
end
% Advance index of avec, when appropriate
if A <= B
if ai < avecLen
ai = ai + 1;
else
break;
end
end
% Advance index of bvec, when appropriate
if A >= B
if bi < bvecLen
bi = bi + 1;
else
break;
end
end
end
% Snip aIndex and bIndex to correct size
aIndex = aIndex(1:n);
bIndex = bIndex(1:n);
In preparation for running the program, you must
create two sorted vectors that have some common
elements. The following statements create vectors a and
b, append the elements from a third vector, c, to both,
and then sort. This gives vectors a and b at least 20 000
common elements.
a = rand(200000,1);
b = rand(260000,1);
c = rand(20000,1);
a = sort([a;c]);
b = sort([b;c]);
Now pass a and b into the function shown above. Use the tic and toc functions to track how much time it takes to execute.
tic; [ia, ib] = vfind_scalar(a, b); toc
What makes it faster? MATLAB accelerates every line in this program. The code in the while loop matters most, since this takes up nearly all the program execution time. Consider the following factors:
* Supported Data Types and Array Shapes. All the code in the program operates on vectors of type double. This is one of the data types and array shapes that supports acceleration.
* Conditional Expression Evaluates to Scalar. The expressions in the while and if statements all evaluate to scalar values. For example, while (ai <= avecLen || bi <= bvecLen)
* No Disqualifying Statements in Loop. Every line of code in the while loop qualifies for JIT-acceleration. This means that every iteration of the loop can execute at a higher speed without being interrupted to separately process any disqualifying lines.
* Function Calls and Overloading. The only functions called are MATLAB built-ins. No M-file or MEX-file functions are called, and no operations are overloaded for the data types being used.
Leslie Mehrez is MATLAB applications manager at Mathworks.