CLIJ2

Logo

GPU accelerated image processing for everyone

CLIJ2 home

Measure speedup (Benchmarking)

Authors: Robert Haase, Daniela Vorkel, April 2020

Source

This macro shows how to measure the performance of image processing done by ImageJ in the CPU and by CLIJ2 in the GPU.

First, let’s get test data:

//open("c:/structure/data/t1-head.tif")
run("T1 Head (2.4M, 16-bits)");
input = getTitle();

// visualize the center plane
run("Duplicate...", "duplicate range=64-64");

t1-head.tif t1-head-1.tif

Measure processing time in the CPU

We start to measure the processing time of standard ImageJ operations in the central processing unit (CPU). Note that we repeatedly execute this operation to get some insight about different processing times when calling the same operation subsequently. Especially, the first execution could be slower, because of the warm-up effect. We measure and save the current processing times as a variable time, before printing (getTime() - time). As an example of image processing, we use the mean filter:


// Local mean filter in the CPU
for (i = 1; i <= 10; i++) {
	// duplicate the original image to avoid blurring the same image again and again
	selectWindow(input);
	run("Duplicate...", "duplicate range=1-129");
	
	// actual blur operation
	time = getTime();
	run("Mean 3D...", "x=3 y=3 z=3");
	print("CPU mean filter no " + i + " took " + (getTime() - time) + " msec");
	
	// keep the first blurred image and close the duplicates
	if (i == 1) {
		blurred_image = getTitle();
	} else {
		close();
	}
}
selectWindow(blurred_image);

// visualize the center plane
run("Duplicate...", "duplicate range=64-64");
> CPU mean filter no 1 took 2381 msec
> CPU mean filter no 2 took 2294 msec
> CPU mean filter no 3 took 2407 msec
> CPU mean filter no 4 took 2525 msec
> CPU mean filter no 5 took 3851 msec
> CPU mean filter no 6 took 3915 msec
> CPU mean filter no 7 took 3744 msec
> CPU mean filter no 8 took 3794 msec
> CPU mean filter no 9 took 3800 msec
> CPU mean filter no 10 took 3637 msec

t1-head-2.tif t1-head-3.tif

Measure of processing time in the GPU

As done for the CPU, we repeat the same strategy to measure the processing time in the GPU. As the performance of GPU-accelerated processing also depends on data transfer time between the CPU and GPU memory, we consider the taken time for push() and pull() commands.

Let’s start with the initialization of the GPU:

run("CLIJ2 Macro Extensions", "cl_device=");
Ext.CLIJ2_clear();

Push images to GPU

time = getTime();
Ext.CLIJ2_push(input);
print("Pushing one image to the GPU took " + (getTime() - time) + " msec");

// clean up ImageJ
run("Close All");

> Pushing one image to the GPU took 28 msec

Process images in the GPU using CLIJ2

Again, we use the mean filter of CLIJ2:

// Local mean filter in the GPU
for (i = 1; i <= 10; i++) {
	time = getTime();
	Ext.CLIJ2_mean3DBox(input, blurred, 3, 3, 3);
	print("CLIJ2 GPU mean filter no " + i + " took " + (getTime() - time) + " msec");
}

> CLIJ2 GPU mean filter no 1 took 62 msec
> CLIJ2 GPU mean filter no 2 took 12 msec
> CLIJ2 GPU mean filter no 3 took 13 msec
> CLIJ2 GPU mean filter no 4 took 15 msec
> CLIJ2 GPU mean filter no 5 took 16 msec
> CLIJ2 GPU mean filter no 6 took 15 msec
> CLIJ2 GPU mean filter no 7 took 14 msec
> CLIJ2 GPU mean filter no 8 took 13 msec
> CLIJ2 GPU mean filter no 9 took 10 msec
> CLIJ2 GPU mean filter no 10 took 10 msec

Apply the mean filter using convolution using CLIJ2

time = getTime();
Ext.CLIJ2_create3D(structuring_element, 7, 7, 7, 32);
Ext.CLIJ2_set(structuring_element, 1. / 7 / 7 / 7);
print("Preparing the convolution kernel in GPU memory took " + (getTime() - time) + " msec");

for (i = 1; i <= 10; i++) {
	time = getTime();
	Ext.CLIJ2_convolve(input, structuring_element, blurred);
	print("CLIJ2 GPU mean filter using convolution no " + i + " took " + (getTime() - time) + " msec");
}

> Preparing the convolution kernel in GPU memory took 5 msec
> CLIJ2 GPU mean filter using convolution no 1 took 33 msec
> CLIJ2 GPU mean filter using convolution no 2 took 30 msec
> CLIJ2 GPU mean filter using convolution no 3 took 29 msec
> CLIJ2 GPU mean filter using convolution no 4 took 30 msec
> CLIJ2 GPU mean filter using convolution no 5 took 29 msec
> CLIJ2 GPU mean filter using convolution no 6 took 30 msec
> CLIJ2 GPU mean filter using convolution no 7 took 28 msec
> CLIJ2 GPU mean filter using convolution no 8 took 23 msec
> CLIJ2 GPU mean filter using convolution no 9 took 23 msec
> CLIJ2 GPU mean filter using convolution no 10 took 23 msec

Compare CLIJ2 with its predecessor: CLIJ

Once more, we use the mean filter, but of CLIJ:

// Local mean filter in the GPU
for (i = 1; i <= 10; i++) {
	time = getTime();
	Ext.CLIJ_mean3DBox(input, blurred, 3, 3, 3);
	print("CLIJ GPU mean filter no " + i + " took " + (getTime() - time) + " msec");
}
> CLIJ GPU mean filter no 1 took 38 msec
> CLIJ GPU mean filter no 2 took 9 msec
> CLIJ GPU mean filter no 3 took 9 msec
> CLIJ GPU mean filter no 4 took 9 msec
> CLIJ GPU mean filter no 5 took 8 msec
> CLIJ GPU mean filter no 6 took 8 msec
> CLIJ GPU mean filter no 7 took 7 msec
> CLIJ GPU mean filter no 8 took 8 msec
> CLIJ GPU mean filter no 9 took 8 msec
> CLIJ GPU mean filter no 10 took 8 msec

Pull a result image from the GPU


time = getTime();
Ext.CLIJ2_pull(blurred);

print("Pulling one image from the GPU took " + (getTime() - time) + " msec");

// visualize the center plane
run("Duplicate...", "duplicate range=64-64");

> Pulling one image from the GPU took 62 msec

CLIJ2_mean3DBox_result1 CLIJ2_mean3DBox_result1-1

For documentation purpose, we also should report about the used GPU:

Ext.CLIJ2_getGPUProperties(gpu, memory, opencl_version);
print("GPU: " + gpu);
print("Memory in GB: " + (memory / 1024 / 1024 / 1024) );
print("OpenCL version: " + opencl_version);

> GPU: GeForce RTX 2070
> Memory in GB: 8
> OpenCL version: 1.2

At the end of the macro, clean up:

Ext.CLIJ2_clear();