GPU accelerated image processing for everyone
Authors: Robert Haase, Daniela Vorkel, April 2020
This macro shows how to measure the performance of image processing done by ImageJ in the CPU and by CLIJ2 in the GPU.
First, let’s get test data:
//open("c:/structure/data/t1-head.tif") run("T1 Head (2.4M, 16-bits)"); input = getTitle(); // visualize the center plane run("Duplicate...", "duplicate range=64-64");
We start to measure the processing time of standard ImageJ operations in the central processing unit (CPU).
Note that we repeatedly execute this operation to get some insight about different processing times when
calling the same operation subsequently. Especially, the first execution could be slower, because of the
warm-up effect.
We measure and save the current processing times as a variable time
, before printing (getTime() - time)
.
As an example of image processing, we use the mean filter:
// Local mean filter in the CPU for (i = 1; i <= 10; i++) { // duplicate the original image to avoid blurring the same image again and again selectWindow(input); run("Duplicate...", "duplicate range=1-129"); // actual blur operation time = getTime(); run("Mean 3D...", "x=3 y=3 z=3"); print("CPU mean filter no " + i + " took " + (getTime() - time) + " msec"); // keep the first blurred image and close the duplicates if (i == 1) { blurred_image = getTitle(); } else { close(); } } selectWindow(blurred_image); // visualize the center plane run("Duplicate...", "duplicate range=64-64");
> CPU mean filter no 1 took 2381 msec > CPU mean filter no 2 took 2294 msec > CPU mean filter no 3 took 2407 msec > CPU mean filter no 4 took 2525 msec > CPU mean filter no 5 took 3851 msec > CPU mean filter no 6 took 3915 msec > CPU mean filter no 7 took 3744 msec > CPU mean filter no 8 took 3794 msec > CPU mean filter no 9 took 3800 msec > CPU mean filter no 10 took 3637 msec
As done for the CPU, we repeat the same strategy to measure the processing time in the GPU. As the performance of
GPU-accelerated processing also depends on data transfer time between the CPU and GPU memory,
we consider the taken time for push()
and pull()
commands.
Let’s start with the initialization of the GPU:
run("CLIJ2 Macro Extensions", "cl_device="); Ext.CLIJ2_clear();
time = getTime(); Ext.CLIJ2_push(input); print("Pushing one image to the GPU took " + (getTime() - time) + " msec"); // clean up ImageJ run("Close All");
> Pushing one image to the GPU took 28 msec
Again, we use the mean filter of CLIJ2:
// Local mean filter in the GPU for (i = 1; i <= 10; i++) { time = getTime(); Ext.CLIJ2_mean3DBox(input, blurred, 3, 3, 3); print("CLIJ2 GPU mean filter no " + i + " took " + (getTime() - time) + " msec"); }
> CLIJ2 GPU mean filter no 1 took 62 msec > CLIJ2 GPU mean filter no 2 took 12 msec > CLIJ2 GPU mean filter no 3 took 13 msec > CLIJ2 GPU mean filter no 4 took 15 msec > CLIJ2 GPU mean filter no 5 took 16 msec > CLIJ2 GPU mean filter no 6 took 15 msec > CLIJ2 GPU mean filter no 7 took 14 msec > CLIJ2 GPU mean filter no 8 took 13 msec > CLIJ2 GPU mean filter no 9 took 10 msec > CLIJ2 GPU mean filter no 10 took 10 msec
time = getTime(); Ext.CLIJ2_create3D(structuring_element, 7, 7, 7, 32); Ext.CLIJ2_set(structuring_element, 1. / 7 / 7 / 7); print("Preparing the convolution kernel in GPU memory took " + (getTime() - time) + " msec"); for (i = 1; i <= 10; i++) { time = getTime(); Ext.CLIJ2_convolve(input, structuring_element, blurred); print("CLIJ2 GPU mean filter using convolution no " + i + " took " + (getTime() - time) + " msec"); }
> Preparing the convolution kernel in GPU memory took 5 msec > CLIJ2 GPU mean filter using convolution no 1 took 33 msec > CLIJ2 GPU mean filter using convolution no 2 took 30 msec > CLIJ2 GPU mean filter using convolution no 3 took 29 msec > CLIJ2 GPU mean filter using convolution no 4 took 30 msec > CLIJ2 GPU mean filter using convolution no 5 took 29 msec > CLIJ2 GPU mean filter using convolution no 6 took 30 msec > CLIJ2 GPU mean filter using convolution no 7 took 28 msec > CLIJ2 GPU mean filter using convolution no 8 took 23 msec > CLIJ2 GPU mean filter using convolution no 9 took 23 msec > CLIJ2 GPU mean filter using convolution no 10 took 23 msec
Once more, we use the mean filter, but of CLIJ:
// Local mean filter in the GPU for (i = 1; i <= 10; i++) { time = getTime(); Ext.CLIJ_mean3DBox(input, blurred, 3, 3, 3); print("CLIJ GPU mean filter no " + i + " took " + (getTime() - time) + " msec"); }
> CLIJ GPU mean filter no 1 took 38 msec > CLIJ GPU mean filter no 2 took 9 msec > CLIJ GPU mean filter no 3 took 9 msec > CLIJ GPU mean filter no 4 took 9 msec > CLIJ GPU mean filter no 5 took 8 msec > CLIJ GPU mean filter no 6 took 8 msec > CLIJ GPU mean filter no 7 took 7 msec > CLIJ GPU mean filter no 8 took 8 msec > CLIJ GPU mean filter no 9 took 8 msec > CLIJ GPU mean filter no 10 took 8 msec
time = getTime(); Ext.CLIJ2_pull(blurred); print("Pulling one image from the GPU took " + (getTime() - time) + " msec"); // visualize the center plane run("Duplicate...", "duplicate range=64-64");
> Pulling one image from the GPU took 62 msec
For documentation purpose, we also should report about the used GPU:
Ext.CLIJ2_getGPUProperties(gpu, memory, opencl_version); print("GPU: " + gpu); print("Memory in GB: " + (memory / 1024 / 1024 / 1024) ); print("OpenCL version: " + opencl_version);
> GPU: GeForce RTX 2070 > Memory in GB: 8 > OpenCL version: 1.2
At the end of the macro, clean up:
Ext.CLIJ2_clear();