5.1 - Stupid-fast performances with GPU operations
Since version 0.6.0
, running operations on the GPU
(Graphics Processing Units) of a computer is possible and - we want to
believe - very easy to implement. GPUs are designed to allow for
parallel pixel processing. This can result in significant gains in
performance for operations that can benefit from massive
parallelization, making them in some cases several orders of magnitude
faster than the same operation on the CPU (Central Processing Unit) of
the computer.
OpenCV
- the computer
vision library on top of which Rvision
is built - can
handle operations on the GPU using two popular frameworks: OpenCL
and Nvidia’s
CUDA
. OpenCL
being open, royalty-free, and
cross-platform, it is available to more users’ machines
(CUDA
is proprietary and restricted to Nvidia
graphics cards). We have, therefore, chosen to use that framework in
Rvision
. This is not to say that we will never consider
adding CUDA
support to Rvision
; we probably
will - time permitting - because it is reportedly faster than
OpenCL
in many instances. We just decided to start with the
framework that is available to more users immediately.
5.2 - Enabling GPU operations in Rvision
By default (and for good reasons too), Rvision
loads
images it receives in the memory of the CPU. Before using
GPU-accelerated functions, the image must first be copied to the GPU
memory. This can be done very easily as follows:
# Find the path to the balloon1.png image provided with Rvision
path_to_image <- system.file("sample_img", "balloon1.png", package = "Rvision")
# Load the image in memory
my_image <- image(filename = path_to_image)
# Copy the image to GPU memory
my_image$toGPU()
Once this is done, Rvision
(OpenCV
really)
will automatically use the GPU-accelerated version of each operation on
the image if it is available. Otherwise, it will default to using the
CPU version. And that’s it, there is nothing more to do to take
advantage of the processing speed of the GPU.
In the case where a function accepts multiple images as arguments (e.g. when using a target image to save the result of an operation on the original image), then it does not matter in most cases if all the images are on the CPU or on the GPU (in the rare cases when it does matter, an error will be thrown and you can adjust your pipeline accordingly). However, better performances will probably be achieved if all the images use the memory of the same processor.
Finally, if you need to copy back the image to the CPU memory
(e.g. if you want to then transfer the image to a
base::array
), you can simply apply the reverse operation as
follows:
# Copy the image back to CPU memory
my_image$fromGPU()
5.3 - Caveats to GPU operations
Nothing comes for free and there are a few caveats to using GPU-accelerated operations.
First, it will only work if your computer has a GPU (even if only a
basic one integrated with the CPU) and that the OpenCL
drivers for the GPU are installed. If OpenCL
is not
available on your system, Rvision
will throw an error when
you use the $toGPU
function. OpenCL
should be
available on most traditional computers (including laptops) with most
operating systems. Check the documentation of your computer and
operating system to figure out whether OpenCL
is
available/installed on your system and how to install it if necessary
(in which case, you will probably need to recompile OpenCV
using the ROpenCVLite::installOpenCV
function).
OpenCL
will probably not be available on shared servers
that do not provide GPU access to their users. Check with your server
administrator if that it the case.
While GPU-accelerated operations can be much faster than their
CPU-based equivalent, not all operations can be efficiently run on a
GPU. Some operations provided by
Rvision
/OpenCV
will actually be slower on the
GPU than the CPU. Moreover, performances will highly depend on the
abilities of your GPU. A basic, integrated GPU will perform a lot slower
than a dedicated graphics card for instance, and in many cases not
faster than the CPU. Test your pipeline carefully to decide whether it
is worth running all or parts of it on the GPU, or to keep everything on
the CPU instead.
Copying an image to/from the GPU comes with a time penalty. It is best if your pipeline avoids performing too many copying operations. Ideally, the GPU/CPU copies should all be created/pre-allocated before starting the pipeline for better performances.
Finally, GPU operations will be slow the first time they are run
during a session. This is because OpenCV
needs to compile
the corresponding functions for the specific graphics device it will use
to run the operations. Therefore, GPU operations should be avoided if
they will only be used a handful of times during a session. They are
better suited for heavily repeated operations, on large volumes of
images where their speed gains can quickly compensate for the time
penalty they incur at the start of the pipeline. Note that a possible
way to mitigate this problem is to do a “warm-up” run of all the
functions at the start of a session, before the pipeline starts.