Parallel Computing With CUDA Extensions (Part 2)

Parallel Computing With CUDA Extensions Part 2

A “kernel” in CUDA terms can be thought of as a series of instructions to be carried out
by a computation unit on a GPU. Basically a regular program.

1. You write kernels / code as if it where regular serial (top to bottom) programs, like
those designed to run on a single thread.

2. You tell the GPU to launch this code, and it will do so on multiple threads. (you can define how many)

But won’t this lead to the same code being executed multiple times?
Yes, it will, but that will be explained later.
If you write code to output “Hello World” and ask the GPU to run this on 50 threads,
you’ll be greeted back 50 times, and it will all happen in parallel.

But let’s say that your program code contains a vector of 50 floats.
For each of those floats you want something done, the same way, as fast as possible..
You tell the GPU to spawn 50 threads to work on this kernel (program code).

Inside the kernel when run on the GPU, each thread can connect to one vector member, and have full control over which thread works with what member of the vector.

Each thread starts doing the work as instructed in the code received from the CPU.

As an example, let’s say that when running the code on a CPU only, the CPU would have to traverse the vector members one by one, do the job needed, and continue on to the next member to do the same.
The total execution time for the task would vary based on how busy the CPU was and other factors, but let’s assume that we have 50 members that each needs 10 ms to be processed.

Easily this would take 50 x 10 ms (500 ms) to complete, as we work in a non-parallel way.
If we compare this to how the execution would be done in a GPU assisted way,
the time needed to process each element might be a bit higher, because of the general fact that the working unit will not be as fast as a regular CPU thread, so let’s say 20 ms per member.

The difference is that because these tasks are all started in parallel, they would finish processing the whole vector of 50 members in just 20 ms compared to the CPU, that would need to use 10 ms x 50 members, giving us 500 ms!

To not loose focus, it might help
to visualize situations in programming that could benefit from being able to do several equal tasks at the same time.

One thing that comes to my mind is in image editing applications. When you have an image consisting of millions of pixels, there will be several thousand pixels that share the same characteristics / properties, like color and brightness.
If you where to write a function to lighten or change the color of all those equal pixels, you’d basically have a job that could benefit from being executed simultaneously, rather than doing the same thing to each pixel in a linear fashion.

Usually, when programming using only the CPU, launching and running threads in parallel is considered an expensive and cumbersome activity.
The whole point of using the GPU as a processing unit for “regular” tasks is that it’s very good at certain things, like these two:

1. Launch a lot of threads (and “a lot” is MANY, think thousands)
2. To actually run these threads in parallel

So GPU’s makes perfect candidates for doing the kind of processing that’s lacking in regular CPU’s.

For those learning about programming, maybe as a student or on their own, I seriously believe that there will be heavy demand for competent C/C++ programmers that knows how to program using GPU assistance soon, and also into the unforeseeable future.

C and C++ might be lower-level than the languages you find most comfortable to use, but the truth is that even though these statically typed compiled languages has experienced a drop in general interest the last ten years, they’re now on the rise again thanks to technologies like this and because of the importance of power consumption / watts per cycle on modern handheld devices.

C++ is the most efficient language to use for low power consumption devices (if done right) compared to any other high-level language in existence today, and many large companies invests huge sums of money to the driving forces behind these languages now.

The future is mobile and the future is (hopefully) green.
To achieve this, we also need to start making software that’s green and environmentally friendly.

I hope this article has made you more interested in learning about GPU assisted processing using tools such as CUDA or OpenCL.

There’s more in the world than an Apple.


Parallel Computing With CUDA Extensions (Part 1)


Parallel Computing With CUDA Extensions (Part 1)

First, let’s see how to rate a CPU in a parallel way of thinking.

Let’s say we have an eight Core Intel CPU.

With eight cores, you can execute 8 operations (Wide AVX vector operations) per core,
and each core has support for running two threads in parallel via Intel “HyperThreading” technology, so you get:

8 cores * 8 operations/core * 2 threads and end up with what’s called
“128-Way Parallelism”

For more about AdvancedVectoreXtentions (AVX) in CPU’s, check this page.

Programming without taking advantage of ANY multithreading / parallel processing
techniques, means that for each program you run, you use

2/128 = 1/64 of your CPU’s total resources (including the automatic “HyperThreading”).

In an ordinary C/C++ program you can only run code that uses the CPU as
the computing resource.
If people really took advantage of their cores and threading capabilities, this would
probably be enough for most regular applications, but for applications that does a lot of
heavy¬†calculations, like video / image processing or 3D graphics it’s way better if you could
offload some of these tasks¬†to the simpler (in terms of instructions), but well capable GPU(‘s) in your machine.

One way to do this is through the use of CUDA extensions.

In this model, the CPU is considered the “HOST” and each GPU is a “DEVICE”
in your system that can be used for doing calculations.
When such a program is compiled, instructions for both the HOST and any DEVICE
is created.
In CUDA the GPU/DEVICE is seen as a “CO-PROCESSOR” to the CPU/HOST.
The processor also assumes that the HOST and DEVICE has access to separate physical
memory where they can store data.
The DEVICE memory is typically a very high-speed block of memory, faster than the one
on the HOST.

The HOST is “In charge” in CUDA and sends messges to the DEVICE telling it what to do.
The HOST keeps track of:

Moving data:
1. From CPU memory -> GPU memory
2. Grom GPU memory -> CPU memory
CUDA’s version of C’s memcpy() is cudaMemcpy()
3. Allocating GPU memory
Again CUDA uses cudaMalloc() instead of malloc()
4. Launch “kernel” on GPU (in CUDA, the HOST launches “kernels” on the DEVICE)

A Typical flow in a CUDA Application would be something like:

1. CPU runs cudaMalloc on GPU
2. CPU copies input data from CPU->GPU with cudaMemcpy
3. CPU launches the transfered “kernels” on GPU (kernel launch)
4. CPU copies results back with cudaMemcpy

So, what is this “Kernel” stuff all about?

Guess we’ll find out in part 2 of this series…

Using the Digia Qt Framework (LGPL) for “commercial” applications

This post is targeted at any developer that needs a good framework to develop true cross platform applications.

With “true cross platform” I mean that you can build for Windows, Apple OS X, Linux, Android and other mobile platforms from the same code sources.

The licensing terms of Qt is the same whether you’re an “indie developer” or a team of developers.

There are many people out there, unsure if they should learn / use the Qt Framework, because of the relatively unclear definitions of the different licensing options.

Many feels that it would be a waste of time if they can’t sell the application in the end, without paying for a rather expensive Qt developer license or get legal problems if they include a certain part of their code.

The aim of this article is to clear up some of that uncertainty.

Qt isn’t just a GUI Framework.
You can use it to create just about any modern application with networking capabilities, 2D / 3D graphics, sound, you name it.

It’s a really nice way to get applications working on multiple devices and platforms, in the shortest time possible, and without resorting to some JavaScript “miracle application” that turns your source code into “working applications” for both desktop and mobile units and costs a fortune.

The user interface part (gui) of a Qt application looks really good and integrates with the platform it’s running on, so you won’t notice any difference from using an application created to run native on that device, like an OS X application written in Objective-C using Cocoa.

Qt isn’t only for C++ developers!
You can (and should) also use an XML like language called QML to create the user interface part of the application.
It’s even possible to create a complete simple application using nothing but QML if you wish.

This opens up for closer collaboration between designers and back-end coders.
It must be said that Qt is mainly a C++ framework, and to get access to the full functionality you need to use C++,
but for the user interface interface part, QML is the way to go for most projects. It saves you many hours of C++ coding.

So, what about the licensing question, man?

Oh, off course…

If the scenario goes something like:

“You’re a single developer or a team of developers that needs to use some functionality of Qt (like QtWidgets) in your application…”

The short answer is:

Relax, learn and use Qt, you can make money on the applications even if you use the LGPL version of the framework.
You need only be aware of two very important “limitations”.

1. Your application must be dynamically linked to the Qt components that came with your downloaded LGPL Qt distribution. No static linking allowed.

2. You can’t make changes to the Qt source code itself and sell your application based on the changed version of Qt.

3. Inform users of your application that Qt is used in the application in some licence text or in a readme somewhere within your distributed application files.

4. Provide a copy of the Qt LGPL licence file together with your distributed application files.

Other than that, you’re free to distribute your application as a commercial closed-source application, using any licensing you may want.

There are also corner cases, such as if you’ve made changes to Qt and those changes has been officially accepted as part of Qt, but this is not the case for most developers.

Hope this cleared things up a bit.

So start learning Qt and happy coding!