Larrabee - Not a GPU?

CUDA, Programming No Comments »

Larrabee, the future computing platform for hardware accelerated programming was planned to be released as a kind of graphics card. However, due to delays in the development of the product Intel decided to change plans and not to release a Larrabee graphic cardRead the rest of this entry »

NVIDIA Goes OpenCL - First Driver Available in Early-Access Program

CUDA, Programming No Comments »

Yesterday, NVIDIA announced, to release an OpenCL (Open Computing Language) driver and software development kit (SDK) to developers participating in its OpenCL  Early Access Program. NVIDIA is providing this release to solicit early feedback in advance of a beta release which will be made available to all GPU Computing Registered Developers in the coming months.

OpenCL is being created by the Khronos Group with the participation of many industry-leading companies and institutions including AMD, Apple, Broadcom, IBM, Intel,  NVIDIA and many more. OpenCL aims to be the first royalty-free standard for general-purpose parallel programming of heterogeneous systems. It provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.

Developers can apply to become a GPU Computing Registered Developer at:  www.nvidia.com/opencl. I registered for it. So look forward to some interesting OpenCL code in a future post.

A first look at Larrabee

Programming No Comments »

The invent of CUDA offered a new flexibility for scientific programming of the graphic card without the need to learn special graphic languages. However, getting your stuff to work with CUDA is not always that easy due to compiler limitations. For porting old code, you are most likely required to write complete parts new.

Intel announced already some time ago a new GPU code-named Larrabee. It is likely that Intel targets also the scientific programming community, as the platform has some nice benefits:

  1. It is x86 compatible and your code will run with a simple recompilation.
  2. Full support for OpenMP or Intel TBB.
  3. Enhanced 512-bit vector processing units.

However, yet no hardware prototypes are available but writing code is already possible as Intel  provided some C++ implementation of the new Larrabee instruction set extension called LRBni. First prototypes are expected for the end of 2009.

First Medical Experiences at the Fully3D?

At this years Fully3D conference a high-performance workshop will take place. According to the topic list, Larrabee seems to be of some importance. Maybe we will see first prototype applications in Beijing 2009.

For those who don’t want to wait until then, check out these links for more information:

Discrete random element selection based on a user-specified probability

Matlab, Programming, Statistics No Comments »

Yesterday, an algorithmic idea came into my mind which required a specific component:

  • Given: Set of discrete delements E(i). Likelihood P(i) for the occurence of an element E(i).
  • Goal: A function which returns a random element from E based on the distribution described by P.

What is this useful for? Well, I wanted to implement some kind of stochastic gradient descent algorithm for image registration without gradients but given a certain likelihood for a point to increase the cost function value. However, I think this kind of function comes in handy for many situations.

Matlab Code & Example

I wrote a Matlab function randelement() which solves exactly the problem. It is based on the theory of inverse transform sampling. The link to the download is given below. It is well-documented, that’s why I will only provide a little usage example:

  1. %  select some arbitrary discrete points
  2. E = [-2 0 2 4 6];
  3.  
  4. % select likelihoods for each point
  5. P = [1 0.5 2 0.1 0.5];
  6.  
  7. % get a long vector with elements from P
  8. % distributed according to P
  9. R=randelement(E, [100000 1], P);
  10.  
  11. % verify by looking at the histogram

The histogram will then look somehow similar to this:

histogram

So … this function seems to do what we wanted - I like it!
The latest version of randelement.m can be downloaded here.

C++ 1-Liner: round() your numbers

C++, Programming No Comments »

You know what really hurts? A missing round() function if you really need it. Here is a code snippet for all of those with the same problem.

  1. template <class T>
  2. inline T round(float num)
  3. {
  4.    return static_cast<T>((num>0.0f) ? num+0.5f : num-0.5f);
  5. }

CUDA 2.1: NVIDIA Releases Notebook Beta Driver

CUDA No Comments »

Getting CUDA to run on your notebook is not always an easy task even if you have a CUDA enabled graphic card. In my Laptop there is a NVIDIA Quadro FX 570M (256 MB). However, the laptops require you to have a vendor-specific driver otherwise you get the nice error message “no supported hardware detected” when trying to install the latest CUDA driver.

However, NVIDIA seems to realize that this situation is not amusing to any Laptop-Programmer (like me). So yesterday I found on the download site the following item:

Beta Notebook Driver for Developers (181.22)

So downloaded, installed, and same error as with the other drivers. Well, at least it is a step into the right direction.  Maybe it works for others.

Standard alternative

However, there is still the option to install a modified driver for your notebook. This worked for me but is not the clean solution. Those drivers can be found at http://www.laptopvideo2go.com/

Fast Uniform Cubic B-Spline Evaluation

CUDA, Programming 1 Comment »

My current work includes the evaluation of a parametric motion field which is based on cubic B-splines. Already some time ago I ported the straight forward B-spline evaluation on the graphic card using CUDA. However, it required the evaluation of multiple nested for loops and took even on the graphic card some time.

So the quest of today was: Find a fast way for cubic B-spline evaluation using CUDA!

Of course Google helped me out and pointed meto the following papers:

Daniel Ruijters, Bart M. ter Haar Romeny, and Paul Suetens, “ Accuracy of GPU-based B-Spline Evaluation,” In Proc. Tenth IASTED International Conference on Computer Graphics and Imaging (CGIM), Innsbruck, Austria, pp. 117-122, February 13-15, 2008.

Christian Sigg and Markus Hadwiger, “ Fast Third-Order Texture Filtering,” In GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Matt Pharr (ed.), Addison-Wesley; chapter 20, pp. 313-329, 2005.

They showed that a fast and accurate evaluation of B-splines is possible by replacing the nearest neighbor lookups during the B-spline evaulation by linear interpolation which is hard-wired on the GPU.

Finally the best of all: Daniel Ruijters provides some clean CUDA code for download on a website. It helped me out very well. In overall I gained a speed-up factor of 7 in comparison to my first naive implementation!

Multiline Macro Compilation Errors

C++, Programming No Comments »

Sometimes you just need to write macros in your C/C++ code. And sometimes, they get long and you want to split them on multiple lines for reasons of readibility.

I admit, that I only seldomly use multiline macros. But today something new came to my mind. I created a simple multiline macro, e.g. something similar to this:

  1. #define MYMACRO(a, b) {  
  2.                      a = a+b;  
  3.                      b = b+a; }

Nothing difficult or wrong on the first sight! Right? Okay but I got hundreds of compilation errors!
After trying around some time I noticed that it is not allowed to put a space behind the slash, i.e. ” is okay and ‘ ‘ is not.

So if you encounter the same error - welcome to the club …. tststs one never stops learning :-)

Design by j david macor.com.Original WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in