Larrabee - Not a GPU?

CUDA, Programming No Comments »

Larrabee, the future computing platform for hardware accelerated programming was planned to be released as a kind of graphics card. However, due to delays in the development of the product Intel decided to change plans and not to release a Larrabee graphic cardRead the rest of this entry »

NVIDIA Goes OpenCL - First Driver Available in Early-Access Program

CUDA, Programming No Comments »

Yesterday, NVIDIA announced, to release an OpenCL (Open Computing Language) driver and software development kit (SDK) to developers participating in its OpenCL  Early Access Program. NVIDIA is providing this release to solicit early feedback in advance of a beta release which will be made available to all GPU Computing Registered Developers in the coming months.

OpenCL is being created by the Khronos Group with the participation of many industry-leading companies and institutions including AMD, Apple, Broadcom, IBM, Intel,  NVIDIA and many more. OpenCL aims to be the first royalty-free standard for general-purpose parallel programming of heterogeneous systems. It provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.

Developers can apply to become a GPU Computing Registered Developer at:  www.nvidia.com/opencl. I registered for it. So look forward to some interesting OpenCL code in a future post.

A first look at Larrabee

Programming No Comments »

The invent of CUDA offered a new flexibility for scientific programming of the graphic card without the need to learn special graphic languages. However, getting your stuff to work with CUDA is not always that easy due to compiler limitations. For porting old code, you are most likely required to write complete parts new.

Intel announced already some time ago a new GPU code-named Larrabee. It is likely that Intel targets also the scientific programming community, as the platform has some nice benefits:

  1. It is x86 compatible and your code will run with a simple recompilation.
  2. Full support for OpenMP or Intel TBB.
  3. Enhanced 512-bit vector processing units.

However, yet no hardware prototypes are available but writing code is already possible as Intel  provided some C++ implementation of the new Larrabee instruction set extension called LRBni. First prototypes are expected for the end of 2009.

First Medical Experiences at the Fully3D?

At this years Fully3D conference a high-performance workshop will take place. According to the topic list, Larrabee seems to be of some importance. Maybe we will see first prototype applications in Beijing 2009.

For those who don’t want to wait until then, check out these links for more information:

Fast Uniform Cubic B-Spline Evaluation

CUDA, Programming 3 Comments »

My current work includes the evaluation of a parametric motion field which is based on cubic B-splines. Already some time ago I ported the straight forward B-spline evaluation on the graphic card using CUDA. However, it required the evaluation of multiple nested for loops and took even on the graphic card some time.

So the quest of today was: Find a fast way for cubic B-spline evaluation using CUDA!

Of course Google helped me out and pointed meto the following papers:

Daniel Ruijters, Bart M. ter Haar Romeny, and Paul Suetens, “ Accuracy of GPU-based B-Spline Evaluation,” In Proc. Tenth IASTED International Conference on Computer Graphics and Imaging (CGIM), Innsbruck, Austria, pp. 117-122, February 13-15, 2008.

Christian Sigg and Markus Hadwiger, “ Fast Third-Order Texture Filtering,” In GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Matt Pharr (ed.), Addison-Wesley; chapter 20, pp. 313-329, 2005.

They showed that a fast and accurate evaluation of B-splines is possible by replacing the nearest neighbor lookups during the B-spline evaulation by linear interpolation which is hard-wired on the GPU.

Finally the best of all: Daniel Ruijters provides some clean CUDA code for download on a website. It helped me out very well. In overall I gained a speed-up factor of 7 in comparison to my first naive implementation!

Design by j david macor.com.Original WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in