2011年1月16日 星期日

Graphics processing unit architecture (revised)

Graphics processing unit architecture

Abstract

Graphic Processing Units (GPU) is an important processor of a computer. It is a micro processor and is also one of the main processing units in the architecture of computing graphics display. Since the architecture of GPU is built with parallel concept and vector processing units, therefore, apart from dealing with graphics display, it can also used in a very efficient way for researching or processing the algorithms which involves huge amount of mathematical calculation, such as floating point and large scale of matrix or vector operations.


Roles of GPU

The initial reason for inventing the GPU is to share the burden of the main system resources (such as the Central Processing Unit (CPU), Main Memory, and the System Bus), so the graphical operation and I/O requests are then process effectively. This is done by separating dedicated graphics resources including a graphics processor and memory.

In addition, another goal of GPU is to enable the representation of 3D world on the computer. Therefore, some of the GPUs are customized and so providing extra computational power to perform the 3D tasks.

Compare to CPU, GPU is tailored for highly parallel operation which has many parallel execution units. Besides, GPU have more advanced memory interface and much deeper pipelines which provides significantly faster data transmission.



The classic Architecture of GPU  - Nvidia 6800 Series

An Nvidia GeForce 6800 Series architecture will be introduced for more easier to know about the basis of GPU architecture concept. The following diagram shows the structure of a 6800 GPU:

GeForce 6 Series GPU Block Diagram

The list below is showing the graphic processing procedure:
1. Vertex Shader

   The Vertex Shader is a programmable processing function. It mainly used for 3D environment by adding effects onto the objects. 

   The purpose is to transform each vertex’s 3D position of a object (a graphic) into the 2D coordinate so it then able to display on the computing screen. In addition, Vertex Shaders can manipulate the position, texture coordinate and color, but note that it cannot create new vertices.

2. Geometry Shader

Geometry Shader is a shader program model which generates new graphics primitives like triangles, lines and points.

Geometry shader programs are executed after vertex shaders. It takes the whole rasterized primitive as input and pass the object fragments to the Pixel Shader.

3. Pixel Sahder
   A Pixel Shader is a computation kernel function that computes color and other attributes of each pixel. Pixel Shaders range from always outputting the same color, to applying a lighting value, to doing bump mapping, shadows, specula highlights, translucency and other phenomena.

Working Cycle  - The Pipeline technology in GPU

The GPU receives geometry information (3D data) from the CPU as an input and provides the picture/graphic (2D data) as an output. The process can be simply classify into three stages, they are: Application Stage, Geometry Stage and Rasterization Stage.

In the first Application Stage, the host interface is the communication bridge between the CPU and the GPU. It receives commands from the CPU and also the geometry information from the system memory. It then outputs a stream of vertices in object space with all their associated information (texture coordinates per vertex color etc.).

The next Geometry Stage, it receives vertices from the host interface in object space and outputs them in screen space.

In order to translate the scene from 3D data to 2D data, all the objects of a scene have to transform into various spaces, each with its own coordinate system before the 3D image can be projected onto a 2D plane. These transformations are applied on a vertex-to-vertex basis.

The pipeline then goes into the final stage - Rasterization. The GPU traverses the 2D image and convert the data into a number of "pixel-candidates", so-called fragments, which become pixels of the final image. A fragment is a data structure that contains attributes such as position, color, depth, texture coordinates, etc.

Besides, it is generated by checking which part of any given primitive intersects with which pixel of the screen. If a fragment intersects with a primitive, but not any of its vertices, the attributes of that fragment have to be additionally calculated by interpolating the attributes between the vertices. After that, further steps can be made to obtain the final pixels. Colors are calculated by combining textures with other attributes such as color and lighting or by combining a fragment with either another translucent fragment or optional fog (a graphical effect).



For the modern GPU, the rendering stages shown below:
Working cycle of GPU

The GeForce 6 Series GPU represents a classic architecture model, and the further series, the GeForce 8 Series, which maturely brings the GPU architecture to the next generation.

The turning point of GPU Architecture - Unified Shader Model

The architecture is not goes into the Programmable Pipeline. The GeForce 8 Series is then constructing with Built around programmable units and with the function of Unified shader. The Vertex, Geometry and Fragment parts are then become programmable and therefore, it causes to increase huge amount of programmability in the pipeline.

Shaders in GeForce 8 Series GPU:

GeForce 8 Series GPU Block Diagram

The two new features of Unified shaders and the allowance of directing access to compute units in new APIs map well to the nowadays famous application of GPU, which is general-purpose computing on graphics processing unit.

The architecture innovation

The trend of GPU architecture innovation is followed by the demand of the world which interested in general-purpose programmability and parallel processing.

The evolution of GPUs nowadays tends to allow user using the GPU to take over CPU tasks. When GPUs are used for non-graphical processing, it is known as GPGPU, or general-purpose computing on graphics processing unit.

CPUs are primarily serial processors. While they might have multiple cores and threads, they still function by performing many calculations very rapidly, one after the other. GPUs, on the other hand, have become increasingly parallel processors. While an Intel or AMD CPU might have 2, 4, or 8 cores, an Nvidia GeForce or ATI Radeon GPU can have hundreds, all working at the same time. The individual cores on a GPU perform relatively simple functions, but since they all work together simultaneously, they can perform some impressive mathematical feats.

Nowadays, the GPU performance growth still obeys to the Moore’s Law, which described the trend of the computing performance enhancement will have 200% increase every 18 months.

GPU performance graph 2000-2010:


GPU performance graph 2000-2010

Current and Future development

Computing is evolving from “central processing” on the CPU to “co-processing” on the CPU and GPU. The newly parallel computing architecture called “CUDA” which provided by NVIDIA, has further taken the used of GPGPU into the next generation.

For the scientific research, the CUDA GPU architecture accompany with the used of newly invented program (e.g. AMBER), the concept of GPGPU is then importantly apply into academia and pharmaceutical companies worldwide to accelerate new drug discovery.

On the other hand, the financial market, GPGPU is also used for counterparty risk application. The famous analysis helper, such as Numerix and CompatibL are also support the used of CUDA for GPGPU and achieved an 18X speedup.

GPU computing is going to the mainstream. The recent launches operation system, like Microsoft Window 7 and Apple Snow Leopard, the GPU is no longer only to be acted as a graphics processor, but also a general purpose parallel processor accessible to any application.

The world major chips manufacturer Intel, launched the first Core i3 on January 7, 2010. This type of CPU is integrated with a GPU. Examples are the Core i3-5xx and Core i3XXM. The Next series Core i5 also use the same architecture concept, such as Core i5-6XX, Core i5-5XXM and Core i5-5xxUM and so on.

Is our future going back to the old time using single-processor computing?



References

[1] Wikipedia entry on GPUs
[2] Kees Huizing, Han-Wei Shen: \The Graphics Rendering Pipeline"
[3] Cyril Zeller: \Introduction to the Hardware Graphics Pipeline", Presentation at ACM SIGGRAPH
2005
[4] ExtremeTech 3D Pipeline Tutorial
[5] Ashu Rege: \Introduction to 3D Graphics for Games"
[6] DirectX Developer Center: \The Direct3D Transformation Pipeline"
[7] Mark Colbert: \GPU Architecture & CG"
[8] GPU Gems 2, Chapter 30: \The GeForce 6 Series GPU Architecture"
[9] IEEE Micro, Volume 25 , Issue 2 (March 2005): \The GeForce 6800"
[10] www.3dcenter.de: \NV40-Technik im Detail"
[11] www.digit-life.com: \NVIDIA GeForce 6800 Ultra (NV40)"
[12] Austin Robison, Abe Winter: \An Overview of Graphics Processing Hardware"
[13] John Montrym, Henry Moreton: \NVIDIA GeForce 6800", Hot Chips 16
[14] Ajit Datar, Apurva Padhye: \Graphics Processing Unit Architecture"
[15] Sven Schenk: \Eine Einfuehrung in die Architektur moderner Graphikprozessoren"
[16] Thomas Scott Crow: \Evolution of the Graphical Processing Unit"
[17] DirectX Developer Center: \Asm Shader Reference"
[18] Erik Lindholm, Stuart Oberman: \NVIDIA GeForce 8800 GPU"
[19] www.digit-life.com: \Say Hello To DirectX 10, Or 128 ALUs In Action: NVIDIA GeForce 8800 GTX (G80)"
[20] Richard Hough, Richard Yu: \GPU Architecture"
[21] Technical Brief: \NVIDIA GeForce 8800 GPU Architecture Overview"
[22] GPU Gems 2, Chapter 46: \Improved GPU Sorting"
[23] Tim Purcell: \Sorting and Searching", SIGGRAPH 2005 GPGPU COURSE
[24] Peter Kipfer, Mark Segal, Ruediger Westermann: \UberFlow: A GPU-Based Particle Engine"
[25] Wikipedia entry on Nvidia
[26] Wikipedia entry on ATI
[27] Wikipedia entry on CUDA
[28] Wikipedia entry on CTM


Exercise Section

Questions

  1. State the major differences between GeForce Series 6 and GeForce Series 8.
Answer:
GeForce Series 6
  1. The Rendering Pipeline
b.   Vertex Shader, Pixel Shader, Geometry Shader to handle different steps of rendering
GeForce Series 8
  1. The Programmable Pipeline
b.   Unified Shader handles all rendering steps

  1. In the GeForce 6800 architecture, which part is response to convert 3D position to 2D coordinate?
Answer:
Vertex Shader.

  1. What are the new features of modern GPU GeForce Series 8?
Answer:
  1. Unified shaders
  2. Direct access to compute units in new APIs

  1. What is the job of Rasterization?
Answer:
To traverse the 2D image and convert the data into a number of "pixel-candidates"

  1. What is the meaning of GPGPU?
Answer:
When GPUs are used for non-graphical processing, it is known as GPGPU, or general-purpose computing on graphics processing unit.