Code Acceleration on GPU architectures

Port and optimize for GPU architectures based on nVidia/AMD chips with either CUDA or OpenCL. Often the following steps are required to
extract reasonable performance:

  • Identify and make initial port of acceleration candidates to GPU.
  • Review hierarchical data flow (host-device-within device).
  • Tailor data structures for GPUs and eliminate redundant data moves.
  • Reduce number of registers/operations and mitigate warp divergence.
  • Auto-tune optimized kernels for the best performance on target architectures.