Achieving optimal performance for real-time algorithms running on GPUs goes far beyond just knowing the Cuda or OpenCL library usage. Different algorithm architectures require specialized tuning, memory access optimization, hidden latencies to consider, etc.
We are experienced with working in many domains and can design algorithms from concept to real-time deployment on GPUs, as well as taking existing algorithms and porting them to Cuda on GPUs or other embedded platforms.
We have experience in many different algorithmic domains ranging from Radar signal processing, wireless and wired communication signals processing, computer vision, machine learning, deep learning, electromagnetics, linear algebra, pattern recognition
If you have an existing CPU based on non-real time algorithms that need GPU speed up, we do: code analysis and profiling, CPU to GPU code optimization and porting, multi-GPU support and more. We have deep expertise in CUDA and OpenCL for both porting existing prototyped systems and algorithms to designing from scratch.