Utilising a Graphical Processing Unit to its full Potential

During my Masters studies in artificial intelligence, I came across a paper by 3 PHD students from the Vilnius University in Lithuania while researching for papers about machine learning and algorithmic trading. The paper spoke of the diverse approaches taken by different researches on the use of Graphical Processing Units (GPUs) to process high frequency algorithmic trades. This got me thinking. How could a GPU help in the processing of trading ticks? What impact does a GPU have in mining cryptocurrencies? How difficult is it to actually use the performance power in a GPU to program crypto mining and high frequency trading? But to answer these questions, we need to first understand the processing power in a GPU and how this has exploded in the last decade.

Back in the days when I was still in my early twenties and still an avid PC gamer, we saw a surge in the demand which new games were asking for. This was because these games were not just providing relaxation time to the gamer, but were striving to provide a completely immersive gaming experience.

Game Development companies started giving us more open world and secondary in game missions, which not only lengthened the amount of time to finish a game, but also let the user have a more immersive experience of the game. This non withstanding the improvement in graphics, giving a more lively experience to the games in themselves. Let’s take for example any game in the Assassin’s Creed series. This is a first person shooter game with the main mission always being to rank up the enemy line of imposters, gather more information and finally find the location of the main boss and put an end to him. A gamer can simply follow that path and that would be enough to complete the game. However it would be a pity to lose on the many side missions each game in the series offers. The game producers do a wonderful job to create a beautiful graphically appealing near to life open world (since the game is nearly always set in a an actual city).

This improvement in games, gave space for GPUs to be manufactured as a dedicated module in a PC or laptop. We started seeing GPU manufacturers such as Nvidia, ATI, EVGA and MSI produce their graphics components which connected to the PCI Express motherboard bus; which was 2 – 3 times faster than the earlier PCI and ISA buses. Apart from that, GPUs now their own power directly from the power supply rather than through the motherboard. What this did in essence was let graphics hungry programs to take advantage of the dedicated GPU, rather than use the CPU processing power to run them. Dedicated GPUs are autonomous in a sense that they have their own processing cores and memory which is independent from the machine’s memory.

The architecture of a GPU is designed to handle many computer intensive, memory intensive processes so that it becomes the ideal hardware to take care of data intensive processes. But what happens when the user is not running a game or a graphically intensive program such a post-production video editing or 3D graphical rendering? Most of the time the GPU’s power is not fully utilized. This is where the GPU usage for algorithmic trading and crypto mining comes in.

The two main criteria for algorithmic trading are the speed at which the same set of computations can be performed on multiple sets of that data; and programmability. For this principle, the CPU is not a suitable enough component to run the process. Running data processes on a GPU helps to achieve a higher standard of processing and a quicker throughput so that when combining multi￾core processing with GPU performance, we can get the best outcome of a machine learning process. Programming for a GPU is however an intensive task. It is not that simple to program a process which uses the GPU resources in back testing. In particular introducing double loops and random access patterns is not simple on a GPU. For this reason most of the batch processes being sent to the GPU need to be pre-calculated in the CPU and then passed onto the GPU. Nvidia were one of the first companies who created GPU hardware which could be made easier to program through their invention of the CUDA technology. (Compute Unified Device Architecture.) The way CUDA works is by allowing developers to use MATLAB so that when a program running on the CPU invokes a GPU kernel, many copies of that same kernel are distributed to different multiprocessors under the form of threads, and are executed. This concept has revolutionized the way trades and trading computations are done over the past decade. Using a GPU for high frequency trading has helped to give a low latency to the computations and allow for every single minimal change in frequency due to the high volatility of algorithmic trading. This caters for the market demand in high frequency trading where the execution of computerized trading strategies is characterized by extremely short position holding periods.

The success of a high frequency trading algorithm depends on its ability to react to a change in the financial situation faster than others. Sometimes the change can occur so quickly that a new term has been coined for this kind of change: Ultra High Frequency Trading. This allows traders to exploit minimal changes to the financial data to that they can make the best profit value.

The most promising machine learning algorithm used in GPUs is SVM, that can be conveniently adapted to parallel architectures. The last decade, many works have created programs to accelerate the time consuming training phase in SVMs on many-core GPUs.

Hierarchical Decomposition Algorithm for Support Vector Machine Training , J. Vanek introduced a new approach to the support of using vector machine training with GPUs and called it: Optimized Hierarchical Decomposition SVM (OHD-SVM). It uses a hierarchical decomposition iterative algorithm that allows using matrix-matrix multiplication to calculate the kernel matrix values. The biggest difference was on the largest datasets where they achieved speed-up up to 12 times in comparison with the fastest already published GPU implementation.

Others have also programmed algorithms for GPUs to help in the Deep Learning field. Some developed general principles for massively parallelizing unsupervised learning tasks using graphics processors and shown that these principles can be applied to successfully scaling up learning algorithms for both deep belief networks (DBNs) and sparse coding. Their implementation of DBN learning is up to 70 times faster than a dual-core CPU implementation for large models.

The improvement of GPUs and its processing power has not only revolutionized the gaming industry but has also helped in the representation of high frequency algorithmic trading in the Fintech industry. This does not limit GPU processing power just for games and graphical rendering, but lets us use the full potential of such a component.