Our client was a hedge fund company that uses mathematical formulas to evaluate stocks and their performance. They wanted us to create a more lucrative trading strategy that would provide the most ROI, so we created a reinforcement learning agent and simulation that could do that.
The goal of this study was to see how different indicators might change their strength and weaknesses in volatile markets, so that an RL agent could be taught a trading strategy by understanding these changing signals. Then it could use this information to go and trade stocks on behalf of the user.
The perfect agent for trading should be fast enough to work on your high-frequency trades, but also good at performance optimization to maximize the risk-reward ratio.
The team of Intellekt AI started by analyzing the signals and their influence on stock price movement. Using the most useful signals, we trained a RL agent that was used as a reference model to further improvements. Using this model, we further analyzed the importance of each signal for various actions of the agent. Based on the observations made, we further refined the selection of signals.
During our analysis, we observed the trading patterns of the agent during various market conditions and found that the agent would keep holding the stock even after the profit starts to reduce. To overcome that shortcoming, we improvised our reward calculation to make the model take maximum profit as much as it can. Also, we modified rewards in a way that the agent learns to keep the loss at minimum, while keep holding in profit.
This way, we were able to drastically improve the profit to drawdown ratio. We developed a stock simulator to verify the model performance on unseen data. The simulator also accounted for delays and price slippage, as well as commissions and fill model.
Over a couple iterations, we observed and optimized the RL agent to make trades throughout the day. In the end, our agent was able to generate a 50% ROI and a 3:1 profit-to-drawdown ratio after six months of trading. We calculated these results on unseen data so that nobody could guess through trial and error how to get good returns.
Pytorch, stable baselines, Open AI gym, multiprocessing