Reinforcement learning (RL) along with supervised and unsupervised learning make up the three branches of machine learning. What differentiates reinforcement learning from the other two is that it is based on the idea of learning through trial and error, measuring its learning through the idea of rewards instead of through labeled data in the case of supervised learning or finding hidden structures in the data in the case of unsupervised learning.
Reinforcement learning has been used in various applications in finance and trading, including portfolio optimization and optimal trade execution. Given that actions taken in finance and trading may have long-term effects not immediately measurable, some financial problems can be viewed as sequential decision problems, and the environment in which these areas work may be very large or continuous, reinforcement learning can well-suited to solving finance problems, more so than supervised learning in some cases.
In reinforcement learning an agent (the algorithm) makes decisions at each time step based on the observed environment and the rewards it receives, learning through trial and error, with the overall goal of selecting actions that will maximize the total reward in the long run. A combination of exploration (trying the unknown) and exploitation (using knowledge the agent already has) can be used to make improvements in the performance of reinforcement learning algorithms.
The idea of delayed rewards is an important concept in reinforcement learning, in that some actions may not provide a large reward right away but may yield large rewards in the future, such that the best action to be taken not only depends on the immediate rewards, but also the future rewards obtained as a result of taking that action.
Important elements in reinforcement learning algorithms include the following:
- Environment: model of the world in which the agent operates. The agent has no direct control over the environment, and it may be fully or partially observable to the agent. In the case of finance applications, the environment could be the market.
- State: current situation of the agent. The agent uses the information from the state to make a decision on the next action to take. For example, the state could contain variables that characterize the market, such as inventory of shares of stock, market quality measures, price volatility, etc.
- Policy: behavior function of the agent; a mapping from states of the environment to the probabilities of selecting actions that can be taken in those states.
- Reward: numerical value received by the agent in response to its actions. For example, in finance applications the rewards could reflect the change in profit made by taking a particular action.
- Value function: measure of the total amount of expected reward an agent will receive in the future starting from a particular state. The estimation of the value function is important for estimating the greatest reward to be received in the long run.
- Q value: similar to the value function. It is a measure of the total amount of future expected reward given that the agent starts from a particular state and takes a particular action.
Experimentation and research over new application areas in finance for RL and improvements to existing RL solutions in finance continues. Some concrete examples of where reinforcement learning has been successfully applied to problems in finance include the following:
- Portfolio optimization. In portfolio optimization the goal is to create an optimum portfolio given the specific factors that should be maximized or minimized (e.g. expected return, financial risk) and taking into account any constraints. Various types of reinforcement learning algorithms have been successfully applied to the problem of portfolio optimization, including Deep Q learning, Policy Gradient, PPO, and A3C.
- Optimized trade execution. The goal of optimized trade execution is to sell or buy a specific number of shares of a stock in a fixed time period, such that the revenue received (in the case of selling) is maximized or the capital spent (in the case of buying) is minimized. Reinforcement learning algorithms have been applied to optimized trade execution to create trading strategies and systems, and have been found to be well-suited to this type of problem, with the performance of the RL trading systems showing improvements over other types of solutions.
- Market-making. In market-making the market maker buys and sells stocks with the goal of maximizing the profit from buying and selling them and minimizing the inventory risk. Reinforcement learning has been used successfully to come up with price setting strategies to maximize profit and minimize inventory risk.
Reinforcement learning historically came from the melding of two areas: that of learning through trial and error that came from how animals learn and that of optimal control which uses value functions and dynamic programming in its solutions. Problems in which reinforcement learning is applicable include those for which a process needs to be optimized and a sequence of decisions needs to be learned; however, there is no absolute correct specific sequence. In addition, a simulation should be able to be constructed in order to model the environment in which the model (algorithm) is being used. That is our experience at Treelogic (www.treelogic.com) both in research projects and solutions for clients.
In conclusion, the use cases of reinforcement learning in finance are growing and have been shown to have better performance than previous solutions in some types of finance problems. Treelogic continues her research and experimentation in this area with new reinforcement learning solutions and their use in finance areas in which RL had not previously been applied.