Blog 13

What is reinforcement learning?
Gizem Baruk I 28.03.2022

Reinforcement learning is a form of machine learning in which the developed software learns to perform a task in a dynamic environment through repeated trial-and-error interactions. Rewards are awarded for certain actions performed. In contrast to supervised and unsupervised learning, no data is required for conditioning. It independently runs through numerous training runs within the simulation environment in order to then deliver an exact result. The system is not confronted with correct results, only impulses are given that support the system. The aim of the training is for the artificial intelligence to be able to solve very complex control problems autonomously without human prior knowledge. It also uses the artificial neural network, which is very similar to the human brain and human learning.Computer games provide the perfect basis for a better understanding of reinforcement learning. Computer games generally provide various control options, a simulation environment and the ability to influence the environment. In addition, the games usually depict a problem or complex tasks that must be solved. Point systems that exist in most games are also similar to the reward system of reinforcement learning.

How does reinforcement learning work?
Reinforcement learning uses various methods in which the software agent learns a strategy independently. The aim of the learning process is to maximize the number of rewards in the simulation environment. During training, the agent carries out actions within the environment at each time step and receives feedback for each one. It is not shown in advance which action is the best; it only receives a reward in certain situations. During training, the agent learns to assess the consequences of actions on situations in the simulation environment. This allows a long-term strategy to be mapped out.

In order to train a reinforcement learning system accordingly, a method called Q-learning is used. This comes from the Q function, which is intended to calculate the expected benefit of an action in the status. The aim of reinforcement learning is then to create the most optimal policy possible. The term "policy" refers to the learned behavior of the software agent, which shows it which action should be carried out in a varying behavior variant from the learning environment.

What fields of application are there?
Neural networks trained with reinforcement learning can encode complex behaviors. This enables an alternative approach that is difficult or impossible to handle using conventional methods.For example, in autonomous driving, the neural network can replace the driver and use multiple sensors, such as camera images and LiDAR measurements, to decide how to turn the steering wheel.

Typical fields of application are problems with the following characteristics:
• The task can be simulated
• Develop your own strategies for finding solutions
• Classic engineering methods are not effective
• Complex solution steps should be found and optimized.

Practical fields of application:
• Autonomous driving
• Traffic light control to minimize traffic jams
• Smart grids
• Factory automation
• Control of robots
• Optimization of supply chain or warehousing
• Dynamic pricing to maximize profits
• Learning a computer game or console game
• Etc.

What are the benefits of reinforcement learning?
Reinforcement learning offers several advantages over other machine learning methods. It is able to find solutions to complex problems without human prior knowledge and initial data. It is similar to the natural learning process and can generate solutions that humans are not capable of. In principle, reinforcement learning can be used for any intellectual task.

Share by: