Reinforcement Learning in Financial Services: Modelling Payment Switching as a Multi-Armed Bandit Problem
- 1 Department of Computer Science and Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria
- 2 Department of Digital and Computational Studies, Bates College, Lewiston, United States
- 3 Vitruvian Shield PT, LDA, Portugal
- 4 Venture Garden Group, Ikeja, Lagos, Nigeria
- 5 Department of Arts and Social Science Education, Lead City University, Nigeria
Abstract
The ever-evolving landscape of digital payments demands continuous innovation and self-improvement. This study addresses this imperative by simulating a model for payment routing, a crucial aspect of the digital payment ecosystem. To achieve this, industry professionals were interviewed to inform the approach, emphasizing data randomization for effective data collection. Using Python, a randomized dataset is created and three Reinforcement Learning (RL) algorithms are implemented and evaluated: Epsilon Greedy, Upper Confidence Bound (UCB), and Thompson Sampling. The paper adopts the Multi-Armed Bandit (MAB) framework to model payment routing as a resource allocation problem, offering a computational approach to real-world resource allocation dilemmas. Through simulation, we eliminate real-time transaction costs, allowing us to focus on algorithmic approaches without implications for customers, businesses, or payment providers. Among the RL algorithms studied, UCB emerges as the most effective in addressing this Multi-Armed Bandit problem, corroborating findings from prior research. This study suggests not only the potential of modeling real-world problems as MAB but also the superior performance of the UCB algorithm in solving RL problems. The paper underscores the need for increased focus on non-consumer-facing aspects of the financial services industry, emphasizing cross-disciplinary research to create infrastructure and software solutions. Researchers can extend this study by exploring MAB algorithms in various domains with options for system choices. The simulation-based approach offers a cost-effective means of testing system performance and hypotheses across a spectrum of industries, fostering innovation and progress.
DOI: https://doi.org/10.3844/jcssp.2024.1519.1529
Copyright: © 2024 Ishaya Gambo, Christopher Agbonkhese, Segun Aina, Mogboluwaga Tayo Otegbayo, Johnson Bayo Adekunle, Israel Odetola, Omobola Gambo, Tolulope Oluwadare and Oluwatoni Odetola. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 493 Views
- 294 Downloads
- 0 Citations
Download
Keywords
- Multi-Armed Bandit Problem
- Reinforcement Learning
- Digital Payments
- Transaction
- Simulation