Deep Reinforcement Learning for Investment Portfolio Optimization: Application in Asset Management of the Energy Sector

Koshelev N.M. , Tarlykov A.V. , Preobrazhenskiy A.P.

UDC 004.852+336.761

Abstract
List of references
About authors

This paper examines the application of deep reinforcement learning in the tasks of dynamic optimization of the investment portfolio in relation to the assets of the energy sector. The portfolio management problem is formalized as a Markov Decision Process (MDP). The algorithms PPO, DDPG, and SAC are analyzed with emphasis on the mechanics behind their mathematical differences. The central finding is that no algorithm is universally superior: A2C and PPO consistently outperform SAC in trending markets (cumulative return +12.5% vs. +4.5%), while SAC leads during high-volatility crises (Sharpe ratio 1.18 vs. 0.61 for Buy & Hold; max drawdown −19.3% vs. −38.2%). Interpretability via SHAP and LIME, financial environment non-stationarity, and practical barriers between backtesting and live trading are also discussed.

1. Sutton R.S. Reinforcement Learning: An Introduction / R.S. Sutton, A.G. Barto. – 2nd ed. – Cambridge, MA: MIT Press, 2018. – 552 p.

2. A Review of Reinforcement Learning in Financial Applications / Y. Bai, Y. Gao, R. Wan [et al.] // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2411.12746 (дата обращения: 25.02.2026).

3. Proximal Policy Optimization Algorithms / J. Schulman, F. Wolski, P. Dhariwal [et al.] // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/1707.06347 (дата обращения: 19.02.2026).

4. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / T. Haarnoja, A. Zhou, P. Abbeel, S. Levine // Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. – PMLR, 2018. – P. 1856–1865.

5. Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity / A. Mohammadshafie, A. Mirzaeinia, H. Jumakhan, A. Mirzaeinia // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2407.09557 (дата обращения: 16.02.2026).

6. De-la-Rica-Escudero A. Explainable Post Hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning Agent / A. de-la-Rica-Escudero, E.C. Garrido-Merchán, M. Coronado-Vaca // PLoS ONE. – 2025. – Vol. 20, No. 1. – URL: https://doi.org/10.1371/journal.pone.0315528 (дата обращения: 16.02.2026).

7. Ndikum Ph. Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization / Ph. Ndikum, S. Ndikum // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2403.07916 (дата обращения: 08.02.2026).

Koshelev Nikita Mikhailovich

Voronezh Institute of High Technologies

Voronezh, Russia

Tarlykov Alexander Vyacheslavovich

Voronezh Institute of High Technologies

Voronezh, Russia

Preobrazhenskiy Andrey Petrovich
Doctor of Engineering Sciences, Full Professor

Voronezh Institute of High Technologies

Voronezh, Russia

Keywords: deep reinforcement learning, portfolio optimization, markov decision process, PPO, DDPG, SAC, regime dependence

For citation: Koshelev N.M. , Tarlykov A.V. , Preobrazhenskiy A.P. , Deep Reinforcement Learning for Investment Portfolio Optimization: Application in Asset Management of the Energy Sector. Bulletin of the Voronezh Institute of High Technologies. 2026;20(1). Available from: https://vestnikvivt.ru/ru/journal/pdf?id=1468 (In Russ).

1068

Full text in PDF

Received 11.03.2026

Revised 30.03.2026

Accepted 30.03.2026

Published 31.03.2026