Keywords: deep reinforcement learning, portfolio optimization, markov decision process, PPO, DDPG, SAC, regime dependence
Deep Reinforcement Learning for Investment Portfolio Optimization: Application in Asset Management of the Energy Sector
UDC 004.852+336.761
This paper examines the application of deep reinforcement learning in the tasks of dynamic optimization of the investment portfolio in relation to the assets of the energy sector. The portfolio management problem is formalized as a Markov Decision Process (MDP). The algorithms PPO, DDPG, and SAC are analyzed with emphasis on the mechanics behind their mathematical differences. The central finding is that no algorithm is universally superior: A2C and PPO consistently outperform SAC in trending markets (cumulative return +12.5% vs. +4.5%), while SAC leads during high-volatility crises (Sharpe ratio 1.18 vs. 0.61 for Buy & Hold; max drawdown −19.3% vs. −38.2%). Interpretability via SHAP and LIME, financial environment non-stationarity, and practical barriers between backtesting and live trading are also discussed.
1. Sutton R.S. Reinforcement Learning: An Introduction / R.S. Sutton, A.G. Barto. – 2nd ed. – Cambridge, MA: MIT Press, 2018. – 552 p.
2. A Review of Reinforcement Learning in Financial Applications / Y. Bai, Y. Gao, R. Wan [et al.] // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2411.12746 (дата обращения: 25.02.2026).
3. Proximal Policy Optimization Algorithms / J. Schulman, F. Wolski, P. Dhariwal [et al.] // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/1707.06347 (дата обращения: 19.02.2026).
4. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / T. Haarnoja, A. Zhou, P. Abbeel, S. Levine // Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. – PMLR, 2018. – P. 1856–1865.
5. Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity / A. Mohammadshafie, A. Mirzaeinia, H. Jumakhan, A. Mirzaeinia // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2407.09557 (дата обращения: 16.02.2026).
6. De-la-Rica-Escudero A. Explainable Post Hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning Agent / A. de-la-Rica-Escudero, E.C. Garrido-Merchán, M. Coronado-Vaca // PLoS ONE. – 2025. – Vol. 20, No. 1. – URL: https://doi.org/10.1371/journal.pone.0315528 (дата обращения: 16.02.2026).
7. Ndikum Ph. Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization / Ph. Ndikum, S. Ndikum // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2403.07916 (дата обращения: 08.02.2026).
Keywords: deep reinforcement learning, portfolio optimization, markov decision process, PPO, DDPG, SAC, regime dependence
For citation: Koshelev N.M. , Tarlykov A.V. , Preobrazhenskiy A.P. , Deep Reinforcement Learning for Investment Portfolio Optimization: Application in Asset Management of the Energy Sector. Bulletin of the Voronezh Institute of High Technologies. 2026;20(1). Available from: https://vestnikvivt.ru/ru/journal/pdf?id=1468 (In Russ).
Received 11.03.2026
Revised 30.03.2026
Accepted 30.03.2026
Published 31.03.2026