Глубокое обучение с подкреплением для оптимизации инвестиционного портфеля: применение в управлении активами энергетического сектора
Работая с сайтом, я даю свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта обрабатывается системой Яндекс.Метрика
SCIENTIFIC JOURNAL BULLETIN OF THE VORONEZH INSTITUTE OF HIGH TECHNOLOGIES
Online media
ISSN 2949-4443

Deep Reinforcement Learning for Investment Portfolio Optimization: Application in Asset Management of the Energy Sector

Koshelev N.M. ,  Tarlykov A.V. ,  Preobrazhenskiy A.P.  

UDC 004.852+336.761

  • Abstract
  • List of references
  • About authors

This paper examines the application of deep reinforcement learning in the tasks of dynamic optimization of the investment portfolio in relation to the assets of the energy sector. The portfolio management problem is formalized as a Markov Decision Process (MDP). The algorithms PPO, DDPG, and SAC are analyzed with emphasis on the mechanics behind their mathematical differences. The central finding is that no algorithm is universally superior: A2C and PPO consistently outperform SAC in trending markets (cumulative return +12.5% vs. +4.5%), while SAC leads during high-volatility crises (Sharpe ratio 1.18 vs. 0.61 for Buy & Hold; max drawdown −19.3% vs. −38.2%). Interpretability via SHAP and LIME, financial environment non-stationarity, and practical barriers between backtesting and live trading are also discussed.

1. Sutton R.S. Reinforcement Learning: An Introduction / R.S. Sutton, A.G. Barto. – 2nd ed. – Cambridge, MA: MIT Press, 2018. – 552 p.

2. A Review of Reinforcement Learning in Financial Applications / Y. Bai, Y. Gao, R. Wan [et al.] // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2411.12746 (дата обращения: 25.02.2026).

3. Proximal Policy Optimization Algorithms / J. Schulman, F. Wolski, P. Dhariwal [et al.] // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/1707.06347 (дата обращения: 19.02.2026).

4. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / T. Haarnoja, A. Zhou, P. Abbeel, S. Levine // Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. – PMLR, 2018. – P. 1856–1865.

5. Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity / A. Mohammadshafie, A. Mirzaeinia, H. Jumakhan, A. Mirzaeinia // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2407.09557 (дата обращения: 16.02.2026).

6. De-la-Rica-Escudero A. Explainable Post Hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning Agent / A. de-la-Rica-Escudero, E.C. Garrido-Merchán, M. Coronado-Vaca // PLoS ONE. – 2025. – Vol. 20, No. 1. – URL: https://doi.org/10.1371/journal.pone.0315528 (дата обращения: 16.02.2026).

7. Ndikum Ph. Advancing Investment Frontiers: Industry-grade Deep Reinforcement Learning for Portfolio Optimization / Ph. Ndikum, S. Ndikum // arXiv [Электронный ресурс]. – URL: https://arxiv.org/abs/2403.07916 (дата обращения: 08.02.2026).

Koshelev Nikita Mikhailovich

Voronezh Institute of High Technologies

Voronezh, Russia

Tarlykov Alexander Vyacheslavovich

Voronezh Institute of High Technologies

Voronezh, Russia

Preobrazhenskiy Andrey Petrovich
Doctor of Engineering Sciences, Full Professor

Voronezh Institute of High Technologies

Voronezh, Russia

Keywords: deep reinforcement learning, portfolio optimization, markov decision process, PPO, DDPG, SAC, regime dependence

For citation: Koshelev N.M. , Tarlykov A.V. , Preobrazhenskiy A.P. , Deep Reinforcement Learning for Investment Portfolio Optimization: Application in Asset Management of the Energy Sector. Bulletin of the Voronezh Institute of High Technologies. 2026;20(1). Available from: https://vestnikvivt.ru/ru/journal/pdf?id=1468 (In Russ).

1001

Full text in PDF

Received 11.03.2026

Revised 30.03.2026

Accepted 30.03.2026

Published 31.03.2026