推荐系统前沿系列-强化学习方法概览

主要文章

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System，蚂蚁金服，ICML 2019
Large-scale Interactive Recommendation with Tree-structured Policy Gradient
树型策略梯度(tree-structured policy gradient)的强化学习模型TPGR,相关报道
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning KDD 2018
Deep Reinforcement Learning for Page-wise Recommendations, JD, RecSys 2018
Reinforcement Learning to Ranking in E-commerce Search Engine:Formalization, Analysis, and Application KDD 2018
A Deep Reinforcement Learning Framework for News Recommendation JD, WWW 2018
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology，IJCAI 2019
Top-K Off-Policy Correction for a REINFORCE Recommender System，WSDM 2019
Deep Reinforcement Learning for Sponsored Search Real-time Bidding， Alibaba, DQN

主要贡献点：
- 用户的行为模型+reward统一由一个minmax的框架来学习
- 以这套模型为环境，开发了一个级联的DQN的方法线性复杂度的解决组合选择action的问题
值得借鉴点：
- 强化学习的五元组建模方法
- 对抗训练的建模方式。G：基于用户的历史行为序列生成当前的行为概率；D：尝试从生成的行为序列和真实的区分出来。
- 行为序列两种建模方式：LSTM/Position Weight
- 贪心方式的级联的DQN的处理方式