主要文章
- Generative Adversarial User Model for Reinforcement Learning Based Recommendation System,蚂蚁金服,ICML 2019
- Large-scale Interactive Recommendation with Tree-structured Policy Gradient
树型策略梯度(tree-structured policy gradient)的强化学习模型TPGR,相关报道 - Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning KDD 2018
Deep Reinforcement Learning for Page-wise Recommendations, JD, RecSys 2018
- A Deep Reinforcement Learning Framework for News Recommendation JD, WWW 2018
- Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology,IJCAI 2019
- Top-K Off-Policy Correction for a REINFORCE Recommender System,WSDM 2019
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding, Alibaba, DQN
重点解读
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System,蚂蚁金服,ICML 2019
主要贡献点:
- 用户的行为模型+reward统一由一个minmax的框架来学习
- 以这套模型为环境,开发了一个级联的DQN的方法线性复杂度的解决组合选择action的问题
值得借鉴点:
- 强化学习的五元组建模方法
- 对抗训练的建模方式。G:基于用户的历史行为序列生成当前的行为概率;D:尝试从生成的行为序列和真实的区分出来。
- 行为序列两种建模方式:LSTM/Position Weight
- 贪心方式的级联的DQN的处理方式