Follow
QinBo Bai
Title
Cited by
Cited by
Year
Deep learning-based channel estimation algorithm over time selective fading channels
Q Bai, J Wang, Y Zhang, J Song
IEEE Transactions on Cognitive Communications and Networking 6 (1), 125-134, 2019
1112019
Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach
Q Bai, AS Bedi, M Agarwal, A Koppel, V Aggarwal
Proceedings of the AAAI Conference on Artificial Intelligence 36 (4), 3682-3689, 2022
502022
Reinforcement learning for constrained markov decision processes
A Gattami, Q Bai, V Aggarwal
International Conference on Artificial Intelligence and Statistics, 2656-2664, 2021
182021
Reinforcement learning for multi-objective and constrained Markov decision processes
A Gattami, Q Bai, V Agarwal
arXiv preprint arXiv:1901.08978, 2019
142019
Regret guarantees for model-based reinforcement learning with long-term average constraints
M Agarwal, Q Bai, V Aggarwal
Uncertainty in Artificial Intelligence, 22-31, 2022
122022
Model-free algorithm and regret analysis for MDPs with peak constraints
Q Bai, A Gattami, V Aggarwal
arXiv preprint arXiv:2003.05555, 2020
11*2020
Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm
Q Bai, AS Bedi, V Aggarwal
Proceedings of the AAAI Conference on Artificial Intelligence 37 (6), 6737-6744, 2023
102023
Concave utility reinforcement learning with zero-constraint violations
M Agarwal, Q Bai, V Aggarwal
arXiv preprint arXiv:2109.05439, 2021
102021
A Reinforcement learning framework for vehicular network routing under peak and average constraints
N Geng, Q Bai, C Liu, T Lan, V Aggarwal, Y Yang, M Xu
IEEE Transactions on Vehicular Technology, 2023
82023
Joint optimization of multi-objective reinforcement learning with policy gradient based algorithm
Q Bai, M Agarwal, V Aggarwal
arXiv preprint arXiv:2105.14125, 2021
82021
Markov decision processes with long-term average constraints
M Agarwal, Q Bai, V Aggarwal
arXiv preprint arXiv:2106.06680, 2021
72021
Escaping saddle points for zeroth-order non-convex optimization using estimated gradient descent
Q Bai, M Agarwal, V Aggarwal
2020 54th Annual Conference on Information Sciences and Systems (CISS), 1-6, 2020
72020
Regret analysis of policy gradient algorithm for infinite horizon average reward markov decision processes
Q Bai, WU Mondal, V Aggarwal
Proceedings of the AAAI Conference on Artificial Intelligence 38 (10), 10980 …, 2024
52024
Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach
Q Bai, AS Bedi, M Agarwal, A Koppel, V Aggarwal
Journal of Artificial Intelligence Research 78, 975-1016, 2023
32023
Joint optimization of concave scalarized multi-objective reinforcement learning with policy gradient based algorithm
Q Bai, M Agarwal, V Aggarwal
Journal of Artificial Intelligence Research 74, 1565-1597, 2022
32022
Provably Sample-Efficient Model-Free Algorithm for MDPs with Peak Constraints
Q Bai, V Aggarwal, A Gattami
Journal of Machine Learning Research 24 (60), 1-25, 2023
22023
Model-free algorithm and regret analysis for MDPs with long-term constraints
Q Bai, V Aggarwal, A Gattami
arXiv preprint arXiv:2006.05961, 2020
12020
Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm
Q Bai, WU Mondal, V Aggarwal
arXiv preprint arXiv:2402.02042, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–18