Follow
Kwangjun Ahn
Kwangjun Ahn
Senior Researcher, Microsoft Research
Verified email at microsoft.com - Homepage
Title
Cited by
Cited by
Year
Transformers learn to implement preconditioned gradient descent for in-context learning
K Ahn, X Cheng, H Daneshmand, S Sra
Advances in Neural Information Processing Systems 36, 2024
862024
From Nesterov's Estimate Sequence to Riemannian Acceleration
K Ahn, S Sra
Proceedings of Thirty Third Conference on Learning Theory (COLT), PMLR 125 …, 2020
762020
Hypergraph spectral clustering in the weighted stochastic block model
K Ahn, K Lee, C Suh
IEEE Journal of Selected Topics in Signal Processing 12 (5), 959-974, 2018
722018
Optimal dimension dependence of the metropolis-adjusted langevin algorithm
S Chewi, C Lu, K Ahn, X Cheng, T Le Gouic, P Rigollet
Conference on Learning Theory (COLT), 1260-1300, 2021
662021
Understanding the unstable convergence of gradient descent
K Ahn, J Zhang, S Sra
International Conference on Machine Learning, 247-257, 2022
652022
Sgd with shuffling: optimal rates without component convexity and large epoch requirements
K Ahn, C Yun, S Sra
Advances in Neural Information Processing Systems 33, 17526-17535, 2020
652020
Efficient constrained sampling via the mirror-Langevin algorithm
K Ahn, S Chewi
Advances in Neural Information Processing Systems 34, 28405-28418, 2021
572021
Community recovery in hypergraphs
K Ahn, K Lee, C Suh
IEEE Transactions on Information Theory 65 (10), 6561-6579, 2019
412019
Binary rating estimation with graph side information
K Ahn, K Lee, H Cha, C Suh
Advances in neural information processing systems 31, 2018
352018
Learning threshold neurons via edge of stability
K Ahn, S Bubeck, S Chewi, YT Lee, F Suarez, Y Zhang
Advances in Neural Information Processing Systems 36, 2024
322024
Graph Matrices: Norm Bounds and Applications
K Ahn, D Medarametla, A Potechin
arXiv preprint 1604.03423, 2020
31*2020
Linear attention is (maybe) all you need (to understand transformer optimization)
K Ahn, X Cheng, M Song, C Yun, A Jadbabaie, S Sra
ICLR 2024 (arXiv:2310.01082), 2023
202023
Reproducibility in optimization: Theoretical framework and limits
K Ahn, P Jain, Z Ji, S Kale, P Netrapalli, GI Shamir
Advances in Neural Information Processing Systems 35, 18022-18033, 2022
162022
Riemannian perspective on matrix factorization
K Ahn, F Suarez
arXiv preprint arXiv:2102.00937, 2021
142021
Mirror descent maximizes generalized margin and can be implemented efficiently
H Sun, K Ahn, C Thrampoulidis, N Azizan
Advances in Neural Information Processing Systems 35, 31089-31101, 2022
122022
Understanding Nesterov's Acceleration via Proximal Point Method
K Ahn, S Sra
Symposium on Simplicity in Algorithms (SOSA), 117-130, 2022
122022
The crucial role of normalization in sharpness-aware minimization
Y Dai, K Ahn, S Sra
Advances in Neural Information Processing Systems 36, 2024
92024
One-pass learning via bridging orthogonal gradient descent and recursive least-squares
Y Min, K Ahn, N Azizan
2022 IEEE 61st Conference on Decision and Control (CDC), 4720-4725, 2022
82022
On tight convergence rates of without-replacement sgd
K Ahn, S Sra
arXiv preprint arXiv:2004.08657, 2020
72020
From proximal point method to Nesterov’s acceleration
K Ahn
arXiv preprint arXiv:2005.08304, 2020
72020
The system can't perform the operation now. Try again later.
Articles 1–20