Gradient descent finds global minima of deep neural networks S Du, J Lee, H Li, L Wang, X Zhai International conference on machine learning, 1675-1685, 2019 | 1356 | 2019 |

Gradient descent provably optimizes over-parameterized neural networks SS Du, X Zhai, B Poczos, A Singh arXiv preprint arXiv:1810.02054, 2018 | 816 | 2018 |

Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints W Mou, L Wang, X Zhai, K Zheng Conference on Learning Theory, 605-638, 2018 | 151 | 2018 |

On the multiple descent of minimum-norm interpolants and restricted lower isometry of kernels T Liang, A Rakhlin, X Zhai Conference on Learning Theory, 2683-2711, 2020 | 132 | 2020 |

Consistency of interpolation with Laplace kernels is a high-dimensional phenomenon A Rakhlin, X Zhai Conference on Learning Theory, 2595-2623, 2019 | 85 | 2019 |

How many samples are needed to estimate a convolutional neural network? SS Du, Y Wang, X Zhai, S Balakrishnan, RR Salakhutdinov, A Singh Advances in Neural Information Processing Systems 31, 2018 | 84 | 2018 |

On the risk of minimum-norm interpolants and restricted lower isometry of kernels T Liang, A Rakhlin, X Zhai arXiv preprint arXiv:1908.10292 27, 2019 | 30 | 2019 |

How many samples are needed to estimate a convolutional or recurrent neural network? SS Du, Y Wang, X Zhai, S Balakrishnan, R Salakhutdinov, A Singh arXiv preprint arXiv:1805.07883, 2018 | 16 | 2018 |

Near optimal stratified sampling T Yu, X Zhai, S Sra arXiv preprint arXiv:1906.11289, 2019 | 3 | 2019 |

Transformers are Efficient Compilers, Provably X Zhai, R Zhou, L Zhang, SS Du arXiv preprint arXiv:2410.14706, 2024 | | 2024 |

Towards Effective Theories for Deep Learning and Beyond X Zhai Massachusetts Institute of Technology, 2024 | | 2024 |