AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs Q Wang, X Zhang, Y Zhang, Q Yi SC'13: Proceedings of the International Conference on High Performance …, 2013 | 220 | 2013 |
POET: Parameterized optimizations for empirical tuning Q Yi, K Seymour, H You, R Vuduc, D Quinlan 2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007 | 151 | 2007 |
Transforming loops to recursion for multi-level memory hierarchies Q Yi, V Adve, K Kennedy Proceedings of the ACM SIGPLAN 2000 conference on Programming language …, 2000 | 112 | 2000 |
High Performance Fortran compilation techniques for parallelizing scientific codes V Adve, G Jin, J Mellor-Crummey, Q Yi SC'98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, 11-11, 1998 | 111 | 1998 |
POET: a scripting language for applying parameterized source‐to‐source program transformations Q Yi Software: Practice and Experience 42 (6), 675-706, 2012 | 75 | 2012 |
Transforming complex loop nests for locality Q Yi, K Kennedy, V Adve The Journal Of Supercomputing 27 (3), 219-264, 2004 | 70 | 2004 |
Understanding stencil code performance on multicore architectures SMF Rahman, Q Yi, A Qasem Proceedings of the 8th ACM International Conference on Computing Frontiers, 1-10, 2011 | 65 | 2011 |
Improving memory hierarchy performance through combined loop interchange and multi-level fusion Q Yi, K Kennedy The International Journal of High Performance Computing Applications 18 (2 …, 2004 | 58 | 2004 |
Automated empirical tuning of scientific codes for performance and power consumption SF Rahman, J Guo, Q Yi Proceedings of the 6th International Conference on High Performance and …, 2011 | 55 | 2011 |
Advanced optimization strategies in the Rice dHPF compiler J Mellor‐Crummey, V Adve, B Broom, D Chavarría‐Miranda, R Fowler, ... Concurrency and Computation: Practice and Experience 14 (8‐9), 741-767, 2002 | 47 | 2002 |
A highly parallel reuse distance analysis algorithm on gpus H Cui, Q Yi, J Xue, L Wang, Y Yang, X Feng 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 33 | 2012 |
Semantic-driven parallelization of loops operating on user-defined containers D Quinlan, M Schordan, Q Yi, BR Supinski International Workshop on Languages and Compilers for Parallel Computing …, 2003 | 30 | 2003 |
Applying loop optimizations to object-oriented abstractions through general classification of array semantics Q Yi, D Quinlan International Workshop on Languages and Compilers for Parallel Computing …, 2004 | 28 | 2004 |
Effective use of non-blocking data structures in a deduplication application SD Feldman, A Bhat, P LaBorde, Q Yi, D Dechev Proceedings of the 2013 companion publication for conference on Systems …, 2013 | 27 | 2013 |
Classification and utilization of abstractions for optimization D Quinlan, M Schordan, Q Yi, A Saebjornsen International Symposium On Leveraging Applications of Formal Methods …, 2004 | 27 | 2004 |
Automatic blocking of QR and LU factorizations for locality Q Yi, K Kennedy, H You, K Seymour, J Dongarra Proceedings of the 2004 workshop on Memory system performance, 12-22, 2004 | 25 | 2004 |
Exploring the optimization space of dense linear algebra kernels Q Yi, A Qasem International Workshop on Languages and Compilers for Parallel Computing …, 2008 | 23 | 2008 |
Studying the impact of application-level optimizations on the power consumption of multi-core architectures SMF Rahman, J Guo, A Bhat, C Garcia, MH Sujon, Q Yi, C Liao, ... Proceedings of the 9th Conference on Computing Frontiers, 123-132, 2012 | 21 | 2012 |
A C++ infrastructure for automatic introduction and translation of OpenMP directives D Quinlan, M Schordan, Q Yi, BR Supinski International Workshop on OpenMP Applications and Tools, 13-25, 2003 | 20 | 2003 |
Automated transformation for performance-critical kernels Q Yi, RC Whaley Proceedings of the 2007 Symposium on Library-Centric Software Design, 109-119, 2007 | 19 | 2007 |