AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs
Q Wang, X Zhang, Y Zhang, Q Yi
SC'13: Proceedings of the International Conference on High Performance …, 2013
POET: Parameterized optimizations for empirical tuning
Q Yi, K Seymour, H You, R Vuduc, D Quinlan
2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007
Transforming loops to recursion for multi-level memory hierarchies
Q Yi, V Adve, K Kennedy
Proceedings of the ACM SIGPLAN 2000 conference on Programming language …, 2000
High Performance Fortran compilation techniques for parallelizing scientific codes
V Adve, G Jin, J Mellor-Crummey, Q Yi
SC'98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, 11-11, 1998
POET: a scripting language for applying parameterized source‐to‐source program transformations
Q Yi
Software: Practice and Experience 42 (6), 675-706, 2012
Transforming complex loop nests for locality
Q Yi, K Kennedy, V Adve
The Journal Of Supercomputing 27 (3), 219-264, 2004
Understanding stencil code performance on multicore architectures
SMF Rahman, Q Yi, A Qasem
Proceedings of the 8th ACM International Conference on Computing Frontiers, 1-10, 2011
Improving memory hierarchy performance through combined loop interchange and multi-level fusion
Q Yi, K Kennedy
The International Journal of High Performance Computing Applications 18 (2 …, 2004
Automated empirical tuning of scientific codes for performance and power consumption
SF Rahman, J Guo, Q Yi
Proceedings of the 6th International Conference on High Performance and …, 2011
Advanced optimization strategies in the Rice dHPF compiler
J Mellor‐Crummey, V Adve, B Broom, D Chavarría‐Miranda, R Fowler, ...
Concurrency and Computation: Practice and Experience 14 (8‐9), 741-767, 2002
A highly parallel reuse distance analysis algorithm on gpus
H Cui, Q Yi, J Xue, L Wang, Y Yang, X Feng
2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012
Semantic-driven parallelization of loops operating on user-defined containers
D Quinlan, M Schordan, Q Yi, BR De Supinski
International Workshop on Languages and Compilers for Parallel Computing …, 2003
Applying loop optimizations to object-oriented abstractions through general classification of array semantics
Q Yi, D Quinlan
International Workshop on Languages and Compilers for Parallel Computing …, 2004
Classification and utilization of abstractions for optimization
D Quinlan, M Schordan, Q Yi, A Saebjornsen
International Symposium On Leveraging Applications of Formal Methods …, 2004
Automatic blocking of QR and LU factorizations for locality
Q Yi, K Kennedy, H You, K Seymour, J Dongarra
Proceedings of the 2004 workshop on Memory system performance, 12-22, 2004
Exploring the optimization space of dense linear algebra kernels
Q Yi, A Qasem
International Workshop on Languages and Compilers for Parallel Computing …, 2008
Effective use of non-blocking data structures in a deduplication application
SD Feldman, A Bhat, P LaBorde, Q Yi, D Dechev
Proceedings of the 2013 companion publication for conference on Systems …, 2013
Studying the impact of application-level optimizations on the power consumption of multi-core architectures
SMF Rahman, J Guo, A Bhat, C Garcia, MH Sujon, Q Yi, C Liao, ...
Proceedings of the 9th Conference on Computing Frontiers, 123-132, 2012
A C++ infrastructure for automatic introduction and translation of OpenMP directives
D Quinlan, M Schordan, Q Yi, BR De Supinski
International Workshop on OpenMP Applications and Tools, 13-25, 2003
Automated transformation for performance-critical kernels
Q Yi, RC Whaley
Proceedings of the 2007 Symposium on Library-Centric Software Design, 109-119, 2007
