An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
S Hong, H Kim
Proceedings of the 36th annual international symposium on Computer …, 2009
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
CK Luk, S Hong, H Kim
2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture …, 2009
An integrated GPU power and performance model
S Hong, H Kim
Proceedings of the 37th annual international symposium on Computer …, 2010
Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers
S Srinath, O Mutlu, H Kim, YN Patt
2007 IEEE 13th International Symposium on High Performance Computer …, 2007
A performance analysis framework for identifying potential benefits in GPGPU applications
J Sim, A Dasgupta, H Kim, R Vuduc
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of …, 2012
Graphpim: Enabling instruction-level pim offloading in graph computing frameworks
L Nai, R Hadidi, J Sim, H Kim, P Kumar, H Kim
2017 IEEE International symposium on high performance computer architecture …, 2017
Many-thread aware prefetching mechanisms for GPGPU applications
J Lee, NB Lakshminarayana, H Kim, R Vuduc
2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 213-224, 2010
When prefetching works, when it doesn’t, and why
J Lee, H Kim, R Vuduc
ACM Transactions on Architecture and Code Optimization (TACO) 9 (1), 1-29, 2012
TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture
J Lee, H Kim
IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012
GraphBIG: understanding graph computing in the context of industrial solutions
L Nai, Y Xia, IG Tanase, H Kim, CY Lin
SC'15: Proceedings of the International Conference for High Performance …, 2015
SD3: A scalable approach to dynamic data-dependence profiling
M Kim, H Kim, CK Luk
2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 535-546, 2010
Transparent hardware management of stacked dram as part of memory
J Sim, AR Alameldeen, Z Chishti, C Wilkerson, H Kim
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 13-24, 2014
Techniques for efficient processing in runahead execution engines
O Mutlu, H Kim, YN Patt
32nd International Symposium on Computer Architecture (ISCA'05), 370-381, 2005
A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch
J Sim, GH Loh, H Kim, M OConnor, M Thottethodi
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 247-257, 2012
Age based scheduling for asymmetric multiprocessors
NB Lakshminarayana, J Lee, H Kim
Proceedings of the conference on high performance computing networking …, 2009
Efficient runahead execution: Power-efficient memory latency tolerance
O Mutlu, H Kim, YN Patt
IEEE Micro 26 (1), 10-20, 2006
Power modeling for GPU architectures using McPAT
J Lim, NB Lakshminarayana, H Kim, W Song, S Yalamanchili, W Sung
ACM Transactions on Design Automation of Electronic Systems (TODAES) 19 (3 …, 2014
Macsim: A cpu-gpu heterogeneous simulation framework user guide
H Kim, J Lee, NB Lakshminarayana, J Sim, J Lim, T Pho
Georgia Institute of Technology, 2012
Effect of instruction fetch and memory scheduling on gpu performance
NB Lakshminarayana, H Kim
Workshop on Language, Compiler, and Architecture Support for GPGPU 88, 2010
Characterizing the deployment of deep neural networks on commercial edge devices
R Hadidi, J Cao, Y Xie, B Asgari, T Krishna, H Kim
2019 IEEE International Symposium on Workload Characterization (IISWC), 35-48, 2019
