Adrián Castelló

Cited by

	All	Since 2019
Citations	484	349
h-index	12	9
i10-index	12	7

120

201520162017201820192020202120222023202419 22 32 54 40 53 62 58 113 23

Public access

View all

37 articles

7 articles

available

not available

Based on funding mandates

Co-authors

Enrique S. Quintana-OrtíUniversitat Politècnica de València, SpainVerified email at disca.upv.es
Manuel F. DolzUniversitat Jaume IVerified email at icc.uji.es
Jose DuatoUniversitat Politècnica de ValènciaVerified email at disca.upv.es
Antonio J. PeñaBarcelona Supercomputing Center (BSC)Verified email at bsc.es
Pavan BalajiArgonne National LaboratoryVerified email at anl.gov
Sangmin SeoKlaytn FoundationVerified email at klaytn.foundation
Pedro Alonso-JordáUniversitat Politècnica de ValènciaVerified email at upv.es
Francisco D. IgualUniversidad Complutense de MadridVerified email at ucm.es
Sergio IserteSenior Researcher @ BSCVerified email at bsc.es
Sandra CatalánUniversitat Jaume IVerified email at uji.es
Rafael Rodríguez-SánchezDep. Sistemas Informáticos, Universidad de Castilla-La ManchaVerified email at uclm.es
Cristian RamírezUniversitat Politècnica de ValènciaVerified email at posgrado.upv.es

Adrián Castelló

Postdoc Fellow @ Universitat Politècnica de València (UPV)

Verified email at disca.upv.es - Homepage

Code Auto-generation Programming Models High Performance Computing Lightweight threading Deep Neural Networks


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Argobots: A lightweight low-level threading and tasking framework S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ... IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017	151	2017
SLURM support for remote GPU virtualization: Implementation and performance study S Iserte, A Castelló, R Mayo, ES Quintana-Ortí, F Silla, J Duato, C Reaño, ... 2014 IEEE 26th International Symposium on Computer Architecture and High …, 2014	34	2014
High Performance and Portable Convolution Operators for Multicore Processors P San Juan, A Castelló, MF Dolz, P Alonso-Jordá, ES Quintana-Ortí SBAC-PAD 2020, 2020	25*	2020
Improving the User Experience of the rCUDA Remote GPU Virtualization Framework C Reano, F Silla, A Castelló, AJ Pena, R Mayo, ES Quintana-Ortí, J Duato	24	2014
PyDTNN: a user-friendly and extensible framework for distributed deep learning S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre The Journal of Supercomputing 77, 9971-9987, 2021	19	2021
A Review of Lightweight Thread Approaches for High Performance Computing A Castelló, AJ Peña, S Seo, R Mayo, P Balaji, ES Quintana-Ortí 2016 IEEE International Conference on Cluster Computing (CLUSTER 2016), 471-480, 2016	19	2016
Analysis of model parallelism for distributed neural networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019	17	2019
Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato 2nd High Performance Machine Learning Workshop (HPML 2019), 534-541, 2019	14	2019
On the use of remote GPUs and low-power processors for the acceleration of scientific applications A Castelló, J Duato, R Mayo, AJ Pena, ES Quintana-Ortí, V Roca, F Silla The Fourth International Conference on Smart Grids, Green Communications and …, 2014	14	2014
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña International Conference on Parallel Processing (ICPP-2017), 60-69, 2017	13	2017
Enabling GPU Virtualization in Cloud Environments S Iserte, FJ Clemente-Castelló, A Castelló, R Mayo, ES Quintana-Ortí CLOSER 2016, 2016	13	2016
Reformulating the direct convolution for high-performance deep learning inference on ARM processors S Barrachina, A Castelló, MF Dolz, TM Low, H Martínez, ES Quintana-Ortí, ... Journal of Systems Architecture 135, 102806, 2023	12	2023
Anatomy of the BLIS family of algorithms for matrix multiplication A Castelló, ES Quintana-Ortí, FD Igual 2022 30th Euromicro International Conference on Parallel, Distributed and …, 2022	9	2022
Accelerating distributed deep neural network training with pipelined MPI allreduce A Castelló, ES Quintana-Ortí, J Duato Cluster Computing 24 (4), 3797-3813, 2021	9	2021
A flexible research-oriented framework for distributed training of deep neural networks S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre 2021 IEEE International Parallel and Distributed Processing Symposium …, 2021	9	2021
GLT: A unified API for lightweight thread libraries A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña Euro-Par 2017: Parallel Processing: 23rd International Conference on …, 2017	8	2017
A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor C Ramírez, A Castelló, ES Quintana-Ortí The Journal of Supercomputing 78 (16), 18051-18060, 2022	7	2022
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS A Castelló, S Barrachina, MF Dolz, ES Quintana-Ortí, P San Juan, ... Journal of Systems Architecture 125, 102459, 2022	7*	2022
Programming parallel dense matrix factorizations with look-ahead and OpenMP S Catalán, A Castelló, FD Igual, R Rodríguez-Sánchez, ES Quintana-Ortí Cluster Computing 23, 359-375, 2020	7	2020
On the adequacy of lightweight thread approaches for high-level parallel programming models A Castelló, R Mayo, K Sala, V Beltran, P Balaji, AJ Peña Future Generation Computer Systems 84, 22-31, 2018	7	2018

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors