Heterogeneous model parallelism for deep neural networks

Moreno Álvarez, Sergio; Haut Hurtado, Juan Mario; Paoletti Ávila, Mercedes Eugenia; Rico Gallego, Juan Antonio

Busca por

Estadísticas

Visualiza las estadísticas

Ayuda

Ayuda

Identificador persistente para citar o vincular este elemento: http://hdl.handle.net/10662/20373

0 0

Registro completo de Metadatos

Campo DC	Valor	idioma
dc.contributor.author	Moreno Álvarez, Sergio	-
dc.contributor.author	Haut Hurtado, Juan Mario	-
dc.contributor.author	Paoletti Ávila, Mercedes Eugenia	-
dc.contributor.author	Rico Gallego, Juan Antonio	-
dc.date.accessioned	2024-02-07T18:55:51Z	-
dc.date.available	2024-02-07T18:55:51Z	-
dc.date.issued	2021	-
dc.identifier.citation	Sergio Moreno-Alvarez, Juan M. Haut, Mercedes E. Paoletti, Juan A. Rico-Gallego, Heterogeneous model parallelism for deep neural networks, Neurocomputing, Volume 441, 2021, Pages 1-12, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2021.01.125 (https://www.sciencedirect.com/science/article/pii/S0925231221002320)	-
dc.identifier.issn	0925-2312	-
dc.identifier.uri	http://hdl.handle.net/10662/20373	-
dc.description.abstract	Deep neural networks (DNNs) have transformed computer vision, establishing themselves as the current state-of-the-art for image processing. Nevertheless, the training of current large DNN models is one of the main challenges to be solved. In this sense, data-parallelism has been the most widespread distributed training strategy since it is easy to program and can be applied to almost all cases. However, this solution suffers from several limitations, such as its high communication requirements and the memory con- straints when training very large models. To overcome these limitations model-parallelism has been pro- posed, solving the most substantial problems of the former strategy. However, describing and implementing the parallelization of the training of a DNN model across a set of processes deployed on several devices is a challenging task. Current proposed solutions assume a homogeneous distribution, being impractical when working with devices of different computational capabilities, which is quite com- mon on high performance computing platforms. To address previous shortcomings, this work proposes a novel model-parallelism technique considering heterogeneous platforms, where a load balancing mech- anism between uneven devices of an HPC platform has been implemented. Our proposal takes advantage of the Google Brain’s Mesh-TensorFlow for convolutional networks, splitting computing tensors across filter dimension in order to balance the computational load of the available devices. Conducted experi- ments show an improvement in the exploitation of heterogeneous computational resources, enhancing the training performance. The code is available on: https://github.com/mhaut/HeterogeneusModelDNN.	es_ES
dc.description.sponsorship	Supported by (1) The European Regional Development Fund ’A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118); (2) The Spanish Ministry of Science and Innovation (Ref. PID2019-110315RB-I00 APRISA); and (3) The computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain.	-
dc.format.extent	12 p.	es_ES
dc.format.mimetype	application/pdf	en_US
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.subject	Deep learning	es_ES
dc.subject	High performance computing	es_ES
dc.subject	Distributed training	es_ES
dc.subject	Heterogeneous platforms	es_ES
dc.subject	Model parallelism	es_ES
dc.subject	HPC	-
dc.subject	Computación de alto rendimiento	-
dc.subject	Entrenamiento distribuido	-
dc.subject	Plataformas de computación heterogénea	-
dc.title	Heterogeneous model parallelism for deep neural networks	es_ES
dc.type	article	es_ES
dc.description.version	peerReviewed	es_ES
europeana.type	TEXT	en_US
dc.rights.accessRights	closedAccess	es_ES
dc.subject.unesco	1203 Ciencia de Los Ordenadores	-
europeana.dataProvider	Universidad de Extremadura. España	es_ES
dc.type.version	publishedVersion	es_ES
dc.contributor.affiliation	Universidad de Extremadura. Departamento de Ingeniería de Sistemas Informáticos y Telemáticos	es_ES
dc.contributor.affiliation	Universidad Nacional de Educación a Distancia	-
dc.contributor.affiliation	Universidad de Extremadura. Departamento de Tecnología de los Computadores y de las Comunicaciones	-
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S0925231221002320?via%3Dihub	es_ES
dc.identifier.doi	https://doi.org/10.1016/j.neucom.2021.01.125	-
dc.identifier.publicationtitle	Neurocomputing	es_ES
dc.identifier.publicationissue	44	es_ES
dc.identifier.publicationfirstpage	1	es_ES
dc.identifier.publicationlastpage	12	es_ES
dc.identifier.publicationvolume	441	es_ES
dc.identifier.orcid	0000-0002-4264-7473	es_ES
dc.identifier.orcid	0000-0002-1858-9920	-
dc.identifier.orcid	0000-0001-6701-961X	-
dc.identifier.orcid	0000-0003-1030-3729	-
Colección:	DIEEA - Artículos DISIT - Artículos DTCYC - Artículos

Archivos

Archivo	Descripción	Tamaño	Formato
j_neucom_2021_01_125.pdf ???org.dspace.app.webui.jsptag.ItemTag.accessRestricted???		1,78 MB	Adobe PDF	Descargar Pide una copia

Vista resumida

Este elemento está sujeto a una licencia Licencia Creative Commons