<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>10</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Naima Zerari</style></author><author><style face="normal" font="default" size="100%">Samir Abdelhamid</style></author><author><style face="normal" font="default" size="100%">Hassen Bouzgou</style></author><author><style face="normal" font="default" size="100%">Christian Raymond</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition</style></title><secondary-title><style face="normal" font="default" size="100%">2nd International Conference on Natural Language and Speech Processing (ICNLSP) IEEE</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2018</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://ieeexplore.ieee.org/abstract/document/8374374/</style></url></web-urls></urls><publisher><style face="normal" font="default" size="100%">IEEE</style></publisher><pub-location><style face="normal" font="default" size="100%">Algiers</style></pub-location><pages><style face="normal" font="default" size="100%">1-6</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Automatic Speech Recognition can be considered as a transcription of spoken utterances into text which can be used to monitor/command a specific system. In this paper, we propose a general end-to-end approach to sequence learning that uses Long Short-Term Memory (LSTM) to deal with the non-uniform sequence length of the speech utterances. The neural architecture can recognize the Arabic spoken digit spelling of an isolated Arabic word using a classification methodology, with the aim to enable natural human-machine interaction. The proposed system consists to, first, extract the relevant features from the input speech signal using Mel Frequency Cepstral Coefficients (MFCC) and then these features are processed by a deep neural network able to deal with the non uniformity of the sequences length. A recurrent LSTM or GRU architecture is used to encode sequences of MFCC features as a fixed size vector that will feed a multilayer perceptron network to perform the classification. The whole neural network classifier is trained in an end-to-end manner. The proposed system outperforms by a large gap the previous published results on the same database.</style></abstract></record></records></xml>