<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Naima Zerari</style></author><author><style face="normal" font="default" size="100%">Samir Abdelhamid</style></author><author><style face="normal" font="default" size="100%">Hassen Bouzgou</style></author><author><style face="normal" font="default" size="100%">Christian Raymond</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Bidirectional deep architecture for Arabic speech recognition</style></title><secondary-title><style face="normal" font="default" size="100%">Open Computer Science (De Gruyter)</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://www.degruyter.com/view/j/comp.2019.9.issue-1/comp-2019-0004/comp-2019-0004.xml</style></url></web-urls></urls><volume><style face="normal" font="default" size="100%">9</style></volume><pages><style face="normal" font="default" size="100%">92–102</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Nowadays, the real life constraints necessitates&lt;br&gt;controlling modern machines using human intervention&lt;br&gt;by means of sensorial organs. The voice is one of the human&lt;br&gt;senses that can control/monitor modern interfaces.&lt;br&gt;In this context, Automatic Speech Recognition is principally&lt;br&gt;used to convert natural voice into computer text as&lt;br&gt;well as to perform an action based on the instructions&lt;br&gt;given by the human. In this paper, we propose a general&lt;br&gt;framework for Arabic speech recognition that uses Long&lt;br&gt;Short-Term Memory (LSTM) and Neural Network (Multi-&lt;br&gt;Layer Perceptron: MLP) classifier to cope with the nonuniform&lt;br&gt;sequence length of the speech utterances issued&lt;br&gt;fromboth feature extraction techniques, (1)Mel Frequency&lt;br&gt;Cepstral Coefficients MFCC (static and dynamic features),&lt;br&gt;(2) the Filter Banks (FB) coefficients. The neural architecture&lt;br&gt;can recognize the isolated Arabic speech via classification&lt;br&gt;technique. The proposed system involves, first, extracting&lt;br&gt;pertinent features from the natural speech signal&lt;br&gt;using MFCC (static and dynamic features) and FB. Next,&lt;br&gt;the extracted features are padded in order to deal with the&lt;br&gt;non-uniformity of the sequences length. Then, a deep architecture&lt;br&gt;represented by a recurrent LSTM or GRU (Gated&lt;br&gt;Recurrent Unit) architectures are used to encode the sequences&lt;br&gt;ofMFCC/FB features as a fixed size vector that will&lt;br&gt;be introduced to a Multi-Layer Perceptron network (MLP)&lt;br&gt;to perform the classification (recognition). The proposed&lt;br&gt;system is assessed using two different databases, the first&lt;br&gt;one concerns the spoken digit recognition where a comparison&lt;br&gt;with other related works in the literature is performed,&lt;br&gt;whereas the second one contains the spoken TV&lt;br&gt;commands. The obtained results show the superiority of&lt;br&gt;the proposed approach.</style></abstract></record></records></xml>