Abstract
Sign language recognition (SLR) plays a crucial role in facilitating communication for the hearing-impaired community. Conventional methods for SLR have encountered difficulties in attaining both high precision and efficiency because of the intricate characteristics of sign language motions and the variability in articulation. We propose a novel framework for enhancing SLR by leveraging the efficiency of EfficientNet-B0 as a feature extractor and incorporating a transformer-based decoding mechanism for classification. The objective of our method is to enhance the precision and computational effectiveness of SLR systems, thereby making them more viable for real-world applications. Experimental results on two standard, commonly used sign language datasets: American Sign Language (ASL) and ASL with Digits. The proposed model achieves accuracies of 99.59% on the ASL dataset and an outstanding accuracy of 99.97% on the ASL with Digits dataset, outperforming all other state-of-the-art methods. These results highlight the effectiveness of our framework in accurately recognizing sign language gestures, making it highly suitable for realworld applications. Our study contributes to the advancement of SLR research by introducing a novel methodology that combines the efficiency of EfficientNet-B0 with the expressive capabilities of transformer-based decoding, ultimately improving communication accessibility for individuals who rely on sign language.
Keywords: EfficientNet-B0, Multi-Head Self Attention, Sign Language Recognition (SLR), Transformer.