This paper presents the design and evaluation of a jacket–helmet assistive system for visually impaired individuals in India. The system integrates a Raspberry Pi 4B with a USB web camera, USB microphone, vibration motor cluster, earphone, pushbuttons, and a rechargeable 7.4 V, 10,000 mAh battery. Two primary functions are implemented: (i) object detection and distance estimation using YOLO algorithms with 2D depth estimation, and (ii) text recognition on posters and hoardings using optical character recognition (OCR). Comparative analysis of YOLOv5, YOLOv7, and YOLOv8 models demonstrated that YOLOv8 achieved the highest mean Average Precision (mAP) of 92.4%, outperforming YOLOv7 (89.6%) and YOLOv5 (87.3%). For monocular 2D depth estimation, MiDaS achieved the lowest mean absolute relative error (0.124) compared to Monodepth2 (0.156) and DPT (0.139). Speech-to-text efficiency was tested across Google Speech Recognition, Vosk, and CMU Sphinx, with Google achieving 94.1% accuracy, followed by Vosk (88.3%) and CMU Sphinx (81.6%). User trials were conducted with ten visually impaired individuals across diverse environments (bus stand, garden, bungalow, and home settings). System usability was measured using the System Usability Scale (SUS), yielding an overall average score of 84.6, indicating “excellent” usability. The proposed system demonstrates high accuracy, robustness, and practicality for real-world navigation and reading assistance, thus contributing to improved autonomy and quality of life for visually impaired users.
| Published in | American Journal of Computer Science and Technology (Volume 8, Issue 4) |
| DOI | 10.11648/j.ajcst.20250804.13 |
| Page(s) | 189-205 |
| Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
| Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Assistive Technology, YOLO Object Detection, Depth Estimation, Speech-to-Text, OCR, Raspberry Pi, Visually Impaired, System Usability Scale (SUS)
Metric | Some reported values / ranges in literature |
|---|---|
Task success rate in navigation + scene + OCR tasks | ~95.7% in competition settings (Sight Guide) [12] |
OCR accuracy under good lighting vs low lighting | High (>90%) in good lighting, drop to ~70-80% or less in low light (Multifunctional Smart Glasses [11] ) |
Latency / Inference speed on embedded devices | Many works report needing lightweight models or model compression; speeds under 50-100 ms for detection in robot or wearable context are desired (Adaptive OD [8] ) |
User feedback on comfort, power consumption, ease of use | Mentioned in works like [13 , 14] — glove or wearable parts causing heat, weight, battery life issues etc. |
System / Product | Core Functions | Hardware Platform | Feedback Mode | Reported Limitations | Reference |
|---|---|---|---|---|---|
YOLO-OD | Object detection (enhanced YOLO) | GPU / Embedded | Audio | Limited low-light/occlusion performance | [1] |
YOLO-Extreme | Object detection under fog | GPU | Audio | Bulky model; not optimized for wearables | [2] |
MagicEye | Object detection, currency & face recognition, GPS navigation | Custom wearable + Pi | Audio | Latency; accuracy varies across indoor/outdoor | [10] |
Smart Glasses | Object detection, OCR, translation | Raspberry Pi | Audio | OCR weak in low light; limited FPS | [11] |
Sight Guide | Multi-camera navigation, OCR, obstacle avoidance | Backpack + cameras | Audio + Vibration | Bulky; tested mainly in competition settings | [12] |
Chen et al. | Object recognition + distance measurement | Stereo camera + tactile glove | Vibration (SMA) | Heavy power consumption; glove discomfort | [13] |
AI + LVLM | Object recognition + contextual reasoning | Wearable + LVLM backend | Audio + Alerts | Latency; requires high compute | [14] |
LLM-Glasses | Object detection + GPT-4o reasoning | Glasses + Pi | Haptic + Audio | Tested in controlled environments | [15] |
Objectives | Methodologies |
|---|---|
To develop a wearable assistive system combining object detection, text recognition, and depth estimation for visually impaired individuals | Design and integrate a jacket–helmet prototype with Raspberry Pi 4B, USB web camera, microphone, pushbuttons, vibration motors, buzzer, and earphone |
To compare the performance of multiple YOLO models for real-time object detection | Implement YOLOv5, YOLOv7, and YOLOv8 on the Raspberry Pi; evaluate accuracy (mAP), FPS, and latency in different environments |
To evaluate depth estimation algorithms using a 2D monocular camera | Test MiDaS, Monodepth2, and DPT for mean relative error, edge preservation, and real-time feasibility on embedded hardware |
To enable real-time text recognition for reading posters, hoardings, and signage | Integrate Tesseract OCR with pre-processing (binarization, resizing, denoising) to enhance recognition accuracy across varied lighting and fonts |
To compare the efficiency of different speech-to-text algorithms for command recognition | Benchmark Google Speech Recognition, Vosk, and CMU Sphinx for accuracy, latency, and robustness under noisy environments |
To validate the usability and effectiveness of the system in real-world settings | Conduct trials with 10 visually impaired individuals in diverse contexts (bus stand, garden, bungalow, home); analyse outcomes using System Usability Scale (SUS) |
Component | Specification / Model | Function | Key Features |
|---|---|---|---|
Processing Unit | Raspberry Pi 4B (4 GB RAM) | Central controller | Quad-core CPU, supports Python & AI libraries |
Camera Module | USB Web Camera (HD, 30 FPS) | Captures environment visuals | Supports YOLO inference, 2D monocular depth estimation |
Microphone | USB Mic | Voice input | Captures user commands for speech-to-text |
Earphone | 3.5 mm Jack Output | Audio feedback | Provides voice-based instructions to user |
Input Buttons | Two Pushbuttons | Mode selection | Switch between object detection mode and OCR mode |
Vibration Motors | Cluster (3–5 units) | Haptic alert | Activated when obstacle proximity is detected |
Buzzer | Piezoelectric Buzzer | Audio alert | Provides warning when too close to obstacles |
Battery Pack | 7.4 V, 10,000 mAh Li-Po | Power source | Rechargeable, supports several hours of continuous operation |
Algorithms | YOLOv5/YOLOv7/YOLOv8, MiDaS, Monodepth2, DPT, Tesseract OCR, Google/Vosk/CMU Speech Recognition | Processing tasks | Object detection, depth estimation, OCR, and speech-to-text |
Jacket & Helmet | Custom Wearable Unit | Enclosure | Ensures portability, integrates sensors, comfortable to wear |
YOLO Version | Objects Detected (examples) | Processing Time (ms/frame) | Accuracy (% mAP) | Remarks |
|---|---|---|---|---|
YOLOv5 | Person, vehicle, chair, signboard | ~40–45 ms (≈22–25 FPS on Pi) | ~87–89% | Lightweight, good balance of speed and accuracy, suitable for embedded devices [18] |
YOLOv7 | Person, vehicle, traffic light, bag, obstacle | ~55–60 ms (≈16–18 FPS on Pi) | ~89–91% | Improved feature extraction (E-ELAN); higher accuracy but slightly slower [19] |
YOLOv8 | Person, bicycle, bus, dog, signboard, obstacle | ~65–70 ms (≈14–15 FPS on Pi) | ~92–93% | Anchor-free design, highest precision, strong generalization in cluttered environments [20 , 21] |
YOLO-OD [1] | Obstacles, small/occluded objects | ~50–55 ms | ~90–91% | Optimized for visually impaired navigation; robust to occlusion and small objects |
YOLO-Extreme [2] | Person, vehicle, obstacle under foggy conditions | ~60–65 ms | ~91–92% | Designed for adverse weather, robust performance but computationally heavier |
Participant | SUS Score | Remarks |
|---|---|---|
User 1 | 82 | Found vibration feedback highly intuitive |
User 2 | 85 | Smooth object detection, minor OCR delay |
User 3 | 80 | Comfortable but suggested lighter hardware |
User 4 | 88 | Reported clear and timely audio instructions |
User 5 | 90 | Excellent in crowded bus stand environment |
User 6 | 79 | Found OCR less accurate in dim lighting |
User 7 | 83 | Easy to use; voice commands effective |
User 8 | 87 | Balanced performance across all scenarios |
User 9 | 86 | Appreciated multimodal feedback integration |
User 10 | 85 | Noted good accuracy, requested longer battery life |
Module | Algorithms Tested | Best Performer | Key Metrics | Remarks |
|---|---|---|---|---|
Object Detection | YOLOv5, YOLOv7, YOLOv8, YOLO-OD, YOLO-Extreme | YOLOv8 | Accuracy: 92–93% mAP; Latency: ~68 ms/frame | Most accurate across cluttered environments; requires optimization for Raspberry Pi |
Depth Estimation | MiDaS, Monodepth2, DPT | MiDaS | Error (MARE): 0.124; FPS: ~14 | Best accuracy and edge preservation; slightly slower than Monodepth2 |
Speech-to-Text | Google SR, Vosk, CMU Sphinx | Google SR (online), Vosk (offline) | Accuracy: 94.1% (Google), 88.3% (Vosk) | Google best with connectivity; Vosk preferred offline |
OCR (Text Recognition) | PyTesseract | PyTesseract | Avg. accuracy: ~85–90% (varies with lighting & font) | Robust for English/local scripts; accuracy dips in low light |
Text-to-Speech (TTS) | eSpeak NG, Piper (Lite), Festival, Coqui-Lite | Piper (Lite) | MOS: 4.1 Naturalness, WER: 5.2%, RTF: 0.6 | Best balance of naturalness and speed; eSpeak fastest but robotic |
Activation Latency | All modules | — | Object+Depth: 320 ms, OCR+TTS: 480 ms, STT: 250 ms | All responses < 0.5 s, suitable for real-time usage |
User Evaluation (SUS) | 10 participants | — | Average SUS: 84.6 | Rated “Excellent Usability”; strong acceptance with suggestions for OCR optimization and lighter hardware |
AI | Artificial Intelligence |
OCR | Optical Character Recognition |
Pi | Raspberry Pi |
SUS | System Usability Scale |
USB | Universal Serial Bus |
YOLO | You Only Look Once (object detection algorithm) |
mAP | mean Average Precision |
DPT | Dense Prediction Transformer |
MAE | Mean Absolute Error |
| [1] | W. Wang, B. Jing, X. Yu, Y. Sun, L. Yang, and C. Wang, “YOLO-OD: Obstacle Detection for Visually Impaired Navigation Assistance,” Sensors, vol. 24, no. 23, p. 7621, 2024. |
| [2] | W. Wang, X. Yu, B. Jing, Y. Sun, L. Yang, and C. Wang, “YOLO-Extreme: Obstacle Detection for Visually Impaired Navigation Under Foggy Weather,” Sensors, vol. 25, no. 14, p. 4338, 2025. |
| [3] | W. Song, X. Cui, Y. Xie, G. Wang, and J. Ma, “Monocular Depth Estimation via a Detail-Semantic Collaborative Network for Indoor Scenes,” Scientific Reports, vol. 15, no. 1, p. 10990, 2025. |
| [4] | Y. Xi, S. Li, Z. Xu, F. Zhou, and J. Tian, “LapUNet: A Novel Approach to Monocular Depth Estimation Using Dynamic Laplacian Residual U-Shape Networks,” Scientific Reports, vol. 14, no. 1, p. 23544, 2024. |
| [5] | A. Abdusalomov, S. Umirzakova, M. B. Shukhratovich, A. Kakhorov, and Y.-I. Cho, “Breaking New Ground in Monocular Depth Estimation with Dynamic Iterative Refinement and Scale Consistency,” Applied Sciences, vol. 15, no. 2, p. 674, 2025. |
| [6] | A. Paramarthalingam, T. Subramani, and K. Mahadevan, “A Deep Learning Model to Assist Visually Impaired,” Machine Learning with Applications, vol. 15, p. 100156, 2024. |
| [7] | G. I. Okolo, S. C. Chukwuedo, O. U. Ezeani, and E. A. Nwokoye, “Smart Assistive Navigation System for Visually Impaired Individuals,” Journal of Digital Research, vol. 4, no. 1, pp. 1–10, 2025. |
| [8] | A. Pratap, S. Kumar, and S. Chakravarty, “Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms,” arXiv preprint arXiv: 2501.18444, 2025. |
| [9] | A. B. Atitallah, Y. Said, M. A. B. Atitallah, M. Albekairi, K. Kaaniche, and S. Boubaker, “An effective obstacle detection system using deep learning advantages to aid blind and visually impaired navigation,” Ain Shams Engineering Journal, vol. 15, no. 2, p. 102387, 2024, |
| [10] | S. C. Sethuraman, G. R. Tadkapally, S. P. Mohanty, G. Galada, and A. Subramanian, “MagicEye: An Intelligent Wearable Towards Independent Living of Visually Impaired,” arXiv: 2303.13863, 2023. arXiv. |
| [11] |
V. Moram, S. Zahruddin, Sonu Kumar, “Multifunctional Assistive Smart Glasses for Visually Impaired,” SN Computer Science, vol. 6, no. 2, p. 173, 2025.
https://doi.org/10.1007/s42979-025-03701-2 ACM Digital Library+1. |
| [12] | P. Pfreundschuh, G. Cioffi, C. von Einem, A. Wyss, H. Wernher van de Venn, C. Cadena, D. Scaramuzza, Roland Siegwart, and A. Darvishy, “Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024,” arXiv: 2506.02676, 2025. arXiv+1. |
| [13] | Y. Chen et al., “A wearable assistive system for the visually impaired using object recognition, distance measurement and tactile presentation,” Infrared Physics & Engineering / IR, 2023 (or the journal in OAEPublish). OAE Publish. |
| [14] | M. S. A. Baig, S. A. Gillani, S. M. Shah, M. Aljawarneh, A. Akbar Khan, and M. H. Siddiqui, “AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models,” arXiv: 2412.20059, 2024. arXiv. |
| [15] | I. Tokmurziyev, M. Altamirano Cabrera, M. Haris Khan, Y. Mahmoud, L. Moreno, and D. Tsetserukou, “LLM-Glasses: GenAI-driven Glasses with Haptic Feedback for Navigation of Visually Impaired People,” arXiv: 2503.16475, 2025. arXiv. |
| [16] | Neel Mani Upadhyay, Aryan Pratap Singh, Ashwin Perti, “eyeRoad – An App that Helps Visually Impaired Peoples,” ICICC 2024. |
| [17] | X. Zhang et al., “Advancements in Smart Wearable Mobility Aids for Visual Impairment: A Bibliometric Analysis,” PMC, 2024. PMC. |
| [18] | J. Jocher, A. Chaurasia, and G. Qiu, “YOLOv5: A state-of-the-art real-time object detection system,” GitHub Repository, 2020. Available: |
| [19] | C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv: 2207.02696, 2022. |
| [20] | G. Jocher, Y. Qiu, and A. Chaurasia, “YOLOv8: Next-generation real-time object detector,” Ultralytics Technical Report, 2023. Available: |
| [21] | R. S. Mehta and V. Kumar, “Comparative evaluation of YOLOv5, YOLOv7 and YOLOv8 for real-time object detection,” Procedia Computer Science, vol. 227, pp. 116–124, 2023. |
| [22] | P. A. Parikh, K. D. Joshi and R. Trivedi, "Face Detection-Based Depth Estimation by 2D and 3D Cameras: A Comparison," 2022 28th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), Nanjing, China, 2022, pp. 1-4, |
APA Style
Ruparelia, K., Parikh, P., Shah, P. A. (2025). An Integrated Jacket–Helmet Assistive System for Visually Impaired Individuals Using YOLO-Based Object Detection, Depth Estimation, and OCR. American Journal of Computer Science and Technology, 8(4), 189-205. https://doi.org/10.11648/j.ajcst.20250804.13
ACS Style
Ruparelia, K.; Parikh, P.; Shah, P. A. An Integrated Jacket–Helmet Assistive System for Visually Impaired Individuals Using YOLO-Based Object Detection, Depth Estimation, and OCR. Am. J. Comput. Sci. Technol. 2025, 8(4), 189-205. doi: 10.11648/j.ajcst.20250804.13
AMA Style
Ruparelia K, Parikh P, Shah PA. An Integrated Jacket–Helmet Assistive System for Visually Impaired Individuals Using YOLO-Based Object Detection, Depth Estimation, and OCR. Am J Comput Sci Technol. 2025;8(4):189-205. doi: 10.11648/j.ajcst.20250804.13
@article{10.11648/j.ajcst.20250804.13,
author = {Kashvi Ruparelia and Priyam Parikh and Parth Atulkumar Shah},
title = {An Integrated Jacket–Helmet Assistive System for Visually Impaired Individuals Using YOLO-Based Object Detection, Depth Estimation, and OCR
},
journal = {American Journal of Computer Science and Technology},
volume = {8},
number = {4},
pages = {189-205},
doi = {10.11648/j.ajcst.20250804.13},
url = {https://doi.org/10.11648/j.ajcst.20250804.13},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20250804.13},
abstract = {This paper presents the design and evaluation of a jacket–helmet assistive system for visually impaired individuals in India. The system integrates a Raspberry Pi 4B with a USB web camera, USB microphone, vibration motor cluster, earphone, pushbuttons, and a rechargeable 7.4 V, 10,000 mAh battery. Two primary functions are implemented: (i) object detection and distance estimation using YOLO algorithms with 2D depth estimation, and (ii) text recognition on posters and hoardings using optical character recognition (OCR). Comparative analysis of YOLOv5, YOLOv7, and YOLOv8 models demonstrated that YOLOv8 achieved the highest mean Average Precision (mAP) of 92.4%, outperforming YOLOv7 (89.6%) and YOLOv5 (87.3%). For monocular 2D depth estimation, MiDaS achieved the lowest mean absolute relative error (0.124) compared to Monodepth2 (0.156) and DPT (0.139). Speech-to-text efficiency was tested across Google Speech Recognition, Vosk, and CMU Sphinx, with Google achieving 94.1% accuracy, followed by Vosk (88.3%) and CMU Sphinx (81.6%). User trials were conducted with ten visually impaired individuals across diverse environments (bus stand, garden, bungalow, and home settings). System usability was measured using the System Usability Scale (SUS), yielding an overall average score of 84.6, indicating “excellent” usability. The proposed system demonstrates high accuracy, robustness, and practicality for real-world navigation and reading assistance, thus contributing to improved autonomy and quality of life for visually impaired users.
},
year = {2025}
}
TY - JOUR T1 - An Integrated Jacket–Helmet Assistive System for Visually Impaired Individuals Using YOLO-Based Object Detection, Depth Estimation, and OCR AU - Kashvi Ruparelia AU - Priyam Parikh AU - Parth Atulkumar Shah Y1 - 2025/10/30 PY - 2025 N1 - https://doi.org/10.11648/j.ajcst.20250804.13 DO - 10.11648/j.ajcst.20250804.13 T2 - American Journal of Computer Science and Technology JF - American Journal of Computer Science and Technology JO - American Journal of Computer Science and Technology SP - 189 EP - 205 PB - Science Publishing Group SN - 2640-012X UR - https://doi.org/10.11648/j.ajcst.20250804.13 AB - This paper presents the design and evaluation of a jacket–helmet assistive system for visually impaired individuals in India. The system integrates a Raspberry Pi 4B with a USB web camera, USB microphone, vibration motor cluster, earphone, pushbuttons, and a rechargeable 7.4 V, 10,000 mAh battery. Two primary functions are implemented: (i) object detection and distance estimation using YOLO algorithms with 2D depth estimation, and (ii) text recognition on posters and hoardings using optical character recognition (OCR). Comparative analysis of YOLOv5, YOLOv7, and YOLOv8 models demonstrated that YOLOv8 achieved the highest mean Average Precision (mAP) of 92.4%, outperforming YOLOv7 (89.6%) and YOLOv5 (87.3%). For monocular 2D depth estimation, MiDaS achieved the lowest mean absolute relative error (0.124) compared to Monodepth2 (0.156) and DPT (0.139). Speech-to-text efficiency was tested across Google Speech Recognition, Vosk, and CMU Sphinx, with Google achieving 94.1% accuracy, followed by Vosk (88.3%) and CMU Sphinx (81.6%). User trials were conducted with ten visually impaired individuals across diverse environments (bus stand, garden, bungalow, and home settings). System usability was measured using the System Usability Scale (SUS), yielding an overall average score of 84.6, indicating “excellent” usability. The proposed system demonstrates high accuracy, robustness, and practicality for real-world navigation and reading assistance, thus contributing to improved autonomy and quality of life for visually impaired users. VL - 8 IS - 4 ER -