
Please use this identifier to cite or link to this item:
https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Marquez-Carpintero, Luis | - |
| dc.contributor.author | Viejo, Diego | - |
| dc.contributor.author | Cazorla, Miguel | - |
| dc.date.accessioned | 2025-11-11T06:52:06Z | - |
| dc.date.available | 2025-11-11T06:52:06Z | - |
| dc.date.issued | 2025-07-10 | - |
| dc.identifier.other | DOI : 10.1109/ACCESS.2025.3584025 | - |
| dc.identifier.uri | https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272 | - |
| dc.description.abstract | Generative Artificial Intelligence (AI) and Large Language Models (LLMs), including Visual Language Models (VLMs) and Multimodal LLMs (MLLMs), have shown transformative potential in education. These technologies address persistent challenges in fostering classroom engagement and interaction. Our study highlights the efficacy of these models in detecting students’ attention levels and emotional states, equipping educators with actionable insights to optimize instructional delivery. However, widespread adoption is hindered by significant barriers such as high computational demands and the limited availability of high-quality datasets. To overcome these challenges, this research proposes the integration of MLLMs with Few-Shot Learning techniques, offering a resource-efficient framework to enable their practical implementation in educational contexts. This study focuses on the application of VLMs and MLLMs to predict student attention in science, technology, engineering and mathematics (STEM) education, evaluating the effectiveness of Few-Shot Training compared to traditional AI methodologies. The research is structured into two phases: the first phase optimizes image frequency and computational costs using MLLMs, while the second phase trains VLMs on classroom data to identify visual cues, including gaze direction and head movement. The results demonstrate that VLMs combined with Few-Shot Learning significantly outperform traditional models in capturing nuanced visual data, allowing for pedagogical adjustments comparable to those made through human labeling. These findings underline the transformative potential of VLMs and MLLMs in education, particularly in resource-constrained environments. Few-Shot Learning emerges as a practical and effective approach for leveraging small datasets to enhance student engagement and instructional quality. | ms_IN |
| dc.language.iso | en | ms_IN |
| dc.publisher | IEEE Access | ms_IN |
| dc.relation.ispartofseries | ;Volume 13 | - |
| dc.subject | Attention prediction | ms_IN |
| dc.subject | Engineering education | ms_IN |
| dc.subject | Few-shot learning | ms_IN |
| dc.subject | Large Language Models (LLMs) | ms_IN |
| dc.subject | Student engagement | ms_IN |
| dc.subject | STEM | ms_IN |
| dc.subject | Visual Language Models (VLMs) | ms_IN |
| dc.title | ENHANCING ENGINEERING AND STEM EDUCATION WITH VISION AND MULTIMODAL LARGE LANGUAGE MODELS TO PREDICT STUDENT ATTENTION | ms_IN |
| dc.type | Article | ms_IN |
| Appears in Collections: | JABATAN MATEMATIK, SAINS DAN KOMPUTER | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Enhancing Engineering and STEM Education With Vision and Multimodal Large Language.pdf | 2.88 MB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
