Please use this identifier to cite or link to this item: https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMarquez-Carpintero, Luis-
dc.contributor.authorViejo, Diego-
dc.contributor.authorCazorla, Miguel-
dc.date.accessioned2025-11-11T06:52:06Z-
dc.date.available2025-11-11T06:52:06Z-
dc.date.issued2025-07-10-
dc.identifier.otherDOI : 10.1109/ACCESS.2025.3584025-
dc.identifier.urihttps://repositori.mypolycc.edu.my/jspui/handle/123456789/7272-
dc.description.abstractGenerative Artificial Intelligence (AI) and Large Language Models (LLMs), including Visual Language Models (VLMs) and Multimodal LLMs (MLLMs), have shown transformative potential in education. These technologies address persistent challenges in fostering classroom engagement and interaction. Our study highlights the efficacy of these models in detecting students’ attention levels and emotional states, equipping educators with actionable insights to optimize instructional delivery. However, widespread adoption is hindered by significant barriers such as high computational demands and the limited availability of high-quality datasets. To overcome these challenges, this research proposes the integration of MLLMs with Few-Shot Learning techniques, offering a resource-efficient framework to enable their practical implementation in educational contexts. This study focuses on the application of VLMs and MLLMs to predict student attention in science, technology, engineering and mathematics (STEM) education, evaluating the effectiveness of Few-Shot Training compared to traditional AI methodologies. The research is structured into two phases: the first phase optimizes image frequency and computational costs using MLLMs, while the second phase trains VLMs on classroom data to identify visual cues, including gaze direction and head movement. The results demonstrate that VLMs combined with Few-Shot Learning significantly outperform traditional models in capturing nuanced visual data, allowing for pedagogical adjustments comparable to those made through human labeling. These findings underline the transformative potential of VLMs and MLLMs in education, particularly in resource-constrained environments. Few-Shot Learning emerges as a practical and effective approach for leveraging small datasets to enhance student engagement and instructional quality.ms_IN
dc.language.isoenms_IN
dc.publisherIEEE Accessms_IN
dc.relation.ispartofseries;Volume 13-
dc.subjectAttention predictionms_IN
dc.subjectEngineering educationms_IN
dc.subjectFew-shot learningms_IN
dc.subjectLarge Language Models (LLMs)ms_IN
dc.subjectStudent engagementms_IN
dc.subjectSTEMms_IN
dc.subjectVisual Language Models (VLMs)ms_IN
dc.titleENHANCING ENGINEERING AND STEM EDUCATION WITH VISION AND MULTIMODAL LARGE LANGUAGE MODELS TO PREDICT STUDENT ATTENTIONms_IN
dc.typeArticlems_IN
Appears in Collections:JABATAN MATEMATIK, SAINS DAN KOMPUTER

Files in This Item:
File Description SizeFormat 
Enhancing Engineering and STEM Education With Vision and Multimodal Large Language.pdf2.88 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.