ENHANCING ENGINEERING AND STEM EDUCATION WITH VISION AND MULTIMODAL LARGE LANGUAGE MODELS TO PREDICT STUDENT ATTENTION

Marquez-Carpintero, Luis; Viejo, Diego; Cazorla, Miguel

Please use this identifier to cite or link to this item: https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272

Full metadata record

DC Field	Value	Language
dc.contributor.author	Marquez-Carpintero, Luis	-
dc.contributor.author	Viejo, Diego	-
dc.contributor.author	Cazorla, Miguel	-
dc.date.accessioned	2025-11-11T06:52:06Z	-
dc.date.available	2025-11-11T06:52:06Z	-
dc.date.issued	2025-07-10	-
dc.identifier.other	DOI : 10.1109/ACCESS.2025.3584025	-
dc.identifier.uri	https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272	-
dc.description.abstract	Generative Artificial Intelligence (AI) and Large Language Models (LLMs), including Visual Language Models (VLMs) and Multimodal LLMs (MLLMs), have shown transformative potential in education. These technologies address persistent challenges in fostering classroom engagement and interaction. Our study highlights the efficacy of these models in detecting students’ attention levels and emotional states, equipping educators with actionable insights to optimize instructional delivery. However, widespread adoption is hindered by significant barriers such as high computational demands and the limited availability of high-quality datasets. To overcome these challenges, this research proposes the integration of MLLMs with Few-Shot Learning techniques, offering a resource-efficient framework to enable their practical implementation in educational contexts. This study focuses on the application of VLMs and MLLMs to predict student attention in science, technology, engineering and mathematics (STEM) education, evaluating the effectiveness of Few-Shot Training compared to traditional AI methodologies. The research is structured into two phases: the first phase optimizes image frequency and computational costs using MLLMs, while the second phase trains VLMs on classroom data to identify visual cues, including gaze direction and head movement. The results demonstrate that VLMs combined with Few-Shot Learning significantly outperform traditional models in capturing nuanced visual data, allowing for pedagogical adjustments comparable to those made through human labeling. These findings underline the transformative potential of VLMs and MLLMs in education, particularly in resource-constrained environments. Few-Shot Learning emerges as a practical and effective approach for leveraging small datasets to enhance student engagement and instructional quality.	ms_IN
dc.language.iso	en	ms_IN
dc.publisher	IEEE Access	ms_IN
dc.relation.ispartofseries	;Volume 13	-
dc.subject	Attention prediction	ms_IN
dc.subject	Engineering education	ms_IN
dc.subject	Few-shot learning	ms_IN
dc.subject	Large Language Models (LLMs)	ms_IN
dc.subject	Student engagement	ms_IN
dc.subject	STEM	ms_IN
dc.subject	Visual Language Models (VLMs)	ms_IN
dc.title	ENHANCING ENGINEERING AND STEM EDUCATION WITH VISION AND MULTIMODAL LARGE LANGUAGE MODELS TO PREDICT STUDENT ATTENTION	ms_IN
dc.type	Article	ms_IN
Appears in Collections:	JABATAN MATEMATIK, SAINS DAN KOMPUTER

Files in This Item:

File	Description	Size	Format
Enhancing Engineering and STEM Education With Vision and Multimodal Large Language.pdf		2.88 MB	Adobe PDF	View/Open

Show simple item record