ENHANCING ENGINEERING AND STEM EDUCATION WITH VISION AND MULTIMODAL LARGE LANGUAGE MODELS TO PREDICT STUDENT ATTENTION

Marquez-Carpintero, Luis; Viejo, Diego; Cazorla, Miguel

Sila gunakan pengecam ini untuk memetik atau memaut ke item ini: https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272

Tajuk:	ENHANCING ENGINEERING AND STEM EDUCATION WITH VISION AND MULTIMODAL LARGE LANGUAGE MODELS TO PREDICT STUDENT ATTENTION
Pengarang:	Marquez-Carpintero, Luis Viejo, Diego Cazorla, Miguel
Kata kunci:	Attention prediction Engineering education Few-shot learning Large Language Models (LLMs) Student engagement STEM Visual Language Models (VLMs)
Tarikh diterbit:	10-Jul-2025
Penerbit:	IEEE Access
Siri / Laporan No.:	;Volume 13
Abstrak:	Generative Artificial Intelligence (AI) and Large Language Models (LLMs), including Visual Language Models (VLMs) and Multimodal LLMs (MLLMs), have shown transformative potential in education. These technologies address persistent challenges in fostering classroom engagement and interaction. Our study highlights the efficacy of these models in detecting students’ attention levels and emotional states, equipping educators with actionable insights to optimize instructional delivery. However, widespread adoption is hindered by significant barriers such as high computational demands and the limited availability of high-quality datasets. To overcome these challenges, this research proposes the integration of MLLMs with Few-Shot Learning techniques, offering a resource-efficient framework to enable their practical implementation in educational contexts. This study focuses on the application of VLMs and MLLMs to predict student attention in science, technology, engineering and mathematics (STEM) education, evaluating the effectiveness of Few-Shot Training compared to traditional AI methodologies. The research is structured into two phases: the first phase optimizes image frequency and computational costs using MLLMs, while the second phase trains VLMs on classroom data to identify visual cues, including gaze direction and head movement. The results demonstrate that VLMs combined with Few-Shot Learning significantly outperform traditional models in capturing nuanced visual data, allowing for pedagogical adjustments comparable to those made through human labeling. These findings underline the transformative potential of VLMs and MLLMs in education, particularly in resource-constrained environments. Few-Shot Learning emerges as a practical and effective approach for leveraging small datasets to enhance student engagement and instructional quality.
URI:	https://repositori.mypolycc.edu.my/jspui/handle/123456789/7272
Muncul dalam Koleksi:	JABATAN MATEMATIK, SAINS DAN KOMPUTER

Fail	Penerangan	Saiz	Format
Enhancing Engineering and STEM Education With Vision and Multimodal Large Language.pdf		2.88 MB	Adobe PDF	Lihat/buka

Tunjukkan rekod item penuh

Item di DSpace dilindungi oleh hak cipta, dengan semua hak dilindungi, kecuali dinyatakan sebaliknya.