SpARC LEARNING
SpARC makes invisible forces tangible and collaborative with projection mapping.
The learning gap:
The Problem
Spatial Augmented Reality for Collaborative learning (SpARC) is a NSF-funded research project designing software to help architecture students understand structural engineering. These students are visual learners who excel at spatial reasoning but struggle to understand abstract mathematical calculations.
Team
Nurit Kirshenbaum, Lee Friedman, Yasushi Ishida, and Eric Peterson
Students visualize buildings three-dimensionally but can't translate spatial intuitions into mathematical equations
Structural concepts require understanding forces, loads, and equilibrium that are inherently invisible
Studio courses emphasize collaborative learning around physical models; structures courses isolate students with individual problem sets
This pedagogical discontinuity contributes to persistent difficulties in required engineering courses
My Role
CS Graduate Research Assistant - Responsible for research, prototyping, model building, voice recognition pipeline, AI integration, VAD implementation, & UX Design.
Research: Understanding the Disconnect
The problem wasn't that architecture students lacked ability. Research showed they have significantly higher spatial reasoning abilities than students in other disciplines. The issue was pedagogical mismatch.
What we discovered:
Students can intuitively understand how buildings stand but struggle when asked to calculate the forces mathematically
Traditional teaching presents equations on paper separate from physical structures, requiring mental translation between disconnected representational systems
Architecture studio culture emphasizes collaborative learning, but structures courses require isolated problem-solving
Students experience "AI stigma" in classrooms—anxiety and guilt about appearing lazy when using computational tools publicly
Physical Models + Projection + Voice
SPARC uses physical models with fiduciary markers and computer vision to capture student interaction. Projection mapping displays real-time structural analysis around and directly onto the models, making invisible forces visible in a shared space.
Tension & Compression Module (Currently Built)
Students work together pulling handles connected to a central middle point with strings. As they pull on the handles the projection shows:
Direction and magnitude of forces (red = compression, blue = tension)
Real-time force diagrams
How their physical actions translate to structural behavior
The physical design creates necessary interdependence because one person physically cannot pull all three handles. This bypasses social hierarchies that typically determine who participates in group work.
Designing a Voice Interface
Wake Word Detection Challenges
After initial testing using only Porcupine, the wake word was being detected 73% of the time. It struggled to detect “Hey Sparc” without a slight pause after beginning the request or if it was buried mid-sentence.
Having to repeat Sparc requests 27% of the time is a large usability issue. My solution uses 3 thread processing with a fallback that parses the speech detected transcripts to look for missed “Hey Sparc” at timed intervals.
Figure 1: Primary wake word architecture utilizing Porcupine
Figure 2: Fallback method using transcript detection
I'm designing the voice user interface that integrates natural language interaction with the physical system. The interface needs to:
Recognize speech commands to switch between modules, show calculations, change projections
Handle natural questions about structural concepts through Gemini NLP integration
Provide sound design that supports learning without distraction
The critical design decision: Originally planned for students to directly query the AI ("Hey SPARC, why did the point move?"). Research on AI stigma in classrooms revealed students experience anxiety using AI publicly—worried about appearing lazy or incompetent.
The solution: Teacher-mediated AI. Professors control voice commands. Students interact with physical models; AI becomes teaching infrastructure rather than competing expertise. This maintains familiar classroom roles where teachers mediate learning, not AI.
Technical Implementation
Wake Word Detection: Porcupine Neural Network trained on “Hey Spark”
Voice Recognition: OpenAI Whisper for speech detection
Natural Language Processing: Gemini handles contextual queries that aren't hardcoded, using prompt engineering to generate contextually relevant responses based on:
Current model state
Learning module objectives
Expected interaction patterns
Scene Identification: Computer vision tracks physical model manipulation and user interactions through depth cameras
Projection Mapping: Real-time overlay of structural analysis visualizations mapped to 3D model geometry
Query Management: Interprets voice commands, analyzes structural models, generates appropriate visualizations
Impact & Next Steps
Spring & Summer 2026 Testing:
First classroom implementation with architecture students
Designing evaluation instruments to measure learning outcomes, engagement, knowledge retention
Testing voice identification accuracy and GUI/VUI integration
Research Questions:
How does tangible interaction in spatial AR environments impact learning of fundamental engineering concepts?
Does collaborative physical manipulation reduce cognitive load compared to individual problem-solving?
Can voice interfaces support self-guided learning without undermining teacher authority?
Currently building:
Sound design elements for voice feedback
LLM training and prompt engineering
Integration testing between physical sensors, projection mapping, and voice recognition
Module expansion to other structure topics (Trusses, Arches, Tributary Areas, ect.)
What I Learned
Social construction matters more than technical capability
The most sophisticated AI becomes useless if students feel too ashamed to use it. Design decisions must account for how communities socially construct technology meanings, not just what technology can technically accomplish.
Material form shapes collaboration
Individual screens isolate learners. Shared screens create control bottlenecks. Projection creates commons where information belongs to communities. Technology choices are inherently social choices about who accesses information and whether learning occurs individually or collectively.
Physical constraints work better than social encouragement
You can't bypass status hierarchies by asking everyone to participate nicely. Physical impossibility forces equitable engagement where social encouragement fails.
Private Repository — code available to reviewers upon request.

