YUNGE

Yunge Wen

M.S. Computer Science @New York University
Research Intern at Multisensory Intelligence @MIT Media Lab

Computer Graphics | Human-AI Interaction | Perceptual Engineering

yungew@mit.edu

LinkedIn | Github | Google Scholar

Animation

About

RESEARCH IN PROGRESS

GRAPHICS & CREATIVITY

🎨PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

Yunge Wen, Yuancheng Shen, Paul Pu Liang
Under Review | paper | code | website

We present PaintCopilot, a co-creative neural painting assistant built on differentiable brush representations (SVG Brush Tip and 2D Gaussian), integrating three lightweight AI models: a ViT-based Target Predictor, an autoregressive Next Stroke Predictor via flow matching, and a VAE-based Region Sampler, enabling artists and AI to continuously alternate control throughout the creative process.

Text-Driven Artistic Staging: Pose, Lighting, and Camera References from Paintings

Yunge Wen
Under Review | paper

We introduce text-to-editable 3D staging, jointly generating human poses, dominant lighting, and camera placement from affective text. Using 11,911 text–staging pairs reconstructed from 2,328 figurative paintings, we train a flow-matching transformer that handles multiple figures and produces diverse staging alternatives.

LayerParse: Parsing Graphic Design Images into Editable Layer with VLM

Yunge Wen, Bob Tianqi Wei
Manuscript In Preparation | proposal

We present LayerParse, a vision-language model that parses raster graphic design images into structured editable layer representations by jointly predicting a typed element tree, per-element segmentation masks, and continuous 36-dimensional attribute vectors through a novel AttrHead that avoids token quantization error, trained entirely on a procedurally generated synthetic dataset with exact ground truth at zero labeling cost.

Narrative Arc-Conditioned Gameplay Planning

Yunge Wen*, Chenliang Huang*, Hangyu Zhou, Zhuo Zeng, Chun Ming Louis Po, Julian Togelius, Timothy Merino, Sam Earle
CHI Poster 2026 | ‍paper | code | video

We propose Forking Garden, a framework for planning branching games via narrative archetypes (e.g., Hero's Journey). Independent nodes are assembled into a dungeon graph through arc-guided constraints, achieving multimodal alignment of gameplay elements. We demonstrate an end-to-end interactive system instantiating this framework.

AI OLFACTION

SMELLNET: A Large-scale Dataset for Real-world Smell Recognition

Dewei Feng, Carol Li, Wei Dai, Alistair Pernigo, Yunge Wen, Paul Pu Liang
ICLR 2026 | paper | code | video

We present SmellNet, a large sensor-based olfaction dataset with 828K time-series points across 50 substances (nuts, spices, herbs, fruits, vegetables) and 43 mixtures, totaling 68 hours of data. Using it, we developed ScentFormer, a Transformer model with temporal differencing and sliding-window augmentation, targeting applications in allergen detection, process monitoring, and emotion/disease sensing.

AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal LLM

Yunge Wen*, Awu Chen*, Jianing Yu*, Jas Brooks, Hiroshi Ishii, Paul Pu Liang
CVPR Workshop 2026 (Outstanding Paper Award) | paper | code | video

We present AromaGen, an AI-powered wearable interface capable of real-time, general-purpose aroma generation from free-form text or visual inputs. AromaGen is powered by a multimodal LLM that leverages latent olfactory knowledge to map semantic inputs to structured mixtures of 12 carefully selected base odorants, released through a neck-worn dispenser. Users can iteratively refine generated aromas through natural language feedback via in-context learning.

Smell with Genji: Rediscovering Human Perception through an Olfactory Game with AI

Awu Chen, Vera Yu Wu, Yunge Wen, Yaluo Wang, Jiaxuan Olivia Yin, Yichen Wang, Qian Xiang, Richard Zhang, Paul Pu Liang, and Hiroshi Ishii
CHI Poster & Interactive Demo 2026 | poster | interactive demo | video

Genji-kō (源氏香) is a traditional Japanese incense game that structures olfactory experience through comparison, memory, and collective sensing. We reinterpret this ritual as Smell with Genji, a human–AI collaborative system integrating an olfactory sensor and smell AI described in SmellNet, a mobile application, and an AI co-smelling large language model.

MULTIMODAL INTERACTION

DuoZone: A User-Centric, LLM-Guided Mixed-Initiative XR Window Management System

Jing Qian*, George X. Wang*, Xiangyu Li, Yunge Wen, Guande Wu, Sonia Castelo Quispe, Fumeng Yang, Claudio Silva.
TVCG 2026; ISMAR 2026 | paper

We propose DuoZone, a mixed-initiative human–AI window management system in Extended Reality (XR) that enhances functionality for placing, arranging, and resizing virtual windows, preserving user agency while automating window management via user-created spatial zones and LLM-generated hints for multi-scenario efficiency, demonstrated on Apple Vision Pro.

Whispering Water: Materializing Human-AI Dialogue as Interactive Ripples

Ruipeng Wang*, Tawab Safi*, Yunge Wen*, Christina Cunningham, Hoi Ling Tang, and Behnaz Farahi
Under Review | paper | video

Whispering Water is an interactive installation that materializes human–AI dialogue through cymatic patterns on water. We propose a novel algorithm that decomposes speech into component waves and reconstructs them in water, establishing a translation between linguistic expression and the physics of material form.

Semantic Zooming Interface for Fruitfly Brain Connectome

Jizheng Dong, Yuancheng Shen, Yunge Wen, Yu Cheng, Emma Obermiller, Erdem Varol, Robert Krueger
Manuscript In Preparation | video

We present an interactive visualization tool for the FlyWire dataset that integrates morphological clustering and single-neuron downsampling with multi-resolution rendering and semantic zooming, enabling intuitive, Google Maps-style exploration and significantly reducing visual clutter for large-scale neuronal networks.

AVA-Align: Generating Rubric-Aligned Feedback from Long Classroom Videos

Ao Qu*,Yuxi Wen*, Jiayi Zhang*, Yunge Wen, Yibo Zhao, Alok Prakash, Andrés F. Salazar-Gómez, Paul Pu Liang, Jinhua Zhao
Under Review | paper | video

We propose AVA-Align (Adaptive Video Agent with Alignment), a video–language model framework that addresses long-context understanding, temporal precision, and instruction following for classroom settings, generating rubric-aligned feedback from long classroom recordings to enhance teacher performance.

"See What I Imagine, Imagine What I See": Human-AI Co-Creation System for 360° Panoramic Video Generation in VR

Yunge Wen | paper

We introduce Imagine360, a proof-of-concept VR system integrating co-creation principles with AI agents for panoramic video generation, enabling users to generate, evaluate, and refine immersive environments in real time through speech-based prompts and egocentric focal point recentering.

PROJECTS

Bayesian Motion Trajectory Prediction

Finetuned YOLOv8 with the VisDrone dataset to enhance small object tracking, using Kalman filters to track single and multiple objects and predict their motion trajectories.

[Github]

Neural Style Transfer

Reproduced the 2015 seminal paper on image style transfer, with step-by-step visualization of content and style convolution results.

[Github]

Video2Video Search

Trained a convolutional autoencoder on Coco dataset. Extracted feature maps from video screenshots, stored in a vector database, and compared with query images through vector similarity.

[Github]

Enhancing LLM Accuracy with RAG

Created Huggingface WebApp to demonstrate retrieval-augmented generation for non-technical corporate users.

[Huggingface Page]

COMPUTATIONAL DESIGN

About