CHAPTER 2: TECHNICAL ARCHITECTURE AND INFRASTRUCTURE

1. SYSTEM OVERVIEW

The Fat Cat technical system is designed around a core principle: real-time virtual character performance composited into physical environments for live broadcast. This requires low-latency signal flow from motion capture through rendering to stream output, with robust failover capabilities to support multi-hour daily broadcasts.

1.1 Core Pipeline Architecture

The production pipeline consists of five primary stages: (1) Motion Capture Input; (2) Character Animation Processing; (3) Real-Time Rendering; (4) Compositing; and (5) Stream Output. Each stage operates on independent hardware where possible to maximize reliability and minimize single points of failure.

CAPTURE → PROCESSING → RENDERING → COMPOSITING → OUTPUT

Target latency from physical movement to screen output is sub-200ms, with acceptable degradation to 500ms under high load. This latency is imperceptible to audiences and allows for genuine real-time interaction between the performer and chat/guests.

2. TECHNICAL DEVELOPMENT ROADMAP

The Fat Cat motion capture infrastructure is being deployed in three distinct phases, each representing a significant capability upgrade.

2.1 Stage 1: The Zurk Configuration (Current)

The Zurk Desk Setup
Fig. 1 — The Zurk Desk Configuration

The initial deployment utilizes a recursive VRChat-based setup where a stylized avatar is displayed on a physical TV screen—designated 'The Zurk'—which is then filmed with an iPhone camera in a real physical environment. This lo-fi approach creates a unique aesthetic that blends virtual and physical space while requiring minimal technical infrastructure.

Stage 1 Technical Specifications:

2.1.1 The Zurk: IRL Street Deployment Mode

A key operational variant of Stage 1 involves deploying the Zurk as a mobile IRL unit for street-level content capture. The TV monitor displaying Fat Cat is physically transported into public spaces, enabling direct interaction between the character and unsuspecting pedestrians.

This deployment mode leverages the character's narrative premise: Fat Cat is trapped inside the Zurk portal, visible but unable to escape into the physical world. When members of the public approach the screen, they can converse with Fat Cat in real-time, creating spontaneous interactions that blur the line between digital character and street performer.

IRL Zurk Operational Parameters:

IRL Stunt Categories:

The IRL Zurk configuration is designed for maximum clip generation. Each outing targets 5-10 standalone clips suitable for short-form platform distribution, with at least one potential 'spectacle-grade' moment per deployment.

2.2 Stage 2: Sony Mocopi IMU Configuration (Q1 2026)

Stage 2 introduces the Sony Mocopi 6-point IMU motion capture system, enabling full body tracking with real-time character animation in Unreal Engine. This configuration adds iPhone ARKit facial capture via Live Link Face for 52-blendshape facial animation synchronized with body movement.

Stage 2 Technical Specifications:

2.3 Stage 3: OptiTrack PrimeX 13 Optical Motion Capture (Q2 2026)

Stage 3 represents the target production configuration: a professional optical motion capture system using OptiTrack PrimeX 13 cameras. This system enables cinema-quality character animation with sub-millimeter precision across the entire capture volume.

OptiTrack PrimeX 13 Camera Specifications:

Stage 3 System Configuration:

3. STAGE 4: EVENT-BASED CAMERA ENHANCEMENT SYSTEM (Speculative)

Stage 4 represents a speculative future enhancement utilizing Dynamic Vision Sensor (DVS) technology—also known as neuromorphic or event-based cameras—to achieve superior eyelid and finger detection fidelity beyond conventional camera systems.

3.1 Event Camera Technology Overview

Unlike traditional cameras that capture complete frames at fixed intervals (typically 24-120 fps), event cameras asynchronously record only pixel-level brightness changes that exceed a threshold. This bio-inspired approach, modeled on biological retinas, produces a sparse spatiotemporal stream of events with microsecond temporal resolution.

Key Advantages for Motion Capture:

3.2 Eyelid Detection Enhancement

Eyelid movement is critical for character believability but challenging to capture with conventional systems due to the speed of blinks (100-400ms) and subtle lid position changes. Event cameras excel at this task, achieving 97%+ P10 accuracy in event-based eye tracking.

Implementation Approach:

3.3 Finger Detection Enhancement

Fine finger articulation presents similar challenges: rapid movements, self-occlusion, and the need for high precision across multiple joints.

Implementation Approach:

Reference Systems: EventEgo3D (CVPR 2024) demonstrated 3D human motion capture from egocentric event streams. MoveEnet achieved high-frequency human pose estimation using event cameras. The AIS 2024 Challenge on Event-Based Eye Tracking validated DVS approaches for precision gaze and lid tracking.

4. REAL-TIME MOTION CAPTURE PIPELINE

4.1 Facial Motion Capture (Primary)

Primary facial capture utilizes iPhone ARKit via the Live Link Face application. This solution provides 52 blend shapes derived from Apple's TrueDepth camera system, enabling high-fidelity lip sync, eye tracking, and facial expression capture at minimal cost.

Hardware Requirements:

Software Stack:

4.2 Body Motion Capture

Body capture utilizes a tiered approach based on production phase:

5. CHARACTER ANIMATION AND RIGGING SPECIFICATIONS

5.1 Model Specifications

5.2 Animation Pipeline

Live performance data flows through the following pipeline:

  1. Live Link Face app captures facial performance data (52 blend shapes)
  2. Data transmitted via WiFi to Unreal Engine Live Link plugin
  3. Live Link plugin maps incoming data to character blend shapes
  4. Body mocap data (if available) retargeted to character skeleton
  5. Animation Blueprint blends face + body + procedural overlays
  6. Final pose rendered in real-time with dynamic lighting

6. AI INTEGRATION LAYER

The AI integration layer serves an enhancement role rather than core functionality. The primary "AI" of Fat Cat is the human performer—the character presents as an AI entity, but the execution is human performance. AI systems augment rather than replace human judgment.

6.1 Current AI Applications

6.2 Planned AI Enhancements (Phase 2+)