Overview - v i m a.

Vima turns egocentric construction footage into auditable spatial evidence. Instead of asking a model to guess from raw video, Vima builds a compact evidence layer first, then exposes it to humans and agents through a dashboard, HTTP API, CLI, and MCP server.

Evidence First

Frames are grounded with cached CII labels, zone attribution, and spatial context before they become an answer.

Agent Native

MCP and CLI interfaces let external agents inspect the same production evidence as the dashboard.

Small Production Footprint

Expensive perception is precomputed offline, so the hosted services stay light enough for a 1 vCPU / 2 GB VPS.

Auditable Outputs

Tool responses are structured JSON with frame rows, zone metrics, and temporal evidence instead of ungrounded prose.

Production Surfaces

Surface	URL
Landing	`https://vimaspatial.tech`
API	`https://vimaspatial.tech/api`
MCP	`https://vimaspatial.tech/mcp`
MCP Health	`https://vimaspatial.tech/mcp/health`

Pipeline

Hardhat footage
  -> sampled frames
  -> object boxes + optional Gemini Robotics-ER boxes
  -> merged boxes, masks, and depth
  -> CII labels and spatial zones
  -> object-event episodic memory
  -> cited dashboard / API / CLI / MCP response

Current Hosted Facts

/api/cii/summary: 30 masonry frames, 26 P, 0 C, 4 NC, 86.7% wrench time, and 11 raffle tickets.
/api/spatial/zones: three 10-frame zones, currently reporting 90.0%, 100.0%, and 70.0% productive time.
/api/eval: temporal evidence is served from cached or live JSON; the current hosted snapshot examines 8 frames, returns 5 accepted temporal claims, and shows a 0.600 to 0.792 spatial score lift from memory augmentation.
Frontend evidence workspace: 118 temporal episodes are available for the demo review flow.
/mcp/health: the hosted MCP service is deployed and healthy. The /mcp route itself is for streamable HTTP MCP clients, not plain browser GETs.
/api/demo: present for local video-backed demos, but returns video_unavailable on production because the full source video is not bundled on the VPS.

Production serves cached evidence and small JSON tools. Heavy video, reconstruction, SAM, Depth Anything, and COLMAP work stays offline or in precomputed artifacts.

Evidence First

Agent Native

Small Production Footprint

Auditable Outputs

​Production Surfaces

​Pipeline

​Current Hosted Facts

Production Surfaces

Pipeline

Current Hosted Facts