← Back to Intelligence

Phi-3-Vision Multimodal Model

Date: May 21, 2024
Company: Microsoft
Category: Models & Research

Narrative

4.2B parameter multimodal model. Language and vision capabilities. Chart and diagram understanding. Small model multimodal breakthrough.

Microsoft

Reality

Vision capabilities working. OCR, chart interpretation, multi-image comparison functional. Khan Academy testing for math tutoring. Epic using for medical records.

Implication

Multimodal capabilities no longer required massive models. Vision-language on-device became viable. Small model philosophy extended beyond text.

Tags

  • microsoft
  • model-release
  • multimodal
  • small-model
  • vision