The Multimodal
After this drill, you can use images, PDFs, and screenshots as input — not just text — and know when each multimodal format unlocks new capability.
Why this matters
The world runs on images and documents, not just text. Whiteboards captured on phones. Screenshots of error messages. PDFs of contracts. Charts in annual reports. Being able to feed these directly into AI — and ask the right questions about them — is one of the highest-leverage practical skills in this course. This drill covers three scenarios: visual analysis, document extraction, and screenshot debugging.
How to do it
- 1
Scenario 1: Visual analysis — take a photo of something in your physical environment
A whiteboard diagram, a printed chart, a product you're trying to describe. Ask Claude what it sees and what it means.
- 2
Scenario 2: Document extraction — upload a PDF with structured content
A report, a contract section, a form. Ask Claude to extract specific information and answer questions about it.
- 3
Scenario 3: Screenshot debugging — take a screenshot of something that is confusing or broken
A UI element that looks wrong. An error message. A chart you do not understand. Ask Claude to diagnose and explain.
- 4
Identify which scenario produced the most value for your work context
This determines which multimodal use case you prioritize in your workflow.
The prompt
I'm sharing [IMAGE/SCREENSHOT/PDF]. Please: 1. Describe exactly what you see (be specific about numbers, labels, and structure) 2. Answer this specific question: [YOUR QUESTION ABOUT THE CONTENT] 3. Flag anything that seems incorrect, inconsistent, or requires clarification
Success criteria
- ✓You completed all three multimodal scenarios
- ✓You used the "describe first" approach before asking specific questions
- ✓You identified which scenario is most useful for your work
Common mistakes
Asking the question before Claude describes what it sees
→ "What does this chart mean?" without Claude first describing the chart often produces misinterpretations. Always describe first.
Using low-quality images
→ Blurry photos, dark lighting, small text in images all degrade quality significantly. The clearer the image, the better the analysis.