Cheap AI “video scraping” can now extract data from any screen recording

You May Be Interested In:‘We got stuck in puddles’: skiers upset by lack of snow on Swedish slopes



Video scraping is just one of many new tricks possible when the latest large language models (LLMs), such as Google’s Gemini and GPT-4o, are actually “multimodal” models, allowing audio, video, image, and text input. These models translate any multimedia input into tokens (chunks of data), which they use to make predictions about which tokens should come next in a sequence.

A term like “token prediction model” (TPM) might be more accurate than “LLM” these days for AI models with multimodal inputs and outputs, but a generalized alternative term hasn’t really taken off yet. But no matter what you call it, having an AI model that can take video inputs has interesting implications, both good and potentially bad.

Breaking down input barriers

Willison is far from the first person to feed video into AI models to achieve interesting results (more on that below, and here’s a 2015 paper that uses the “video scraping” term), but as soon as Gemini launched its video input capability, he began to experiment with it in earnest.

In February, Willison demonstrated another early application of AI video scraping on his blog, where he took a seven-second video of the books on his bookshelves, then got Gemini 1.5 Pro to extract all of the book titles it saw in the video and put them in a structured, or organized, list.

Converting unstructured data into structured data is important to Willison, because he’s also a data journalist. Willison has created tools for data journalists in the past, such as the Datasette project, which lets anyone publish data as an interactive website.

To every data journalist’s frustration, some sources of data prove resistant to scraping (capturing data for analysis) due to how the data is formatted, stored, or presented. In these cases, Willison delights in the potential for AI video scraping because it bypasses these traditional barriers to data extraction.

share Paylaş facebook pinterest whatsapp x print

Similar Content

Google and Kairos sign nuclear reactor deal with aim to power AI
Google and Kairos sign nuclear reactor deal with aim to power AI
Ward Christensen, BBS inventor and architect of our online age, dies at age 78
Ward Christensen, BBS inventor and architect of our online age, dies at age 78
Nvidia nudges mainstream gaming PCs forward with RTX 5060 series, starting at $299
Nvidia nudges mainstream gaming PCs forward with RTX 5060 series, starting at $299
Two fake cats, sitting on seats atop an air purifier at CES 2025
Three bizarre home devices and a couple good things at CES 2025
The phrase Zero Day can be spotted on a monochrome computer screen clogged with ones and zeros.
Serbian student’s Android phone compromised by exploit from Cellebrite
Liverpool fans jostle for a look at the on-field VAR screen at Anfield stadium
The real reason VAR infuriates football fans and how to fix it
The News Spectrum | © 2024 | News