Cheap AI “video scraping” can now extract data from any screen recording

You May Be Interested In:People prefer AI-generated poems to Shakespeare and Dickinson



Video scraping is just one of many new tricks possible when the latest large language models (LLMs), such as Google’s Gemini and GPT-4o, are actually “multimodal” models, allowing audio, video, image, and text input. These models translate any multimedia input into tokens (chunks of data), which they use to make predictions about which tokens should come next in a sequence.

A term like “token prediction model” (TPM) might be more accurate than “LLM” these days for AI models with multimodal inputs and outputs, but a generalized alternative term hasn’t really taken off yet. But no matter what you call it, having an AI model that can take video inputs has interesting implications, both good and potentially bad.

Breaking down input barriers

Willison is far from the first person to feed video into AI models to achieve interesting results (more on that below, and here’s a 2015 paper that uses the “video scraping” term), but as soon as Gemini launched its video input capability, he began to experiment with it in earnest.

In February, Willison demonstrated another early application of AI video scraping on his blog, where he took a seven-second video of the books on his bookshelves, then got Gemini 1.5 Pro to extract all of the book titles it saw in the video and put them in a structured, or organized, list.

Converting unstructured data into structured data is important to Willison, because he’s also a data journalist. Willison has created tools for data journalists in the past, such as the Datasette project, which lets anyone publish data as an interactive website.

To every data journalist’s frustration, some sources of data prove resistant to scraping (capturing data for analysis) due to how the data is formatted, stored, or presented. In these cases, Willison delights in the potential for AI video scraping because it bypasses these traditional barriers to data extraction.

share Paylaş facebook pinterest whatsapp x print

Similar Content

Which AI chatbot is best at avoiding disinformation?
Which AI chatbot is best at avoiding disinformation?
Liverpool fans jostle for a look at the on-field VAR screen at Anfield stadium
The real reason VAR infuriates football fans and how to fix it
GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini
GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini
iOS 18.2 developer beta adds ChatGPT and image-generation features
iOS 18.2 developer beta adds ChatGPT and image-generation features
A glob of jelly can play Pong thanks to a basic kind of memory
A glob of jelly can play Pong thanks to a basic kind of memory
New Scientist. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.
Are we really ready for genuine communication with animals through AI?
The News Spectrum | © 2024 | News