this post was submitted on 03 Apr 2024
960 points (99.4% liked)
Technology
60564 readers
3658 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
AI-based video codecs are on the way. This isn't necessarily a bad thing because it could be designed to be lossless or at least less lossy than modern codecs. But compression artifacts will likely be harder to identify as such. That's a good thing for film and TV, but a bad thing for, say, security cameras.
The devil's in the details and "AI" is way too broad a term. There are a lot of ways this could be implemented.
I don't think loss is what people are worried about, really - more injecting details that fit the training data but don't exist in the source.
Given the hoopla Hollywood and directors made about frame-interpolation, do you think generated frames will be any better/more popular?
Han shot first.
Over Greedo's dead body.
Correct!
In the context of video encoding, any manufactured/hallucinated detail would count as "loss". Loss is anything that's not in the original source. The loss you see in e.g. MPEG4 video usually looks like squiggly lines, blocky noise, or smearing. But if an AI encoder inserts a bear on a tricycle in the background, that would also be a lossy compression artifact in context.
As for frame interpolation, it could definitely be better, because the current algorithms out there are not good. It will not likely be more popular, since this is generally viewed as an artistic matter rather than a technical matter. For example, a lot of people hated the high frame rate in the Hobbit films despite the fact that it was a naturally high frame rate, filmed with high-frame-rate cameras. It was not the product of a kind-of-shitty algorithm applied after the fact.
I don't think AI codecs will be anything revolutionary. There are plenty of lossless codecs already, but if you want more detail, you'll need a better physical sensor, and I doubt there's anything that can be done to go around that (that actually represents what exists, not an hallucination).
It's an interesting thought experiment, but we don't actually see what really exists, our brains essentially are AI vision, filling in things we don't actually perceive. Examples are movement while we're blinking, objects and colors in our peripheral vision, the state of objects when our eyes dart around, etc.
The difference is we can't go back frame by frame and analyze these "hallucinations" since they're not recorded. I think AI enhanced video will actually bring us closer to what humans see even if some of the data doesn't "exist", but the article is correct that it should never be used as evidence.
I think there's a possibility for long format video of stable scenes to use ML for higher compression ratios by deriving a video specific model of the objects in the frame and then describing their movements (essentially reducing the actual frames to wire frame models instead of image frames, then painting them in from the model).
But that's a very specific thing that probably only work well for certain types of video content (think animated stuff)
Nvidia's rtx video upscaling is trying to be just that: DLSS but you run it on a video stream instead of a game running on your own hardware. They've posited the idea of game streaming becoming lower bit rate just so you can upscale it locally, which to me sounds like complete garbage
It remains to be seen, of course, but I expect to be able to get lossless (or nearly-lossless) video at a much lower bitrate, at the expense of a much larger and more compute/memory-intensive codec.
The way I see it working is that the codec would include a general-purpose model, and video files would be encoded for that model + a file-level plugin model (like a LoRA) that's fitted for that specific video.
Arguably already here.
Look at this description of Samsungs mobile AI for their S24 phone and newer tablets:
AI-powered image and video editing
Galaxy AI also features various image and video editing features. If you have an image that is not level (horizontally or vertically) with respect to the object, scene, or subject, you can correct its angle without losing other parts of the image. The blank parts of that angle-corrected image are filled with Generative AI-powered content. The image editor tries to fill in the blank parts of the image with AI-generated content that suits the best. You can also erase objects or subjects in an image. Another feature lets you select an object/subject in an image and change its position, angle, or size.
It can also turn normal videos into slow-motion videos. While a video is playing, you need to hold the screen for the duration of the video that you want to be converted into slow-motion, and AI will generate frames and insert them between real frames to create a slow-motion effect.