this post was submitted on 06 Oct 2024
111 points (88.8% liked)

Technology

59534 readers
3223 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] FaceDeer@fedia.io 16 points 1 month ago (1 children)

They're not talking about the same thing.

Last week, researchers at the Allen Institute for Artificial Intelligence (Ai2) released a new family of open-source multimodal models competitive with state-of-the-art models like OpenAI’s GPT-4o—but an order of magnitude smaller.

That's in reference to the size of the model itself.

They then compiled a more focused, higher quality dataset of around 700,000 images and 1.3 million captions to train new models with visual capabilities. That may sound like a lot, but it’s on the order of 1,000 times less data than what’s used in proprietary multimodal models.

That's in reference to the size of the training data that was used to train the model.

Minimizing both of those things is useful, but for different reasons. Smaller training sets make the model cheaper to train, and a smaller model makes the model cheaper to run.

[–] General_Effort@lemmy.world 1 points 1 month ago

After a quick skim, seems like the article has lots of errors. Molmo is trained on top of Qwen. The smallest ones are trained on something by the same company as Molmo.