• HakFoo@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    4
    ·
    15 hours ago

    But what data would it be?

    Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.

    OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE

    • latenightnoir@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      12 hours ago

      You’ve highlighted exactly why I also fundamentally disagree with the current trend of all things AI being for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.

      Edit: but to focus on this specific instance, where we have to deal with the here-and-now, I could see them receiving, say, 60-75% of what they have now, hassle-free. At the very least, and uniformly distributed. Again, AI development isn’t what irks most people, it’s calling plagiarism generators and search engine fuck-ups AI and selling them back to the people who generated the databases - or, worse, working toward replacing those people entirely with LLMs! - they used for those abhorrences.

      Train the AI to be factually correct instead and sell it as an easy-to-use knowledge base? Aces! Train the AI to write better code and sell it as an on-board stackoverflow Jr.? Amazing! Even having it as a mini-assistant on your phone so that you have someone to pester you to get the damned laundry out of the washing machine before it starts to stink is a neat thing, but that would require less advertising and shoving down our throats, and more accepting the fact that you can still do that with five taps and a couple of alarm entries.

      Edit 2: oh, and another thing which would require a buttload of humility, but would alleviate a lot of tension would be getting it to cite and link to its sources every time! Have it be transformative enough to give you the gist without shifting into plagiarism, then send you to the source for the details!