• Domi@lemmy.secnd.me
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    Prompt processing at 12.3t/s, inference at 10.7-11.1 t/s.

    Is that still on CPU or did you get it working on GPU?

    I have seen a few people recommending GLM 4.5 at lower quants primarily for more intricate writing, might be worth the lower speed and context size for shorter texts.

    Thanks for testing!

    • panda_abyss@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 months ago

      That was GPU, CPU was 5.

      I’ve also tested the image processing more, a 512x512 takes about a minute, 1400x900 takes about 7-10, and image to image takes about 10 minutes

      Most of the time is spent on the encoder decoder layers for image to image, and decoding is what shapes the slowest with image size