I was at my local Coney Island bar commenting how poorly this cat was sitting and unhappy. I didn't want to be spending my life like this. I used to have ambitions. There had to be more in store for me.
This sent me on a deep introspective journey to remote, exotic places like Jackson Heights and even Flushing. I saw more sitting cats. I scoured the internet. More cats. Look at the huge influx of photos at: https://www.reddit.com/r/Catloaf/. There are countless unanswered posts by desperate people.
People say Artificial Intelligence will mold every apect of our lives and yet large amounts of the internet is still manually rating cat loaves. Every day the rate of cat pictures is far surpassing the number of qualified loaf judgers and this discrepancy is only going to get more crumby as time goes on.
Cat photos on the internet in billions increases exponentially every year
So here is how you rate your cat loaf faster with Artificial Intelligence. I'll just step through it. Say we upload this fresh loaf.
First it gets sent to Yolov8 Segmentation Model. YOLO is an opensource model and pretty good at recognizing common objects like cats. It tries to see if there is any in the image at all and highlight it.
Looks like a pure bread! So now we can run the next model on it and try to find the impurrfections. We remove everything outside the highlighted cat loaf so the next model has a better chance of working. This way it won't have any false positives with furniture or other things. Also, it looks cool.
Then we send it to YOLO-World, another open source model that is more flexible with the types of items it tries to find. Everyone knows bad loaves have paws and tails sticking out so I gave it the classes "paws" and "tails" and it tries to look for them. It's finding paws and tails, but it's just circling the entire cat. This model is beginning to look half baked.
I got it to perform better but adding a third class "cat". The model now separates out a cat from a Paw or Tail and narrows down its focus. You can find more info on why that works here: https://blog.roboflow.com/yolo-world-prompting-tips/
Those sure are impurrfections but some are just the spots on the cat, not its paws. Time to look at the confidence levels for a bunch of photos and see where the false positives tend to start. After looking at a dozen photos, I determine it's sometimes but not always around 0.00031% confidence.
You may be thinking, that is ridiculously low, who could trust that amount of confidence? YOLO-World is just built different. The level of confidence it needs to be reliable is extremely low.
We spotted the impurrfections correctly now. From there, it's just some simple logic. If I find any paws with "0.00031" confidence I'll just show those, otherwise I'll just show whatever is found. Sometimes science is just banging pots together until something useful happens.
Finally, I throw in a call to OpenAI's GPT4v to comment on what it saw. Their model can't detect images so I can draw nice circles like YOLO can but it's got some good cat loaf adjectives and puns for rating.
In total, we intertwined 3 different state of the art AI models to automatically rate your loaf accurately.
The workflow that runs when you upload an image (Roboflow)
The next natural step is towards training the model on cat jaw lines. Some teenagers on the street told me if we can detect this reliable there are methods to fix it.