todo for myself:
get a dataset of the new captchas https://github.com/baudm/parseq
use crnn because of low cpu usage, so can be attached to an API
use an API with a rate limited of 5 requests per minute.
I dont really know shit about AI and all that stuff but one idea I had was combining this with the previous dataset to generate a dataset of the new captchas but idk maybe that is retarded
Current model uses similar crrn architecture, but even smaller - you don't need much for captchas of 20 characters with zero semantic. You can check the training notebook I shared in #6 for more details.
You can run this model as an API, moffatman does that for chance iirc
Current model uses similar crrn architecture, but even smaller - you don't need much for captchas of 20 characters with zero semantic. You can check the training notebook I shared in #6 for more details. You can run this model as an API, moffatman does that for chance iirc
Why did the development on the script stop?
Do you have the new model?
Any news from moffatman?
How come nobody cares anymore?
@yukariin how many are needed to get 75% success rate?
1k should be fine for fine-tuning.
You also want an even distribution between different captcha types. For example if you have 1k new captchas (white letter in black circle) you want 1k old captchas (without circle) and 1k new-old captchas (black letter in black circle with white outline) so 3k total dataset.
Why did the development on the script stop?
Do you have the new model?
Any news from moffatman?
How come nobody cares anymore?
It's not that I don't care, there's just not much I can do without collecting captcha samples myself.
Anyways, I just updated the script to use Yukariin's new test model. I will push the update as soon as I can confirm that the model can solve the older captchas as well, for good measure.
Activity
beansofhell commentedon Jan 7, 2024
todo for myself:
get a dataset of the new captchas
https://github.com/baudm/parseq
use crnn because of low cpu usage, so can be attached to an API
use an API with a rate limited of 5 requests per minute.
slabodan commentedon Jan 7, 2024
I dont really know shit about AI and all that stuff but one idea I had was combining this with the previous dataset to generate a dataset of the new captchas but idk maybe that is retarded
yukariin commentedon Jan 10, 2024
Current model uses similar crrn architecture, but even smaller - you don't need much for captchas of 20 characters with zero semantic. You can check the training notebook I shared in #6 for more details.
You can run this model as an API, moffatman does that for chance iirc
JonseyJones commentedon Jan 11, 2024
Time for @drunohazarb to update the script then.
JonseyJones commentedon Jan 11, 2024
Why did the development on the script stop?
Do you have the new model?
Any news from moffatman?
How come nobody cares anymore?
yukariin commentedon Jan 11, 2024
I'm not the script developer
What's wrong with current one?
Ask him?
Dunno, current script works fine for me
JonseyJones commentedon Jan 11, 2024
Doesn't work when the white letters are on the black blob, it doesn't solve those letters/numbers.
yukariin commentedon Jan 11, 2024
1k should be fine for fine-tuning.
You also want an even distribution between different captcha types. For example if you have 1k new captchas (white letter in black circle) you want 1k old captchas (without circle) and 1k new-old captchas (black letter in black circle with white outline) so 3k total dataset.
JonseyJones commentedon Jan 11, 2024
The older captchas are the same.
yukariin commentedon Jan 12, 2024
Yeah but you don't want model to regress on old captchas, which will happen if you train/tune only on new ones
yukariin commentedon Jan 12, 2024
Here's a test model actives 98.9% (7655/7735) accuracy on combined large dataset (10k old + 16k new + 1k new (white letters))
drunohazarb commentedon Jan 12, 2024
It's not that I don't care, there's just not much I can do without collecting captcha samples myself.
Anyways, I just updated the script to use Yukariin's new test model. I will push the update as soon as I can confirm that the model can solve the older captchas as well, for good measure.
As for the suggestion, I think it belongs here: https://github.com/based-org/chana-solver