Skip to content

[idea] using Parseq, using the idea of Scene Text recongition  #13

@beansofhell

Description

@beansofhell
No description provided.

Activity

beansofhell

beansofhell commented on Jan 7, 2024

@beansofhell
Author

todo for myself:
get a dataset of the new captchas
https://github.com/baudm/parseq
use crnn because of low cpu usage, so can be attached to an API
use an API with a rate limited of 5 requests per minute.

slabodan

slabodan commented on Jan 7, 2024

@slabodan

I dont really know shit about AI and all that stuff but one idea I had was combining this with the previous dataset to generate a dataset of the new captchas but idk maybe that is retarded

yukariin

yukariin commented on Jan 10, 2024

@yukariin
Collaborator

Current model uses similar crrn architecture, but even smaller - you don't need much for captchas of 20 characters with zero semantic. You can check the training notebook I shared in #6 for more details.
You can run this model as an API, moffatman does that for chance iirc

JonseyJones

JonseyJones commented on Jan 11, 2024

@JonseyJones

Time for @drunohazarb to update the script then.

JonseyJones

JonseyJones commented on Jan 11, 2024

@JonseyJones

Current model uses similar crrn architecture, but even smaller - you don't need much for captchas of 20 characters with zero semantic. You can check the training notebook I shared in #6 for more details. You can run this model as an API, moffatman does that for chance iirc

Why did the development on the script stop?
Do you have the new model?
Any news from moffatman?
How come nobody cares anymore?

yukariin

yukariin commented on Jan 11, 2024

@yukariin
Collaborator

Why did the development on the script stop?
Do you have the new model?
Any news from moffatman?
How come nobody cares anymore?

I'm not the script developer
What's wrong with current one?
Ask him?
Dunno, current script works fine for me

JonseyJones

JonseyJones commented on Jan 11, 2024

@JonseyJones

What's wrong with current one?

Doesn't work when the white letters are on the black blob, it doesn't solve those letters/numbers.

yukariin

yukariin commented on Jan 11, 2024

@yukariin
Collaborator

@yukariin how many are needed to get 75% success rate?

1k should be fine for fine-tuning.
You also want an even distribution between different captcha types. For example if you have 1k new captchas (white letter in black circle) you want 1k old captchas (without circle) and 1k new-old captchas (black letter in black circle with white outline) so 3k total dataset.

JonseyJones

JonseyJones commented on Jan 11, 2024

@JonseyJones

1k old captchas (without circle) and 1k new-old captchas (black letter in black circle with white outline) so 3k total dataset.

The older captchas are the same.

yukariin

yukariin commented on Jan 12, 2024

@yukariin
Collaborator

The older captchas are the same.

Yeah but you don't want model to regress on old captchas, which will happen if you train/tune only on new ones

yukariin

yukariin commented on Jan 12, 2024

@yukariin
Collaborator

here's what ive got

Here's a test model actives 98.9% (7655/7735) accuracy on combined large dataset (10k old + 16k new + 1k new (white letters))

drunohazarb

drunohazarb commented on Jan 12, 2024

@drunohazarb
Owner

Why did the development on the script stop?
Do you have the new model?
Any news from moffatman?
How come nobody cares anymore?

It's not that I don't care, there's just not much I can do without collecting captcha samples myself.
Anyways, I just updated the script to use Yukariin's new test model. I will push the update as soon as I can confirm that the model can solve the older captchas as well, for good measure.

As for the suggestion, I think it belongs here: https://github.com/based-org/chana-solver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @yukariin@JonseyJones@slabodan@drunohazarb@beansofhell

        Issue actions