[ home / overboard ] [ soy / qa / raid / r ] [ soy2 / tdh ] [ ss / craft ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ][Options]

A banner for soyjak.party

/tech/ - Soyence and Technology

Download more RAM for your Mac here
Catalog
Email
Subject
Comment
SelectFile / Embed / Oekaki / Tegaki / JS Paint / Voice / Poll
File
Select/drop/paste files here
Password (For file deletion.)

[–]

File (hide): Screenshot 2025-11-19 at 2….png 📥︎ (584.95 KB, 4060x1406) ImgOps

 21351[Quote] [Voice Chat]>>21355

I'm currently working on my fourth attempt at making a decent dark web search engine.
<
My first attempt was more of a test to see if I could actually use Tor in my code. It didn't have a front end and searching was limited to the sites description with no real searching algorithm apart from "IS LIKE %query%". It was all in the terminal without actually serving any site to the tor network.
<
My second attempt went better. This time I wanted to focus on the front end (the part I hate the most). I figured out how to host hidden services with Flask and, after spending way too long generating a cool domain, I finally had a working site up and running. It had the same searching (o) algo, if you can even call it that, as the last. It also took forever (around 2 minutes without cache) to load because Nginx was being a selfish little fuck and not letting me use it with tor (this could have been because I had 2 hidden services running at the same time, as I was working on another project (Onion365) at the time). Picrel is what this attempt looked like. I got carried away and added way too much bloat to the homepage. This one had 129644 sites indexed, most were homepages though.
<
My third attempt was just back-end stuff again. This included improving the filters, reworking the scraper to work more efficiently, and entirely remaking the search algorithm to use tokenizers for way better search results. This had no front end and I didnt do much testing with the tokenizers before moving on so I'm not sure how much that actually helped.
<
My current attempt is focused on remaking the crawler to be more efficient and work asynchronously (the latter I just implemented). It's already working way faster than any of my previous ones (with around 0.5 seconds per scrape instead of just 3-5). Another one of my goals for this is to finally get Nginx to run correctly. I'm also selectively caching websites (only the pure HTML, no media) this time as current archival hidden services are unreliable. I have planned to implement AI-assisted filters to avoid false detections from just using keywords. I will rework part of the front end (more search parameters and such), but not much will change. I have over 200k domains queued up to be crawled.
<
I did not "vibe code" any of this. I strongly dislike that term and those who do/promote it. I like to keep LLM usage at a minimum for my projects, but I did need the asyncio library explained before I could implement it and fix the bugs myself. As I said I suck at front end so I also needed some help with getting CSS to look like how I wanted it to.
<
I am unsure on if it is against the rules to link to my hidden service, even though I aggressively filter out any and all pornography and erotic content from search results, so I'll hold off on that for now. If this post includes anything against the rules then it was not on purpose. I know that the dark web is a touchy subject, but I just wanted to share my hobby project with you guys.
<
Leave any suggestions or questions you have ITT.

 21353[Quote]

File (hide): 1759554340217h.png 📥︎ (67.64 KB, 255x252) ImgOps

very aryan
you will share it here once its done

 21355[Quote]>>21371

>>21351 (OP)
This looks gemmy. Do you have an email to contact at? Wordfilters will likely autoban you if you link the hidden service. You could always link a clearnet homepage that links the service (if you're worried about getting banned I can post it for you)

What is your filtering method like? Do you have any ideas on how to implement the AI assisted filters? I've always wondered how that could be done in a lightweight manner.

 21371[Quote]

>>21355
Thanks. You should have gotten my email by now, if the one you put in your email field is real.

It will probably be a bit until I get a working website up and running again due to privacy concerns with owning a clearnet site and not securing my hidden service. Although when I do, I'd be more than happy to have you test it.

I currently just get the title, description, and keywords and check for banned words in there while also accounting for leetspeak. I do this rather than searching the full website for any banned words, as mentioning anything in those meta fields is more intentional and thus might rule out some false positives.

In the future I hope to use something like pytorch-text to classify websites. This would also allow for further search filtering (like possible scam detection or categorizing hidden services), not just sorting out bad sites. I know Ahmia already does something like this, I use their blocklist and their indexed sites as a starting point, but they seem to have a few false positives.

5014f8ebd867a23f02209e61de4980a0

 21374[Quote]>>21377

File (hide): 1763618149360.jpg 📥︎ (417.05 KB, 1079x685) ImgOps

A soy version of the Dark Web would unironically be hilarious.

 21375[Quote]>>21377

What are you using to make this? Rust?

 21377[Quote]

>>21374
Yeah I thought of making my own private dark net but my other laptop’s network card is too old for that so I couldn’t test it.
>>21375
I am using Python 3.11 to make it. I’ll give a full list of all the libraries I used later. Although I have attempted to learn rust several times, it’s just too different from the languages I already know for me to effectively use it in these kinds of projects.



[Return][Catalog][Go to top][Expand all images][Post a Reply]
Delete Post [ ]
[Update] ( Auto) 2
6 replies | 2 images | Page 1
[ home / overboard ] [ soy / qa / raid / r ] [ soy2 / tdh ] [ ss / craft ] [ int / pol ] [ a / an / asp / biz / mtv / r9k / tech / v / x ] [ q / news / chive / rules / pass / bans / status ] [ wiki / booru / irc ]
Style: