This post will hopefully make the situation a bit clearer for everyone. Most of this info has been posted in one place or another already, but I will attempt to consolidate the important bits as I recall them. I have effectively zero visibility into the operation of cybertip.ca or Project Arachnid (their crawler), so this is based on what I can tell from my end. I have not used this blog in many years, but it's the easiest way I can publish this info now.
To start with, I do not believe they are acting maliciously, and they do not seem to be intentionally using the site to search for images. They are just following links. Dangerous links which spread CSAM (Child Sexual Abuse Material), links which they should be smart enough not to follow, but ultimately, still, just following links. From what I understand, these links primarily come from image boards and such which helpfully add them next to all posted images. This is great for users, as the links allow for quick and easy source lookups of interesting images as you come across them:
The current situation started in the afternoon on the 31st of March when our host received the first CSAM notification and promptly sent it over to us for review. At the time, I was traveling, and not online as often as usual so the report went unnoticed. Around the same time, the host of iqdb.org - another anime reverse image search engine - also started receiving similar notices. Likewise, they forwarded them to the site's operator, but unfortunately the emails wound up in spam and were not immediately noticed. More reports continued coming in over the course of the afternoon and night. I finally noticed them in my email when I came back online later that night.
The view of my inbox was seriously distressing, full of notifications from cybertip and our colocation host. I was more than a bit freaked out, wondering what could possibly be going so wrong for me to have an inbox full of CSAM alerts. I was also very concerned that our host would suddenly pull the plug. It had been several hours after all, and the wording of the notices is highly alarming. I needed to act fast. Looking at the reports, I was somewhat relieved to see they were reports of /userdata images. Those images are temporary files associated with, and created for the searches performed in response to the links the crawler accessed. They're also all long gone by now, having been automatically deleted only minutes after creation.
I was far from sure that our colocation host would recognize that distinction on their own though! The clock was ticking, I quickly purged all query image caches, etc, just to be sure, and responded to the many tickets as fast as possible. Mostly the same explanation to each, but luckily SauceNAO's host seems to have understood the situation. I did not receive a response on the tickets, but the site stayed online. Simultaneously, I attempted to contact cybertip directly in response to the notices explaining what their bot was doing wrong, and how it was directly spreading the material. No response.
After the initial tickets were dealt with, I sent a heads up to the group of anime site operators I interact with frequently, including the operator of iqdb. It was getting fairly late though, so most had already gone AFK for the night. The notices kept coming, so I took the emergency action of disabling the search query image, as that is what all the reports were reporting.
The reports immediately stopped, though cybertip continued to search for bad images, causing them to be uploaded to our servers. Once the image they were uploading was no longer being displayed on the page, there was no longer anything to report...
The next morning, after some pretty terrible sleep, I awoke to the news that iqdb.org was down. Taken down by their host, in response to the abuse notifications sent by cybertip. Abuse notifications they should actually have been sending to themselves.
Luckily, SauceNAO was still online. If I had not noticed the night before, we would probably also have been taken down, with potentially damaging effects to our servers, data, and reputation. Later that day, iqdb was brought back online when its operator was able to respond to the abuse reports, but it could have been so much worse.
Several days later, once I was back home, I started to see many users wondering about the search query image being missing. A few even asked me directly about it, so it was obviously starting to be a problem.
The search query image makes it clear that the image was acquired successfully, is properly formatted, aligned, etc. Clicking on the search query image also allows editing the image to remove borders or search for just a portion of an image to improve result accuracy. It's a very important feature, and everyone was missing it badly. Reluctantly, I re-enabled the search query images, hoping for the best...
It took a few hours, but the notices started flowing again. More reports for the images being searched for, the same images being created at the direction of the crawler which then reports them. In frustration, having heard nothing from cybertip I attempted to contact them again. Shortly after, I posted a pointed message to Twitter, publicly calling them out on their crawler's bad behavior.
By the next afternoon, the tweet was getting a lot of attention. I don't know if it was solely in response to the attention from that tweet, but they finally responded to my initial email. Around the same time, they replied on twitter with a complete denial.
Via email, I attempted to explain what was wrong and suggested several options for fixing it, but they seem to think their crawler's behavior is completely okay. Consequences be damned, no apparent care for how the modern internet operates.
One good thing did come out of that email communication though, they agreed to notify us directly in the future rather than through our host. This dramatically reduces the chance our host will suddenly decide to drop us as a customer, or take our servers offline.
Shortly after, I replied to their reply on Twitter.
Mostly silence since, and their crawler has continued trying to search for what they call CSAM on our site. In response, we disabled searches from AWS, on which their badly behaved crawler is hosted.
While blocking them from searching for abuse material on SauceNAO improves the situation for us, it does not change the fact that their bot is actually spreading the material they claim to be trying to remove from the internet. In my view, it's even worse now since they know what is happening and have promised no action to address the problem.
There are many other services, including big names like Google, Bing, and Yandex, which allow uploading or acting on an image using just an image link embedded in a url. Each and every one of these is in effect being attacked by the Project Arachnid bot with illegal requests directing their servers to access and in some cases host illegal images. The giants may have the resources to shrug this off, but smaller players like us are being severely impacted by Project Arachnid's misuse of our services.
I am still attempting to work with them, hopefully something positive will come of all this.
View comments