Nice :) From a cursory glance the web scraping doesn't appear to check for / obey robots.txt however?
-
-
-
Good call lemme add that
-
What about ai.txt?
-
I've seriously pondered about variations of this :) robots.txt is broken on many levels _especially_ for modern usage. When I was at
@CommonCrawl the complexities that arise from the excessively simple robots.txt were a recurring bottleneck and frustration. End of conversation
New conversation -
-
-
Can't help but wonder if this aims at OpenAI GPT-2 dataset
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
- Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Eight spaces indentation?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Amazing!
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
7h15 - 7w337 - 5h0u1d - b3 - 4dd3d - 70 b00km4rk5.. : )
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
- Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
And then a problem is filtering out junk like auto translated, old fashioned and other unwanted stuf. Which leaves you with maybe 30%
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.