https://archive.org/download/AO3_story_dump_continuing
https://archive.org/details/updateablefanfic
https://archive.org/details/FanficRepack_Redux
https://archive.org/details/fanfic-meta-sqlite
https://archive.org/details/fictionpress_save_01_nov_26_2018
I'm BACK! I've uploaded hundreds of gigs of fanfic from archiveofourown.org, fictionpress, and fanfiction.net, along with a sqlite db of the metadata, for easy searching. Need a natural language corpus? there are probably better ones out there, but here's this one! in about a dozen different languages too!
enjoy.
EDIT: ao3continuing and updateable are compilations of datadumps I've had sitting around a while, ao3's are identical to the previous ones, with the addition of the newer dumps in one place.
I made these as one stop shops for those 2 website dumps in the future.
Awesome! I already have a copy of all Reddit comments, but that's fairly hard to parse and usually very conversational.
How do you handle deleted/removed comments? Just dropping them?
Would I be a hero or a demon if I created a bot that continuously posted fanfic made from this online?
I'll call it Infinific.
Go nuts. I just scraped them. I didn't write them.
Although, may I suggest a twitterbot, that grabs title, author, category, genre, word count and summary.
There's a metadata database, just use that.
WB and TY for your ongoing efforts with this!
FYI, I'm seeding the torrents for all those datasets with a 650Mbit/s connection.
Download it here from my Google Drive. The size is 681MB compressed.
You can visit my GitHub repo here (Python), where I give examples and give a lot more information. Leave a star if you enjoy the dataset!
It's basically every single picture from the site thecarconnection.com. For more details, visit the repo. Picture size is approximately 320x210 but you can also scrape the large version of these pictures if you tweak the scraper. I did a quick classification example using a CNN: Audi vs BMW with CNN.
Complete list of variables included for all pics:
'Make', 'Model', 'Year', 'MSRP', 'Front Wheel Size (in)', 'SAE Net Horsepower @ RPM',
'Displacement', 'Engine Type', 'Width, Max w/o mirrors (in)', 'Height, Overall (in)', 'Length,
Overall (in)', 'Gas Mileage', 'Drivetrain', 'Passenger Capacity', 'Passenger Doors', 'Body Style'
Hi everyone,
This weekend I uploaded a new dataset into Kaggle regarding NBA Games, you can find games stats, ranking, players statistics from 2004 season to december 2019. NBA games dataset link
I will try to maintain it every month.
You can find more informations about data collection on my GitHub repository here : Github nba-predictor repo link
If you have any suggestions I will gladly read them and try to improve the dataset.
I’m not sure if this has been posted before, but the free-online book Forecasting: Principles and Practices is not only a great resource, but it comes with so many interesting time series datasets that can all be loaded as ready-to-go time series objects by simply importing the fpp2 package in R.
The book: https://otexts.com/fpp2/
A pdf describing the datasets: https://cran.r-project.org/web/packages/fpp2/fpp2.pdf
I am trying to find historical flight prices to build an airline ticket purchase timing recommendation system. Does anyone know of any data providers or APIs to get this data? I am willing to consider paid solutions if there is no good free solution. I looked at the Skyscanner API and several others but none of them seem to support past dates. Do GDS systems like Amadeus/Sabre have the capability to query and export historical prices?
Hello,
I'm currently earning a degree in Biotechnology with a concentration in Bioinformatics. I'm trying to get more experience at cleaning datasets. If you have any suggestions or files, please send me a message.
I am planning to make a dataset on any field which is currently in demand in our kaggle community. Can someone suggest me some data which is actually needed but not present or outdated on websites like kaggle.
I already have my dataset on kaggle, you can view it on https://www.kaggle.com/himanshupoddar/zomato-bangalore-restaurants
Suggest me something that you guyz want
Hello,
I´m looking for a dataset with groceries and maybe other products which are available in local stores.
It would be nice if it´s a csv file which I can import to my database.
Maybe at first a trial dataset would be enough
Thanks
I’m having trouble finding historical weather forecast data. Most sites I find provide daily and hourly forecasting but going backwards in time only observational.
I’d really like to get a hold of some historical forecasts for the last month for research purposes.
I’m currently specifically interested in data for Australia.
Say for the last 30 years or so. Daily high and low prices of stocks. Is it possible to get such data? Preferably free or low cost?
Hello there, I am new to Reddit and Datascience. I am looking for datasets that consist of player stats like positoning, number of passes, types of passes, shots, tackles etc along with their position. My main goal is to collect features of a player that can depict their position in the game i.e midfielder, winger, striker etc.
I believe each role has it's own signature like the way they pass, strike or defend. I want a dataset that depicts those.
Any pointers to where I can get started is helpful. Appreciate any help that you can offer.
I think this is just such a good idea. The NAACP Legal Defense Fund setup a database of Department of Justice grants to local police departments. Remember that all federal grants go along with Title VI requirements of the Civil Rights Act of 1964: prohibiting discrimination in programs that receive federal funding(!!) In other words, if you got a problem with racial profiling in your local police department, your federal funding is as risk. Here is main site:
https://policefundingdatabase.tminstituteldf.org/
Database of Federal grants to local police departments is here:
https://policefundingdatabase.tminstituteldf.org/report
Opinion piece explaining project by director of the NAACP Legal Defense Fund here:
We must hold police departments to account, even if the Trump administration fails to do so
I would only say more work needs to be done. Especially adding other Federal agencies that fund the police, like the Department of Transportation through NHTSA.
I scraped 50 000 abstracts to do some natural language processing, here if anyone else is interested: csv in google drive Some of them are in Chinese though so there is some cleaning to do! I've scraped them from earliest to oldest.
* some_q I know you were interested by the sample data set here is a more complete version!
Hey guys,
I want to get more familiar with data analysis. It is very useful for my job but not necessary, so I will not get some lessons or seminars. So I want to do some research in my free time. I thought about a small study. Comparison of Homöopathical and classical medicines.
But my first problem is to get the required data. I thought about get the data from common web sites where you can buy medicine. But my skills in web scraping are extremely bad. I want you to ask if you can give me some advices for tutorials where I learn to solve the following problem.
I have homepages, let’s say for Homöopathic medicines. There are buttons like „a“ „b“ etc. How can I get the data for all of the sub pages? I know how I get data from one page, but not from the complete homepage.
I will use python.
Thanks a lot Best wishes and happy holidays
I'm looking for datasets (non image, non text/NLP datasets) where deep neural networks tend to outperform shallow neural networks. I know that on a number of image, text/NLP, video and audio classification tasks deep networks perform much better than shallow networks. Are there any other domains where deep networks are essential for good performance? If so, are there any popular datasets in those domains?
------EDIT------
One of the reasons for asking this question is that I'm curious to know if deep learning has any real impact on domains where we don't deal with images and text data.
I have a spreadsheet with 1500 school names with city and state but no address. Is there a way to automate a search for all the addresses?
Hi, does anyone know where to find a dataset on whether or not a patient developed Alzheimers/Dementia based on attributes like age, gender, race, or mri scans that belong to people who later on develop dementia? Sorry if I’m doing something wrong, I’m kind of new to this.
Here is a sample dataset of pubmed articles information: Link to csv
Right now it includes title, url, authors list, abstracts and doi.
I'm planning to get some more using the related articles suggestion widget of pubmed.
Does anyone know if you can link CPT codes to health providers to understand what services they provide? For example. Understanding which mental health facilities provide TMS treatments?
Does anyone know where I can collect the long term price historical data for smartphones?
How do you do your job compared to other data scientists?
How much do you know about data science risks and ways to mitigate them?
Check it out through this survey, part of a research project at the University of Pisa.
---> https://forms.gle/ZKMeGBZXA3hFZyf88
Spending 5 minutes of your time you will be rewarded with a selection of 10 scientific articles from our database, based on your answers.
Furthermore, the results of this survey will be posted here, in order to share a benchmark overview of how data scientists work.
Here responses sheet in real time---> https://drive.google.com/drive/folders/13QDBwDvlT2MXQ2oiOHdJU23eylNW5Kfx?usp=sharing
You will receive the papers within a week, leaving the email in the questionnaire.
If you prefer, you can fill it in a completely anonymous form.
We will send you a notification for the data analysis report in February 2020.
E-mails will be sent from designingdatascience@gmail.com
I’m looking for a dataset that has information about specific workout plans and how they effected someone, like a plus or minus for weight gain or something like that.