Hacker Newsnew | comments | show | ask | jobs | submitlogin
Show HN: Never lose a website again (fetching.io)
78 points by flippyhead 279 days ago | 78 comments



Two issues with your landing page

1. major - Breaks back button

2. minor(nitpick) - If I click on the green "notify me" button and then "cancel" on the prompt dialog instead of "OK", it still shows me the green "Success - you will be notified" message

Good luck though!

-----


"Never lose a website again" ...because you will never leave this one.

-----


Ha!

-----


Your site breaks the back button (Chrome 35 OS X Mavericks)

-----


You are right! I'm working on a fix.

-----


Out of curiosity, what are you doing with such a simple site (and I don't mean that in a bad way) that requires any conscious effort to fix? It seems like you really need to go out of your way to fuck up the back button and for a fix to be "work".

But, recent nuanced trends in web design/navigation aren't my top skill, so I'm asking the question honestly.

-----


The site is built with the meteor framework, so it's not just the simple implementation you might expect from the layout. I imagine the features of meteor are useful in the actual app itself, which is probably not a simple site.

-----


Basically, wrong tool for the job. The top page should probably be static, no need to involve meteor at all.

The rest? Maybe meteor is the best choice, cannot really say.

-----


Honestly, I've been taken a bit by surprise that the back button bug has been such a big issue. I did some testing among friends (with a group of five or so startup, programmer types) and, believe it or not, it never came up.. That said, it'll be fixed in short order.

To the other comments, the service is considerably more complex (and yes, built on Meteor, MongoDB and ElasticSearch) as it serves up search results and updates in real time. Though the UI is simple, it does take a fair bit of effort to organize all the browsing information and index such that search results are relevant.

-----


Don't mess with the back button and with the scrollbar.

Why not just elasticsearch ?

If you store texts in mongodb, look later for tokumx that compresses data.

-----


It looks like you're using InstantClick. Make sure you're implementing it right.

-----


Shameless plug: https://www.purplerails.com/

(1) Saves an exact copy of the page also.

(2) Indexes the text.

(3) Encrypted (search occurs on your computer).

Been in beta for a while. Thanks for feedback.

-----


Love the cartoon on your front page!

-----


Thank you for your kind words!

-----


I love this idea. Chrome's history is insufficient; I swear, it often doesn't record sites I've visited. But.. I'm not ready to trust an unknown process sending my entire browsing history to unknown servers. Excited for a potential local version.

-----


Chrome's history is inordinately awful. Get this addon: https://chrome.google.com/webstore/detail/better-history/obc...

-----


I've always thought this, so thanks for the extension, looks pretty good. It would be nice to have some geek stats like a top x list of your most visited sites.

-----


Agreed. I learned recently that you get better history search by going to Google and having it only show 'visited sites'.

-----


Visited pages only seems to include the sites you visit immediately after doing a google search. It does not include pages found through normal browsing.

-----


Yeah, I noticed just today that it doesn't record everything! Very strange. Could turbolinks-alike implementations be throwing it off?

-----


I was just looking for such indexer this exact evening.

Unfortunately, logon via Twitter fails with 500 error bar flashing at the site's top and logon via FB fails with "App Not Setup: The developers of this app have not set up this app properly for Facebook Login.", so I can't try it.

Nonetheless, my biggest worry is why this is a service instead of a standalone package. (Actually, I've considered trying with the hope plugin may be possibly FOSS and if so researching whenever it could be hacked to be used with locally-installed Solr/Lucene server) I'm not really comfortable with directly or indirectly sharing my browsing history with most third parties.

-----


Bah, I'm sorry! The Facebook login has now been fixed.

I totally hear you on some people not wanting to share browsing history externally. This first version was easiest to do as a hosted service. Next up I intend to package it up as an installable app.

-----


Twitter login still not working

-----


Please also look at https://www.purplerails.com/

I tried address the privacy angle from day 1. Data is encrypted and only ciphertext is sent to the cloud. Index is stored only on your own machine. Searching occurs on your own machine.

A couple of cool features are that it also saves an exact HTML copy of the page including images and stuff if you read a page "long enough" (currently 90 seconds).

Been in beta for a while. Thanks for feedback.

-----


Local version: https://github.com/idibidiart/AllSeeingEye

-----


That said, I have found it handy to have access to this searchable history for different locations and devices. For example on my phone.

-----


I am going to wait for the local version for this one. I am working on a project that needs the full browsing history including content. You might have read read it here on HN[1]

The goal is to make my computer a search engine that can also recommend articles based on the ones I have visited. It can also check websites to see if there is any new content.

The project is still in its very early stages and any tool I can use will be very helpful. This looks just like what I need.

[1]: https://news.ycombinator.com/item?id=7822859

-----


Why server side indexing and not use something like lunarjs[1] from the beginning?

Like the Idea of fulltext search on history, but I'm not going to send my full history to some random dude on the internet. No offence intended:)

[1] ”Simple full-text search in your browser” http://lunrjs.com/

-----


None taken! I looked at lunarjs (and a few other options) but felt that the more sophisticated features of ElasticSearch were worth it (at least to me).

I'm curious, would you be more comfortable if all your content were encrypted such that not even the app developer (me) could read it? Or would only a local index do?

-----


It's not just about storing the history itself. I store bookmarks in the cloud as well.

My main problem is: As I understand it, the plugin sees what I see and just sends away everything. Including payments and balance in my bank account, business messages in Basecamp, code in private Github repos and content on sites that are not yet public I've signed an nda for.

The benefits don't justify the risks for me. Using incognito mode to avoid a plugin I've installed isn't really feasible.

As a second reason: I have mediocre internet connections most of the time since I'm traveling. Therefore I try to avoid as much requests as possible.

-----


Most people would be okay with plaintext content sitting on your server and even publicly accessible by anyone.

But this is HN, and we wouldn't use even an AES-512-encrypted service if it has to leave our computers. Our password could still be cracked with enough computing power. So, we'd all be extremely happy if you could make a version for paranoid people like us.

Also, I don't know how your indexing works, but please make it easy to backup (just allowing us to define the index folder would do). Possible data loss is the tradeoff with local content.

-----


No, not really

-----


I always thought it surprisingly how poorly the "Search History" feature works in Chrome. Could be super useful if it worked.

-----


The idea is simple enough. My concerns would be how many resources it would use after a while of use. It's basically duplicating your already existing history.

-----


Maybe not, I think you can get the history from chrome and perhaps just have the app store an index.

-----


I like the concept of this service.

Basically we should be able to find _any_ piece of information that we have already encountered with anytime in our lifetime. This service is indexing this layer of information.

I would add another layer for information with which we engage more: e.g. liking, linking, sharing etc.

-----


Cool. I've definitely had a desire for something like this in the past. It's mostly been filled by Evernote and its web clipper now. Any time I have a vague inkling that I might want a page again sometime, I clip it, so I can easily find it with a future search. (Often by accident since you can configure the clipper to show Evernote notes alongside search engine results.)

The downside compared to something like this is that it only works if I have the foresight to clip a page. But the upside is that I don't end up indexing all the crap I definitely won't want again, so it's easier to find things in the remainder.

-----


Thanks. My hope is that the search is good enough that it doesn't matter how much is indexed -- you'll always be able to find what you are looking for. This is one reason, to start, I felt it was easier to index content server side..

-----


Of course, there's no reason not to do both. A service like yours could catch anything that falls through the cracks with EN. If you could eventually create browser extensions like theirs that add results alongside search engine results, it might would help people rediscover pages they've used in the past.

-----


What browsers does it work with?

Where's your privacy policy?

-----


Moreover, how does this work with sites that make use of ajax content? It'd be frustrating to assume that it's recording just fine, only to later find that patches of your history couldn't be recorded.

-----


That's a great point. At the moment it only records what's in the DOM (stripped of all HTML) after the first page load event. I'm working on how to include AJAX content but it's not nearly as straight forward.

-----


Take a look at readability.js (https://github.com/Kerrick/readability-js) and extract/upload the main DOM content after all the JS trickery has completed.

-----


Assuming it completes. I can easily imagine a page with a news ticker that updates once every few seconds.

-----


Do you just record n-grams indexed against the page url, are you then uploading that index? If you're not uploading it how is there no "local version" available?

It's an interesting idea. Personally I have a script that wgets all the pages I bookmark and I very rarely use that content. What use cases are you anticipating?

-----


It would be useful for pulling back pages that have fallen off the internet and are not in the wayback machine.

Thats a use case I have hit a few times. I even started backing up useful posts just in case they died.

http://gigablast.com/rants.html is an example. Its a really good insight into the creation of a search engine which really should be preserved.

One that did dissapear but has since come back is this post http://widgetsandshit.com/teddziuba/2010/10/taco-bell-progra... which I went looking for a few years ago but had dissapeared from all search indexes.

-----


Pinboard offers an archival account (25$ a year) that works wonderfully for this. Take a look if you don't already know it.

https://pinboard.in/faq/#archiving

-----


What does indexing mean here? I'm not sure what it means to 'index' but not 'store' webpages as the frontpage says.

-----


I assumed they meant that they extracted meta-data about the page (page mentions "Winnie the Pooh" for example, like a book index) and kept that associated with a key (eg URL) but didn't actually store the content per se. I was hoping the answer to my question would include that sort of detail - what's stored and where.

-----


That's exactly right. The indexed content is stored (as an inserted index) but not the plain text.

-----


Looks interesting, I've often wanted to find old pages I've visited. Though I'll wait until it has a track record, before I trust it with my history.

It would also be important to rank searches well, since otherwise an entire history may have too much. Though this will be a hard area to take on Google...

-----


To be blunt, this project will never gain traction because not enough people are willing to store their browsing history somewhere outside of their control. The back button issue shows a lack of attention to detail. Something that's extremely important when dealing with sensitive personal data.

-----


Please know this is the first exposure this project has ever had. I really did try, in my spare time, to get this thing perfect before soliciting feedback from this wondrous community but -- alas! -- there be bugs. It's this kind of feedback that I was seeking and I intend to incorporate as I drive towards a broader release.

Part of what I'm trying to validate is exactly the point you raised: will people generally be too freaked out to store their browsing history "in the cloud"?

The next thing I'm hoping to determine is if it'll be enough to encrypt this content is such a way that people are OK with it being stored externally or if only a locally installed version will do. Security aside, there's a great deal of advantage to offer this service hosted.

Thanks for your feedback!

-----


I think the only way people would be comfortable is if you use strong one-way encryption, meaning you yourself cannot decrypt a user's data. But to enable trust, and being a startup, you would have to have your code reviewed by a reliable third party or open-sourced for public review. Google gets away with it because they have built a brand with a reputation that some people trust enough with their data (I don't personally, but enough do). A much easier path is to create a version that works locally with no external communication. Both options would be ideal, but perhaps more work than you care to take on. No matter how it pans out it will be a great learning experience.

-----


Ah, so it's like Firefox's URL bar, except with full-text search, rather than just the URL/Title.

-----


If this doesn't repeat websites and categorizes them in a neat manner it would be amazing. My obsessive bookmarking led me to throw this up: http://goo.gl/WNw5OG

I look forward to seeing what you come up with.

-----


As an Opera user, there has been many times where the in-page content searching has helped me find stuff. When I'm in different browsers, that loses its usefulness. I'll definitely come back to this when the local version is done.

-----


I'm a bit confused. Is this essentially a browser plugin which extracts plaintext of your browsing and then sends it off to fetching.io's servers for indexing, and presumably some sort of search box area to let me search those?

-----


Is each user's index stored somewhere in their browser or on your servers? It would probably ease peoples' minds if they knew their history was stored locally, but I can see the performance hit that might make.

-----


What does it look like? Would be nice to have a demo page with an example index to try without having to sign up. I bet you could A/B test it to see what affected sign-up rates too.

-----


For people that want something like this but locally: https://github.com/idibidiart/AllSeeingEye

-----


nice pointer! Not sure what amount it can support. It stores screenshots for the pages, which is a lot consumption of space.

-----


I may be blind, but you don't seem to inform the user that the addon is currently Chrome-only before he has suscribed. It's misleading.

-----


Ha, very random but I remember you from an Airbnb listing a few months ago. Almost lived with you in Seattle for a couple weeks! Best of luck.

-----


Why do so many sites require cookies just to tell me what they are all about? Yes, I am of the lunatic fringe that blocks all cookies by default...

...and by default more and more web sites lose me as a potential client/user/whatev because they require cookies just to display static welcoming information...

...I lose nothing by this, as far as I can tell. Perhaps ignorance really is bliss. :->

-----


I guess most websites nowadays are built using one of the myriad web frameworks out there (Django, RoR, you name it.) Most of this frameworks enable sessions by default, simply because it's what most websites will want if they manage any kind of state. Nothing nefarious about it.

-----


> Most of this frameworks enable sessions by default

True, but in my experience, the major frameworks don't automatically lock out users with cookies disabled. For example, on a Rails app with no before_filter on the homepage, you can start the server and do this:

    echo "GET / HTTP/1.1" | nc localhost 3000
You should get back the homepage HTML.

-----


One legit reason is to set a CSRF cookie. If there's a login or signup form on the page, many frameworks add a CSRF field to the form that should match the cookie. When you POST the form, the cookie and the field must match.

-----


Sure, that's legit - if login or signup is required.

But for pages containing information intended to encourage a user to spend time on a site, to get convinced they need to sign-up, it's counter-productive.

-----


This is a great point and not one I'd ever even considered. You are right, the page you see first -- before signing up -- really doesn't need much. I am, in fact, using the Meteor stack for the entire thing, even the informational pages. But that was mostly a convenience.

-----


Thanks for answering and for accepting this comment - I needed the validation, since I lost two karma points on the comment.

Ironic, no?

:->

-----


I don't understand, what is the advantage over using Chrome's history menu?

-----


Chrome doesn't index the contents of the page and is limited in how much history it records. This does full text indexing of the pages you visit and lasts forever. It's backed by elasticsearch.

-----


Why do I need to create an account for what sounds like a browser extension?

-----


not available on Firefox. Can't use it.

-----


when is firefox support planned?

-----


hmmm, privacy?

-----


Interesting service, but if you consider the market, it's the cross-section of people...

- who are both anal-retentive enough to want to index the content of every page they visit, and yet...

- are not anal-retentive enough to want to index the content of any page visited on a mobile device, and also...

- don't care about their privacy by sending the address and index of every single page they ever visit to someone.

Good luck.

-----




Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: