Okay, this is a relatively serious proposal to require Google to allow API access to its search index, with the premise that it would democratize the search engine ecosystem. There are some issues with the regulations he proposes (you have to allow throttling to prevent DDoS attacks, and you can't let anyone with API access add content to prevent garbage results), but it's roughly feasible.
The main problem is, I think the author is wrong about what Google's "crown jewel" is. Yes, Google has a huge index, but most queries aren't in the long tail. Indexing the top billion pages or so won't take as long as people think.
The things that Google has that are truly unique are 1) a record of searches and user clicks for the past 20 years and 2) 20 years of experience fighting SEO spam. 1 is especially hard to beat, because that's presumably the data Google uses to optimize the parameters of its search algorithm. 2 seems doable, but would take a giant up-front investment for a new search engine to achieve. Bing had the money and persistence to make that investment, but how many others will?
> Yes, Google has a huge index, but most queries aren't in the long tail.
I'm not quite sure about that. 15% of Google searches per day are unique, as in, Google has never seen them before. [1]. That's quite an insane number.
Wow, 15% unique searches is indeed quite an interesting figure. With that said, what OP said is definitely not disproved. Just because 15% of searches are unique, that doesn't mean the most relevant result is buried in the tail end. I mean I can think of loads of my own searches that are probably unique or rare, but lead to the same popular results because of typos, improper wording etc.
Without some clear numbers on that from a major search engine, I think this might be very difficulty to infer.
Heh, yes, they do. Which is a reminder that devs are not "typical" users.
As a developer, I search using keywords; for example, if I was looking for property for sale in Inverness, I might search for "property Inverness", whereas I've seen and heard "typical" users use something like "find me a 2 bedroom house with a garden for sale in the North of Inverness" - much more verbose, and containing stop words and phrases unlikely to help (I think!).
I search full sentences (questions) from the keyboard. I figure I'm not the only to have had the question before, so I ask. Also, I find that blog posts, etc. tend to match well for full sentences.
Probably quite a few. New things happen. Politics, wars, famous folks, movies, music, diseases, scientific studies, products, brands, model numbers for products, fads and slang. I'm guessing there are other things as well.
Some of the new things are probably variation as well - as others have mentioned, sentences and voice commands can give lots of new stuff.
I would assess Google (& FB's) "crown jewel" as, ultimately, their market share, which is related to your points... and causation runs both ways.
The user data helps/ed Google create the superior UX, as you say. The reach is what makes Google & FB valuable to advertisers. A search engine with 0.1% of Google's user volume cannot charge advertisers 0.1% of Google's as revenue. Returns to scale/reach/market-share are very substantial in online advertising.
I'm glad we're talking though. Those tech giants are too powerful.
Ultimately, the old antitrust toolkit is near useless today, for dealing with tech monopolies. It's not obvious what "break up Google" even means. There are strong network effects and other returns-to-scale. It's a zero-marginal cost business, which was rare enough in the past that economists a ignored it.
We need fresh thinking, a new vocabulary, new tools, but we do need to deal with it.
* an Office suite / enterprise company (Google Cloud + Docs + Gmail + Business)
* a phone company (Android)
* a search company (Google Search + Advertisement)
* and a media company (Google Play Movies, Music, Books and YouTube)
The names would probably become different in time, but you get the gist.
Amazon and Microsoft could be broken up much the same way, in neat categorical 'silos'. Facebook should be trisected into Facebook, WhatsApp and Instagram again. I have no idea how you would break Apple up without utterly destroying their core principle, vertical integration. There is no way to do what Apple does with MacBooks or iPhones if they don't control the entire stack. I'm not saying they shouldn't be, I just see no way.
Yes, my thought was that by breaking everything off from everything else, these silo'd services would suddenly have to compete with the rest of their market at fair terms, instead of being propped up massively by other division(s), and thus would lose marketshare to a multitude of fresh and established competitors.
You are right though, it doesn't deal with the dominance of the search directly. My hope is a complimentary effect to the above also happens: Google no longer gets gobs of personal data from its other services, allowing other search engines to approach its efficacy.
As is clear I'm not really a fan of direct intervention in a single market, I see it as more of a problem when these giants muscle their way and control more and more markets, creating a vicious feedback loop.
Most of those products don't make money by themselves, they exist to keep people in the ecosystem, providing more data for the real moneymaker.
The biggest blow to Google wouldn't be to break it up into lots of small companies, you just need to separate the advertising business from everything else and you've effectively neutered the monopoly. Google's genius isn't in hiring the best engineers to providing a ton of services, it's in convincing people that they're not an advertising company, and that is where Facebook has been falling out of favor recently (I'm guessing that's why they bought Instagram, and why Google bought YouTube).
That's an interesting thought. I agree with you that most of those products are loss leaders for data mining and thus advertisement.
But my thinking was that if you simply cut off advertising all the products still have massive marketshares and could lean on each other, as long as some succeed. Not to mention investors probably willing to prop up such a massive aggregate marketshare (one only has to look at Uber).
If you 'silo' them, success of one division of previously-Google won't lead to all of them dominating.
I rather cleave them all vertically anyways, rather than be left with a bunch of mini horizontal monopolies.
Granted most of your examples wouldn't be, except for search, but it still seems more interesting to me to just have a bunch of mini googles made from cleaving teams. Certainly that would make for some crazier competition.
> Indexing the top billion pages or so won't take as long as people think.
This is what makes me wonder why we don't have a LOT of competing search engines. Perhaps i'm vastly under-estimating the technology and difficulty (I could well be - it's not my domain) but it surely it can't be THAT hard to spawn Google-like weighted crawl-based search results?
It's a long-since solved problem - heck, pageRank's first iteration recently came out of patent protection - it could just be copy'pastad. Why aren't all the big companies Doing Search?
Most likely answer: lack of diversity in revenue models.
Outside of ad revenue, search has always been seen as something of a "charity" effort for the internet. It's "boring" infrastructure work that can be critically useful but doesn't really make money directly on its own. No one wants to pay a "search toll" and there's no government agency in the world that the internet would trust as a neutral index to run it as actual tax-basis infrastructure.
Aside from the quality issues that others have already mentioned, I think that simply gaining traction for a new search engine is incredibly difficult - people typically use whatever is the default in their browser, or/and Google/Baidu/Yandex (which are surely the best known in their respective regions).
Consider DuckDuckGo, which sells itself on privacy, but after more than a decade has only 0.18% market share. Without the power to make it the default in an OS or browser, you'd have to have a really strong value proposition to convince people to switch.
It is because people just stick with their best usually instead of using a variety of search engines. It becomes rather winner takes all.
Google for general search. Duckduckgo fir general if you want something a bit more private but not extreme enough to run your own spiders. Bing mostly for porn search - not being snarky some people do consider it to have better results.
Querying an index isn't a solved problem, building it is.
It's easy to gather the necessary data, but it's hard to know which parts of that data are the most relevant for finding good content and avoiding bad content. Is it more relevant if key words show up in links or titles than in the body of the text? If so, SEO spam sites will include a bunch of keywords in links and titles. Is it more relevant if keywords show up in the first 200 visible words of the page? If so, spam pages will make tons of pages with relevant keywords at the top.
The hard part about building a search engine isn't indexing the internet, it's adapting to spam. Spammers are continually adapting to changes in the algorithm, so the algorithm needs to adapt as well. And the more popular your search engine is, the more money you make and the more able you are too adapt to spam (and the more spammers focus on your engine).
So, the problem isn't that Google has a better index (though I'm sure it does), the problem is that nobody else has the will to spend the money necessary to tune the search algorithm to stay on top of spammers. When Google started, companies didn't care as much about improving their index and instead focused on building their other content (Yahoo, MSN, etc). Google saw the value of search and got a lead on everyone else in terms of curating results, and now they have the momentum to stay in front and have shifted to building content to improve monetization. Nobody else has the monetization network for search that Google has, so they'll continue having the problem that other companies had (Microsoft wants to point you to their other services, DuckDuckGo is limited by their commitment to privacy, etc).
In short, Google wins because:
- it was better when it mattered - it makes money directly from search - its other services improve their ability to understand what users want, which improves search quality and ad relevance
You can't make a better algorithm by being clever, you make a better algorithm by having better data, and that's hard to come by these days. The only way I can think of a competitor stepping in is if they target an underserved demographic and focus data collection and monetization there, and DuckDuckGo is close by targeting privacy conscious power users.
I did a search earlier today on Google for "north face glacier" - turns out that the company North Face has a Glacier product so as far as I can tell that's all the search results contain.
Searching for "north face glaciation" did help as the first page of search results did have one entry on the topic I was actually searching on!
Maybe they should have a "I'm not buying anything" flag!
It's not just ML, but the people that provide the labeling for the ML.
Google pays some large number of people to do search and grade the various results they get to see if the answers are good, which then helps feed back ML.
Heck, according to this article[0], google has been paying people to evaluate their search results since 2004.
I feel for certain topics, especially anything to do with tutorials or coding, even Google falls foul to SEO content. Just Google ‘android custom ROM <phone model>’ for instance. There’s stock pages for all of them, identical save for the phone model, and clearly not applicable.
PageRank was an innovation at the time but modern search engines require training models on lots of query logs to get good performance. Its expensive to make a really good search engine.
It's so weird how about 1/3 of the time on DuckDuckGo, I add a !g in frustration .. half the time I still get nothing and I end up posting on Stackoverflow but half the time I get a little more useful information.
Google custom tailors results for each and every machine. Even if you're not signed in, Google uses your browser fingerprint, the OS it's reporting and location/IP data to custom fit results. There is no "stock" google result.
This is something DuckDuckGo et. al. can't do if they want to focus on a privacy model. DDG does offer location specific searches, which can be helpful.
> 1) a record of searches and user clicks for the past 20 years
If a government was serious about getting more players in the search industry, they would force Google (and all other players) to make this data public.
Simply say "All user-behaviour data used to improve the service must be freely published".
Make the law apply to any web service with more than 20 million users globally so small businesses aren't burdened.
If the data cannot be published for privacy reasons, the private parts must be seperated and not used by google or it's competitors.
Tangential - but does anyone else feel that google results are useless a lot of the time? If you search for something, you will get 100% SEO optimized shitty ad-ridden blog/commercial pages giving surface level info about what you searched about. I find for programming/IT topics its pretty good, but for other topics it is horrible. Unless you are very specific with your searches, "good" resources don't really percolate to the top. There isn't nearly enough filtering of "trash".
Yes, I feel like Google search results have very gradually become more irrelevant and spammy over the past decade or so.
There are 2 issues, I think.
Firstly, the SE-optimised spam, which has become very good as masquerading as genuine content.
Secondly, Google has dumbed search syntax down a bit, and often seems to outright ignore double quoted phrases, presumably thinking it knows better than I what I want.
As a dev, I do accept I may be an outlier though - with the incredible wealth of search history and location data that Google holds, it seems likely things have actually improved for typical users.
Google signed an armistice in the Great Spamsite War some time around '08 or '09, to the effect that spam can have all the search results aside from those pointing at a few top, trusted sites, so long as they provide any content at all. Bad content is fine. Farmed content is fine. Content that was probably machine-generated is fine. Just content. Play the game, make sure your markov chain article generator or mechanical turks post every day, throw some Google ads on your page, and G will happily put your spamsite garbage at result #3.
It has gotten better over the years in some ways even if it feels like it also got worse. I recall pages of "ads and useful lookimh search result keywords" being more common in the past.
I'd wager any startup that tries to crawl a few sites like Amazon, Yelp, Linkedin, etc will be blocked. Google, however gets a pass because they're Google. So yes, I believe their huge index, and ability to crawl any site at will is a huge, huge advantage for them.
Via API access you'd be effectively getting access to the index _plus_ the derivative search quality improvements _based on_ user data, even if you're not getting user data itself. That would certainly open the door to competition, especially on a niche basis e.g. you want to build a platform dedicated to drones - you can combine drone reviews and news with videos plus e-commerce results. The result could be awesome in sparking all kinds of small business building on Google's API.
> 2) 20 years of experience fighting SEO spam.
That's probably a key issue here though. Providing an API potentially makes it easier for spammers to identify ways to boost their content in a well automated manner.
But that's where differentiation occurs. Every search engine will get short tail results correct. We go back to Google because it also performs with the weird queries.
I agree that algorithmic superiority will probably perpetuate Google's dominance. But making its index public is (a) legally precedented, (b) conceptually simple and (c) a small step in the right direction.
Gotta say my experience is very varying with long-tail type queries, I usually try DuckDuckGo and if that fails I search Google. They find very different things, DDG tends to be less filtered in terms of spam sites and fake news, but it also finds results of dubious copyright nature, for example.
I've had the same experience with DDG, which I use as my primary search engine. If I'm looking for a specific e.g. scientific paper or a recent news article, it doesn't have it. I run the search through Google. That's purely an indexing problem.
On the other hand, if I have a health-related search, I run it through Google. DDG has the proper content. It's just that it priorities the blog spam. That's an algorithm problem.
Relieving the former, as the author's proposal would do, makes DDG more competitive. As a second-order effect, it would also let DDG priorities resources towards the second problem, making them more competitive still.
well considering the complaints I read about Google's search quality going down for users on HN all the time I have a theory that highly technical users are adversely effected by the search improvements so an improved search engine targeting that group would essentially be one searching on what you typed.
I also happen to think that is the search engine I would prefer. I think I could build that pretty quick if I had the api access.
Since the author compares the proposed API to what startpage.com does, I'm guessing he's not talking about "index" as in "raw documents", but basically Search as an API with all the sorting and ranking done.
Some argue (not necessarily me) that Google isn't necessarily purely optimizing for quality using that 20 year click-and-search log, that they're accepting some inefficiency by biasing for political (left-leaning) gain or "censorship by obscurity". If competitors could more easily build alternatives, which, say, didn't have those biases, then arguably that'd put more competitive pressure on Google to not use their monopoly for bad stuff.
Robert Epstein (born June 19, 1953) is an American psychologist, professor, author, and journalist. He earned his Ph.D. in psychology at Harvard University in 1981, was editor in chief of Psychology Today,
He has also made some questionable claims about google manipulating search results to favor Hillary Clinton.
His research is based entirely on his own experience
“It is somewhat difficult to get the Google search bar to suggest negative searches related to Mrs. Clinton or to make any Clinton-related suggestions when one types a negative search term,” writes Dr. Robert Epstein, Senior Research Psychologist at the American Institute for Behavioral Research and Technology.
Google's claim that the algorithm is generic is demonstrably false. Type in "hillary clinton e" and there is no suggestion for "email", type "donald trump e" and email is the first suggestion. Given the news content that we know is out there, that can only be the result of adjusting the results for clinton specifically (if anything, we would not expect "email" to be autocompleted for trump). This is not research that tells us what exactly Google is doing, but you cannot deny the example.
This is not "research" period. Using one arbitrary search comparison to draw conclusions about the nature of a system that processes billions of queries a day is pretty weak. Additionally, I don't get the same results you do. "hillary clinton e" does not bring up emails, nor does "donald trump e" bring up emails (the first results I see are election, education, england visit, ex wife).
I'm not ruling out the possibility that google actually is manipulating search results, but this is not proof of that.
This is not the most scientific test, since previous searches are generally taken into account. Was this test conducted from a system that mostly searches for / clicks on pro-trump or anti-trump content?
Well, that's kind of the point: it's not scientific, but it's relevant. I believe this was also the example that was recently used in a Project Veritas video, with the same results.
I searched from a Firefox private window over a VPN from the Netherlands. But since the results are the same (regarding presence/absence of "email" as an autocomplete term) I don't think it matters much.
tomweingarten can't see past the tip of his ideological nose. It's gonna be such a shocker to him when his megacorp gets shattered into a million little pieces.
> He has also made some questionable claims about google manipulating search results to favor Hillary Clinton.
Despite it being off topic, can we define why those claims are questionable? Is their data proving those claims wrong? Because with all the Google political controversies over the past few years, and given the political donation history of Google employees, it’s highly plausible that search results are manipulated to favor certain politics over others.
If the “questionable claims” have been disproven or are inaccurate, then it would seem that you’d provide some proof. Essentially, it you are to claim the search engine was not biased towards Clinton, certainly there would be some proof of that? It’s more reasonable to suspect Google manipulating search engines than not, given the political environment at Google.
The real “questionable claim” is that Google is neutral in any way — which is kind of the entire premise of the article. If Google were completely neutral, then why would their monopoly on search need to be broken?
What about Project Veritas? People claim the statements by Google employed were taken out of context, but I've gone back, listened to them, looked at the videos, and it's hard to think in what context anything they said is acceptable.
Even if the specific engineers and managers in the video clips don't have the level of authority to make the changes they're talking about, it's still chilling that their attitude could be common in Google and they see political ends of their great power as being some kind of great responsibility; instead of respecting the idea of equal/diversity of opinion.
* instead of respecting the idea of equal/diversity of opinion.*
Going to go out on a limb and say I don't respect "Hitler was a bad man" and "Hitler did nothing wrong" equally. Individual employees are allowed to have opinions...even opinions I don't agree with.
>Despite it being off topic, can we define why those claims are questionable?
the claims are questionable because his methodology is questionable. If he claims google is biased, he should have a good peer reviewable study that proves this..not google is biased because google didn't auto prompt me with "created AIDS" when I typed in hillary Clinton....
And he's the one making the claim that google is biased...The burden of proof is on HIM.
This is a forum for people in the tech world, right? Shouldn't we question N=1 "studies"?
The data that google has holed up that I want more than anything else would be trends, broken down by page number. As interesting as the data of how often people search for a term is, I find it far more interesting when they can't find the thing they're searching for. That's a hole in the market.
So... this article is a good example of how ??? it gets once you move from. "We gotta do something about these tech monopolies" into the "what should we do?" phase.
How exactly "do* you break up a Google or a FB so that (only one possible reason, but the one cited here) they don't control too much media/mind share?^
Laws usually want to be general, and my suggestion doesn't necessarily lend to that, but I'll suggest it anyway.
Facebook doesn't need to be broken up into several companies. It can just be shut down.
I don't mean that it should (justice-wise) be shut down. I just mean that we won't lack for social media. We will have social media alternatives the day after FB shuts down. Theres a chance we'll get something more open instead. There's a good chance we'll get several small replacements instead. There is 0-chance that we'll lack for ways to share posts and post pictures. This isn't Bell, where we need to keep the phones working. The phones will work fine with or without Facebook.
YouTube is another sort of example. If it shuts down, alternatives will pop up with immediately...maybe open ones.
There are justice questions (is it fair to shareholders/employees/zuck?) There are legal validity questions (why FB and not apple?). But, for the practical questions... the problem is an easy one.
^fwiw, I also think this is the most worrying part. These companies have a tremendous control about how and what people think. They make Murdoch media look quaint.
"But what about those nasty filter bubbles that trap people in narrow worlds of information? Making Google’s index public doesn’t solve that problem, but it shrinks it to nonthreatening proportions. At the moment, it’s entirely up to Google to determine which bubble you’re in, which search suggestions you receive, and which search results appear at the top of the list; that’s the stuff of worldwide mind control. But with thousands of search platforms vying for your attention, the power is back in your hands. You pick your platform or platforms and shift to others when they draw your attention, as they will all be trying to do continuously."
But this is a huge problem. I'd rather have 10 independent search providers instead of 10 companies proxying the results of google. It's worse, if I don't even know from which index the results come from. I guess, many people don't know, that Startpage shows you Google results.
I don't want Google results! I want different web crawlers ordering the results according to my taste without tracking each and every page impression of me. Give me that and I'll switch in a heartbeat.
The other problem with this is that it still can't change human nature. Ok, so this plan is implemented and any site can serve google results and order them as they want with an API. People are still going to go to their favorite far right or far left outlets, which can now access google results and show only the articles that they know their users want to see. The "filter bubble" problem could even be worse than it currently is in this scenario.
I don’t see any mention in this article of what seems like the most obvious way to split up Google, separating their search and ad businesses. (Edit to add: although maybe the effect would end up being similar, if API users serve their own ads but without access to Google’s ad infrastructure.)
That obviously wouldn’t be a simple job, of course, and maybe there are some interesting reasons why it wouldn’t work well.
> what seems like the most obvious way to split up Google, separating their search and ad businesses
Given the complexity of (a) Google's search and ad integrations and (b) the adtech landscape as a whole, this would be difficult to do legislatively. That leaves settlement with the DoJ, a costly and time-consuming path.
The author's suggestion is not exclusive against a break-up of Google. Its moderation and basis in precedent, however, make it something multiple agencies--not just the DoJ--could implement. Including Congress.
It is actually very easy to do legislatively (laws tell the companies what to do, not necessarily how to do it). The technical challenges fall on Google.
It would be easy to write a law, not to pass one. Generally speaking, when a law is unpredictably disruptive it becomes (a) difficult to pass and (b) time-consuming and costly to defend in court.
The difference in this case is that the usual defenders of private property rights and libertarianism feel as though Google is suppressing them, potentially removing the usual roadblock to this kind reform.
This is a rare instance where anti-corporate leftists and dejected right wingers could actually do something substantive together.
In most cases I would agree with you the defenders of private industry are pretty fierce in the US, but all that ideology sort of dissipates when you feel like you are being oppressed (whether or not, or to what degree it is true).
> the usual defenders of private property rights and libertarianism feel as though Google is suppressing them, potentially removing the usual roadblock to this kind reform
Potentially. It's still more difficult than the author's index proposal. No reason they can't be pursued in parallel.
Regarding a search-ad break-up, I'd guess there would be lots of wrangling over (and lobbying around) defining search and advertising. For example, is Amazon's product search tied to an advertising business, given it sells third parties' products?
> Microsoft said the same thing about internet exploror and it's operating system
And breaking Microsoft's attempted browser monopoly required the DoJ.
To be clear, I believe we will eventually have to break apart--at the very least--Google and Facebook. But there are advantages to the author's proposed tactic of making the index public. It's a cleaner, cheaper, and quicker solution and doesn't harm the odds of a future break-up.
And when it all came down to it, it didn’t matter. Chrome didn’t become dominant on the desktop because of government intervention and Safari didn’t become important because of DOJ either.
Apple and Google competed. Government intervention is rarely the right answer. A bunch of people who are both beholden to lobbyists and ignorant to technology won’t produce the outcome people think it will.
Besides, the last thing anyone should want is more government power. Given the choice between trusting the government - that has the power to take away my liberty and my money - with more power or trusting private corporations, I have much more to fear about government.
Fun timing to read this, this last weekend I was playing around with making my own search engine to understand better how ElasticSearch and Lucene work.
It occurred to me that the two most powerful things Google has to work with are records of clicks, and the time users spent on the webpages Google returned. I've argued against Google monopoly before because I can throw together a web crawler and search engine in a weekend, so it's not like it's a hard market to enter.
> According to W3Techs, Google Analytics is being used by 52.9 percent of all websites on the internet
This is the real problem though. When a search engine sees a new query, it uses everything it's got to assert which pages the user wants, but with Google Analytics, they can test their assertions constantly to see if a user actually wanted that web page. Then your future queries could be compared against previous queries that were validated by a user spending several minutes of active time on the returned page.
I'm sure Google's algorithm is great and all, but I really think this is what sets them apart.
I think you misunderstand what I'm saying. I'm not saying Google analytics will get you ranked higher or lower. Just that Google can use the data from analytics to tell if their results were what the user wanted.
The effect of this on Alphabet's revenue would be nil.
The majority of Google's Revenue comes from Google, Youtube, Gmail and Play. They make so much $ because they have the biggest network effect of advertisers-eyeballs in the world along with Facebook. That. Is. Unbreakable. Even more than a social network's network effect, because the friction to switch budgets and people in a company is higher than a guy telling their best friends to download an app.
And then, YT is a network effect. And then, Play/Android is also a network effect. And then there's the branding. But presumably every big company has the latter. Still, what a brand. Everyone knows what Google or Android is. Every. Single. Human.
Finally, because they make all this money, they can pay to be the default on the other half of the devices, Apple's devices, to use Google as default. Last time I checked, $5B a year.
Hence, this article is so bad.
I don't even care about Google, just saying.
edit: did I mention Chrome? They've got chrome too, with the googleverse as default.
Ye and anyway you could return x10 worse results than Google's current results and still become the new dominant search engine if you've got infinite $B a year to outbid Google to be the default on browsers and operating systems and a big salesforce to onboard the advertisers. Man this is the exact playbook they used to become the search engine in the first place. They literally were Yahoo's search bar at some point two decades ago.
I'm of two minds here: Google's whole reason for ascending to where they are is the PageRank algorithm which is why Google was created in order to monetize. I see this in similar veins to Apple and iOS: would we support calls for Apple to be forced to allow iOS to be installed on non-Apple hardware? If not, then why would we insist on Google giving up it's reason for being, it's reason a lot of us use it to find relevant information?
Then again, the concentration of power in a handful of operators likely threatens the open internet.
Not a lawyer, but this seems to be conflating patents with copyrights?
i.e. iOS (especially new versions) would fall under copyright protection [0].
PageRank is a patented technique for search. The patent apparently ran out about 6 weeks ago [0].
While both copyright and patents are intended to protect creators for a certain period of time, copyright protects a specific work and patents protect an idea. Patents should generally expire much more quickly since they cover a much broader topic.
I also realize both systems are completely crippled at the moment, but I'm trying to stick to what they're at least intended to be.
Is Google's Search quality that much better than Duck Duck Go or Bing? I get about the same quality of content when using any one of the three.
All three are terrible at serving me up websites to buy crap when I typically want to learn about something. I would love a filter that says "Don't try to sell me anything"!
I believe Google's biggest advantage is in their marketing & that the word "Google" now is a synonym for search.
I don't know the answer to this - you could be right, but just outright questioning that Google's search quality is no better than DDG or Bing and basing it off of a single data point is pretty foolhardy. Your second statement reinforces this point - I've run a lot of Google ads, and they're quite effective. Google ads are very effective at selling things to people.
In the future, when coming up with an opinion about something, I'd encourage you to look at statistics and combine that with your own personal experience. You'll often learn something you didn't know, and come up with a more grounded opinion.
A list of websites and their content is really not useful at all. Anyone can get this themselves with some really simple programming.
The actual hard part is when it comes to ranking and sorting the data in any useful way, and doing it within like 100ms. Plus various other issues like spam protection etc. This is where Google excels (at least in my opinion).
I wouldn't say that anyone can create an index with really simple programming. There are quite a few technical obstacles that "really simple programming" probably couldn't solve. That being said, I agree that any legitimate company would be able to create an index easily enough. The hard part is ranking and spam detection.
Because their even more monopolistic competitors are pissed, journalists are pisses because the world changed on them and they have to compete, combined with radicals with a persecution complex and a "no the children are wrong" attitude towards their opinions being unpopular. The tech break up push is one big hypocritical circle jerk from every party.
I don’t want to break Google’s monopoly on search. Google’s search is fantastic. It’s their advertising business knowing too much about me I care about.
The author's proposal, making Google's index publicly accessible, would leave Google search intact. Google's algorithm would presumably remain proprietary.
For people who like Google today, nothing would change. For people for whom Google falls short, there would be new options. Looks like a clear win-win for consumers.
>Google's algorithm would presumably remain proprietary.
This is not a good thing. Basic keyword searching on an index isn't viable to compete, then you go back to when keyword spammers were at the top. Google should be forced to distribute all their search technologies to the public with the threat of the Alphabet corporation being dissolved if they do not comply. That is the only way to get us out of this mess and break Google's stranglehold on the internet.
> the value of Google is the algo. The index is worth little.
But it's worth something. Giving DuckDuckGo direct access to Google's index, including the ability to train models on said index, would improve the competitive landscape.
Yeah it seems like this would spawn a lot of weak competitors who only exist because of the protections and would die off the second they're cut off from the API.
Maybe the practical indexing ability is worth more than the index itself. Most site owners happily let the Googlebot into their websites, while a competitor is likely to be caught into some sweeping anti-bot/Captcha measures.
I kinda disagree. 15 years ago you could generally type anything and expect the first search result to be the legitimate one. Now you have to dig through the list of SEO optimized scams (or at least dubious websites). Albeit it improved somewhat over the last couple of years.
Example: if you type "broadway shows phantom of the opera", the first non-ad link redirect to www.broadway.com. But www.broadway.com doesn't do anything: they just buy tickets from the legitimate website (www.telecharge.com in that case) with an extra ~$30 fee per ticket tackled on top. I won't go as far as to claim that Google is ok with it because broadway.com must spend tons of money on ads, and not telecharge.com... but I'm suspicious.
Counter example: "green card lottery" now points to the official us gov websites, while few years ago it was scams after scams all the way down.
And really, that seems like a good cutpoint to me. Search might be a natural monopoly. But advertising sure isn't. A lot of their ad stuff came via acquisition; what was bought can be spun off or sold again.
But the profit one can gain from crawling sure grows non-linearly with the number of pages consumed.
Thus, small businesses are less likely to reach the threshold of the index size required to make profit ratio comparable with that of their bigger competitors.
Google's search isn't even that great, to be honest. I use DDG for search, and use the !g operator when the DDG results aren't satisfying. But my experience with that is that the Google results are never good for those searches, either.
The thing the article misses is that search isn't really Google's crown jewel anymore. It's their position at the top of the adtech ecosystem, and while they bootstrapped that with search, I kind of doubt that search is the main thing driving their ad views today.
I think the better move is to find a way to kill the advertising business model that google relies on. If there's no advertising there's not as much need for all this data collection.
>the better move is to find a way to kill the advertising business model that google relies on. If there's no advertising there's not as much need for all this data collection.
What would be the alternative ad-free business model for Google to pay for its datacenters? Paid subscriptions?
I've been performing google searches for 20 years. In an alternate past universe... if I was paying $9.99/month for paywall access to their ad-free search engine, Google Inc would have more sensitive data collection about me -- not less. Google would have decades of my (sometimes embarassing) search history specifically tied to my paid account.
Ads can reduce privacy but they can also increase privacy by making explicit logins optional. (In other words, the anonymous google searches I did at the office during work are uncorrelated with the searches I do at home or at school. Without explicit Google account logins, the searches I do on my smartphone at Starbucks can be uncorrelated with the searches I do on the desktop at home.)
I don't like ads but the alternative ad-free revenue model is worse: Google Inc having my credit-card payment info (which means my real identity) and all my private search queries tied to it.
I'm not so sure about search being fantastic anymore. I'm frustrated with how most times Google will completely ignore the exact query terms and return very clickbaity/popular links from medium/quota/etc. It's been happening for a bit now and usually for any serious search, I'll compare results with duckduckgo just to be sure.
Google treads on censorship as it regards political discourse. I’m not referring to things beyond the pale. I mean they will censor seemingly small things like bury a politician’s peccadilloes or surface something that stains an opponent. That’s dangerous.
One obvious search query string is: reddit + candidate and see some surfaced and some less so.
I know it's hard to get definite proof, but what really makes you think Google employees (only a subset of whom have production access) would go out of their way to censor websites or results? If I search for [any US candidate] I always get both a news carousel and their official website. Maybe this happens in other countries where Google can throw their weight around without anyone noticing?
So although everyone likes to believe google is a monopoly it’s far from it. You have choices- bing, biadu , yandex, DuckDuckGo... there is also nothing about googles search position that prevents you from building a competitor. What we do have is peter thiel backing an administration that’s anti google, Russia, China that are anti google. Why? it’s a source of truth that challenges their lies. We also have an emergent anti ad - cult like backlash against personalized ads. So all of these factors combined and you get a lot of pressures mis information telling you google is evil. Additionally, karma , google led the charge against Microsoft with googles do no evil position against Microsoft- which did have an oem monopoly preventing others from competing. Anyways that is how I see it... so is google near to being a monopoly no I think they would need to be doing a lot worse things and there is room to compete and people should
Peter thiel was recently speaking at a conservative whatever in DC spreading nonsense and creating a bogeyman out of Google (but not FB obviously). That they're working on some evil AI. Anyone actually working in the field knows how blown out of proportion and cringy this AI doomsday narrative is but people like thiel don't waste a chance. I'd like someone to tell his Republican audience (pardon the stereotyping) that he is homosexual and recently obtained a New Zealand citizenship just in case. Then watch how his audience reacts.
ah yes, the only reason you could ever have for being anti google, et al, is that you want to lie...
You started with a decent premise, that google isn't an actual monopoly because it has "competitors" (and yes, those quotes are intentional). But then you go off the deep end and completely lost me to the point that I didn't even finish the entirety of your post.
There are many many reasons that someone could be against google's absolute market dominance.
> or browsers globally it's 63% that's not a monopoly...
You should look at markets, not globally. Google doesn't have a global monopoly (and likely never will because China, Russia won't allow their web to be controlled by a US company), but it has a monopoly in lots of markets (read: countries).
Google is a profit-seeking corporation that needs to expand into China to generate more money and bends the knee as much as it can to make inroads there. Far from being a source of truth they follow all Chinese censorship laws (why, because they would be blocked from operating there and unable to make money off of the Chinese). China likes everything about Google other than the fact that it is not a domestic corporation.
A company does not need to be a monopoly to face Anti-trust measures made in the public interest. In the 90s you could have run Linux on a PC, installed Netscape on a PC, or bought a Mac, but Microsoft was still treated as a monopolist. The same with IBM in the 70s. There were at least five to ten other major suppliers of mainframes, but IBM was so dominant in the market that they were aggressively regulated for anti-trust.
Google is not the victim of misinformation. Everyone is just starting to understand how important they are as the de facto gateway to information (regardless of choice) and mulling over the implications of it dominating add revenue, search, etc.
Carousel requiring (severely favoring?) AMP, requiring exclusively Google javascript/ads is definitely Googling using their advanced user base on search to favor their ad business.
They have already been condemned for their Google Shopping service for monopoly abuse on search.
Are you suggesting that we should discard criticism of google because Russia and China is anti google? China does not like Trump either, so should we not criticize trump because of China?
And I don't even get why you say China and Russia is anti Google. They are anti-information more than anti-google. If google allows them to control what information people get they will have no problem with Google. They are further anti-west and are concerned that Google will play ball with western governments to give them access to Chinese and Russian data. Maybe I am missing what you mean by them being anti Google.
No - but I do think we should be sure we read deeper and question who's motivating a certain position. Ask is there a source of truth that can confirm someone's position. For example, I can see google's search business controls about 90% of the search traffic. I see browser usage around 63%. For advertising (google main source of income) I see they don't dominate they share market with Facebook and more recently a lot of talk about Amazon's position being strong to take more of that market.
I don't necessarily think re-inventing the wheel every so often is the way to go on some things. When it comes to search, you want to type in your query, and search. It doesn't really get much simpler than having a search box, and I am having a hard time understanding how that can be innovated upon. I am however not saying it CAN'T be done, I just don't see why or how it would be.
"DuckDuckGo, which aggregates information obtained from 400 other non-Google sources, including its own modest crawler.)"
I looked into it, and it seems DDG is using Bing and Yahoo search API and lots of other sources. I looked into the pricing of Yahoo's search API / Bing search API,it ranges from $0.80 / 1000 queries to several dollars per thousand queries.
It seems to expensive to be economically viable with ads, what am I missing ?
I see the duck crawler on my websites a fair bit now so chances are they're eliminating this expense, but even in 2015 the CEO said they were profitable (although this wasn't at the same scale as now) https://fortune.com/2015/10/09/duckduckgo-profitable/
I agree. I've given up recommending DuckDuckGo to non-technical friends/family because they always get distracted by the irritating name. I wish they would change it to "Duck.com".
Did they buy it recently? I recall Google owning it for a bit through an acquisition? Yeah it certainly would be nice if ddg rebranded and pushed duck as it's name.
I wonder if it's even in America's interests to create a weaker Google/Facebook/Amazon/Microsoft. These companies are dominating globally (excluding China) and bring back so much money, jobs, and influence to America. Weakening them might allow real foreign competition to flourish.
Making the index available would be very valuable for scientific research. when I was a googler, we used the index (and google scholar's index) to do all sorts of interesting science projects (DNA search, gene search, etc) But we couldn't publish the results for various reasons. If the index (or a fragment of it) was available sitting in parquet files (or possibly a better format for indices) you could easily sit around doing spark jobs to extract all sorts of interesting web data.
There are tons of projects with serious sizable open source index, for ex, http://commoncrawl.org/. Even crawling 15B pages on AWS is not super expensive on your own using well established OSS tool chain. Bing already provides APIs if you don’t want to do crawling.
Web index, while important, is by no means most important part of search engine (which I had say is relevance). The OP article is written by someone who has little clue that search engine involves massive amount of technologies besides index and even relevance - everything from spell correction to recommendations to query rewriting to answers to segment specific searches like images/video/maps/local/product/news/entertainment/events, so on and on and on. This is above and beyond the gigantic infrastructure needed to run all these at scale, speed and cost effectively. All of these needs to work harmoniously with each other and designed keeping in mind weaknesses and strength of each component. For example, index for news must be refreshed almost in real time and relevance needs higher emphasis on location.
Search is not free and no one has figured out if someone just can provide index and someone else can provide relevance and everything can just work as efficiently as before (data point: each query consumes 0.3 Watt-hour of energy at Google).
I don't think this would help our social fabric, which the article says Google is "tearing apart." Why do we need to break Google's monopoly on search again? Which they don't actually have.
Well look at what they mean whenever people complain about tearing the social fabric it usually means "other people aren't conforming to how I think they should, this is new and I don't like it so therefore it is all its fault and we are doomes if we don't get rid of it".
Given that it has previous culprit of that exact same charge was gay marriage it is clearly a meaningless "family values" style euphemism to try to make their complaints seem valid.
If only replacing Google would be as simple as building another search engine. It’s like trying to beat WhatsApp by launching another messaging app. Any kid could do that. The tech is not the hard part.
I don't personally think Google's 'index' becoming public would make a single bit of difference on its position as number one search engine (and therefore advertiser) on the internet, unless you expand that word to include all the algorithms and infrastructure around it.
Bloomberg talks about 'making the index public' and the way to do it is with the API. But the API behaves like the website of Google and does not make the index of the website content available. Google's new AI-based algorithm is trained by humans with a strong bias. An example of this is the disappearance of website of respectable medical doctors who use non-mainstream methods. It is one thing to agree or disagree with someone, but completely dropping a website from search results is effectively censorship and who is Google to determine who sees what?
> Google is especially worrisome because it has maintained an unopposed monopoly on search worldwide for nearly a decade. It controls 92 percent of search, with the next largest competitor, Microsoft’s Bing, drawing only 2.5%.
1. The definition of the word "monopoly" is not applicable to this situation.
2. Property expropriation reduces the competition within the society, while also being unethical.
I don't see how this could reduce Google's "Monopoly".
Searching with an API is the same as searching from the browser. how the results are going to be sorted: garbage will be garbage, and should be demoted both in the API and HTML version. And of course, downloading the index is out of question.
I think that a better way to improve competition would be reducing cloud computing costs (bandwidth in particular). But of course, regulators in USA would say that it is communism or something... Sigh.
> This idea is a pipe dream. It would be nationalizing property
The author references precedent in the 1956 consent decree with Bell Labs [1]. If anything, that was more extreme. AT&T developed those patents.
The author's proposal doesn't involve Google surrendering its algorithm. Just the index it compiled from public resources using a publicly-subsidized Internet infrastructure. All without content owners' permission.
What about the computing and infrastructure resources that Google dedicated to build the index, and continues to dedicate to keep the index up to date?
This is a failed model. There is no incentive for Google to continue to update the public version, and it would quickly fall out of date while Google focuses on their own internal copy. It’s a solution suggested by lawmakers who fundamentally don’t understand how computers work.
> What about the computing and infrastructure resources that Google dedicated to build the index
How is this different from the 1956 consent decree? AT&T spent money developing its patents. But on the basis of longstanding law around public interest, it was forced to license them to third parties. (Note: not give them away.)
> and continues to dedicate to keep the index up to date?
The author explicitly contemplates, again within the context of the 1956 consent decree and many subsequent and preceding actions by the U.S. government (mostly around pipelines, et cetera), use fees paid to Google.
> There is no incentive for Google to continue to update the public version, and it would quickly fall out of date while Google focuses on their own internal copy
Google wouldn't be permitted to maintain dual states. Its algorithms would have to use the public index.
Well, the author is an idiot. A search index (as far as the compiled listings of things, not the tech that runs it) would obviously not be covered under this ruling.
Google can't have a monopoly on data gathered from the publicly available internet, that's absurd.
The difference in this case is that the usual defenders of private property rights and libertarianism feel as though Google is suppressing them removing the usual roadblock to this kind reform.
This is a rare instance where anti-corporate leftists and dejected right wingers could actually do something substantive together.
In most cases I would agree with you the defenders of private industry are pretty fierce in the US, but all that ideology sort of dissipates when you feel like you are being oppressed (whether or not, or to what degree it is true).
Some people are arguing that catering search results or what content is allowed on a platform to a specific set of political views makes those platforms publishers rather than mere platforms. Apparently this also has some implications in some political campaign laws I don't really understand.
I think we're heading towards political ideologies having the same protection as religious institutions (which IMO, are exactly the same in all practical matters).
I agree that political ideologies are similar to religions and that protections may be extended to them (although they should already be covered by what is in the constitution). What I don't follow is why it matters that platforms are becoming publishers.
The government has never had the right to meddle in what publishers decide to publish (with exceptions for regulations on pornography and classified information). I can't imagine the government forcing Mother Jones or The Nation to print conservative viewpoints. If the platforms become "publishers" their power to select what information is available increases. They become liable for more of the content on their sites, but they also gain full control over content.
The only way I see to protect freedom of access to information would be to declare certain spaces (Facebook, Reddit, YouTube, etc.) as "privately-owned public forums" where suppression based on ideology would be heavily restricted.
Personally, Bloomberg seems owned by some behind-the-scenes communist group: how can they propose such anti-free-enterprise drivel unless for sensationalism-driven pageclicks or they're pushing a funded agenda? I seriously wonder that.
Bloomberg are obviously rather pro free-enterprise
Though, I'd say totally free enterprise alone doesn't actually work. It naturally leads to monopolies (as is pretty much the case with bloomberg terminals for example), and the law of the jungle in general. If that's the kind of society you want, then head to many 3rd world countries.
> Bloomberg seems owned by some behind-the-scenes communist group
Bloomberg, LP is owned and controlled by Michael Bloomberg [1]. I can't believe I have to write this, but Michael Bloomberg is not a communist. (Nor is the author's proposal remotely communist.)
No sure if relevant but I've hated google's search for a unique reason: I find it horribly slow, especially compared to what it used to be. It's so slow, I've designed + partially implemented an alternative for my own use: https://github.com/Jeffrey-P-McAteer/dindex
I've only tested with 1000 records, but the query times are all <200ms.
It's easy to achieve good performance on small datasets. But we are speaking of Internet scale!!! And not only that, Google is capable of serving responses to millions of persons!!!
The index take a lot of disk, memory, computing power, bandwidth... And then you have to handle attacks, spam, dead websites and not going bankrupt in the process.
I don't know you, but my best toy website is only capable of serving 100k reqs/second in localhost if serving pong...
Maybe it's your part of the world, but my TTFB to google search pages is less than 150ms and the "generated in x seconds" is usually less than 1 second. That's pretty good for an index searching effectively every public internet page.
How does your performance scale? If it scales linearly in records, you're going to have trouble if you're only below 200ms on 1k records. If it scales logarithmically, there are still a couple orders of magnitude between a comprehensive index like Google's and your 1k record index, which is still going to give you trouble.
In short, 200ms is slow, not fast.
EDIT: Just to clarify, I'm not saying it's necessarily easy to be faster than 200ms (I'd need to look closer into fuzzy text searching algorithms to be able to make an educated guess here); I'm just saying it's not fast enough.
The main problem is, I think the author is wrong about what Google's "crown jewel" is. Yes, Google has a huge index, but most queries aren't in the long tail. Indexing the top billion pages or so won't take as long as people think.
The things that Google has that are truly unique are 1) a record of searches and user clicks for the past 20 years and 2) 20 years of experience fighting SEO spam. 1 is especially hard to beat, because that's presumably the data Google uses to optimize the parameters of its search algorithm. 2 seems doable, but would take a giant up-front investment for a new search engine to achieve. Bing had the money and persistence to make that investment, but how many others will?
reply