Bing

The pernicious perfidy of page-level web spam (SEM 101)

February 11, 2010, 01:16 PM by Rick DeJarnette | 6 Comments |

In the exciting world of today's Internet, where the world's information is literally at your fingertips, where you can endlessly communicate, shop, research, and be entertained, spam is a big downer. The unwanted email spam that fills our inboxes also consumes huge portions of the available bandwidth of our routers and trunk lines. But email is not the only spam game in town.

Web spam is the bane (well, one of the banes) of the search engine and web searcher communities. Search engines want to provide search users with a great experience, helping them find what they want as quickly and as easily as possible. Search users want to use search engines to get the right information they seek as quickly as possible. And webmasters want search users to find their websites, but also to get those search user visitors to become conversions instead of bounces.

Web spam, those unwanted garbage pages that use overtly deceptive search engine optimization (SEO) techniques and contain no valuable content, is a frustration to search engines and search users alike, and ultimately work against the best interests of conversion-seeking webmasters (severely annoying a potential customer is rarely a great sales technique!).

In the previous article that defined web spam and discussed how it is different from junk content, we mentioned that there are two types of web spam. In this article, we're going to delve into the details of the first type: page-level web spam.

Definition of page-level web spam

Page-level web spam uses on-page SEO trickery (not to be confused with link-level web spam, which we'll discuss in an upcoming article). Webmasters and optimizers for these sites do this because they believe they can fool the search engines into giving their webpages a higher-than-deserved ranking based on their content relevancy, often times for subject areas that are completely unrelated to the site's actual content. This is done in an effort to deceive searchers into visiting their spammy sites for a multitude of reasons, none of which usually benefit the end user.

The use of the following questionable SEO techniques will cause Bing to examine your site more deeply for page-level web spam. If your site is determined to be using web spam techniques, your site could be penalized as a result.

Note that Bing recognizes that the core concepts behind many of these techniques can have valid uses. No one is saying that their use always and automatically denotes web spam. The issue of intent behind their use is the distinguishing factor for determining whether or not web spam is present and any site penalties are needed. Please understand that, from a search engine perspective, the web spam effort consistently provides very little to no value whatsoever to end users. The entire effort is directed to fraudulently affect search engine rankings. As Martha Stewart might say, that's not a good thing.

Keyword URL and link stuffing

Definition: This is the use of heavily repeated keywords and phrases with the goal of attaining a more favorable ranking for those words in a search engine index.

Problem: Keywords can be repeated to excess, so much so that they render any text in which they appear unintelligible from a natural language point of view. Those excessive repetitions can also be added in places that are not seen by the end user (meaning outside of displayed page text). Some web spam pages even use repeated keywords that are unrelated to the theme of the page. If any of these conditions are detected, these techniques will draw the attention of Bing as likely web spam.

What we look for: The purveyors of web spam use a variety of methods for keyword stuffing, including:

  • Excessive repetitions of keywords. The number of repetitions relative to the amount of content on the page is a key indicator of web spam. The practice of repetitive keyword stuffing is often relative to the amount of content in a page. For example, a very long page of text dedicated to a single topic may naturally repeat its primary theme keyword several times, but a page with less content using the same number of repetitions of the same word may be indicative of keyword stuffing.
  • Stuffing words unrelated to the page or site theme. Stuffing the page with words that are known to be heavily searched on the Web when they are irrelevant to the theme of a site can be an indicator of web spam. Relevance is an important factor for evaluating whether keywords are indicators of web spam.
  • Stuffing on-page text. Littering the text of a page with repeated keywords that render the text meaningless and unreadable to humans is a clear problem. When such content on the page is not useful to people, the content is often suspect as web spam.
  • Stuffing in less visible areas of the page. Placing repeated keywords in less visible areas of a page, such as at the bottom of the page, in links, in Alt text, and in the title tag, can be indicative of web spam.
  • Hiding stuffed keywords in the code of a page. By putting keywords in the code of a page that the search engine crawler (aka a bot) will see but configuring it so that a web browser will not show it to a human reader can be highly suspicious. Such methods as formatting text fonts the same color as the background, using extremely small fonts, and hiding stuffed keywords using tag attributes such as style="display: none" and class="hide" (both of which prevent the tagged contents from being shown to the user) will draw the attention of a search engine for closer scrutiny.

Note that stuffing the keywords <meta> tag alone is not a reason to be judged as web spam. But <meta> tag stuffing could be an indicator that other web spam techniques may be employed and could draw a search engine to take a closer look at such a site.

It is important that webmasters not overreact to this information. A small amount of relevant keyword repetition is considered common and is not considered web spam as long as it is used naturally within the page content language and the page provides useful, relevant content. They key message is always the same: develop your pages for human readers, not for search engine bots, for the best results. For more information on creating and using keywords wisely, see the blog articles The key to picking the right keywords and Put your keywords where the emphasis is.

Misspelling and computer generated words

Definition: Pages populated with many various spellings of targeted keywords, especially those unrelated to the theme of the page or the site, can indicate that the keyword lists are computer generated.

Problem: Aggressive inclusion of large numbers of misspelled or rare word lists and phrases can be considered web spam when used to excess. The relevance of those words to the theme of the page or the site is the key distinguishing factor here.

What we look for: The Bing team commonly sees the following techniques on web spam sites:

  • Excessive use of misspelled keywords. Huge lists containing all possible iterations of a misspelled word can be so excessive that the page will be worthy of closer inspection for web spam.
  • Large numbers of misspelled words unrelated to the theme of the site. Long lists of word spelling variations whose core definitions are unrelated to the theme of the page or the site can indicate the site is web spam.
  • Common misspellings of popular site URLs in domain names. Common misspellings of URLs and other computer-generated content are usually considered web spam sites.

Redirecting and cloaking

Definition: When a web client visits a website, certain traits can be used to identify the user and redirect them to a different page. These include, but are not limited to, redirects based on the referral code, the user agent (bot or human), and IP address.

Problem: Redirecting can be a legitimate technique in some cases such as if a web client is limited in what it can display on a mobile device web browser, or when a web server uses the client's IP address to determine the language in which to present the content (aka geo-targeting). However, problems arise when sites filter their content based on whether the user agent belongs to an end user web browser versus a search engine bot. This type of filtering can run the gamut between showing the bot a keyword-stuffed page to an entirely different set of content, all of which is an attempt to deceive. When used with this intent, this is web spam.

What the webmasters who implement these techniques don't understand is that search engines can detect this attempted deception. We do see when the content presented is user-agent based, and when the differences between the content variations is not done in the same light as that done between mobile and desktop browsers.

What we look for: Some webmasters design their websites to use the following deceptive techniques when the detected user agent is a search engine bot:

  • Script-based redirects. The use of JavaScript or <meta> tag refreshes to automatically change which page is displayed are often suspicious in nature and will get more scrutiny from Bing. This is because some sites use JavaScript to redirect all visiting user agents to a new page, and that page may contain web spam. However, since search engine bots don't execute JavaScript natively, they won't execute the redirect and thus are supposed to index the contents of the original page (although the search engines bots can still detect this behavior).
  • Referral redirects. Some websites consider the referrer when they show a page. When the referrer is a SERP and the target website shows a different page than the one shown when the user directly navigates to the URL, this behavior is considered web spam.
  • Redirect search engine bot to a target page. Some sites detect the user agent specified and send search engine bots to alternate, text-based pages modified with other web spam techniques such as keyword stuffing (but the site provides its normal web content pages to end user web browser user agents). When redirects are filtered on search engine user agents for the purpose of deceiving them, this is a web spam version of cloaking. Bots can detect when they are redirected to special pages. So when this is encountered, it is usually indicative of web spam and will be investigated further.
  • Redirect end users to a target page. Sometimes webmasters use cloaking to work the opposite way than described immediately above. They may serve highly optimized content pages on Topic A to search engine bot user agents, but when a web browser visits the site, the page shown shows content for a completely different subject (typically an illicit one, such as a page promoting porn, casino or online gambling, illicit pharmaceuticals, and the like.). The effort here is to rank well for a commonly searched topic of interest in a search engine results page (SERP). Then supposedly when searchers find that link in their SERPs, they click the blue link in the SERP and are unwittingly redirected to the web spam page.

The problem for webmasters practicing these techniques is that their technical deceptions are not very effective. Search engines use a number of techniques to uncover such fraudulent practices as redirect and cloaking web spam. When they are revealed, the websites of the perpetrators are penalized, sometimes severely. Well-meaning webmasters or online business owners who hire unscrupulous consultants or carelessly take black hat SEO advice from indiscriminate sources on the Web are setting themselves up for trouble. Reviewing the issues identified in this article as well as the official webmaster guidelines for Bing, Yahoo, and Google, will go a long way to keeping a website on the right track for search.

In the next article on web spam, we'll discuss link-level web spam in detail. We'll also include some information on what to do if your site was pegged as web spam and after the problems have been resolved, how to request reinstatement into the Bing index as a normal website. Stay tuned!

If you have any questions, comments, or suggestions, feel free to post them in our Ranking Feedback and Discussion forum. Until next time...

-- Rick DeJarnette, Bing Webmaster Center

subscribe

Comments

hotels

Posted On February 12, 2010, 03:23 AM

That was a very interesting read. On my blog I once overused a keyword without realising it and had to restart again. Sometimes I get carried away and have realised its more better to optimise the webpage for visitors and not search engines.


Edmond

Posted On February 14, 2010, 02:02 PM

Hi Rick,

This is a great post! But I have a few questions, in terms of keyword density. Sometimes the information presented in a webpage makes sense to the reader only when repetitively using a certain word, in my case, the word 'tax', for example: 'tax agent', 'tax professional', etc. (the meaning is completely different without the word 'tax', and would thus not be the intended one). So in these cases one can very quickly arrive at a large 'density' for certain words (without which the page would have no meaning or the wrong meaning). Thus the question: what can the be done in these cases, and how can Bing distiguish between legitimate cases and spam? Also, how can one know when his/her webpage is considered as spam or border-line spam?

Regards,

Edmond

Taximise Pty Ltd  


Sohrab Khan

Posted On February 17, 2010, 11:23 PM

edmond has a very valid point. Also some SEO experts put hidden links to their clients' websites in some very high ranking webpages like place a link to a client's website & the link is actually placed on a spacer which isn't visible. How do bing tackle that?


Rick DeJarnette

Posted On February 18, 2010, 04:29 PM

Edmond,

Great questions! The key here always comes back to how the content appears to the human reader. Is it logical? Is it readable? Does it make sense? In this particular case, the repeated use of the word "tax" in content regarding tax services offered is reasonably expected and thus is fine. In fact, including a solid set of explanatory content that defines these keyword phrases only strengthens the case for reasonably repeating this word. If the use of this repeated word makes sense to the reader and is not a clumsy attempt to stuff the word in where it's not necessary or helpful, and you have a good amount of supporting content to accompany it, you'll be fine. Our crawler sees this usage and understands it is legitimate. Just write for the reader's comprehension and the crawler will not penalize you for keyword stuffing.

The important thing to remember is that true web spam often involves multiple issue violations. As such, it typically takes more than one violation to trigger web spam consequences – having a slightly above average number of keywords won’t automatically torpedo you. Just as you need to do several things well to improve your ranking (build good content, build valuable inbound links, target several keywords, etc.), you need to do several things wrong to really hurt your ranking. That said, if it’s obvious that you are trying to abuse the system, even with just one egregious issue, then penalties will ensue.

Lastly, we don't define any borderline between acceptable and non-acceptable web spam. If you think what you've done might be considered web spam because you know you're trying to game the system, then take a different approach to optimizing your pages. I'll repeat my mantra: write content for the human reader, not the crawler. Develop good, unique content that is readable, understandable, and valuable. If you do this without involving any black-hat, SEO-style trickery in an effort to artificially boost your ranking, then you'll never have to worry about this being an issue.

Thanks for writing!

Rick DeJarnette

Bing Webmaster Center team


Edmond

Posted On February 20, 2010, 06:23 PM

Hi Rick,

Thanks for your reply!

Your answer is pretty important for me as I am trying to understand if I did something wrong in terms of content of my webpage (as I have got only my main page indexed to date by Bing): one of the obvious things to question was "keyword stuffing" (I believe is called), as I cannot do anything about repeating the word 'tax' without altering the meaning for a human reader. With this out of the way, I will now focus on "repetition", increasing the value of content I provide, etc. Maybe this will help.

Thanks again, I appreciate.

Regards,

Edmond

Taximise Pty Ltd


Welcome

to the Microsoft Bing community

Remember, don't post your personal information!