Bots Directory

Explore a comprehensive directory of bots and view their detailed information

Filter by operator
Filter by category
common.kind
542 results

The Apple App Site Association is used to support "Universal Links" that can open in native iOS apps. The bot requests a specific path for a given hostname, which returns metadata that associates certain URL patterns with native iOS apps.

Apple

AdIdxBot is the crawler used by Bing Ads. AdIdxBot crawls ads and follows the websites from those ads for quality control. Just like Bingbot, AdIdxBot has both “desktop” and “mobile” variants.

Microsoft

AhrefsBot is a Web Crawler that powers the 12 trillion link database for Ahrefs online marketing toolset. It constantly crawls web to fill our database with new links and check the status of the previously found ones to provide the most comprehensive and up-to-the-minute data to our users.

Ahrefs

AllAfrica Global Media produces, aggregates and distributes news from across Africa, relying on agreements with more than 140 news organizations and over 500 other institutions and individuals. The AllAfrica NewsBot scrapes content from sites with whom AllAfrica has written agreements, or whose content is available without licensing restrictions or otherwise freely distributable. In all cases, the author and institution is credited in full.

AllAfrica Global Media

Amazonbot is Amazon's web crawler used to improve our services, such as enabling Alexa to answer even more questions for customers. Amazonbot is a polite crawler that respects standard robots.txt rules and robots meta tags.

Amazon

Amazon Kendra is a highly accurate intelligent search service that enables your users to search unstructured data using natural language. It returns specific answers to questions, giving users an experience that's close to interacting with a human expert. It is highly scalable and capable of meeting performance demands, tightly integrated with other AWS services such as Amazon S3 and Amazon Lex, and offers enterprise-grade security.

Amazon

Applebot data is used to power various features, such as the search technology that is integrated into many user experiences in Appleʼs ecosystem including Spotlight, Siri, and Safari.

Apple

The Internet Archive bot, also known as archive.org_bot, is the web crawler for the Internet Archive's Wayback Machine. It systematically crawls and preserves publicly accessible web pages for historical record.

Internet Archive

Audisto Crawler fetches all accessible URLs of a website. Audisto provides a service to audit and monitor websites for its customers. More information about the crawler is available here: https://audisto.com/bot

Audisto

The Authory bot visits websites to back up articles on behalf of journalists and other writers who use the service.

Authory

On Tumblr, post authors can paste a URL in their post, and we'll "unfurl" that URL into a pretty Link "Block" for their post by making a request to the URL and parsing the response.

Automattic

AwarioSmartBot is a web crawlers sent by Awario to discover and collect new and updated web data (that is further used by Internet marketers from all over the world).

Awario

The Bazqux Fetcher is how BazQux Reader grabs RSS/Atom feeds and com­ments when users choose to sub­scribe to your blog in BazQux Reader. Fetcher col­lects and pe­ri­od­i­cally re­freshes these user-ini­ti­ated feeds.

Bazqux

SEO PowerSuite Link Explorer (webmeup.com) is the world's freshest backlink index, and the primary source of backlink-related data for the SEO PowerSuite tools. We're dedicated to providing SEOs with the most comprehensive, up-to-date backlink data on the Web.

WebMeUp

Blogtrottr delivers updates from all of your favourite news, feeds, and blogs directly to your email inbox, giving you the flexibility to stay updated whilst on the go.

Blogtrottr

Autopilot is an SEO marketing automation tool that includes features for internal linking and image optimization. We crawl customer sites so that we can determine the best links to use on the site and to find images that need to be optimized.

BrightEdge

Bushbaby is an internal bot used by Cloudflare. Its purpose is to manage and renew SSL certificates for websites that use Cloudflare's services.

Cloudflare

Cert Chief is a certificate monitoring tool that periodically crawl web properties to check their configuration and reports problems and changes when they are detected.

Chief Tools

ChatGPT-User is for user actions in ChatGPT and Custom GPTs. When users ask ChatGPT or a CustomGPT a question, it may visit a web page to help answer and include a link to the source in its response.

OpenAI

Checkly is a high-programmability active monitoring solution. We support users in monitoring their websites and APIs. Puppeteer and Playwright (both supported) are browser automation tools that can be used for a variety of tasks. For testing, they really are about E2E/component testing, not unit testing.

Checkly

Chrome-Lighthouse is an automated, open-source tool for auditing web page quality and does not operate as a traditional web crawler. It runs a series of audits against a given page to generate a report on performance and accessibility.

Google

Renders web pages in headless browsers for Cloudflare customers. Used for browser automation (screenshots, PDF generation, content extraction, etc.) and for AI agents to interact with the web. Used by Cloudflare customers via Workers bindings and REST API. Does not include the /crawl endpoint, which has a separate bot identity (Cloudflare Crawler - Signed Agent).

Cloudflare

The Cloudflare Crawler is a well-behaved crawler that retrieves web content. By default, it self-identifies as a bot, honors robots.txt directives, and cannot bypass CAPTCHAs or bot protection. Used by Cloudflare customers via the Browser Rendering /crawl endpoint.

Cloudflare

URL prefetching means that Cloudflare pre-populates the cache with content a visitor is likely to request next. This setting leads to a higher cache hit rate and thus a faster experience for the user. (https://developers.cloudflare.com/fundamentals/speed/prefetch-urls/)

Cloudflare

Cloudflare CSUP is a bot used by Cloudflare's customer support for diagnostic purposes. It is not a general web crawler and is used to investigate technical issues with customer websites.

Cloudflare

Coinbase Webhooks are automated messages sent from the Coinbase platform to a user's server, used for notifying users about events such as receiving crypto payments.

Coinbase

ContentKing is a cloud-based service that monitors websites from a digital marketing perspective. We monitor the websites for customers such as Netflix, Atlassian, Fedex and IBM and alert their digital marketing teams whenever a technical issue or content change is detected.

ContentKing

Within the GDPR legislation it is mandatory to ask a visitor for permission before placing so-called marketing or tracking cookies. Many websites contain a cookie notice, but what is not clear to everyone is that those cookies may not be placed before the visitor has given explicit permission. Cookie Maestro searches for all cookies that your website places in your visitors browser.

Cookie Maestro

Coveo provide services to website, customer service and commerce solutions so they can feature relevant experiences to their end users; said services are based on a unified index which crawls websites when configured so by our customers.

Coveo

DataForSEO Bot is a driving force of our leading product - Backlinks API, which has been developed with a single purpose: providing website owners, webmasters, and SEO professionals with opportunities to analyze the key component of website optimization – backlink analytics. You can learn more about the DataForSEO Bot on this dedicated page: https://dataforseo.com/dataforseo-bot

DataForSEO

Detectify analyses the security level of web applications. To start a scan a user has to add an access key to the domain and agree to their terms.

Detectify

The Mediatoolkitbot is a media monitoring tool that crawls the open internet looking for phrases Determ users search for, helping marketers find relevant opportunities for advertising.

Determ

The discordBot scrapes URLs that are shared within the Discord chat platform. This is done to generate contextual previews of the content, including titles, descriptions, and images.

Discord, Inc.

Dotbot is Moz's web crawler, it gathers web data for the Moz Link Index. This data we collect through Dotbot is available in the Links section of your Moz Pro campaign, Link Explorer, and the Moz Links API.

Moz

We scrape full article/page content to ensure we can optimally automate the content distribution for the digital publishers we work with. Every single article a publisher releases will get scraped approx. 2-4 times by independent services.

Echobox

Ezoic is a technology platform for digital publishers. You can learn more about what Ezoic does here.EzoicBot is our web crawler designed to extract valuable information about how the internet, search engines, and websites all work together. EzoicBot can helps publishers better understand how their sites work. This includes the ability for search engines, like Google, to index and rank their content.

Ezoic Inc

The primary purpose of FacebookExternalHit is to crawl the content of an app or website that was shared on one of Meta’s family of apps, such as Facebook, Instagram, or Messenger. The link might have been shared by copying and pasting or by using the Facebook social plugin. This crawler gathers, caches, and displays information about the app or website such as its title, description, and thumbnail image.

Meta

Foregenix perform security and risk scanning on the web sites of eCommerce merchants for a number of banks and card brands globally. The service assists these organisations in controlling and identifying fraud and financial losses, with a particular focus on trying to identify compromised merchants before they end up in the card brand's compromise investigation process. Early detection (prior to fraud losses escalating) can save the banks and merchants alike considerable sums. The solution has two primary modes of operation Scanning for active malware, this normally entails pulling a very limited number of pages within a sandboxed context for analysis at various stages of DOM initialisation. From the target sites perspective, the operation is simply another browser requesting a small number of pages as normal. Scanning for known publicly exploitable vulnerabilities and outdated software solutions as these attributes are frequently exploited by threat actors to introduce malware targeting financial information. Typically a complete scan comprises less than one hundred requests and is already rate limited on our side. Scanning is always "passive" in nature, relying on GET, HEAD and OPTIONS requests only. The scanning heads by default abide by the "robots.txt" file but this can be overridden by the scan initiator (usually one of our banking clients). This override, to force a scan/assessment is not actioned all that frequently.

Foregenix Limited

FullStory is your digital experience analytics platform for on-the-fly funnels, pixel-perfect replay, custom events, heat maps, advanced search, Dev Tools, and more. FullStoryBot’s fetches and stores assets required to rebuild sites when viewing recorded sessions.

Full Story

Google-InspectionTool is the crawler used by Search testing tools such as the Rich Result Test and URL inspection in Search Console. Apart from the user agent and user agent token, it mimics Googlebot.

Google

Generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development. https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#googleother

Google

Verification is the process of proving that you own the property that you claim to own. Search Console needs to verify ownership because verified owners have access to sensitive Google Search data for a site, and can affect a site's presence and behavior on Google Search and other Google properties. A verified owner can grant full or view access to other people.

Google

APIs-Google is the user agent used by Google APIs to deliver push notification messages. Application developers can request these notifications to avoid the need for continually polling Google's servers to find out if the resources they are interested in have changed. To make sure nobody abuses this service, Google requires developers to prove that they own the domain before allowing them to register a URL with a domain as the location where they want to receive messages.

Google

GTmetrix is a free tool that analyzes a page's speed performance. Using PageSpeed and YSlow, GTmetrix generates scores for pages and offers actionable recommendations on how to fix them.

GTmetrix

GuestpostsBot is a Web Crawler that has several functions to facilitate the website owner who has registered his site on the guestposts.com.br platform to monitor his site. The bot constantly tracks the sites registered on the platform in order to check if the partnerships made on the guestpost platform are still active, in addition to validating if the site exists to allow registration and also monitoring the status of the site from time to time to warn the website owner in case of any inoperability.

Guest Posts

HelloWork is a French job board, and its bot aggregates job listings for its platform. It crawls company career pages and other sources to collect this information.

HelloWork Group

Short Description: HIFI is a financial services company for musicians and professional creators. HIFI acts as an agent on behalf of its clients to automate the retrieval and processing of royalty earnings statements. HIFI’s clients provide access credentials for each of their portal accounts and then HIFI automates the otherwise labor intensive process of logging into each portal, downloading and then processing the relevant CSVs. HIFI analyzes and aggregates the underlying data and presents its clients with a business management comprehensive dashboard.

HIFI

HubSpot offers a full platform of marketing, sales, customer service, and CRM software — plus the methodology, resources, and support — to help businesses grow better. Get started with free tools, and upgrade as you grow.

HubSpot

Huckabot is Huckabuy’s main crawler which is utilized by almost all of Huckabuy’s products. The primary purpose of Huckabot is to crawl and index a customer’s website, which is then rendered and optimized with our Dynamic Rendering Product. Several of the Page Speed product boosters, such as Fold Prioritization, also leverage Huckabot in order to optimize and improve a website’s performance.

Huckabuy

ICC-Crawler automatically crawls the Internet and collects web pages. ICC-Crawler is operated by the Universal Communication Research Institute at the National Institute of Information and Communications Technology (NICT).

NICT

KargoBot-Artemis is Kargo's autonomous content verification bot. It's a simulation of a user on an iOS device. The bot is used to scan sites for content that may be unsuitable for customers on the Kargo ad network.

Kargo

The integration with Klaviyo will automatically capture information about who visits your site and views your products including details on what they viewed so you can send super personal follow up emails

Klaviyo

The Library of Congress Web Archive manages, preserves, and provides access to archived web content selected by subject experts from across the Library, so that it will be available for researchers today and in the future. More information on the programme here: https://www.loc.gov/programs/web-archiving/about-this-program/ And information about crawling policy here: https://www.loc.gov/programs/web-archiving/for-site-owners/

United States Library of Congress

LoomlyBot is used to extract metadata from web pages in order to show a social media post preview within Loomly so that clients can see what their social media posts will look like when published.

Loomly

MagiBot is owned by Peak Labs which focuses on the research and development of information extraction and retrieval technology to transform knowledge in natural language into immeasurable value.

Peak Labs

MediaMonitoringBot crawls and indexes news and media publishers websites for a new materials and try to match it against keywords provided by our customers (subscribers) and send them updates based on that information.

MediaMonitoringBot

This bot is used to aggregate data about a popular online multiplayer game from consenting hosts who have opted-in to this collection. The data that is aggregated is reflected in a panel where players can freely search through the aggregated data. Any modification or deletion of data from the sources (consenting hosts) is reflected within the application's database within 30 minutes. The application scrapes each host for new data every five minutes, with a more thorough check for modified data every 30 minutes.

MelonMesa

Analytics and email automation service used by eCommerce businesses. Metorik syncs data from customer sites by making API requests to their sites.

Metorik

MJ12bot is the web crawler for Majestic. MJ12Bot does not currently cache web content or personal data. Instead it maps the link relationships between websites to build a search engine. This data is available to technologies and the public, either by searching for a keyword or a website at Majestic.

Majestic

This is for the official public hosting of the open-source project https://github.com/synzen/MonitoRSS so that the bot may poll for RSS feeds of Cloudflare-protected sites to deliver news articles. Feeds are chosen by paid users, and the bot adds them to a schedule to be polled at a regular interval of 2-10 minutes.

MonitoRSS

MSNBot was the web crawler for Microsoft's MSN Search, which has since been replaced by Bing. Its purpose was to index web pages for inclusion in the MSN search engine.

Microsoft

Clearscope is an AI-driven SEO content optimization platform developed by Mushi Labs. It assists content creators, marketers, and SEO professionals in producing high-quality, search-optimized content by providing real-time keyword recommendations, content grading, and insights into search intent. By analyzing top-performing content, Clearscope offers actionable suggestions to enhance content relevance and visibility in search engine results.

Mushi Labs

Yeti is the web crawler for Naver, a South Korean search engine. It indexes websites to provide search results and power other services on the Naver platform.

Naver

We run a cloud based site speed optimization solution. As such, we need to make requests to our clients' sites in order to fetch the content that needs to be optimized. We have several sub systems that can fire requests and each one can be identified based on the user agent suffix.

NitroPack Ltd

Our microservice downloads js files from our users servers in order to format them and show them a human readable file. This is done to facilitate solving errors associated with said file

Noibu

Our API is used by mostly consumer facing products to preview links when sharing them on their platforms. For example, how when a link is shared on Facebook or Slack, those platforms provide a description/title/image to make the content more enticing.

Opengraph

The Orlo Link Preview bot is used by the Orlo social media management platform. It fetches previews of links that are scheduled to be published in social media posts.

Orlo

Overcast is a podcast player application, and its bot fetches RSS feeds and audio files from podcast hosting servers. This keeps the podcast directory and episodes updated for its users.

Overcast Radio

A component that serves to load previews for external and internal links. For external links, whenever possible, information from the open graph tags specified on the page (title, descr, images\video) is used, for references to internal objects, the internal representation is used (in the form of specialized blocks in the topic).

Ozon

The PayPal webhooks is part of Paypal's Instant Payment Notification message service, automatically notifying merchants of events related to Paypal transactions.

PayPal

Qualys Web Application Scanner is a cloud-based service that provides automated crawling and testing of custom web applications to identify vulnerabilities including cross-site scripting (XSS) and SQL injection.

Qualys

PetalBot is to access both PC and mobile websites and establish an index database which enables users to search the content of your site in Petal search engine and present content recommendations for the user in Huawei Assistant and AI Search services, both services are powered by Petal Search engine.

Huawei

Pinterestbot is Pinterest’s web crawler. Pinterestbot crawls, or visits public websites to index their content, with the aim of driving traffic back to those websites. It also scrapes content to make sure Pin details, like price and title, are up to date, and to detect and remove broken website links behind Pins.

Pinterest

The PressEngine Bot verifies coverage created by video games press as genuine and their own creation. When a member of the video games press is granted a review key for a video game they will create an article, known in the industry as "coverage". When they submit a URL to us as "coverage" we automatically verify this URL exists and is viewable. This automated code announces itself as the PressEngine Bot.

PressEngine

Extract Content to Show Print Friendly version. Publishers typically embed our button - https://www.printfriendly.com/button - so that their visitors can view a Print Friendly Page and/or create a PDF

PrintFriendly.com

Project Shield, created by Google Cloud and Jigsaw and powered by Google Cloud Armor, provides free unlimited protection against DDoS attacks, a type of digital attack used to censor information by taking websites offline.

Google

PWABuilder (pwabuilder.com) is a free, open source developer tool from Microsoft that helps developers build progressive web apps and publish them in app stores. PWABuilder tool analyzes their website for Progressive Web App capabilities, such as a web manifest or service worker

Microsoft

Readable is a collection of text analysis tools, primarily focused on clarity and plain language. We spider customers' websites, find the content of each page, analyse it, and present that to the customer.

Added Bytes Ltd

With the iboss Cloud Platform, each customer gets dedicated source cloud IP Addresses which are associated with the organization. Because of this, any data traversing the global cloud containerized gateways in the Platform will have a uniquely associated IP Address that can be mapped to the organization. This means that users always appear to be accessing the Internet from within the organization regardless of whether they’re in the office or on the road. This preserves the critical connectivity requirements that IT departments need when migrating to a cloud gateway platform.

Reward Gateway

Seekport is an internet search engine. Originally founded in 2003, the search engine has been operated by SISTRIX, a platform intelligence provider from Bonn (Germany), since December 2014. The search engine is a public, free and independent alternative to Google. Seekport does not store user data and does not profile users. Seekport is also operated without advertising and has no conflicts of interest in the display of search results.

SISTRIX

Splunk Attack Analyzer (formerly known as TwinWave), visits URLs submitted by customers using a headless Chrome browser. DOM (Document Object Model), HAR (HTTP Archive), and other relevant data from these visits are analyzed to determine if the page is hosting malicious content.

Splunk

The Google StoreBot is a search-engine-based program that automatically 'crawls' through web pages to gather and analyse data. Google uses crawlers that go through product pages and checkout processes using machine learning algorithms to fill in forms with information such as delivery addresses, and help compile other information on price, delivery, payments and more.

Google

The Stripe Webhooks service allows Stripe to push real-time event data to customers' application webhook endpoint when events happen in their Stripe account.

Stripe

Stripebot is the Stripe automated web crawler that collects data from their users' websites. They use the collected data to provide services to their users and to comply with financial regulations.

Stripe

The Sucuri bot is part of the Sucuri website security platform. It crawls websites to scan for malware, security risks, and blacklisting status.

Sucuri

Scalable webhook platform featuring automatic retries, signature verification, deep observability, and a static-IP delivery bot—deploy hosted or self-hosted.

Svix Inc.

Help Center Export is a Zendesk-approved app that integrates with any Zendesk help center and helps the customers with these tasks: Export all your articles and any meta-data: title, section, link, labels, updated time. Export all references to internal and external docs. Detect and export broken links and images for each article. In order to check for broken links the app is using a bot that attempts to access each link present in help center articles and check the response for errors.

Swfiteq Ltd

Talkwalker delivers the consumer insights that help brands drive business impact. In a world full of conversations, the most successful global brands have switched to Talkwalker because we provide them with a powerful software platform to uncover, understand and derive the most valuable insights from internal and external data. Our listening and analytics platform enables more than 2,500 companies worldwide to protect their brands, measure their impact and gain the key consumer insights that drive purchase decisions.

Trendiction S.A.

Turnitin.com offers various services to the educational community. Most prominently, we provide a widely used and effective plagiarism detection service. Part of the plagiarism prevention service relies on comparing student papers to content found on the Internet. Since we do not know ahead of time which pages on the Internet a student will use we need to gather them all for comparison. However, we do have automated ways of throwing away content and links that would be irrelevant to our service.

Turnitin

A Twitter bot is a type of bot software that controls a Twitter account via the Twitter API. The bot software may autonomously perform actions such as tweeting, re-tweeting, liking, following, unfollowing, or direct messaging other accounts.

Twitter

VaultPress is a subscription service developed by Automattic, the company behind WordPress, that offers automated daily and real-time backups of WordPress websites onto WordPress.com's cloud servers. It is known for its ease of use, secure backups, and proactive security scanning.

Automattic

Crawler to extract the newest articles in the publisher's website (via feed or parsing html) to make a carrousel with images, links and text for our native ads module in order to improve recirculation in the publisher's web. Only crawls our publisher's webpages.

Digital Green

WebPageTest is one of the most popular and free tools for measuring webpage performance and enables you to run web performance tests on your site from a number of different locations across the world in a number of different browsers.

WebPageTest

Citoid is a Wikimedia service in VisualEditor that generates citations from URLs, DOIs, and ISBNs, relying on the Zotero Translation Server (see wikimedia-zotero) for accurate metadata, processed on demand from website visitors.

Wikimedia Foundation

The Wikimedia Foundation's Zotero Translation Server is a customized metadata extraction tool that powers Citoid (see wikimedia-citoid), retrieving citation data from URLs, DOIs, and ISBNs using Zotero translators, on demand from website visitor requests.

Wikimedia Foundation

The Worldline Bot is associated with Worldline, a payment and transactional services company. It handles notifications and callbacks related to payment processing.

Worldline

Yahoo Mail Proxy is a content fetch proxy that retrieves the page content of URLs that are embedded within emails sent to Yahoo Mail users. Having the content displayed through the proxy improves the security for email users while reducing overall network usage.

Yahoo

Easy automation for busy people. Zapier moves info between your web apps automatically, so you can focus on your most important work.

Zapier Inc.

Zoominfobot is an indexing robot for a web search engine, similar to Google. Created by Zoom Information Inc.(www.zoominfo.com), Zoominfobot’s patented technology continually scans millions of corporate websites, press releases, electronic news services, SEC filings and other online sources. Using advanced natural language processing algorithms, ZoomInfo has created a next generation search engine focused on finding pages with information about businesses and business professionals.

ZoomInfo