3

I need to extract all the text content from a web page. I have used 'document.body.textContent'. But I get the javascript content as well.How do I ensure that I get only the readable text content?

function myFunction() {
  var str = document.body.textContent
  alert(str);
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>

<body>
  <p>Test on this content to change the 5th word to a link
    <p>
      <button onclick="myFunction()">Try it</button>
</body>
</hmtl>

CC BY-SA 3.0

2 Answers 2

5

Just remove the tags you dont want read before doing body.textContent.

function myFunction() {
  var bodyScripts = document.querySelectorAll("body script");
  for(var i=0; i<bodyScripts.length; i++){
      bodyScripts[i].remove();
  }
  var str = document.body.textContent;
  document.body.innerHTML = '<pre>'+str+'</pre>';
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>

<body>
  <p>Test on this content to change the 5th word to a link
    <p>
      <button onclick="myFunction()">Try it</button>
</body>
</hmtl>

CC BY-SA 3.0
0

Try document.body.innerText.

This MDN article describes the differences between textContent and innerText:

Don't get confused by the differences between Node.textContent and HTMLElement.innerText. Although the names seem similar, there are important differences:

  • textContent gets the content of all elements, including <script> and <style> elements. In contrast, innerText only shows "human-readable" elements.
  • textContent returns every element in the node. In contrast, innerText is aware of styling and won't return the text of "hidden" elements.
CC BY-SA 4.0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.