11

I have a collection of .pdf files with comments that were added in Adobe Acrobat. I would like to be able to analyze these comments, but I'm kind of stuck on extracting them. I've looked at the pdftools package, but it seems to only be able to extract the text and not the comments. Is there a method available for extracting the comments within R?

CC BY-SA 4.0
0

3 Answers 3

9

PyMuPDF (https://pymupdf.readthedocs.io/en/latest/) is the only python library I have found working.

Installation in Debian/Ubuntu-based distributions:

apt-get install python3-fitz

Script:

import fitz
doc = fitz.open("example.pdf")
for i in range(doc.pageCount):
  page = doc[i]
  for annot in page.annots():
    print(annot.info["content"])
CC BY-SA 4.0
2
0

Did you try PoDoFo or another OpenSource tool that can access the PDF elements? You can also look at Extracting PDF annotations/comments here on stackoverflow if you will do little programming

CC BY-SA 4.0
3
  • I've tried a few tools, but they all seem focused on extracting images and text.The Python method you linked to combined with the reticulate package looks promising and I'd actually played around with that a bit last week, but the poppler module doesn't seem to want to install. I guess there isn't a native solution in R. Jun 14, 2018 at 15:06
  • I got it. Sometimes it´s hard to find working solution for such specific cases. Have you tried looking for some paid solution that would work? Some of them offer free trial. Which programming language and platform do you prefer?
    – PDFix
    Jun 15, 2018 at 13:49
  • My preference would be a method that imported the comments into R on Windows as a data.frame. I was finally able to get poppler working using the Linux subsystem on Windows which is less than optimal, but better than nothing. Jun 18, 2018 at 19:11
0

Screenshot of how >> Export the comments as an Excel file, then import it into R?

Eg: in PDF-X-change Editor, go to comment > summarize comments > export into whatever format you want. Similar in Adobe.

CC BY-SA 4.0
1
  • 2
    As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
    – Community Bot
    Sep 14, 2021 at 5:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.