Blog

How to remove metadata from PDF file

PDF Metadata

PDF file - how to remove metadata from PDF file or Portable Data Format is almost a de-facto standard when it comes to sharing documents and other information with the help of electronic means. The major reason that it is so successful is that it has appearance retention ability with regards to content that can span multiple types of clients, lesser unwanted leakage of data, and a broad set of features that provide support to varying user requirements. With an increase in the usage of PDF files, much research has gone into identifying the kind of hidden data that may be contained by PDF files, along with their associated risks. The essential structure of PDF files involves a format which is binary in nature, with language that is loosely hinged on the PS (Post Script) language.

The data about data, in other words metadata, is a well known risk area for PDF files, as it can be placed at any place within a file. A valid mechanism for storage of data in PDF file uses the XML format with the help of XMP (Extensible Metadata Platform) entries; the other mechanism being the Info Dictionary, which usually consists of data pairs with key and corresponding matching values. Metadata in a PDF file is not usually compressed, and thus in the documents that are unencrypted, it appears in a format that is readable by all users. XMP mechanism is more robust and powerful in comparison to the Info Dictionary method, and thus, it is the most widely used metadata standard for PDF files.

 Metadata is very commonly found in PDF files; as the applications that are used to generate the PDF applications typically populate it automatically remove metadata from PDF file. There many ways to view the metadata in a PDF file, both online and offline. Among the offline softwares, the free version of Adobe Professional provides this feature, while the paid version offers manipulation of metadata as well.

 
PDF Metadata Viewer Online

Many PDF metadata viewers are available online that help to extract metadata from PDF file, and then display it accordingly, which then subsequently helps to remove metadata from PDF file. A few such examples are provided as follows:

Ø  MetaWiper - Very robust metadata online viewer and editor with support for manipulating metadata in multiple files. URL - https://www.metawiper.com/

Ø  ExtractMetadata - An online metadata viewer that provides users with an option to upload files, or feed a URL which hosts the relevant PDF file. URL - http://www.extractmetadata.com/

Ø  Metashield Analyzer - It provides a free analysis of metadata in documents. For cleaning up the metadata, there is a paid service. URL - https://metashieldanalyzer.elevenpaths.com/

Ø  Jeffreys’ Image Metadata Viewer - http://regex.info/exif.cgi
 
Apart from online PDF metadata editors for displaying metadata information of PDF file, there are many metadata scrubbers available, which help to edit and remove metadata from PDF file. A detailed discussion on such tools is undertaken next.

Edit PDF Metadata

In the Adobe Professional software on Windows and Mac platforms, the easiest way to view the metadata across most of the software versions, is to navigate to the ‘File’ menu and then selecting the ‘Properties’ option. This will display a window showing a brief overview of the metadata. To add metadata to the PDF file, a user simply needs to fill in the relevant fields corresponding to Title, Author, Subject and Keywords. From the displayed interface, it is also possible to remove some information manually such as the title of document, the author of the document, the application name etc. 

To reach to the advanced interface, the user needs to click on the ‘Additional Metadata’ button displayed on the basic interface window, and then clicking on the ‘Advanced’ tab. The displayed interface shows the XMP and dictionary metadata related to information contained within the document. This information is grouped as per their respective schema, and it can individually be replaced, appended or deleted. In the latest versions of the software Adobe Acrobat XI, hidden metadata such as the time of document creation, time of document modification, device used for file creation, etc., can be removed by navigating to the ‘Tools’ menu, clicking on ‘Protection’, and by selecting the option ‘Remove hidden information’.

Remove metadata from PDF file

Apart from Adobe, following softwares are also available on Windows for the purpose of editing metadata:

Ø  MetaWiper - Very robust metadata online viewer and editor with support for manipulating metadata in multiple files. URL - https://www.metawiper.com/
Ø  Autometadata - Free software installation for viewing as well as editing metadata, which supports batch feature. URL - https://www.evermap.com/autometadata.asp
Ø  BeCyPDFMetaEdit - URL - http://www.becyhome.de/becypdfmetaedit/description_eng.htm
Ø  4dots - URL - http://www.4dots-software.com/free-pdf-metadata-editor/
Ø  Hexonic PDF Metadata Editor - URL - http://www.hexonic-software.com/index.php/hexonic-pdf-metadata-editor
Ø  Debenu PDF Tools - URL - http://www.debenu.com/products/desktop/debenu-pdf-tools/

On the open source platforms such as Linux, there are many free tools available to edit and remove metadata from PDF file. A few examples are as follows:

Ø  One such tool is PDF MOD; however, the drawback with this tool is that it does not support the removal of document creation time, document modification time, and the device type used for PDF file creation.
Ø  Another tool is ‘pdftk’, which helps in the update of Info Directory of a PDF file, although it does not have a provision of altering the XMP metadata.
Ø  ‘Exiftool’ is primarily used for reading and writing XMP metadata from and to individual PDF files.
Ø  ‘PDFMtEd’, short of PDF Metadata Editor, is a bunch of tools that enable a user to graphically view and edit metadata. URL - https://github.com/Glutanimate/PDFMtEd
Ø  XPDF is another metadata viewer for PDF files, which is completely open source. URL -http://www.foolabs.com/xpdf/about.html
Ø  ‘PDF Metadata Editor’ tool can also be used to edit the metadata of a PDF file. This is software is available as an installation for Windows, Linux as well as MacOS platforms. URL - http://broken-by.me/pdf-metadata-editor/

 An easier method to remove metadata from PDF file is to simply prevent the creation of metadata in the first place. This can be achieved by printing word documents, both on Windows and Linux platforms, to PDF format. This help in the creation of PDF files that have a flattened output, by the ‘Print to PDF’ functionality acting as a PDF metadata scrubber. For the purpose of achieving this, PDF printer softwares are required. On Linux and MacOS, this feature is inbuilt, while on Windows platform, there are many cheaper or free alternatives available for Adobe Acrobat.