2023-08-29

Add Summary to PDF's Metadata

This entry describes how to generate a summary from a PDF document and then add the summary to the document's metadata.

The summary generation workflow uses a Python web application endpoint similar to the one described here and a Javascript command line client compiled with Deno. The entire workflow is described in this short guide.

Here's the summary generated from the guide itself which was saved to a local file named Python_Summarization_Algorithms.summary:

   In natural language processing (NLP), text summarization is the process of creating a shorter version of a longer text while retaining its most important information.
   Sumy is a Python library for text summarization that provides several algorithms. You can use it with the command line in a python virtual environment .
   The Huggingface transformer pipeline is a high-level API that simplifies the use of models for inference. It automatically loads a default model and a preprocessing class.
   Deno compile command produces a binary executable that can be run on the same platform as the machine it was compiled on.

To update the metadata info tag "Subject", I downloaded EXIFTOOL standalone version, unzipped and renamed the executable to exiftool.exe.

The following command (Bash/Bash for Windows):

   ./exiftool.exe -Subject="$(<./Python_Summarization_Algorithms.summary)" Python_Summarization_Algorithms.pdf

will update the subject info tag with the new summary. The summary of this document can be also obtained on a Linux OS with, for instance, pdfinfo.

Alternatively, any good PDF viewer should be able to display the document's metadata. For Adobe Acrobat Reader DC follow these steps:

1. Open the PDF file in Adobe Acrobat Reader DC.

2. Click on File in the top-left corner of the window.

3. Select Properties from the drop-down menu.

4. Click on the Description tab.

5. Scroll down to view the metadata of the PDF file.