Add Summary to PDF's Metadata
This entry describes how to generate a summary from a PDF document and then add the summary to the document's metadata.
The summary generation workflow uses a Python web application endpoint similar to the one described here and a Javascript command line client compiled with Deno. The entire workflow is described in this short guide.
Here's the summary generated from the guide itself which was saved to a local file named Python_Summarization_Algorithms.summary:
In natural language processing (NLP), text summarization is the process of creating a shorter version of a longer text while retaining its most important information.
Sumy is a Python library for text summarization that provides several algorithms. You can use it with the command line in a python virtual environment .
The Huggingface transformer pipeline is a high-level API that simplifies the use of models for inference. It automatically loads a default model and a preprocessing class.
Deno compile command produces a binary executable that can be run on the same platform as the machine it was compiled on.
To update the metadata info tag "Subject", I downloaded EXIFTOOL standalone version, unzipped and renamed the executable to exiftool.exe.
The following command (Bash/Bash for Windows):
./exiftool.exe -Subject="$(<./Python_Summarization_Algorithms.summary)" Python_Summarization_Algorithms.pdf
will update the subject info tag with the new summary. The summary of this document can be also obtained on a Linux OS with, for instance, pdfinfo.
Alternatively, any good PDF viewer should be able to display the document's metadata. For Adobe Acrobat Reader DC follow these steps:
1. Open the PDF file in Adobe Acrobat Reader DC.
2. Click on File in the top-left corner of the window.
3. Select Properties from the drop-down menu.
4. Click on the Description tab.
5. Scroll down to view the metadata of the PDF file.