Consolidating Jupyter Notebooks into a Single Markdown Document with Python
In the world of data science and programming, organization is key. As I work on various projects, I often find myself dividing my local filesystem into directories by year. This structure helps me keep track of my work, but it can also lead to a scattered collection of files, especially when it comes to Jupyter notebooks (.ipynb).
Recently, I faced a challenge: I needed to consolidate all my Jupyter notebooks from a specific year into a single, comprehensive Markdown document. This would not only make it easier to review my work but also allow me to generate a PDF document for sharing. To tackle this, I turned to the markitdown library in Python, which simplifies the process of converting Jupyter notebooks into Markdown format.
MarkItDown is a Python library designed for converting various file formats into Markdown, making it useful for text analysis, indexing, and integration with AI models. It supports formats like Jupyter Notebooks, PDFs, Word documents, Excel sheets, images (with OCR), and even audio files (with transcription capabilities).
The Use Case
Imagine I have a directory structure like this:
/workspace
/2024
notebook1.ipynb
notebook2.ipynb
/2025
notebook3.ipynb
notebook4.ipynb
For my current project, I want to consolidate all the notebooks from the year 2025 into a single Markdown file.
First, I created a virtual environment using the VOX PowerShell script. Next, I installed the markitdown library by running
pip install markitdown
|
As a Temu Affiliate I earn from qualifying purchases |
After changing the directory to the root folder, the output file combined_output_{file_extension}.md in markdown format will be generated from the following Python script.
import os
import sys
from markitdown import MarkItDown
import time
def convert_files_to_markdown(extension, output_file):
markdown_converter = MarkItDown(enable_plugins=False) # Set to True to enable plugins
with open(output_file, 'w', encoding='utf-8') as md_file:
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith(extension):
file_path = os.path.join(root, file)
try:
md_content = markdown_converter.convert(file_path)
# Get the last modified time
last_modified_time = os.path.getmtime(file_path)
# Convert it to a readable format
readable_time = time.ctime(last_modified_time)
md_file.write(f"<!-- {file_path} - {md_content.title} - {readable_time} --> \n\n")
md_file.write(md_content.markdown)
md_file.write("\n\n---\n\n") # Separator between files
print(f"Processed: {file_path}") # Debugging output
except Exception as e:
print(f"\nError processing file {file_path}: {e}\n")
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python to_markdown.py <file_extension>")
sys.exit(1)
file_extension = sys.argv[1]
output_markdown_file = f'combined_output_{file_extension}.md' # Include the file extension in the output file name
convert_files_to_markdown(file_extension, output_markdown_file)
# Copyright (c) 2025 S. Tessarin
# All rights reserved.
Generating a PDF Document
Once you have my consolidated Markdown file, the next step is to generate a PDF document, as demonstrated in the video below.
For this process, I utilized Typst in conjunction with the cmarker package.
#import "@preview/cmarker:0.1.5"
#cmarker.render(
read("./combined_output_ipynb.md"),
scope: (
image: (path, alt: none) => image("figures/" + path, alt: alt),
),
)