2025-05-01

Consolidating Jupyter Notebooks into a Single Markdown Document with Python

In the world of data science and programming, organization is key. As I work on various projects, I often find myself dividing my local filesystem into directories by year. This structure helps me keep track of my work, but it can also lead to a scattered collection of files, especially when it comes to Jupyter notebooks (.ipynb).

Recently, I faced a challenge: I needed to consolidate all my Jupyter notebooks from a specific year into a single, comprehensive Markdown document. This would not only make it easier to review my work but also allow me to generate a PDF document for sharing. To tackle this, I turned to the markitdown library in Python, which simplifies the process of converting Jupyter notebooks into Markdown format.

MarkItDown is a Python library designed for converting various file formats into Markdown, making it useful for text analysis, indexing, and integration with AI models. It supports formats like Jupyter Notebooks, PDFs, Word documents, Excel sheets, images (with OCR), and even audio files (with transcription capabilities).

The Use Case

Imagine I have a directory structure like this:

    /workspace
        /2024
            notebook1.ipynb
            notebook2.ipynb
        /2025
            notebook3.ipynb
            notebook4.ipynb

For my current project, I want to consolidate all the notebooks from the year 2025 into a single Markdown file.

First, I created a virtual environment using the VOX PowerShell script. Next, I installed the markitdown library by running

   pip install markitdown 

As a Temu Affiliate I earn from qualifying purchases

After changing the directory to the root folder, the output file combined_output_{file_extension}.md in markdown format will be generated from the following Python script.

    import os
    import sys
    from markitdown import MarkItDown
    import time


    def convert_files_to_markdown(extension, output_file):
        markdown_converter = MarkItDown(enable_plugins=False) # Set to True to enable plugins
        with open(output_file, 'w', encoding='utf-8') as md_file:
            for root, dirs, files in os.walk('.'):
                for file in files:
                    if file.endswith(extension):
                        file_path = os.path.join(root, file)
                        try:
                            md_content = markdown_converter.convert(file_path)

                            # Get the last modified time
                            last_modified_time = os.path.getmtime(file_path)

                            # Convert it to a readable format
                            readable_time = time.ctime(last_modified_time)

                            md_file.write(f"<!-- {file_path} - {md_content.title} - {readable_time} -->  \n\n")
                            md_file.write(md_content.markdown)
                            md_file.write("\n\n---\n\n")  # Separator between files
                            print(f"Processed: {file_path}")  # Debugging output
                        except Exception as e:
                            print(f"\nError processing file {file_path}: {e}\n")

    if __name__ == "__main__":
        if len(sys.argv) != 2:
            print("Usage: python to_markdown.py <file_extension>")
            sys.exit(1)

        file_extension = sys.argv[1]
        output_markdown_file = f'combined_output_{file_extension}.md'  # Include the file extension in the output file name
        
        convert_files_to_markdown(file_extension, output_markdown_file)


    # Copyright (c) 2025 S. Tessarin
    # All rights reserved.

Generating a PDF Document

Once you have my consolidated Markdown file, the next step is to generate a PDF document, as demonstrated in the video below.

For this process, I utilized Typst in conjunction with the cmarker package.

    #import "@preview/cmarker:0.1.5"
    #cmarker.render(
      read("./combined_output_ipynb.md"),
      scope: (
        image: (path, alt: none) => image("figures/" + path, alt: alt),
      ),
    )