Combine Multiple PDF Files Into One

I often have to send pdf documents via email. When I do, I prefer to send one document that merges all those pdfs. Form a recipient point-of-view, I find it better to receive one attachment, because it’s easier to manage and to keep track of. The problem is that I’ve yet to find an easy way to stitch together multiple pdf files. Preview supose to let you do it, but I usually can’t get it to work, and when I do, the process is painful1.

Recently, I came up with way to do just that, thanks to a python script I found in the “Automate the Boring Stuff with Python” book. This script takes a folder of documents as an input, search for all the pdf files in that folder, and combine them into one pdf file.

However, I had to modify this script to fit my workflow better. My pdf files are all over the place, and I don’t want to move them around just for the sake of merging them together. I therefore made a little tinkering to the original script, so it can take a list of files’ paths as an input. Here’s my modified version:

#! /usr/local/bin/python3
# combinePdfsFromFiles.py - 
# Script gets a list of pdf files' paths and combine them into one file
# I use it together with keyboard maestro

import PyPDF2
import os
import sys
import logging
import pyperclip

pdfFiles = []

# Get PDF filenames from the clipboard
for filename in pyperclip.paste().split(','):
    if filename.endswith('.pdf') and filename != '':
        pdfFiles.append(filename)

pdfFiles.sort(key=str.lower)

pdfWriter = PyPDF2.PdfFileWriter()

#Loop through all the PDF files.
for filename in pdfFiles:
    pdfFileObj = open(filename, 'rb')
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    #Loop through all the pages (except the first) and add them.
    # If first page should be discarded, change firt param of range to 1
    for pageNum in range(0, pdfReader.numPages):
        pageObj = pdfReader.getPage(pageNum)
        pdfWriter.addPage(pageObj)

#TODO: add an argument that determine whether cover should be included.

#Save the resulting PDF to a file.
pdfOutput = open('/Users/ygilad/Desktop/allminutes.pdf', 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()

I created a simple keyboard maestro macro that goes along with this script and serves as an interface with it:

combine_pdfs_macro.jpg Now, All I have to do is select in Finder the files I want to stitch:

stich_pdf_-_select_files.png I can then execute the KM macro, which pass the list of files to the python script for processing.

I know this process might sound tedious, and even more painful than using Preview for that job. But that’s the beauty of automation – you pay once use freely ever after.

Footnotes:

1

To get it done with Preview, you’ll have to open each of the pdfs, expose the thumbnails’ sidebar, and start dragging and dropping the pages you would like to combine. If you’re still interested, here’s Apple’s support guide.  ↩

A Recipe To Work-In-Progress Documents

I recently stopped using Evernote and started to manage my notes exclusively in Dropbox. My configuration revolves around a Notes folder. I use nvAlt to browse through the notes in that folder and add new ones. If I want to do more than just a scribble, I use the command-e key binding in nvAlt to open the document in MultiMarkdown Composer.

Storing all my notes in one folder has a major limitation, though. As notes accumulate, looking for a specific note becomes impossible. This is actually one of the main reasons to my departure from Evernote. To avoid this problem, I set Hazel to monitor my Notes folder and move everything that wasn’t modified in the last 30 days to a designated archive folder. Archived notes don’t show in nvAlt, yet easily accessible through Finder.

Now that I have a home to my notes, I would like to add some logic to streamline my writing workflow. To begin with, I would like to aggregate documents I’m working on, and are in other folders, to my main notes’ repository.

For example, I’m currently writing a readme file for one of my git repositories. This repo lives within its own folder, where the readme file resides as well. Keeping this file out of my Notes folder means that it’s a hassle to go back and open it when needed. It also means that I can’t work on it when I’m on my iPhone 1.

So, what I needed was a way to mark a document, and have it magically show up in my Notes folder, hence available in nvAlt. Following is the recipe I came up with to address this need.

Let’s start with the ingredients:

  • Finder
  • Hazel
  • Python

And here’s how to mix these components together:

  1. Open Finder and tag wip the document I want to work on and make available in nvAlt.tag_wip.png
  2. Configure a Hazel rule that monitors my home folder, looking for files containing the wip tag 2. hazel_3.png
  3. Create a python script that takes a file’s path as an input and place a symbolic link to it in my Notes folder.
#! /usr/local/bin/python3

import os, sys, shutil
import logging

# Configuring logging to be written into a file in the system's log folder
logging.disable(logging.CRITICAL)
logging.basicConfig(filename='/Users/ygilad/Library/Logs/Python/myPythonLogs.log', level=logging.DEBUG, format=' %(asctime)s - %(levelname)s - %(message)s')

def moveFileToNote(filePath):
    # Set the link name to the original file.
    # Path to the original file is included for two reasons
    # 1) Avoid naming conflicts and
    # 2) remind myself where this file came from
    fileName = 'link' + filePath.replace('/','_').lower()
    logging.debug('Filename: ' + fileName)

    # Make sure that the input is a file and not a folder
    if len(fileName) > 0:
        try:
            # Add the link to my central note repository
            os.symlink(filePath , '/Users/ygilad/Dropbox/Notes/link-'+ fileName)
            logging.debug('Created a file link')
        except FileExistsError:
            logging.debug('File already exists at the target folder')
    else:
        logging.debug('Input is not a file')

# Accept the path coming from Hazel
hazelLocalFile = sys.argv[1]
logging.debug(hazelLocalFile)

# The body of the script
moveFileToNote(hazelLocalFile)

There is one drawback I wasn’t able to solve – nvAlt doesn’t show the content of the link. All it does show is the path of the original document.nvAlt_and_linked_files.png

While I can’t edit the file directly in nvAlt, I can still do it in MultiMarkdown Composer or Editorial on my iPhone.

Footnotes:

1

I keep git repositories in a local folder out of Dropbox reach, because I heard that you shouldn’t mix the two together.

2

I found that creating a rule that monitors a folder and its sub-folders is a bit tricky, but eventually learned how to do it thanks to this post.

Find Repetitive Words Using Python

Read this question from stackoverflow:

Paris in the
the spring. Not that
that is related.

Why are you laughing? Are my my regular expressions THAT bad??

Have you notice the repetitions? chances are you haven’t. The eye sees what the eye wants to see, and it’ll take away any obstacle to let your brain comprehend. I too often catch myself writing the same word twice. The problem is that when I do, it’s usually too late. The email was sent or post got published.

To make sure I find those repetitions in time, I wrote a simple Python script that removes superfluous spaces and highlight words’ duplications, using CriticMarkup. I run this script as soon as I finish writing. It works much better than my eyes in finding those elusive duplications.

Here’s the script:

#! /usr/local/bin/python3
# removeRepeatWords.py - find and remove repeat words
import logging
logging.basicConfig(filename='/Users/ygilad/Library/Logs/Python/myPythonLogs.log', level=logging.DEBUG, format=' %(asctime)s - %(levelname)s - %(message)s')

logging.disable(logging.CRITICAL)

import pyperclip, re

text = str(pyperclip.paste())

#regex definitions for reapeated spaces
repeatSpacesRegex = re.compile(r'\b(\s)+\1+\b') 

#regex definitions for reapeated words
repeatWordsRegex = re.compile(r'\b(\w+)\b[\s\r\n]*(\1[\s\r\n])+', re.IGNORECASE|re.DOTALL)

#remove the extra spaces
repeatSpces = repeatSpacesRegex.findall(text)

if len(repeatSpces) > 1:
    text = repeatSpacesRegex.sub(r'\1', text)
    print(str(len(repeatSpces)) + ' repeat spaces were removed.')

#remove repeated words
repeatWords = repeatWordsRegex.findall(text)
logging.debug(repeatWords)

if len(repeatWords) > 0:
    text = repeatWordsRegex.sub(r'{~~\1 \2~>\1 ~~}{>>repeating words<<}', text)

pyperclip.copy(text)

To use it, copy the text you want to check into the clipboard. You then run the script and its output will be ready for you back in the clipboard. Just past it over the original text. Note that if the script finds repetitions it won’t remove them, but mark them using CriticMarkup. If your editor supports CM, you can decide whether to accept or reject those changes.

Running this script on the quote from stackoverflow above produces the following output:

Paris in {~~the  the ~>the~~}{>>repeating words<<}spring. Not {~~that  that ~>that~~}{>>repeating words<<}is related.

Why are you laughing? Are {~~my  my ~>my~~}{>>repeating words<<}regular expressions THAT bad??