Category: python

Polished

The goal of polished is to show all of the meticulous tweaks that go into a website. My resume is a good example, dozens of hours of work and tweaking to come up with this pretty basic final product. Showing that blood, sweat and hilarious tears in between should be pretty entertaining. Watch pages undulate, stretch, break, grow, and shrink into place.

How does it work?

Once you've installed polished, it works like this:

  1. Fires up selected backend (for example, PelicanBackend if you use the Pelican blog site generator)
  2. Get the history of your git repo
  3. Iterate through that history, preparing each page and finally screen cap it
  4. If after reviewing the images you find bugs, you can go in and @polish out the kinks so it's a nice smooth video

Examples

Resume page video without polishing

Polished resume page video

And I had to "polish" these videos to get them just right, fix bad links for some commits:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from polished.backends import PelicanBackend
from polished.decorators import polish


class EricPelicanBackend(PelicanBackend):

    def _patch_image_srcs(self):
        wait = WebDriverWait(self.DRIVER, 10)
        element = wait.until(EC.visibility_of_element_located((By.TAG_NAME, 'img')))

        self.DRIVER.execute_script("""
            var img_array = document.getElementsByTagName('img');

            for(var i=0; i<img_array.length; i++) {
                var href_replaced = img_array[i].getAttribute('src').replace(/^\/images/, "../images");
                img_array[i].setAttribute("src", href_replaced);
            }
        """)

    @polish(urls=["output/pages/about.html"], commit_indexes=range(112, 135))
    def fix_image_links_on_about_me_page(self):
        self._patch_image_srcs()

    @polish(urls=["output/pages/resume.html"], commit_indexes=range(68,134))
    def fix_resume_page_broken_images(self):
        self._patch_image_srcs()

Installation

Requires:

Then the usual

> pip install polished

For more detailed instructions please check out the repo readme.

Usage

> polished

The default behavior is to capture "index.html" each commit

> polished output/index.html

Local file

> polished http://localhost:8000/

Local server

By default the files are saved to polished_output/<commit count>.<sha>.polished.png and polished_output/output.mp4

— 18 April 2014
…discuss this

Writing my first python package

I've been looking make a little python package to launch on pypi for quite some time. With all of my recent Pelican blog tweaking, I am having about 50 urges per second to improve the development a little bit.

The problem

One of the things that keeps irking me while writing posts was constantly having to double check that I put in the correct value for each link in markdown

So there I was, at [Some Restaurant I can't Remember]() downtown and I totally saw Becky macking on Jonathon!

Right as I am getting into the juicy gossip I normally would blog about, now I have worries:

  1. I will forget to fix that link
  2. I will think about "did I fix that link?" even if I already did
  3. I. will. forget. to. fix. that. LINK!
  4. Years later I will wake up in a cold sweat wondering, "Did I fix that link?"

The attempted solution

My first idea was to make some kind of hook in the Pelican system that re-wrote [text](url) links where the url was empty with the first google search result of the text description.

That idea sucks for so many reasons:

  • It could completely destroy posts
  • It modifies the repository
  • Did I fix that link?

The real solution: Existence

After playing around with a few different ideas I finally thought: screw this. I was thinking too hard, all of the solutions were too convoluted. When I stepped away from the computer and thought for a while I realized the problem I was really trying to fix: broken links!

After some searching I didn't find a nice simple python module that ran through static html files, tried links, and spit out which ones were bad.

Writing the module

I want this to run quickly or I'll never use it, so my plan is to use requests asychronously after scanning all of the files for broken links. However, after trying my damndest I couldn't get grequests to work right... I was having weird errors in the background I couldn't debug easily.

Then I saw this nice little example, here's my version:

def async_check_url(url, file_name, line_number):
    try:
        urllib2.urlopen(url)
    except urllib2.URLError:
        BROKEN_URLS.append((url, file_name, line_number))


def check_urls(urls):
    '''
    expected format of urls is list of tuples (url, file name, source line) i.e. ("google.com", "index.html", 32)
    '''
    threads = list()

    for u in urls:
        t = Thread(target=async_check_url, args=(u[0], u[1], u[2]))
        t.start()
        threads.append(t)

    for thread in threads:
        thread.join()

Getting it on pypi

1. sign up on pypi.com

2. create .pypirc in home directory with login info

[distutils]
index-servers = pypi

[pypi]
username:ckcollab
password:hunter2

3. fill out setup.py

import os
from setuptools import setup


try:
    with open('README.md') as readme:
        long_description = readme.read()
except IOError, ImportError:
    long_description = ''

setup(
    install_requires = [
        "lxml>=3.3.4",
        "cssselect>=0.9.1"
    ],
    name="existence",
    py_modules=["existence"],
    version="0.0.8",
    author="Eric Carmichael",
    author_email="[email protected]",
    description="Checks static .html files for bad links",
    long_description=long_description,
    license="MIT",
    keywords="link checker",
    url="https://github.com/ckcollab/existence",
    classifiers=[
        'Intended Audience :: Developers',
        'Natural Language :: English',
        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
        'Programming Language :: Python',
        'Programming Language :: Python :: 2',
        'Programming Language :: Python :: 2.6',
        'Programming Language :: Python :: 2.7',
        'Topic :: Software Development :: Libraries :: Python Modules',
    ],
)

4. submit

> python setup.py register
> python setup.py sdist upload

Plugging it into fabric

import os
from existence import get_bad_urls
from fabric.api import *


# Local path configuration (can be absolute or relative to fabfile)
env.deploy_path = 'output'
DEPLOY_PATH = env.deploy_path


def clean():
    if os.path.isdir(DEPLOY_PATH):
        local('rm -rf {deploy_path}'.format(**env))
        local('mkdir {deploy_path}'.format(**env))


def deploy():
    clean()
    local("make html")

    print "Checking URLs"
    bad_urls = get_bad_urls(DEPLOY_PATH)

    if not bad_urls:
        print "URL's are looking good"
        local("git push")
        local("git push heroku master")
    else:
        for url in bad_urls:
            print "Broken link found in file %s on line %s linking to %s" % (url[1], url[2], url[0])

Now I just run

> fab deploy

and I will never ever, ever have to worry about broken links in my blog! Thanks existence!

— 14 April 2014
…discuss this