Sharing data between processes with SQLite

Because of the global interpreter lock in CPython, it is sometimes beneficial to use separate processes to handle different tasks. This can pose a challenge for sharing data: it's generally best to avoid sharing memory between processes for reasons of safety1. One common approach is to use pipes, queues, or sockets to communicate data from one process to another. This approach works quite well, but it can be a bit cumbersome to get right when there are more than two processes involved and you just need to share a small amount of infrequently changing data (say some configuration settings that are loaded after worker processes have already been spawned). In such cases, using a file that each process can read is a simple solution, but may have problems if reading and writing happen simultaneously. Thankfully, SQLite can handle this situation easily!

I have created a small module (Permadict) which utilizes SQLite to persist arbitrary (picklable) Python objects to a SQLite database using a dict-like interface. This is not a new idea, but it was fun and simple to utilize only the Python standard library to accomplish this. A basic usage example:

>>> from permadict import Permadict
>>> d = Permadict("db.sqlite")
>>> d["key"] = "value"
>>> print(d["key"])
value

Because context managers are great, you can also use permadicts that way:

>>> with Permadict("db.sqlite") as d:
...     d["something"] = 1.2345
...
>>> with Permadict("db.sqlite") as d:
...     print(d["something"])
...
1.2345

  1. Of course, Python allows you to share memory among processes …

more ...

Simplifying argparse usage with subcommands

One of the best things about Python is its standard library: it's frequently possible to create complex applications while requiring few (if any) external dependencies. For example, command line interfaces can be easily built with the argparse module. Despite this, there exist several alternative, third-party modules (e.g., docopt, click, and begins). These all tend to share similar motivations: while argparse is powerful, it is by inherently verbose and is therefore cumbersome to use for more complex CLIs which use advanced features such as subcommands. Nevertheless, I tend to prefer sticking with argparse in part because I am already familiar with the API and because using it means I don't need to bring in another dependency from PyPI just to add a small bit of extra functionality. The good news is that with a simple decorator and a convenience function, writing CLIs with subcommands in argparse is pretty trivial and clean.

Start by creating a parser and subparsers in cli.py:

from argparse import ArgumentParser

cli = ArgumentParser()
subparsers = cli.add_subparsers(dest="subcommand")

Note that we are storing the name of the called subcommand so that we can later print help if either no subcommand is given or if an unrecognized one is. Now we can define a decorator to turn a function into a subcommand:

def subcommand(args=[], parent=subparsers):
    def decorator(func):
        parser = parent.add_parser(func.__name__, description=func.__doc__)
        for arg in args:
            parser.add_argument(*arg[0], **arg[1])
        parser.set_defaults(func=func)
    return decorator

What this does …

more ...

Javascript for Python programmers

Unless you're just writing a simple HTTP API server, any amount of web programming in Python will likely require at least a little bit of Javascript. Like it or not (and I will try to argue in this post that you should like it for what it's good at), Javascript is really the only game in town when it comes to client-side scripting on a web page. Sure, there are a number of Python-to-Javascript transpilers out there, but using these just tends to limit the ability to use new Javascript features as they are rolled out to browsers and may limit the ability to use third-party Javascript libraries. At the very least, using one these transpilers introduces added complexity to deploying a web app1.

In this post, I will describe some things I've learned about Javascript from the perspective of someone who prefers to use Python as much as possible. This guide is mainly aimed at scientists and others who are not primarily programmers but who may find it useful to make a web app for their main work. It is assumed that the reader is at least moderately familiar with Javascript (Mozilla has a nice tutorial to get you up to speed if not).

Namespaces, encapsulation, modularization, and bundling

Modules in Python make it very easy to encapsulate components without polluting the global namespace. In contrast, Javascript in the browser will make everything a global if you are not careful2. The good news is that it doesn't …

more ...

Getting Matplotlib's colors in order

Matplotlib can be very easy to use at times, especially if you just want to make a simple "y vs. x" type of plot. But when it comes to specialized customization, it can be a bit challenging to find the proper solution. The situation is not helped by the fact that a lot of times, an obscure answer on Stack Overflow no longer works because the API changed.

One common need is to color things in the same way. For example, say you want to plot two dependent variables with widely different scales that share an independent variable. This is often represented by having two separate vertical axes which are colored to match the lines or markers of each data set. The most basic approach is to manually assign colors for the lines and axes, but if using a custom style, such as the ggplot style, we need a way to access the color cycle used if we want to remain consistent with the selected style. Here is the best way I have found which works at least in version 1.5:

colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]

colors should now be a list which contains the colors defined in order by the current style.

more ...

Fitting with lmfit

General-purpose fitting in Python can sometimes be a bit more challenging than one might at first suspect given the robust nature of tools like Numpy and Scipy. First we had leastsq. It works, although often requires a bit of manual tuning of initial guesses and always requires manual calculation of standard error from a covariance matrix (which isn't even one of the return values by default). Later we got curve_fit which is a bit more user friendly and even estimates and returns standard error for us by default! Alas, curve_fit is just a convenience wrapper on top of leastsq and suffers from some of the same general headaches.

These days, we have the wonderful lmfit package. Not only can lmfit make fitting more user friendly, but it also is quite a bit more robust than using scipy directly. The documentation is thorough and rigorous, but that can also mean that it can be a bit overwhelming to get started with it. Here I work through a basic example in two slightly different ways in order to demonstrate how to use it.

Generating the data

Let's assume we have data that resembles a decaying sine wave (e.g., a damped oscillator). lmfit has quite a few pre-defined models, but this is not one of them. We can simulate the data with the following code:

import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(5*x)*np.exp(-x/2.5)

Real data is noisy, so let's add …

more ...