Using Postgres as a time series database

Time series databases (TSDBs) are quite popular these days. To name a few, there are InfluxDB, Graphite, Druid, Kairos, and Prometheus. All aim to optimize data storage and querying for time-based data, which is highly relevant in a physics labs where there are multitude of "metrics" (to borrow a phrase used frequently in TSDB documentation) that naturally lend themselves to time series representation: lab (and individual device) temperatures, vacuum chamber pressures, and laser powers, just to name a few. Ideally, one could log various data to one of these databases and then use a tool like Grafana to visualize it. Sadly, more traditional relational databases like SQLite and PostgreSQL are not (currently) supported by Grafana (although this is now being addressed by a datasource plugin in development).

Nevertheless, there are quite a few reasons to favor a traditional RDBMS over a newfangled TSDB. To name a few:

  • Longevity: SQL has been around since the 1970s and became standardized in the 1980s.
  • Ubiquity: almost every server (web or otherwise) has an instance of SQL installed. If not, SQLite doesn't even require a server!
  • Community: not to suggest there aren't good communities with TSDBs, but the Postgres and SQLite communities in particular are generally quite helpful. Combined with the longevity aspect, any question one may have about how to accomplish a particular task with a SQL database is likely to be easily answerable with a simple web search.

In this post, I will outline a few things I have learned in using …

more ...

Importing one Mercurial repository into another

In the ion trap group, we usually use Mercurial for version controlling software we write for experimental control, data analysis, and so on. This post outlines how to import the full history of one repository into another. This can be useful for cases where it makes sense to move a sub-project directly into its parent, for example.

Convert the soon-to-be child repository

With the Mercurial convert extension, you can rename branches, move, and filter files. As an example, say we have a repo with only the default branch which is to be imported into a super-repository.

For starters, we will want all our files in the child repo to be in a subdirectory of the parent repo and not include the child's .hgignore. To do this, create a file filemap.txt with the following contents:

rename . child
exclude .hgignore

The first line will move all files in the repo's top level into a directory named child.

Next, optionally create a branchmap.txt file for renaming the default branch to something else:

default child-repo

Now convert:

hg convert --filemap branchmap.txt --branchmap branchmap.txt child/ converted/

Pull in the converted repository

From the parent repo:

hg pull -f ../converted

Ensure the child commits are in the draft phase with:

hg phase -f --draft -r <first>:<last>

Rebase as appropriate

hg rebase -s <child rev> -d <parent rev>

To keep the child's changed branch name, use the --keepbranches option.

References

  • https://mercurial.selenic.com/wiki/ConvertExtension
  • https://mercurial.selenic.com/wiki/Phases …
more ...

Running (possibly) blocking code like a Tornado coroutine

One of the main benefits of using the Tornado web server is that it is (normally) a single-threaded, asynchronous framework that can rely on coroutines for concurrency. Many drivers already exist to provide a client library utilizing the Tornado event loop and coroutines (e.g., the Motor MongoDB driver).

To write your own coroutine-friendly code for Tornado, there are a few different options available, all requiring that you somehow wrap blocking calls within a Future so as to allow the event loop to continue executing. Here, I demonstrate one recipe to do just this by utilizing Executor objects from the concurrent.futures module. We start with the imports:

import random
import time
from tornado import gen
from tornado.concurrent import run_on_executor, futures
from tornado.ioloop import IOLoop

We will be using the run_on_executor decorator which requires that the class whose methods we decorate have some type of Executor attribute (the default is to use the executor attribute, but a different Executor can be used with a keyword argument passed to the decorator). We'll create a class to run our asynchronous tasks and give it a ThreadPoolExecutor for executing tasks. In this contrived example, our long running task just sleeps for a random amount of time:

class TaskRunner(object):
    def __init__(self, loop=None):
        self.executor = futures.ThreadPoolExecutor(4)
        self.loop = loop or IOLoop.instance()

    @run_on_executor
    def long_running_task(self):
        tau = random.randint(0, 3)
        time.sleep(tau)
        return tau

Now, from within a coroutine, we can let the tasks run as …

more ...

Background tasks with Tornado

I have been using Tornado lately for distributed control of devices in the lab where an asynchronous framework is advantageous. In particular, we have a HighFinesse wavelength meter which we use to monitor and stabilize several lasers (up to 14 at a time). Previously, a custom server for controlling this wavemeter was written using Twisted, but that has proven difficult to upgrade, distribute, and maintain.

One thing that is common for such a control scenario is that data needs to be refreshed continuously while still allowing incoming connections from clients and appropriately executing remote procedure calls. One method would be to periodically interrupt the Tornado IO loop to refresh data (and in fact, Tornado has a class to make this easy for you in tornado.ioloop.PeriodicCallback). This can be fine if the data refreshing does not take too much time, but all other operations will be blocked until the callback is finished, which can be a problem if the refreshing operation is slow. Another option is to have an additional thread separate from the Tornado IO loop that handles refreshing data. This certainly works, but adds the complexity of needing to use thread-safe communications to stop the thread when the main application is shut down or when other tasks depend on the successful completion of the refresh.

Luckily, Tornado also includes a decorator, tornado.concurrent.run_on_executor, to run things in the background for you using Python's concurrent.futures module (which is standard starting in Python 3.3 and backported …

more ...

Flask and server-sent events

I recently discovered the existence of the HTML5 server-sent events standard. Although it lacks the bidirectional communications of a websocket, SSE is perfect for the publish-subscribe networking pattern. This pattern just so happens to fit in conveniently with writing software to remotely monitor hardware that many people might want to check in on at the same time.

In order to try SSE out within a Flask framework, I put together a simple demo app using gevent. The core of the demo on the Python side looks like this:

app = Flask(__name__)

def event():
    while True:
        yield 'data: ' + json.dumps(random.rand(1000).tolist()) + '\n\n'
        gevent.sleep(0.2)

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/stream/', methods=['GET', 'POST'])
def stream():
    return Response(event(), mimetype="text/event-stream")

This can be run either using gevent's WSGI server or gunicorn using gevent workers.

Update 2016-04-21: There is now a very nice Flask extension called Flask-SSE which handles all of this for you. It additionally supports the concept of channels in order to fine tune what notifications a given client receives.

more ...