Optimizing Rust Binary Size

I develop and maintain a git extension called git-req. It enables developers to check out pull requests from GitHub and GitLab by their number instead of branch name. It initially started out as a bash script that invoked Python for harder tasks (e.g., JSON parsing). It worked well enough, but I wanted to add functionality that would have been painful to implement in bash. Additionally, one of my goals was to make it as portable as possible, and requiring a Python distribution be available flew against that. That meant that I needed to distribute this as a binary instead of a script, so I set about finding a programming language to use. After surveying what was available, and determining what would be the best addition to my toolbox, I selected Rust.

The programming language has a steep learning curve, but has been fun to learn and immerse myself within. The community is great, and I'm excited to find more opportunities to use Rust in the future.

The rewrite took a while to accomplish, but when all was said and done, everything worked, and worked well. I was able to implement some snazzy new features as well as polish some rough edges. However, for how "simple" I felt the underlying program to be, it clocked in at 13 megabytes. That felt like a lot. So, I decided to see what could be done.

Read more…

Supercharging your Reddit API Access

This was originally posted to a former employer's blog. It has been mirrored here for posterity.

Here at HumanGeo we do all sorts of interesting things with sentiment analysis and entity resolution. Before you get to have fun with that, though, you need to bring data into the system. One data source we've recently started working with is reddit.

Compared to the walled gardens of Facebook and LinkedIn, reddit's API is as open as open can be; Everything is nice and RESTful, rate limits are sane, the developers are open to enhancement requests, and one can do quite a bit without needing to authenticate. The most common objects we collect from reddit are submissions (posts) and comments. A submission can either be a link, or a self post with a text body, and can have an arbitrary number of comments. Comments contain text, as well as references to parent nodes (if they're not root nodes in the comment tree). Pulling this data is as simple as GET http://www.reddit.com/r/washingtondc/new.json. (Protip: pretty much any view in reddit has a corresponding API endpoint that can be generated by appending .json to the URL.)

With little effort a developer could hack together a quick 'n dirty reddit scraper. However, as additional features appear and collection-breadth grows, the quick 'n dirty scraper becomes more dirty than quick, and you discover bugsfeatures that others utilizing the API have already encountered and possibly addressed. API wrappers help consolidate communal knowledge and best practices for the good of all. We considered several, and, being a Python shop, settled on PRAW (Python Reddit API Wrapper).

Read more…

Accessing Webcams with Python

So, I've been working on a tool that turns your commit messages into image macros, named Lolologist. This was a great learning exercise because it gave me insight into things I haven't encountered before - namely:

  1. Packaging python modules
  2. Hooking into Git events
  3. Using PIL (through Pillow) to manipulate images and text
  4. Accessing a webcam through Python on *nix-like platforms

I might talk to the first three at a later point, but the latter was the most interesting to me as someone who enjoys finding weird solutions to nontrivial problems.

Perusing the internet results in two third-party tools for "python webcam linux": Pygame & OpenCV. Great! Only problem is these come in at 10MB and 92MB respectively. Wanting to keep the package light and free of unnecessary dependencies, I set out to find a simpler solution...

Read more…

Disabling caching in Flask

It was your run of the mill work day: I was trying to close out stories from a jam-packed sprint, while QA was doing their best to break things. Our test server was temporarily out of commission, so we had to run a development server in multithreaded mode.

One bug came across my inbox, flagging a feature that I had developed. Specifically a dropdown that displayed a list of saved user content wasn't updating. I attempted to replicate the issue on my machine, no-luck. Fired up my IE VM to test and, yet again, no luck. Weird.

I walked over to the tester's desk and asked her to walk me through the bug. Sure enough, upon saving an item, the contents of the dropdown did not update. Stumped, I returned to my machine and spoofed her account, wondering if there was an issue affecting her account within our mocked database backend (read: flat file). I was able to see the "missing" saved content. Now I was getting somewhere.

I walked back to her machine, opened the F12 Developer Tools, and watched the network traffic. The GET for that dynamically-populated list was resulting in a 304 status, and IE was using a cached version of the endpoint. Of course, I facepalmed as this was something I've not normally had to deal with as I usually use a cachebuster. However, for this new application, my team has been trying to do things as cleanly as possible. Wanting to keep our hack count down, I went back to my machine to see what our web framework could do.

Coming from a four-year stint in ASP.NET land, I was used to being able to set caching per endpoint via the [OutputCache(NoStore = true, Duration = 0, VaryByParam = "None")] ActionAttribute. Sadly, there didn't appear to be something similar in Flask. Thankfully, it turned out to be fairly trivial to write one.

I found this post from Flask creator Armin Ronacher that set me on the right track, but the snippet provided didn't work for all the Internet Explorer versions we were targeting. However, I was able to whip together a decorator that did:

To invoke it, all you need to do is import the function and apply it to your endpoint.

from nocache import nocache

@app.route('/my_endpoint')
@nocache
def my_endpoint():
    return render_template(...)

I took this back to QA and was met with success. The downside of this manual implementation, however, is one needs to be religious in applying it. To this day we still stumble upon cases where a developer forgot to add the decorator to a JSON endpoint. Thankfully, code review processes are perfect for catching that sort of omission.

Multithreading your Flask dev server to safety

Flask is awesome. It's lightweight enough to disappear, but extensible enough to be able to get some niceties such as auth and ACLs without much effort. On top of that, the Werkzeug debugger is pretty handy (not as nice as wdb's, though).

Things were going swimmingly until our QA server went down. While the server may have stopped, development didn't, and we needed a way to get testable builds up and running for QA. One of my fellow developers quickly stood up an instance of our application on a lightly-used box that was nearing the end of its usefulness. To get around the fact that the machine wasn't outfitted for httpd action, the developer just backgrounded an instance of the Flask development server and established an SSH tunnel for QA to use. This was deemed acceptable by our QA team of one. We rejoiced and went back to work.

Days passed and our system engineer's backlog remained dangerously full, with the QA server remaining a low priority. Thankfully, aside from having to restart the process a few times, the Little Server That Could kept on chugging. But then disaster struck - we hired another tester! Technically her joining was a great thing; when you bring in fresh blood you get not only another person working the trenches but also an influx of fresh ideas. However, her arrival brought some unforeseen trouble - the QA server would get more than one user! This wouldn't normally be an issue, but at the moment our QA server was nothing more than a single threaded developer script running as an unmanaged process. Oops.

Sure enough the complaints from QA started bubbling forth: "Test is down!", "Test isn't loading!", "Test is unusable!" To preserve harmony in the office, and to hold to our end of the developer-tester bargain, we had to do something (the sysengineer was busy rebuilding a dead Solr cluster). We needed to buy some time, so I started investigating what we could do with what we already had - maybe there was a way to handle the load with the development server.

Flask uses the Werkzeug WSGI library to manage its server, so that would be a good place to start. Sure enough, the docs state that the WSGIref wrapper accepts a parameter specifying whether or not the it should use a thread per request. Now, how to invoke this from within Flask?

This, much like with everything else in the framework, was surprisingly easy. Flask's app.run function accepts an option collection which it passes on to Werkzeug's run_simple. I updated our dev server startup script...

app.run(host="0.0.0.0", port=8080, threaded=True)

... and then threw it back over the wall to QA. I moseyed on over to test-land shortly thereafter to discover them chipping away at a backlog of tasks, with the webserver serving away as if nothing happened.

A few weeks weeks later, the new test server is almost ready for prime time, and our multithreaded Flask server (which should never be used for any sort of production purposes) is still holding down the fort.

Running wdb alongside your development server

We've been using the wonderful wdb WSGI middleware tool at work to aid in debugging our Flask app. The features it brings to the table have helped immensely with development, and it has quickly established itself as a integral part of my toolbox.

One minor issue I had with it, though, was the fact that one needs to manually start the wdb.server.py script before launching a development server. If only I could start and stop wdb whenever Flask did. Some Google-fu introduced me to bash's signal trapping features. After some experimentation, I devised the following wrapper script.

This has been working quite reliably for us. Here's hoping it helps you!

Quick and dirty parameter passing in Angular 2; The Angling.

This is a followup to an earlier post.

After pushing that hack out to the rest of my team, I felt some modicum of pride - I had, through understanding little-used parts (at least, publicly) of a poorly-documented framework, managed to reason out a nice solution.

Days passed and the codebase expanded. One day a teammate approached me and said, "Hey, you know that trick you used to pass the ID into the controller?" "Yes," I replied cautiously. "You didn't need it."

Insert "Duck typing" joke here.

Wat.

As a refresher, here's what I had ended up implementing:

<div data-ng-controller="ParentController">
    <div data-ng-repeat="childGuid in childrenGuids" data-ng-controller="ChildController" data-id="{{childGuid}}"/>
</div>

... which had some companion JavaScript in the repeated controller that read the value of data-id when the view was updated. There was no other way to pass data in to the ChildController, right? Wrong. Let's step back for a second.

Scope in Angular is special. While the framework goes out of its way to make it seem like things are tightly bound to the encompassing controller and that the DOM merely exists for presentation purposes, nothing (especially in web development) is ever that cut and dry.

Quoth the Angular documentation:

Scopes are arranged in hierarchical structure which mimic the DOM structure of the application.

and...

The Model is scope properties; scopes are attached to DOM where scope properties are accessed through bindings.

As my coworker reminded me, the scope referenced within the ChildController isn't bound to the controller, it's bound to the DOM. And, in the case of the repeater, the scope on the repeater node includes the childGuid property.

What the code looks like now...

<div data-ng-controller="ParentController">
    <div data-ng-repeat="childGuid in childrenGuids" data-ng-controller="ChildController"/>
</div>
var ChildController = function() {
    $scope.id = $scope.childGuid;

    var init = function() {
        // do initialization
    };

    init();
};

As you can see, on the second line of the controller, I'm directly referencing the repeated element. You don't need to re-assign it like I'm doing in this example, by the way, calling it directly won't/shouldn't present any issues.

An important caveat: if you think the controller might get used elsewhere, it would be beneficial to guard against instantiation without the required arguments...

var ChildController = function($exceptionHandler) {
    $scope.id = '';

    var init = function() {
        if (typeof $scope.childGuid === "undefined") {
            $exceptionHandler("The ChildController must be initialized with a childGuid in scope");
        }
        $scope.id = $scope.childGuid;
        // continue iniitialization
    };

    init();
};

And with that my hack was dead. This was a good thing as hacks are, by nature, smells that should be minimized.

The moral of the story? Write self-congratulatory blog posts about slightly-hacky solutions. Someone will submit a patch.

Quick and dirty parameter passing in Angular

Don't use this code, instead, read this post and then read the follow-up.

Since my last post I've switched teams at work after completing a massive redesign of all of my old team's webapps. The overhaul saw the unification of the look and feel of the product's core webapps, with a significant amount of instrumentation and modularity to enable the less aesthetically-inclined people within the team to contribute without reducing the overall coherence of the application.

It just so happens that I joined my new team just as they were planning a massive redesign of their frontend! After assessing our users' needs and desires, we set out to redesign our stack. I'm not going to go into the full details of the transition in this post, but we settled on a Python stack running AngularJS, Flask, uWSGI and nginx (a huge improvement, development-wise, from our current JQueryUI, TurboGears, mod_wsgi and Apache system).

Prior to this redesign effort, my only experiences with client side MVC frameworks were with KnockoutJS, a great JavaScript library for manging databinding. However, when you scaled it up to include things like validation, complex objects, RESTful behavior against remote resources, and DOM manipulation, it really started to show its seams. Part of what attracted me to Angular was its similar approach to data binding, but with a stricter separation of concerns between the controller and view layers.

I could continue to wax poetic about our decision, but I'll save that for another post. I just wanted to provide some backstory for this post.

Just the other day I found myself writing a controller that instantiates new child controllers via a repeater...

<div data-ng-controller="ParentController">
    <div data-ng-repeat="childGuid in childrenGuids" data-ng-controller="ChildController" />
</div>

This worked fine. Pushing a new item to ParentController's $scope.children array would generate a new ChildController. However, I soon discovered that what I was attempting to engineer would require each ChildController to have a unique ID, generated by the ParentController prior to the child's instantiation. I attempted to use templating to my advantage in various ways, for example:

<div data-ng-controller="ParentController">
    <div data-ng-repeat="childGuid in childrenGuids" data-ng-controller="ChildController">
        <input type="hidden" value="{{childGuid}}" />
    </div>
</div>

No dice. Why not try broadcasting from the parent scope after I push something to the stack? I feared race conditions. Why not a directive? It felt like too much code for something that should be simple. After reading the relevant portions of the Angular docs several times over, I threw together a nice little hack that got the job done.

<div data-ng-controller="ParentController">
    <div data-ng-repeat="childGuid in childrenGuids" data-ng-controller="ChildController" data-id="{{childGuid}}"/>
</div>
var ParentController = function() {
    $scope.childrenGuids = [];

    $scope.addChild = function() {
        var guid = newGuid();
        $scope.childrenGuids.push(guid);
    };
};

var ChildController = function() {
    $scope.id = '';

    var init = function() {
        //finish initialization
    };

    $attrs.$observe('id', function(value) {
        if (!$scope.id) { //defensive sanity check
            $scope.id = value;
            init();
        }
    });
};

Essentially, what I'm doing is passing the ID in via the controller declaration. However, because of the order in which Angular digests the markup, interpolation of templated attributes happens after the construction of the controller. Because of that, I had to instruct the framework to do some extra processing when the interpolation event fired and the data-id attribute was updated. After that the controller is free to finish its initialization phase.

Happy with the outcome, I pushed my changes out to my team and made a note to revisit this block when I get around to refactoring this section (it'll happen, don't worry).

EDIT: It happened.

Implementing Custom Alert DialogFragments

So, I've been working on further revisions to my app. The most notable of these revisions involves switching action bar implementations from GreenDroid to ActionBarSherlock. While this will benefit me tremendously as I'll be able to use native Android 3.0+ components when running on those platforms, it means that I need to start using Fragments as well as the rest of the Android Support Package.

Steve Ballmer

Refactor Refactor Refactor!

I've be saving much of what I have learned for another post, but I felt like sharing this tidbit right now. One thing that quickly became apparent was that a custom dialog that I was using (with positive/negative buttons) was not going to cut it, as the system dialogs were rendering the button strip differently...

A dialog

The button strip isn't displayed correctly

Read more…

Action Bar and You!

So I've been developing an Android application on and off for the past few months. In the latest iteration of the app, I wanted to make some options that were hidden in a menu more discoverable. The accepted way of doing this in Android is through adopting the Action Bar design pattern. As described in the official Android Developers Blog entry on the matter:

The Action bar gives your users onscreen access to the most frequently used actions in your application. We recommend you use this pattern if you want to dedicate screen real estate for common actions. Using this pattern replaces the title bar. It works with the Dashboard, as the upper left portion of the Action bar is where we recommend you place a quick link back to the dashboard or other app home screen.

Awesome, right? A unified UI paradigm for Android. Let me just add the widget to my Activity... umm, uhh... where is it? Nowhere in the Android 2.x SDKs. Instead, Google decided to kill two birds with one stone and encouraged developers to look at the source code for the Google I/O schedule app. This way people could not only learn how to use and implement the pattern, but they can also see what a well-written Android app is supposed to look like.

That's all well and good, but what about the lazy efficient among us, who don't want to reinvent the wheel and write the same instrumentation every time we make an Activity? Once again, Google to the rescue. With the advent of library projects, the ADT plugin for Eclipse makes importing standalone/reusable components into your existing project easy. After this feature was made available, the number of rich, third-party libraries for Android skyrocketed. Among these were several that had reusable Action Bar controls. After some experimentation, I settled on GreenDroid.

So, GreenDroid

GreenDroid's claim to fame is the notion that you need to alter very little of your existing code to get attractive, functional UI components. Since my project was mostly complete at the time of implementation, this was a huge plus. And, for the most part, it ended up being true.

Read more…