alexkehayias - Tumblr blog

alexkehayias · 8 years

Text

Open a terminal with projectile in emacs

I find myself constantly switching between projects and running scripts from the project root. Here’s a little emacs function and key binding to quickly switch to a reusable ansi-term for the project. Enjoy!

;; Enable Projectile globally (projectile-global-mode) (defun projectile-term () "Create an ansi-term at the project root" (interactive) (let ((root (projectile-project-root)) (buff-name (concat " [term] " (projectile-project-root)))) (if (get-buffer buff-name) (switch-to-buffer-other-window buff-name) (progn (split-window-sensibly (selected-window)) (other-window 1) (setq default-directory root) (ansi-term (getenv "SHELL")) (rename-buffer buff-name t))))) (global-set-key (kbd "C-x M-t") 'projectile-term)

#Emacs

3 notes · View notes

alexkehayias · 8 years

Text

Setting up a Nix environment for Rails

Reproducible builds is one of those things that drive developers insane when attempting to debug or set up local/staging/production environments. The number of hours I’ve spent tracking down inane differences in packages between servers or wondering why something that worked perfectly on my laptop causes apocalyptic bit fires in staging/prod is too damn high and there must be a better way.

I’ve gone through many tools attempting to solve problems related to reproducibility including Chef, Puppet, Ansible, and Vagrant. Most of these tools aim to solve automated deployment and address reproducibility orthogonally. If all servers are in the same state and the state of the world has not changed, you can rely on a deterministic outcome.

Both Docker and Nix directly address reproducibility, but in different ways. Docker creates jails for processes to run in and images to use to replicate across many machines. Nix solves the problem in a different way, by building declaratively from immutable packages so that builds are “stateless”. Nix can be used along with Docker to distribute images or by sharing the Nix expression that builds the environment. In my time spent in the world of functional programming with Clojure, the approach of immutability and functional purity to solve state problems (environments are state) strongly appeals to me.

Trying out Nix

You don’t need to use a separate operating system (NixOS) to take advantage of what Nix has to offer. Instead we will use nix-env and nix-shell to create a local development environment that is completely isolated and, hopefully, reproducible anywhere that Nix runs. Here is a simple Nix expression for configuring an isolated Ruby environment.

with (import {}); stdenv.mkDerivation { name = "MyProject"; version = "0.0.1"; buildInputs = [ stdenv git # Ruby deps ruby bundler ]; }

Now run nix-shell in the same directory as the file above to build the environment.

Ruby gems

After setting up the environment you’ll notice that you can’t install gems using bundler (write permission error). Even if you could, you would have a dirty build as in part of the build used the Nix approach and the rest was built with however any developer ever decided to package and install their gems. Luckily, we can use bundix to generate Nix expressions to install gems with the same guarantees as the rest of Nix.

Installing gems with Bundix

Update the Gemfile with your deps.

source 'https://rubygems.org' # OAuth gem 'rails'

Now run bundix inside the shell to generate a gemset.nix file which is automatically generated Nix expressions for installing gems.

nix-shell --run "bundler lock && bundix -d"

Note: On OSX you may get bitten by file encoding problems which will result in an gemfile.nix missing the sources key. See https://github.com/manveru/bundix/issues/8

Update the Nix config to use bundlerEnv, Nix’s built in method for installing gems, and require it in our build.

let rubyenv = bundlerEnv { name = "rb"; # Setup for ruby gems using bundix generated gemset.nix inherit ruby; gemfile = ./Gemfile; lockfile = ./Gemfile.lock; gemset = ./gemset.nix; }; in stdenv.mkDerivation { ... ... }

Native libraries

Some gems will depend on native libraries being present in order to be installed properly. For rails that means we need to add the following build deps:

clang libxml2 libxslt readline sqlite openssl

Note: On OSX you may get bitten by another bug where compiling native libs fails since Nix will use your host system to compile them rather than your environment :(. The most likely solution is xcode-select --install.

Putting it all together

You can find the working Nix rails environment config here https://gist.github.com/alexkehayias/f5538193a40d04c48f872bdad505b740. You should be able to run nix-shell in a directory with all of the files from the gist and get a working environment with rails installed.

Summary

Overall the process was painful due to a steep learning curve with the Nix expression language and working outside of bundler to get the ruby environemnt set up. The OSX issues in compiling native libs and file encoding problems with bundix gives me less confidence I could easily port the Nix config with no changes on other developer machines without also using NixOS to get OS level reproducibility. Still, it feels like the most correct approach to solving the problem and hopefully there will be nicer solutions that evolve for dealing with language specific libraries.

#nix devops ruby rails ror

1 note · View note

alexkehayias · 9 years

Text

Super simple API mocking for frontend development

Single page web application tend to need a lot of mocked out data from an API to make local development possible. This leads us to manually mocking out each endpoint separately, creating dummy data for the response, and wiring up the local development server. It often takes a lot of work to create realistic looking data and even more work to maintain it as we iterate.

Here’s something I’ve used before to mock out APIs for the frontend when another team is hacking away at the API server. I’ve found this strategy makes it easy to change, fix bugs, and work with real data locally. You simply save API responses to file in a well structured directory tree and that’s it. No rewriting routes, no manual changes to fields, no mess!

Mock API Server

Here’s a simple Clojure function and Compojure route that will look through our project resource directory and return a 200 json response:

(defn mock-api-response [prefix-path uri] (let [resource-path (format "%s/%s/index.json" prefix-path uri)] {:headers {"Content-Type" "application/json" "Access-Control-Allow-Origin" "*"} :body (slurp (io/resource resource-path))})) (defroutes routes (GET "/mock/:uri{[\\/\w\d]+}" [uri] (mock-api-response "mock_api" uri)))

Your entire API will now be mocked out prefexed by /mock/ without having to manually specify all the routes and responses. Responses are automatically generated from json files we’ll see next.

Adding endpoints

To add new endpoints and resources, create a directory to match the resource name and save a json file (maybe from prod endpoints). The only rule is that anything at the resource root should be named index.json

Here’s what the directory structure looks like that will mock /things/ and /things/{id}/attributes:

Now we can get a well formed response for our frontend, for example, by hitting http://{hostname}/mock/things/1 and http://{hostname}/mock/things/2/attributes. Cool!

Limitations

Does not handle responses other than 200 so you won’t be able to test the unhappy path of API calls that fail

Query params are ignored and the same response will be returned regardless of changing params

Does not handle auth and this may be something you want to test within your application

Summary

In a few lines of Clojure and a simple directory structure for storing our mocked data, we can produce a mock API server from static files that’s easy to change. It’s not without it’s limitations, but I’ve gotten a lot of mileage out of this strategy when building out single page apps. Let me know how you’ve solved this before!

#clojure

3 notes · View notes

alexkehayias · 10 years

Text

Building the Ergodox Keyboard

I recently built the Ergodox keyboard from the last Massdrop in November 2014 and figured I would share the things I wished I had known before building it and some initial impressions.

The Build

As a first time solder-er this may have not been the best project to learn on, but I made it through with a fully functioning keyboard and that nice gratification of a job done well enough to be ok kind of feeling. There are two instructions that I followed. The fist is the instructions from Massdrop and the other was a helpful, high definition video of someone putting it together. Although there was no narration in the video, I found it to be much more clear when it came to details of how to actually complete the steps outlined in the Massdrop instructions.

Practice soldering before you build

It's a good idea to get some practice in soldering if this is your first time. My dad had an old board laying around and I practiced soldering resistors into tight spaces. This proved to be useful when it came time to doing it for real I was not worried about messing up and could focus on doing it well.

Which way is up?

What they gloss over the instructions is that both PCB boards are identical. I thought I put all the resistors and Teensy board on the wrong side because of the labels "Right Hand Side" and "Left Hand Side" exist on both halves. Follow the pictures exactly, using them as reference for each step. Don't try to use the aforementioned labels because it will get confusing and lead to mistakes.

Surface mount diodes are tiny

They are required on each key and must be facing the correct direction or the key will not work. You need to use a very small amount of solder or you will end up shorting the key by filling in the hole next to where the diode is supposed to go. If you look at the helpful video mentioned above you will see that they are using some sort of conductive glue instead of hand soldering each point. I did not have such fine tools so I had to solder them by hand. The technique I used was to first put a small amount of solder on one contact point only. Then grab the diode with tweezers and place it on top of the solder and use the iron to melt it into the spot. Release the iron first, leaving the tweezers, and then release the tweezers. It will be held in place for you to solder the other side. This technique was very effective and I only had two keys I had to redo. In retrospect though, I wish I had some conductive glue as this was super time intensive and panic inducing.

The USB Cable Dance

Nowhere in the instructions do they tell you how to gut a usb cable. I don't find it to be common knowledge so I had to be a little creative in doing it. In the video, the guy uses some sort of drill to cut through the rubber of the cable and de-shell it. I did not have such nice tools, nor did I trust a drill bit right next to my fingers so I opted on a different technique. First, I took a box cutter, hands far, pressed the blade into the fat part of the connector down the middle. I did that on both sides and used my fingers to peel it away from the metal. The rubber went further in so I had to cut a bit around it, but I was able to remove it. Then I used the razor to shave away the rubber around the wire in the same way you would peel a carrot. Eventually you get to the mesh and can peel it away with your fingers to reveal the wires inside.

Once you have the internal wires you then need to strip them. They are extremely thin so it is very easy to accidentally go right through it. I ended up using a wire cutter to strip it, but it took several tries. Once the wires are exposed then you have to somehow get it through the correct holes, hold it there, and solder it. Best advice I can give there is to leave yourself enough wire length to mess up and try multiple times.

Correcting mistakes

Soldering mistakes can be fixed with a pump or a special thread. I had neither of these things. You can usually reflow the solder to the point where you correct the mistakes. An obvious problem is any places where the solder touches another joint, connecting two places that you did not intend. What I ended up doing was taking the iron and drawing a line across the problem area to separate both places to prevent a short and reflow to make the solder attach to the iron and wiping it off to remove and excesses. Your results may vary...

When it comes time to plugging the keyboard in to make sure it works, try out each key individually and mark which ones are not working. More than likely it is a connection that you should reflow or a diode on a key that is not connecting. I had two keys where the diodes were not touching somehow and was fixed by reflowing to make a better connection.

Keyboard impressions

My initial impressions of the keyboard have been positive and I am especially enjoying the firmware customization via tmk_keyboard. So far my favorite part is being able to toggle a mouse and use that instead of taking my hands off of the keys. You can easily customize the layout and position of the keyboard to fit your needs (albeit with some tinkering).

The split design allows for a more comfortable arm/hand position and I'm slowly getting used to the matrix key layout. The thumb cluster inner keys don't seem to be in a convenient place so I'm not sure how useful they will be. As an emacs user, I'm hoping to gain some productivity from placing ctrl and meta in the thumbs. Also mechanical keys are clearly a winner compared to the Apple chicklet keys after a few days of use.

Overall, I have never been this excited for a keyboard and it mainly appeals to my desire to customize and tinker. If you're not into that and have no emotional attachment to building things then there is no way it will be enjoyable. I write software so I know the pleasure that can come from pain :-)

#mechanical keyboard #ergodox

4 notes · View notes

alexkehayias · 10 years

Text

Simple centered text mode in Emacs

Here's a dead simple emacs mode for centering text in the middle of an emacs buffer (called windows).

(defun center-text () "Center the text in the middle of the buffer. Works best in full screen" (interactive) (set-window-margins (car (get-buffer-window-list (current-buffer) nil t)) (/ (window-width) 4) (/ (window-width) 4))) (defun center-text-clear () (interactive) (set-window-margins (car (get-buffer-window-list (current-buffer) nil t)) nil nil)) (setq centered nil) (defun center-text-mode () (interactive) (if centered (progn (center-text-clear) (setq centered nil)) (progn (center-text) (setq centered t)))) (define-key global-map (kbd "C-c M-t") 'center-text-mode)

Now hit C-c M-t to toggle centering text it in the middle of the window. That's it!

#emacs

2 notes · View notes

alexkehayias · 11 years

Text

Circle based collision detection in Clojure/ClojureScript

Imagine there are two entities on a two dimensional plain. Let's call them the Player and Monster. Let's assume both the player and monster have some "mass" in that they are not a single pixel on the screen. The widest part of their mass is what we need to check if they are in contact with one another (if we bump together it's not like innards touch). How can we tell if some oddly shaped entities are touching in my game?

We need to reduce the problem by reducing the complexity of the entities' shape. Let's imagine a circle is drawn around these two entities that we will use to represent their mass. To determine if the monster and player are colliding we simply need to determine if two circles are touching on every run through the game loop. This is extremely practical and fast.

(defn exp "Raise x to the exponent of n" [x n] (reduce * (repeat n x))) (defn collision? "Basic circle collision detection. Returns true if circles 1 and 2 are colliding. Two circles are colliding if the distance between the center points is less than the sum of the radii." [x1 y1 r1 x2 y2 r2] (<= (+ (exp (- x2 x1) 2) (exp (- y2 y1) 2)) (exp (+ r1 r2) 2)))

The x and y variables represent the entities' location on the two dimensional plain and the radius (r) determines the size of the circle. When we compare the two using a simple formula, we get a boolean that can be checked for things like damage, movement, etc. Pretty fast and simple!

Check out this SO answer for more.

#clojure #clojurescript #game engine

0 notes

alexkehayias · 11 years

Text

Entity Component Model in ClojureScript

I'm working on game engine using ClojureScript and Pixijs for rendering. Object oriented anything doesn't mesh well with Clojure (and ClojureScript) so modeling our game world using objects and inheritance is out. The entity component model, while object oriented, is actually a great fit for a ClojureScript game. Using records and protocols we can create an expressive game engine with all the benefits of immutable data and coordinated state changes. It's super easy to mix and match components to create new elements of a game.

A Simple Entity Component System

There are 3 parts we need to create, entities (defrecords), components (defprotocols), and systems (a vector of functions). Here is an example of a few simple components and their implementation.

(defprotocol Moveable (move [this state] "Move the entity")) (defprotocol Attackable (attack [this state] "Attack the entity") (defrecord Monster [id sprite x y] Moveable (move [this state] (assoc :x 0 :y 0)) Attackable (attack [this state] (js/console "Ouch!"))) (defrecord Player [id sprite x y] Moveable (move [this state] (assoc :x 0 :y 0)))

By building to an interface (protocols) we can mix and match components. Say we want to make a Tree entity that's attackable, but can't move. Simple, just include the Attackable protocol and not the Moveable protocol. No overwriting class methods or mixins or tracing where a method comes from. Protocols have no implementation details, just the names and the arguments. Records are immutable and can implement protocols. They can be used with all the functions that work on hashmaps! We get the flexibility of an object with the best part of Clojure, functional manipulation of core data structure.

Systems

We want to keep all the components that work on our entities in a vector of functions we'll call systems. They will be iterated over through each trip in the game loop. Examples of systems include moving, attacking, AI, etc. Each system function will take as args a list of entities and the game state. It will apply a function (a component method) to each entity and update the game state. Since we are using records and we updating state we need to ensure that any component methods that result in state changes return a new record. Here's an example.

(def game-state (atom {:entities [player enemy]})) (defn movement-system [state] (map #(when (satisfies? Moveable %) (c/move % state)) (:entities state))) (defn update-game-state [state] (-> state movement-system collision-system render-system)) (defn update-game [] (swap! game-state update-game-state)

We've kept the systems functional (they're just functions that take a hashmap) and we've encapsulated the logic of how to move/collide/render into records.

Conclusion

Building a entity component model was straight forward to implement in ClojureScript with some added benefits. All the state changes are coordinated and has a clear update path through systems. Records and protocols allow us to extend our game and create different behaviors that we can mix and match easily without the nightmares of inheritance. Even better we have maintained immutability throughout and remained close to core data structures thanks to ClojureScript!

You can take a look at a less trivial example in my WIP project, The Chocolatier. (still really early!)

#clojure #clojurescript #game engine

3 notes · View notes

alexkehayias · 11 years

Text

Batching Writes in a Storm Topology

A common use case for Storm is to process some data by transforming streams of tuples. At Shareablee, we use Storm for collecting data from various APIs and one of our needs is to write denormalized data to Amazon S3 so that it can be processed later using Hadoop. If we stored each tuple we would end up with a huge amount of tiny files (which would take forever in Hadoop). How do you accumulate data (state!) so you can batch the writes to S3 in a stream processing framework like Storm?

Coming to terms with state

In general, keeping your topologies free from state will save you lots of grief. The key to stateful bolts is to remember the following properties of Storm bolts:

Prepared bolts allow you to maintain state for a bolt

Prepared bolts process one tuple at a time

They can be tried more than once

We can safely accumulate state inside a bolt without worrying that some other process will change it right out from under us. State is local to the process and since it will only process one tuple at a time, it is safe to change the state within the bolt.

Since bolts can be retried (see the guaranteed message processing in the Storm wiki), we need to model our problem in a way that is safe to do more than once. Since our problem is side effected (we want to write to files and S3), anything that relies on this data must know that there may be duplicates. This was not an issue for our use case (we remove duplicates map reduce side), but you may want to stick to writing to transactional databases if it is.

Tick tock clock spout

A simple way to model our problem is to create a spout that emits every n seconds. We assume that in the time between ticks we will be accumulating data. The accumulator bolt will be listening to the clock stream and perform the writes (in batch) only when it receives a tick from the clock spout. Here's a simple clock spout that can be configured in the Storm config:

;; Interval is set in the topology config clock.interval.seconds ;; In the event of a failure, waits until the interval before emitting (defspout clock-spout ["timestamp"] [conf context collector] (let [interval (* 1000 (Integer. (get conf "clock.interval.seconds")))] (spout (nextTuple [] (Thread/sleep interval) (log-message "Tick tock") (emit-spout! collector [(System/currentTimeMillis)])) (ack [id]))))

It does a simple task which is to emit the current time from every n seconds. By default it waits before emitting, so in the event of a failure you don't send premature ticks.

Accumulator bolt for batch writes

The accumulator bolt that will batch writes will perform two tasks; accumulate data and export it periodically. It will be a prepared bolt (stateful) that accepts two different streams (you must specify this when declaring the topology). Here's a skeleton:

(defn gen-tmp-file "Generates a temporary file with the given prefix and suffix. Prefix should have a trailing underscore. Returns a file path. Example: (gen-tmp-file \"my_data_\" \".csv\")" [prefix suffix] (let [tmp-file (java.io.File/createTempFile prefix suffix)] (.getAbsolutePath tmp-file))) (defbolt accumulate-data ["f1" "f2"] {:prepare true} [conf context collector] ;; State is captured in a single atom, adjust as you need (let [state (atom {:last-tick (System/currentTimeMillis) :tmp-file (gen-tmp-file "my_tmp_" ".csv"})] (bolt (execute [tuple] ;; This bolt can take two tuples, one which is a timestamp from ;; the clock and the other which is a data tuple (if (:timestamp tuple) ;; We can assume that the IO is safe as each bolt is a ;; separate process and will not run more than 1 tuple at a ;; time with the given state (output-tmp-file (:tmp-file @state)) (do (log-message "Recieved clock tick: " (:timestamp tuple)) (log-message "Last tick: " (:last-tick @state)) ;; Set the datetime since last tick in case you want to check time since to decide whether to write out (swap! state assoc :last-tick (:timestamp tuple)) ;; Do something with your accumulated file (output-tmp-file (:tmp-file @state) (swap! state dissoc-in [:tmp-files k]))) ;; If this is not a tick from the clock spout its data to accumulate (do (log-message "Appending to file " (tmp-file @state)) (spit (tmp-file @state) content :append true))) (ack! collector tuple)))))

Accumulating the data by appending it to temp files allows us to keep the data on disk rather than in memory. All we are storing in the state is the temporary file's path. We use the JVM's built in utilities to generate unique temporary file names to avoid collisions with other processes that may be accumulating to file. The tick is merely a signal to perform the export of the accumulated data. The storing of the last tick timestamp is not necessary in this simplified example, but would be useful if you wanted to only write if the file gets large enough or a certain threshold of time has passed.

Wrapping up

Using this method you can easily parallelize the accumulation and batch writing of data for your topology. Since we are using a clock we are guaranteed to have all batch writes to performed after n seconds where n is the interval from the clock spout. If we were to decide when to export based on the file size we could get into a situation where data sits there forever.

We used files to accumulate data, but that may not make sense for your situation. If this were to batch writes to a database I would be more inclined to store the data in memory (in the state atom) and have a shorter interval from the clock spout so we don't use up too much memory. Writing to files can be error prone (what if you forget a newline!) and keeping the data structures in memory is much simpler.

In the event of a failure, the tuple could be tried again (for the clock or the data stream) so keep that in mind when modeling your problem. It is also possible that a failure that crashes the topology (Storm has a fail fast design) you will lose data that is accumulated in memory. Writing to temporary files could potentially allow you to recover from that, but I haven't needed to design around that.

#clojure #storm

0 notes

alexkehayias · 11 years

Text

Dead simple feature flags w/Django

There are a lot of great libraries and tools for implementing feature flags. The concept is, showing or hiding features depending on the user. Maybe you want to beta test a feature with a select group of users or only provide it for a certain tier of customers. Here's a dead simple way of implementing this out of the box in Django. Groups.

Each User object has a m2m field called Groups. Groups are meant to be permission groups. Assigning a user to the group means they inherit all the permissions. Forget permissions. We just want to check membership in the group.

Simply create a new group with no permissions and name it whatever you want. Check if the user is in the group to determine whether to show or hide a particular feature. We can do all this in a single DB call that safely defaults to False and monkey patch it onto the User object:

def user_features(user): groups = user.groups.values() feature1 = filter(λ x: x['name'] == "Feature 1", groups) or False feature1 = filter(λ x: x['name'] == "Feature 2", groups) or False return dict(feature1=feature1, feature2=feature2) User.features = property(user_features)

Now we have a dict that we can easily grab from a django template via the user property (request.user.features) or as a serialized object that can live on the frontend.

Using this method I've been able to build up features with beta users or our internal team without exposing them to everyone else. It's dead simple and an admin can easily add or remove the feature for a user without ever touching code.

#django

0 notes

alexkehayias · 11 years

Text

Understanding the #' reader macro in Clojure

Let's say you have a function, foo, used by something in an event loop, bar, running in the background. The problem arises when reloading foo doesn't automatically change what's going on in bar. You would need to reload both definitions for the changes to take effect. In the case of an event loop, you would need to stop it and start it again. Can't we just "inject" the new form right in there without restarting?

Thankfully, there is a special form that will return the var object rather than the value. It's called var and has a handy reader macro #'. This means that reloading foo will automatically use the new version inside bar! No need to remember to reload in order, start, stop, etc. Using this effectively can make building up and growing your app in the repl simple.

Here's a trivial example you can run in the repl to see the difference.

(defn my-fn [] (Thread/sleep 200) (println "hi")) (def my-loop (future (doall (repeatedly my-fn))))

Now try redefining just the function my-fn to print out "hi again". You'll see that it changes nothing.

Cancel the loop with (future-cancel my-loop) and let's redefine it to use the var my-fn using the #' macro.

(def my-loop (future (doall (repeatedly #'my-fn))))

You can now freely reload my-fn and the changes will be immediately reflected in my-loop without reloading my-loop or restarting it.

(defn my-fn [] (Thread/sleep 200) (println "yo"))

Hopefully this prevents some REPL gotchas that you may experience when it comes to passing around functions and reloading code.

#clojure

0 notes

alexkehayias · 11 years

Text

Python vs Clojure, Why programming with values is more intuitive

What would you expect the results of the following code to be?

init_data = {"x": 1, "y": 2} results = {i: init_data for i in ["a", "b"] results['a']['x'] = 50

If you're familiar with Python, you would realize that init_data is a reference to a mutable data structure, a dict. When changing the value of the reference, anything else that is referencing it will also change. Thus the end result is:

{'a': {'x': 50, 'y': 1}, 'b': {'x': 50, 'y': 1}}

Let's look a a similar example in Clojure. What would you expect the results of the following code to be?

(let [init_data {"x" 1 "y" 1} result (reduce into {} (for [i ["a" "b"]] {i init_data}))] (assoc-in result ["a" "x"] 50))

If you're familiar with Clojure, you would realize that init_data is an immutable data structure and it's value is always the same. Even the assoc-in to the result is not modifying the result, but returning a new value. The end result is:

{"a" {"x" 50, "y" 1}, "b" {"x" 1, "y" 1}}

Because I end up switching back and forth between Clojure and Python, I sometimes get bit by the fact that a dict is a reference to a mutable object. In Clojure, I know that I am always using values, which are immutable and always return the same thing. While this was a trivial example, imagine what happens when complexity increases and somewhere in your code you are calling a reference when you think you are expecting a value.

I find it more intuitive that, by default, calling init_data should give me the same result every time (in Python you would have to use dict.copy which has it's own nuances when it comes to references vs values). Why would I expect something I defined to suddenly change? Especially in this case because it is indirectly being changed. I never changed the "b" key in the results so why should it now be different?

#clojure #python

0 notes

alexkehayias · 11 years

Text

When jQuery gets slow

jQuery can grind the browser to a halt when used inappropriately. All the convenience we gain from using jQuery to manipulate the dom can get lost in an innocuous looking, performance destroying for loop.

Imagine you need to add a list of items dynamically to the dom. You might try something like this:

$.each(Array(5000), function(x){$("#mylist").append('<li>item ' + x + '</li>')});

Try it: http://jsfiddle.net/alexkehayias/veYHt/

What you'll notice is how slow the rendering to the dom actually takes. Each trip through the loop updates the dom which is expensive. A simple optimization would be to build up all the updates into a string and make a single dom update:

var accum = ""; $.each(Array(5000), function(x){accum += '<li>item ' + x + '</li>'}); $("#mylist").html(accum);

Try it: http://jsfiddle.net/alexkehayias/tNAYn/1/

This is a trivial example, but imagine you were making many dom updates inside of a loop. Moral of the story is to bulk perform all of your updates especially if it lends itself to accumulation like this example does.

For a non trivial example check out what's going on in select2, a popular library for making presentable multi select fields. A simple modification to build up all the changes in a single dom update shaves the rendering of 200 items from 21 seconds (on my browser) to 1 second. That's an optimization!

Check out the code here: https://github.com/alexkehayias/select2/compare/ivaynberg:3.4.2...bulk_render_multi_select_choices

#jquery #javascript

1 note · View note

alexkehayias · 11 years

Text

Guiding Principals for a Growing Codebase

After working on some large apps I've come up with some guiding principals that are helping me take my programming to the next level.

Toggle between synchronous/asynchronous Whether you are writing concurrent or async code or using distributed workers it should be easy to turn asynchronous code into synchronous. I find myself making things asynchronous after I have written it synchronously. It's much easier to reason about a process from end to end when it's synchronous and production ready when it can be run asynchronously.

Profiling as a first class citizen I haven't found the answer to this yet, but having tooling to easily profile your code means you will actually profile your code. Usual pitfalls of premature optimization apply, but it can be hard to unravel what is slow if you don't have the tooling.

Easy to test Tests are great when they can be written and run easily. No one runs tests that are hard to run. Similarly if your code is hard to test you probably won't jump to maintain the tests.

Seamless local vs production environments Spending time wrangling setting based on your environment is a waste. Make it as easy as possible to not have to deal with configuration all the time between your local, staging, and production environments. You shouldn't need to worry if your app is suddenly going to blast real emails out or overwrite data in production.

Real data sets during local development It should be easy to have mocked or real data while developing locally. Data you use locally should be representative of what is actually there so there are less surprises later.

It's always about moving forward If you're not first you're last (Talladega Nights reference). Everything should be about making the end result of what you're building better. The world doesn't care how incredibly clever your codebase is. You have to always put yourself (and your codebase) in a position to succeed.

0 notes

alexkehayias · 11 years

Text

What's wrong with Dotcloud and why we moved to EC2

I've been using Dotcloud for the past 2 years so I have a large body of experience to qualify what I'm about to say. As a single freelance developer and startup hacker, I was immediately drawn to platforms like Dotcloud and Heroku. These Paas promise to make it super easy to deploy and scale your application. Just like most things, it's great until it isn't.

The Good

Early days, building my side projects and getting them up was a perfect fit for Dotcloud. I could care less about system administration and anything that helped me deploy faster was where I focused. You pay a slight penalty for building your app according to the stack available (on dotcloud the options are plentiful), but the good CLI tool meant deploying and pushing new code was fast. Adding additional servers to scale horizontally was 4 or 5 words away in the command line. The Dotcloud team has always been nice to me and I appreciate everything they've done for helping me.

The Bad

The platform that promises to help you scale doesn't scale. Shareablee is growing at a crazy pace and early days we were on Dotcloud for the same reasons I listed above. Velocity of iterations is the most important part of any startup. The bad parts happen when you move beyond the average needs of a Paas user.

Poor database performance: we were using Postgres with a very high write volume as we pull down a lot of data from various social platform APIs. We were also aggregating certain metrics in Postgres which led to a constant battle with Dotcloud to get the IO performance we needed. We were moved many times to new containers with less activity and were even moved to dedicated SSD drives (via EBS). At one point it was at a breaking point and our database would go down every single day at least once. Moving our DB directly to EC2 resolved all of these problems and I have yet to experience any of the "IO" issues Dotcloud said we were having.

You have neighbors within neighbors: Dotcloud and Heroku are essentially virtualized containers within virtualized containers. They are built off of EC2 and inherit all related problems. These include inconsistent EBS volume performance, periods of intermittent high latency during peak times and potentially "noisy neighbors" who may be consuming a larger chunk of the virtualized hardware. Compound this when you introduce a Paas like Dotcloud. There were wild inconsistencies in latency and complete outages in which an entire container of users would go down. Your neighbors have neighbors within them and there is nothing you can do about it.

Horizontally scaling means nothing on Dotcloud: There is an insistance to horizontally scale your infrastructure. This is a fantastic idea to make things durable, but the fatal flaw of Dotcloud is that the platform itself (usually the container we were in) would go down much more often than any single server failure within your infrastructure. If your horizontally scaled instance is in the same virtualized container and the container goes down guess what happens? Everything goes down. You can't choose what container your instances are in. They are much more likely to go down than you are. Just look at https://status.dotcloud.com/ if you don't believe me. Every time they restart a "live host" every single user in that container will go down temporarily.

No support SLA: You're at the mercy of their mail queue if you are no one of importance. Everyone who has seriously built anything large on Heroku will tell you the same thing. If you're app goes down you need an answer right away. You can't just sit there being down for two hours because no one is answering their emails. Only after some severe outages did we catch the attention of Dotcloud to get on priority support in which a support email would actually have an SLA. Worse yet, you have to pay for them to actively monitor and proactively take action on incidents.

Expensive: Compared to any other hosting, Dotcloud and Heroku are ridiculously expensive. When you are small it's worth the premium to get easy deploys and management of infrastructure, but as soon as you need instances with more memory, you'll be paying out the nose. Worse yet, there is almost no savings while you scale the number of instances or memory needs.

Routing to your instance can also go down: In order to get traffic to your virtualized instance within a container it has to go through a router/proxy. This can also go down. This can also slow down. It's another point of failure you can do nothing about.

Move off of Dotcloud/Heroku if you actually need to scale

Scaling is a nice problem to have, but as soon as you need to, I would recommend moving off of Dotcloud or Heroku. We decided to move direct to EC2 which is much less scary than you think. The gains in performance and flexibility have been amazing comparatively We have over 20-25 servers up at any one time. Tools like Ansible, Fab, AMIs, etc make deployments just as easy as you get from a Dotcloud or Heroku CLI. Bite the bullet, bring on someone who has the chops to get it done with the urgency that a startup demands. You will be much better off and more empowered to adjust your setup to scale according to your special needs. I've never met a non trivial wep app that hasn't needed to customize their setup to scale.

Lastly, I do want to thank for all of Dotclouds help. Their support team is super awesome, it's just unfortunate we were a bad fit for the platform as we were growing so fast. I'm not trying to badmouth them, just trying to make everyone aware of what happens when you go beyond the sweet spot for these services.

#ec2 #dotcloud #scaling

2 notes · View notes

alexkehayias · 12 years

Text

Importing Postgres Data Into Hadoop HDFS

Recently I've been getting a crash course in big data. I ran into an issue where the analytics that used a certain table on out Postgres database was so large it would not run. Like to the point where it would grind a 8gb of memory server to a near halt on a single query. While I had gone through several optimizations, it still was not performant enough at the scale needed. In fact it caused havoc with our host due to IO issues. We're talking 50 million rows or more that need to be analyzed in a scalable way.

Enter Hadoop

I'm sure Hadoop needs no introduction for most, but the great benefits is a scalable way to take massive amounts of data and answer the questions you have.

How do you get existing data into Hadoop?

I had a ton of data that I needed to get into HDFS that was on a Postgres server. I needed a way to dump the data into a useable format so that I could use it properly with Hadoop. Luckily there is an Apache project called Sqoop (yes Sqoop Hadoop is just an amazing combination of words).

Running Sqoop Jobs

Sqoop requires that you have Java and Hadoop installed on the machine that is going to run the job. The great news is that it doesn't have to be on the same machine as your server. You can run it from any machine, but you typically will run it from a Hadoop cluster. That means it doesn't matter where your server lives or who if your host doesn't allow you to install your own packages. You also don't have to dump the data to the machine that the Sqoop job will be running on. In my case I wanted all my data in S3 as a super portable replacement for HDFS.

Unfotunately, it took a lot of trial and error to get it working. Make sure that you install the proper JDBC driver by putting it on your java classpath or put it in the sqoop directory (usually in /usr/bin/sqoop).

Here's a local example using Postgres:

sqoop import --connect jdbc:postgresql://localhost:5432/mydb \ --table my_table \ --username myusername \ --password mypassword \ --direct

This will dump the table my_table on a local postgres server. The --direct part is a speedup that's unique to Postgres and MySql connectors for sqoop.

To dump data directly to S3:

sqoop import --username myusername --password mypassword --table my_table --target-dir s3n://MYS3APIKEY:MYS3SECRETKEY@bucketname/folder/ --hive-drop-import-delims --escaped-by "\\"

Some notes to help your blood pressure:

Make sure your bucket doesn't have an underscore in it. For some reason I wan't able to get it to work if there was an underscore in the bucket.

Make sure you use trailing slashes. This should help your blood pressure.

--direct doesn't work with --escaped-by

If you have freeform text fields in your data and there is \n it will fuck up your Hadoop job if you try to read new lines of data by splitting on \n. Use the --hive-drop-import-delims to parse those out. I don't know why it works, this has nothing to do with Hive, it just cleans up your output.

--escaped-by should always be used if you have freeform text in your data. This will help you parse the data when you run a job because you can ignore important characters (such as a comma) if they are escaped, letting you split data into fields more easily.

Appending Data

What happens if we need to get updated data off of our database? HDFS doesn't really do an "update" to existing data, but we can append all the new rows added past a certain point. To do that we can use the --incremental argument.

sqoop import --connect jdbc:postgresql://localhost:5432/mydb --username myusername --password mypassword --table my_table --incremental append --check-column id --hive-drop-import-delims --escaped-by "\\"

Append Data To S3 Using Sqoop

I wanted to keep my data independent of the cluster that would run the job. This gives you flexibility to run your jobs wherever, on a cluster or on something like Elastic MapReduce. You have the freedom to build and destroy clusters on a whim because there is no persistant data to lose.

There's no point in appending if you don't know the last thing you appended (which row was last to go in the database). Sqoop has a "job" command that will store the last id (or date) of the last items dumped. It then queries only the rows after that point. Very cool!

Unfortunately (again), I couldn't get this to work by dumping directly to S3 using the --incremental argument so I had to use a workaround. First run the command then use distcp to copy the data to S3. For some reason, incremental appends using S3 as the --warehouse-dir does not work even when specifying a -fs argument.

Hopefully this saves you as much time as I spent figuring it out.

First save the job:

sqoop job --create myjob -- import --connect jdbc:postgresql://myaddress:12345/dbname --table mytable --hive-drop-import-delims --escaped-by "\\" --username myusername --password mypassword --incremental append --check-column id --split-by id

Then run it:

sqoop job --exec myjob

Then copy it in a parallelized way to S3:

hadoop distcp mytable s3n://MY_S3_KEY:MY_S3_SECRET@mybucket/folder/

Boom, data on S3 with no dependency on your cluster living another day! I should note that you will want to use a centralized metastore for saving the last rows run by append. See the sqoop docs for more info.

#hadoop #sqoop #bigdata

1 note · View note

alexkehayias · 12 years

Text

Django model creation shortcut with kwargs

Let's say you've got this model in Django that you need to create that has a lot of fields. You've already mapped out the data, but you think you need to write in each argument in the Model.objects.create method. Thankfully, these methods allow **kwargs to be passed in so you can do this:

# Remember that kwargs is just a dictionary that's mapping # arguments to values (in this case model fields to values) MyModel.objects.create(**some_dict) # Or if we are updating it MyModel.objects.update(**some_dict) # Conveniently we could do something like this, # assuming your field names are mapped exactly to the model names. # Sometimes you don't want a model form so this works. MyModel.objects.create(**request.POST) # Oh, but I have a foreign key! No prob. MyModel.objects.create( fk_field = some_object, **some_dict )

#django #python

1 note · View note

alexkehayias · 12 years

Text

Django Gotcha: Duplicate models when using get_or_create

When you're saving some object to the db, there's a gotcha with DateTimeFields that drove me nuts. It took me way too long to figure this out, but if you have a DateTimeField with the option auto_now_add, it will supersede a datetime object in a Model.objects.get_or_create call. This means that if you are tying to prevent duplicates based on date (say you're saving some data from an API that has a timestamp) you will still get duplicates. It's in the docs, but not the implications when using get_or_create. Check out this example:

# Given this simple model class Foo(models.Model): name = models.CharField(max_length=100) date_added = models.DateTimeField(auto_now_add=True) # This will always be true, even if an instance # with this name and today's date already exists bar, created = Foo.objects.get_or_create( name = 'Alex', date_added = some_datetime_obj ) print created # >> True # The problem is, auto_now_add does some stuff that # makes it uneditable, and fucks up my expectations # when using it with get_or_create # Here's the solution class Foo(models.Model): name = models.CharField(max_length=100) date_added = models.DateTimeField(default=datetime.today()) bar, created = Foo.objects.get_or_create( name = 'Alex', date_added = some_datetime_obj ) print created # >> False

#django #python

1 note · View note