temporaryneckache - Tumblr blog

temporaryneckache · 5 years ago

Text

Diving into Haskell with the Haskell Book

A few months back I was motivated to dive back into Haskell and, after surveying the recent landscape, picked up Haskell Programming from First Principles. It's a book that promises a soup-to-nuts approach to Haskell, staring from the mathematical concepts underscoring the language (Lambda calculus) and moving through to a full-blown production-ready project. I've so far gotten a few chapters in, and, while I haven't learned anything new, the authors' approach to the basics -- data types, constructors, etc -- are fairly conversational and comprehensive, and they do a good job of answering the questions that arise naturally while reading. (I'll ignore the likelihood that these same authors are leading into these questions, and answering them, to make me feel better about myself while reading. Why? Because it makes me feel better about myself, that's why. Well done, authors!)

Reading through this book I reflected on my first runthrough of Learn You a Haskell For Great Good when it came out (iirc it was early 2013). My programming career hasn't swerved headllong into functional programming; I'm coding Ruby now instead of the Python I was slinging back then, but these are still pretty apples-to-apples as far as languages go. I liked LYAHFGG a lot -- it's fun and thorough too -- but while set comprehensions and string operations made sense, I wasn't really making the jump from Python to typeclasses. The book that did help me crack into functional programming concepts was Functional Programming in Scala, not LYAHFGG. Here are some possible reflections why:

Haskell syntax is very spare, rather like poetry to Scala's prose; FPIS, by virtue of it's medium, surrounds novel FP concepts with familiar programming syntax (ie extra parens and curlys). This may not matter to everybody, but it seems to have been a bridge I needed.

Pattern matching, a concept at the core of Haskell's type definitions, doesn't really make sense in languages based on duck typing. Python and Ruby, lacking type constructors, can only can go as far offer some degree of tuple destructuring based on positional assignment. Scala and Haskell are solidly in the world of type constructor pattern matching, which is a vital concept on the road to understanding type classes.

Guards, 'where', and 'let' make sense to anybody with a discrete mathematics backrground, and 'expressions' are familiar from languages like Lisp and Scheme, but none of these are that present prevalent in Python. Ruby does have stronger support for expression syntax (ie assigining an entire 'if' or 'begin/rescue' statement to a variable).

Some other random notes:

The quicksort algorithm is the go-to for demonstrating Haskell's declarative power, and the quicksort section in LYAHFGG does a great job showing that. Algorithms in Haskell translate cleanly from general computer sciance. They tend to be composed definitions, not recipes.

I get that (+3) is a partial function adding 3 to whatever is passed in. I understand that it's really (+) 3. But it'll likely never be as clear to me as (\x -> x + 3), because that explicitness makes more sense. Once again, poetry vs prose.

FPIS nailed home the constructor-pattern matching thing in a way that made sense to my Java-related brain.

#haskll functionalprogramming

0 notes

temporaryneckache · 6 years ago

Text

Working with legacy monolithic Java projects in a module-based system

One of the projects I've been working on recently at work is adapting legacy Java application to fit into a modern, dynamic, parallelized framework. Legacy design aside, we chose this application for a few reasons:

it's open source and mature;

the primary maintainer is easy to communicate with;

it has a well-established academic basis.

Even with this, I don't believe that this last would have been enough if the first and second weren't true; even with the most rigorously established tenents, a project based in solid academia can still falter when pushed to its limits in a production system.

As with many established legacy Java projects, this one was written before modularity entered the province of the professional programmer. This one happens to be a gigantic monolith of Java code that attempts to provide everything from a secure authentication system, to a GUI for editing the data, all with an api interface. As such, it's been a challenge in some cases to incorporate it as an internal component to a broader system, and even more so to integrate it as a redundant component (it's designed to be a single, running, monolith, an adorable trait from the pre-EC2 days).

As we've worked with the system, we've found a few patterns helpful:

Creating an API wrapper: putting a thin API in front of the main engine has done a huge amount to make this system work for us. The API layer can compensate for some of the more annoying features (ie the auth system, which is cumbersome in our fully private system), and it provides a place to standardize access to the system.

Hooks to reflect the system's internal state changes to other components: we were lucky that this particular system has some support to allow for event hooks. We've take this ability and combined it with a publishing mechanism -- in our case, a Kafka queue that acts as an RPC stream (more on this in a later post), which, combined with our standardized API interface, has allowed us to offload complex computations to an external system (see my post on Samza and JRuby), and then push the data back into the system.

Database roles, row-level security, and configuring the database at startup to use this role (thank you PostgreSQL): this idea isn't as clear-cut as the previous two, but it so happens that, given that this system is written to behave as though it is the only system in the universe and have sole access to its database, if we were to scale it up to work in parallel (and we are considering the magnitude of 1000's of parallel instances here), creating a new database for each of these would be a nightmare. Thankfully, PostgreSQL provides the notion of role-based, row-level security, which allows a single database to provided isolated access to many different users based on permissions. While the ability to use this will vary, if possible, this can be a great boon when dealing with many hundreds of instances that don't understand how to share amongst themselves.

Incorporate an external service for cross-system synchronization: also not as hard-and-fast as the first and second items, this can be of great use when trying to figure out when instance A should do what, and instance B is clear to follow. Since we're already incorporating Apache Samza into our stack, we already have ZooKeeper available. ZooKeeper provides some great primitives -- for instance, its notion of ephemeral nodes is a great help for publishing service availability (this is how Kafka does it).

We arrived at these principles after many months of slow integration. My feeling is that the first and second items are basically required when dealing with a legacy system like this; if you don't have this basic flexibility, then everything else is going to be much harder.

#kafka samza yawl java

0 notes

temporaryneckache · 9 years ago

Text

Samza and Jruby, or Streaming Dynamic Typing to the Masses (Pt 1)

PREFACE (AND A SHORTCUT)

If you're impatient, Elias Levy was (to my knowledge) the first to do this and make it publically available. His port used version 0.9.1 of Samza and JRuby 1.7.23 (equivalent to Ruby 1.9.3). His work provided an invaluable starting point. However, if you'd like to work with a different version of Samza or JRuby, or just generally want to have a stronger understanding of how JRuby (and other JVM-based languages) can integrate into Samza's build process, read on!

(I'm only going to as back-to-basics as the official hello-samza repository from the Samza project. The actual core of the project is elsewhere, but we're going to stick with adapting the officially-sanctioned project baseline.)

INTRO

Samza

Samza is a general-purpose distributed stream processing framework that uses queue-based message passing for communication (by default via Kafka and guarantees at-least-once delivery of messages. It stubs out pluggable functionality for message serialization/deserialization, metrics aggregation, node-local key-value storage (by default, RocksDB), and more. It's pretty open-ended in how it can be used -- basically, so long as your message can be serialized, you can do whatever you want with it. The only limits on the content or structure of the message are those imposed the JVM or the subsystems.

On the flip side, this freedom can make implementation an arduous process; since Samza is general by design, it doesn't prescribe how a system should be designed, and thus there are no real tools for topology declaration. That onus lies with the developer; I've personally found it helpful to settle on a basic message-passing structure up front (ie action/payload pairs).

JRuby

JRuby is a project to port Ruby to the JVM. Version 0.9.0 was released in 2006, and it has continued since then to be an actively maintained project with a host of contributors. There are 2 currently maintained release branches: the releases track Ruby 1.9.x, and the 9.x.x.x releases track Ruby 2.2.x. We're going with the latter (9.1.6.0) in this project.

FIRST STEPS

To start, let's pull down the latest version of hello-samza from Github into the local directory jruby-hello-samza:

$ git clone https://github.com/apache/samza-hello-samza jruby-hello-samza

Open pom.xml, the file used to coordinate Maven builds. Let's add the JRuby maven plugin to the <dependencies></dependencies> section:

<dependency><groupid>org.jruby</groupid><artifactid>jruby-complete</artifactid><version>9.1.6.0</version></dependency>

Let's also add some plugins for JRuby->Java source transpilation and downloading of JRuby gems; add this w/in the <plugins></plugins> section:

<plugin><groupid>de.saumya.mojo</groupid><artifactid>jruby-maven-plugin</artifactid><version>1.1.5</version><executions><execution><phase>generate-sources</phase><goals><goal>compile</goal></goals><configuration><generatejava>true</generatejava><generatedjavadirectory>${jruby.generated.sources}</generatedjavadirectory><verbose>true</verbose></configuration></execution></executions></plugin><plugin><groupid>de.saumya.mojo</groupid><artifactid>gem-maven-plugin</artifactid><version>1.1.5</version><configuration><includerubygemsinresources>true</includerubygemsinresources></configuration><executions><execution><goals><goal>initialize</goal></goals></execution></executions></plugin>

While you're there, remove the org.apache.rat configuration from the <plugins> section; it's there to ensure that all source code files have a license attached, which is just going to be a bloody nuisance for our current project.

Let's also make a directory to store the Ruby source code:

$ mkdir -p src/main/ruby

and remove the Java source code and configuration files:

$ rm -r src/main/java $ rm -r src/main/config/*

At this point we can follow the instructions from the Samza project's Hello Samza documentation:

$ ./bin/grid bootstrap $ mvn clean package $ mkdir -p deploy/samza $ tar -zxf ./target/hello-samza-0.11.0-dist.tar.gz -C deploy/samza

You now have a Samza build system running the latest version of Samza with a recent version of JRuby. Those last three lines build your Samza source code and "deploy" it. You'll need to run them every time you make changes to your source or configuration files.

THERE'S NO RUBY LIKE J-RUBY

Now that we have Samza running with JRuby in tow, let's write some JRuby. We're going to start by creating a very simple task, one which will mirror the very basic elements of a Samza stream task. The purpose of this task is ludicrously simple(-minded): get a message from the Kafka input stream and write it to a file. While not in the slightest useful, it will demonstrate the minimum we need to get the two systems cooperating.

Add the following source code to a new file src/main/ruby/SourceStreamTask.rb:

require 'java' java_package 'hello.jruby.test' java_import 'org.apache.samza.system.IncomingMessageEnvelope' java_import 'org.apache.samza.task.MessageCollector' java_import 'org.apache.samza.task.TaskCoordinator' java_import 'org.apache.samza.task.StreamTask' class SourceStreamTask java_implements StreamTask java_signature 'void process(IncomingMessageEnvelope, MessageCollector, TaskCoordinator)' def process(envelope, collector, coordinator) msg = envelope.getMessage File.open("/tmp/message-stream-output.txt", "a") {|f| f.write("#{msg}\n")} end end

The important elements here are:

java_package 'hello.jruby.test'

This is the full package path (important for the properties file).

java_implements StreamTask java_signature 'void process(IncomingMessageEnvelope, MessageCollector, TaskCoordinator)'

The basic Java interface stream tasks must implement to process data. There are others; we'll get to those later.

CONFIGURATION

Each stream task needs to have a *.properties config file (Java properties file format) where to find the class, what systems it works with, etc. We see from the docs that the only truly required attributes are job.factory.class, job.name, task.class, and task.inputs, but let's fill out a few more items to demonstrate some of the basic configurability. Save the following to the file src/main/config/source-task.properties:

# Job job.factory.class=org.apache.samza.job.yarn.YarnJobFactory job.name=source-task # YARN yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz # Task task.class=hello.jruby.test.SourceStreamTask task.inputs=kafka.source-input # Serializers serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory # Kafka System systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory systems.kafka.samza.msg.serde=string systems.kafka.consumer.zookeeper.connect=localhost:2181/ systems.kafka.producer.bootstrap.servers=localhost:9092 # Job Coordinator job.coordinator.system=kafka job.coordinator.replication.factor=1

A quick overview:

job.factory.class -- almost always going to be YarnJobFactory

job.name -- a unique identifier for this task

task.class -- this is the fully-qualified name of our class; for JRuby, it's the contents of the java_package statement followed by the Ruby class name

task.inputs -- the system (kafka) and queue name (source-input) this task will read messages from

serializers.registry.string.class -- class used to serialize/deserialize data; serdes (serializer/deserializers) are instantiated via serde factories, and basically are the translator between the local environment and the queueing system

systems.kafka.samza.msg.serde -- declares that the above defined serde ("string") will be used to translate messages in and out of kafka; we can also describe a separate serde for translating the key data, and serdes can be described on a per-queue basis (more on this later)

The rest of the options are system-level configuration options and can be left as-is. As you can probably see, cranking out more than a few of these properties files can be somewhat tiring; even if you reuse many of the options, this is still a lot of redundancy. Samza currently lacks a standard topology definition mechanism (a la Storm); this is by intent, as Samza aims to be a general stream processing framework (pass in anything, do anything, I don't care).

We now need to update our assembly instructions to include this properties file in the build. Open up the file at src/main/assmembly/src.xml, and find <files> within the <assembly> section. You'll see several entries for the deleted Wikipedia files; remove all of these <file> entries. Add the following to the <filesets> option group:

<fileset><directory>${basedir}/src/main/config</directory><includes><include>**/*.properties</include></includes><outputdirectory>config</outputdirectory><filtered>true</filtered></fileset>

This tells the Pom assembler to read and interpret every *.properties file in that directory. The corrollary here is that every properties file in that directory will need to be valid -- ie, all of the wikpedia *.properties files (if you haven't removed them) will fail, being as we've removed all of their corresponding Java classes.

Quick overview of the other properties for this option:

<outputdirectory>config</outputdirectory> means this file will be copied to the config/ directory in the Samza deployment package

<filtered>true</filtered> means that the variable placeholders will be substituted w/ real values (ie ${basedir} -> /complete/path/to/basedir); a complete list of interpolatable variables can be found at here

COMPILING/RUNNING

We now have a valid, albeit silly, stream task that simply waits for something to come in on its input and writes that message to a file. Let's go ahead and compile it:

$ ./bin/grid stop all ## just in case $ mvn clean package $ tar -zxf ./target/hello-samza-0.11.0-dist.tar.gz -C deploy/samza $ ./bin/grid start all

Now we'll run it using Samza's run-job.sh script:

$ ./deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file:///Users/user-account/hello-samza/deploy/samza/config/source-task.properties

Notice how we're using a complete path to the assembled version of the properties file, not the one we're editing (ie not in src/main/config); the variables in this one have been interpolated by Maven.

Give the task a few (maybe 10) seconds to get running. You can see the input queue for this task listed as one of the available Kafka queues:

$ ./deploy/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --list

Let's go ahead and throw some data at it (using one of the scripts available for interaction with Kafka):

$ echo "This is a great line" | ./deploy/kafka/bin/kafka-console-producer.sh --topic source-input --broker-list localhost:9092

We can see that this was written to our output file:

$ cat /tmp/message-stream-output.txt

Hooray! A totally pointless, bare-basics demonstration of writing a Samza task using JRuby. Next, we'll actually do something useful.

0 notes

temporaryneckache · 10 years ago

Text

Reflections on Freelancing, mid-2015 edition

Much of my professional working life has been spent in an office of some sort. I've worked for bigger companies (several floors' worth of office space) and smaller startups (just one room!). There are some positive aspects to this kind of life:

Social interaction: I've been lucky to work with many very bright people. I've gotten to participate in interesting discussions on a wide variety of topics, be part of project planning meetings, maintain a closely-coordinated pace

Focus: Offices can be great for working! Being in a space that is setup to facilitate working can reduce distraction and increase productivity.

Time management: going to/from work really helps with breaking up the day into working and non-working portions. (I know there's research that this line is blurring).

Although these are substantial benefits, working in the office can also bring challenges:

Social interaction: Usually, when I'm "going to work", I want to work. I mean, I want to get work done when I want to get work done, and at these times productivity is more important than sociability. Some places are great for this, but there's always the chance that a PM is going to walk up to your cube and distract all of that carefully created progammer-state right out of your head.

Focus: You have to focus while programming. Really. A lot of abstract little details are floating around in your head at once. I've heard that some offices are really well set up for allowing programmers to dive into a task uninterrupted (i.e. Microsoft, FogCreek), but much more often, it's an open office plan with little intellectual privacy (Facebook being the most recently notorious example).

Time management: Productivity is generally higher when you're prepared to work, not always when you're expected to work. Everybody has their own schedule for this.

A little over a year ago, I decided to make a go of it doing remote development full-time. I'd previously taken some gigs between jobs; this time, however, I didn't have any immediate leads, and so I struck out into the vast and messy world of freelancing websites. It was a rocky start; the better known of these (Elance, oDesk) are review- and rating-driven, which tend to bias towards users with previous positive reviews (a sort of herd instinct). Wanting to give it a shot, I took on a few short-term, low-paying projects to test the waters and build my profile.

One thing that immediately struck me about this process was the flood of worker candidates offering rates well below my local market rate. This is a commonly quoted problem with these sites: the torrent of job-seekers offering $5/hour for what -- at least on the surface -- seems like equivalent work. Python development is Python development, right?

Dear reader, please believe me when I say: I am a professional software developer. Really. And like any experienced tradesperson, when you come to me with a problem, I'm going to draw on my skills and experience to help you find a solution. I've worked with web frameworks, databases, queueing systems, cloud clusters, authentication systems. So when I look at these jobs and see 30 applicants with technofuture organization names offering barebones rates, it feels...muddied. Conversely, there are also so many potential employers asking for something like "Bitcoin exchange site- $500" that I sometimes lose faith in humanity.

While recently checking out the current freelancing landscape, I came across TopTal. TopTal seems be a curated freelancing site; they profess to rigorously screen both freelancers and clients in an effort to maintain a high potential for quality interaction. No more $7/hr client meets $250/website employer -- this site promises to connect people that are serious about doing work with people that are serious about their work requests.

I've only just begun the application process, but so far, things already seem much improved over <>. The first step in the screening process involves filling out an application form. This form has questions covering work history and experience, code samples, and personal acheivements, as well as desired work and culture. This is just step 1 out of 5 (the others include coding tests and phone interviews, which is also a promising sign), but I already feel like this process has more focus than many others out there.

So how will things go? Will I make it in, and will I like it if I do? I'll try and post updates as things go along.

0 notes

temporaryneckache · 11 years ago

Text

Endpoint autodocumentation in flask

Every once in a while, when writing, say, a REST endpoint or something, you'll want to provide some way to get usage documentation across to potential users. Well, if you're using Flask and Python, and creating function docstrings as you go (and you should be, right?), this can be a fairly simple matter. Observe:

@app.route("/usage") def usage(): return make_response("\n\n".join(["%s\n%s\n%s" % (r.rule, "=" * len(r.rule), func.func_doc) for func in func_list for r in app.url_map.iter_rules() if r.endpoint == func.func_name]), 200, {"Content-type":"text/plain"})

Given a function list func_list of route endpoints, this will extract the route data from the flask app and print it in a friendly, monospaced format to whomever loads http://yoursite/usage. Easy-peasy!

You can fork/improve this code snippet here.

0 notes

temporaryneckache · 11 years ago

Text

Webapp rearchitecture

Some years ago, as I was dipping my toe into the broad landscape of web programming, I (as with many others) realized that there were certain advantages to having my own website to tinker with unbounded by the confines that a paying job places on one's work. With my own website, I could design what I wanted and try out whatever different technologies caught my fancy. Indeed, having one's own website, in addition to the obvious benefits of giving an individual a certain citizenship on the Internet, is somewhat akin to owning a project house -- you can pretty much build on it however you see fit, and whether it stands or falls is (largely) your own business.

2008

I was working at Meetup.com. Meetup at that point was just starting to peek out from under its Tomcat/JSTL roots, but was still built up as a monolithic Java app. My primary language was (and still is) Python, so, being as I hadn't built a website of any complexity before, I took my cues from what I knew and decided that I, too, would build a monolithic web app. Searching around through the Python-based frameworks available at the time, I settled on what seemed to be the most powerful and flexible of the ones I encountered: Cherrypy.

At the time, Cherrypy was probably a good decision. Cherrypy is a layered HTTP application framework, with abstractions ranging from full-fledged dispatchers down to a bare WSGI server. It's a flexible, object-oriented system that allows a very Java-esque style of programming, i.e., it encourages the subclassing and use of various system fundamentals (e.g. the Application itself). While other frameworks encourage subclassing certain bits and pieces to complement well-defined functionality (e.g. subclassing form processors in Django, Cherrypy really allows one to build a web app of any sort, from Blog to API to basic CDN, no holds barred.

Cherrypy was a great learning platform, and, along with some help from Cheetah and SQLObject (both abandoned now, it would appear), I wrote a fairly reasonable first-stab at a big, complex web application with lots of request-munging and inline API interfaces. Trouble was, I hadn't heard the piper's song of regression/unit testing yet, and so every deploy was one mis-click away from dumping a stacktrace to the page (in debug mode only, of course!). Still, I had my development and production environment isolation and a one-step deploy, and so this system lasted me pretty well for several years.

Today

Years passed with minimal changes to the basic application itself. I was using Tumblr via API calls as a CMS for my blog (and I still do, dammit!), Flickr via their API for my photos, and some other APIs just to show off (github and del.icio.us, then diigo when the latter ate the toothpaste), and so I was able to update content and do style tweaks here and there without too much fear of breakage. A few years and jobs on, I decided I wanted to dig back in and put up some new content. However, I really didn't want to re-learn the mess I'd written while getting the site up the first time, and the lack of regression tests terrified me.

So, seizing on the unparalleled opportunity which being master of one's own dominion offers, I decided to do a full rewrite. This time, however, I wanted something more robust and flexible, and so decided to go the loosely-coupled system of modules approach, meaning independent components linked via APIs or XMLRPC or whatever protocol I feel like using that day.

The structure

There is an ecosystem of (mostly) mature tools available now which readily support a distributed paradigm, like Puppet, Chef, Fabric, Salt, et al. In order to keep things flexible, and to allow myself creative room, I've settled on a basic routing layer which delegates requests to other components, retrieves the data, and returns the results. It's simple, robust, flexible.

Two tools that have really helped with this are Fabric for deployment and lxc Docker for application isolation (more on this in a future post). (An additional unsung hero is rsync, ever present and usually sidelined as "old reliable" these days.) Docker is a very young (<1 year) project, but it's partially derived from an existing production system (Dotcloud). These two projects allow rapid iteration of multiple components in lockstep, and, as a bonus, make it fairly simple to stage isolated testing environments, as well. (More on these details in a future post.)

Summary

The site itself is still (always) in development, but I think that this component-wise approach will allow an iterative system that will allow buggy modules to fail with jeopardizing the site as a whole. While not as declarative as it could be, Flask works well as a basic routing layer for a reasonably pluggable infrastructure. Docker is an amazing tool, allowing an entire system's-worth of dependencies to be easily constructed and deployed, over and over again.

0 notes

temporaryneckache · 12 years ago

Quote

In the realm of ideas, everything depends on enthusiasm; in the real world, all rests on perseverance.

Goethe

0 notes

temporaryneckache · 12 years ago

Text

Realtime Metrics Using Ordered Sets and Redis

At IQ Engines, our system generates several different types of features when we process images; with the release of our SmartAlbum API, we wanted to be able to show what is happening across multiple feature detectors in as near real-time possible. Additionally, we have the constraint that we'd like to keep a record of all events; in other words we want this to be as lossless as possible.

I've found Redis to be a great data storage tool; it's very fast, robust, mature, and provides operations at the a data structure level. It pretty much takes your Intro to Data Structures class and turns it into a database. It's the swiss army knife to MySQL's hammer.

Prior research

Before rolling our own, we took a good look at another solution out there, Spool's metrics tracking. Their method was clever, and involved using Redis as well (the bitset (now SETBIT) and BITCOUNT operations), which -- provided one uses an hash with reasonably low collision potential -- would total up unique events within a certain window. The windows could then identified via the string name, e.g., unique_users_2007_12_04 (unique users for December 4, 2007). Redis' bitset operation runs in O(1) time, and its bitcount operation runs in O(N) time. Therefore, each time a user logs in, it's an operation of ultimate simplicity to record, and retrieval (a much less-frequent task) takes respectably little time as well. Spool also gets clever with calculating other time frames using Redis' bitop operation, performing unions across multiple time frames to retrieve longer-spanned metric information.

Although this is a great example of Redis' power and the tools it offers, there are a few caveats with this approach:

It only works for events that have a unique identifier; and

It requires defining the metrics' window in advance.

We wanted to be able to reconstruct the statistics in various ways post-recording; in short, to record events, not summaries.

Our approach

I wanted a system that would record all of the events that occurred, and would do so on objects for which computing an unique hash might be difficult. I basically wanted a running stream of events, categorized by time but not prescripted into certain windows.

Enter ordered sets

Redis offers native operations on hashes, lists, strings, and (un)ordered sets, among others. It's the ordered sets that we utilized in our system. Ordered sets arrange unique members by (possibly repeating) scores, e.g., Bob->1, Susie->2, Jean->2, Fred->3, etc. Redis' provides an efficient array of tools for working with ordered sets, including O(log(N)) insertion and O(log(N)) retrieval. Yes, that's right; although it's a bit more expensive to insert into a set than a string, retrieval is actually faster.

The basic idea is this: for each event, give it a unique name and assign its score to be its timestamp. Then, when you go to retrieve the events, you can use ZRANGEBYSCORE to get just the events that occured between time A and time B.

The obvious caveat here is that every event must have a unique name. However, this name can be anything, from something timestamp-ish, to a random number, to - my favorite - a ticker stored in Redis that provides a unique number every time using the INCR command. (With this last one, you are absolutely assured of no collisions, provided the relationship between the db storing the ticker and the one recording the events is consistent.)

Example

So let's say we're recording events for when somebody submits a blob to our API. We have a field in Redis called blob:submit, and our counter is called blob:index. In Python:

## we're using Andy McCurdy's redis-py, ## https://github.com/andymccurdy/redis-py from redis import Redis from time import time conn = Redis(host=<host>, db=<db>) def add_blob(*args, **kwargs): ## stuff stuff stuff ## add it to the metrics database! conn.zadd('blob:submit', conn.incr('blob:index'), time())

That's it! We've successfully recorded an event of type 'submit' for object 'blob'!

Retrieval

Ok, so we're plugging along, recording data. But data's pretty meaningless unless you can display it in a pretty manner using d3, right? Well, of course. Now let's assume you want to retrieve the data from the last day using a window size of 5 seconds, left-aligned (meaning here, for a window count of N > 1, the first window is the full window size and the last may be cut off). Let's pull a list of event counts:

conn = Redis(host=<host>, db=<db>) def retrieve(window, timespan): """ retrieve a sequence of windows of size <window> over the course of the last <timespan> seconds; time.time() returns <second>.<millisecond>, so our scores are in seconds """ now = time.time() ## keep a consistent reference data = conn.zrangebyscore('blob:submit', now-86400, now) ## optionally chop into windows windowed_data = [] ## python's forced-int in the division below is actually helpful here for ## window alignment for i in xrange(timespan/window)): item = [] while len(data) > 0 and data[0] < window*i: item.append(data.pop(0)) windowed_data.append(item) return windowed_data

And voila! data broken up into windows.

If we're just interested in totals, and not the datapoints themselves, it might be tempting to use ZCOUNT, like so:

def retrieve(window, timespan): now = time.time() windowed_data = [] for i in xrange(timespan/window): windowed_data.append(conn.zcount('blob:submit', (now-timespan)+(i*winddow), (now-timespan)+((i+1)*window))) return windowed_data

It certainly seems easier, but we have to remember that ZCOUNT, like ZRANGEBYSCORE, is a O(log(n)) function, where n is the size of the entire set, including all of the data outside of the interesting range; thus, where our first example only requires the entirety of the data once, the second example requires it once for each window. Yikes. Instead of using ZCOUNT, we can modify the loop in the first example a bit:

for i in xrange(timespan/window)): count = 0 while len(data) > 0 and data[0] < window*i: count += 1 data.pop(0) windowed_data.append(count)

And there we go -- the count, instead of the data itself.

Caching

As with any system where aggregated information is regenerated over and over, we want some caching to keep things lean in subsequent requests. Well, not suprisingly, Redis can help us out here! A simple, solid option for caching static data is the Redis hash, and it's attendent functions HGET and HSET (and, if we do it right, the bulk equivalents HMGET and HMSET).

Let's go ahead and write some simple caching code. We don't need anything complex; <type>:<datatype> should be sufficient.

from colections import defaultdict def get_cached(type, span, window): """ In this example, we are going to return all of the timestamps. This could be easily modified to return a simple count of events, too. """ now = time.time() values = defaultdict(list) missing_indices = [] for step in xrange(now-span, now, window): ## retrieve from our cached hash, <type> => <start>:<size> cached = conn.hget('statcache:%s' % type, '%s:%s' % ()art, window)) if not cached: missing_indices.append(step) else: values[step] = cached replacement_indices = missing_indices[:] ## make a copy for modification ## now cache all values that we don't have if len(replacement_indices) > 0: data = conn.zrangebyscore('blob:submit', now-span, now, withscores=True) ## we can assume the data is ordered for d,tstamp in data: if tstamp >= replacement_indices[0]: while len(replacement_indices) > 1 and tstamp >= replacement_indices[1]: ## move to the next window replacement_indices.pop(0) ## we've advanced to the relevant window values[replacement_indices[0]].append(tstamp) ## now cache the new values [conn.hset('statcache:%s' % type, '%s:%s' % (start, window), values[i]) for i in missing_indices]

Look pretty good so far, but we're leaving out one detail that's crucial for consistency: key normalization. In this case, we need to normalize our cache indices somehow, or else the cache jumps all over the place. For example, if we have a 5 second window and timespan, and we poll for it every second, then we will get a new set of data every time, i.e.:

## assuming instantaneous calculation now = time.time() get_cached('stat', 5, 5) ## returns [now-5, now] time.sleep(1) get_cached('stat', 5, 5) ## returns [now+1-5, now+1], oh noes!

We can correct this problem with some simple window alignment. Let's add the following to the beginning of the function above:

... now = time.time() offset = now % window now = now-offset ## now everything will be aligned! ...

Now, given the same window size, we will be able to use the same cached values moving forward, and our beautiful d3 graphs will transition smoothly.

Further improvements

It is probably obvious that storing all of this information could take a whole bunch of memory. Redis is an in-memory database, so, depending on how beefy your machine is, this could become a problem sooner or later.

This problem could be mostly eliminated by periodic preening. As a simple example, let's say that you only need to know about events in the last month. Let's also assume that the only window size being employed is the 5-second window (obviously this can change depending on timespam). In that case, Redis' ZREMRANGEBYSCORE could be employed:

import re ## we only want the latest month's data now = time.time() one_month = 2592000 conn.zremrangebyscore('blob:submit', '-inf', now - one_month) ## of course we want to remove the cached values, too offset = now % 5 keys = hmget('statcache:%s' % type).keys() ## let's pipeline this to keep roundtrips down pipe = conn.pipeline(transaction=False) for k in keys: cur_start = \ re.match('^(?P<start>.*):(?P<window>.*)$', k).groupdict()['start'] if cur_start < now - one_month: pipe.hdel('statcache:%s' % type, k) pipe.execute()

You can run this (or some variant) however often you feel is necessary. Additionally, you can cache to different time scales for different caching periods, and clean those out accordingly; for example, you may want 5 second windows for recent data, but only need 1 hour windows for the last year, so cleaning out only 5 second windows for anything older than a month, and everything for windows older than a year would be appropriate.

Conclusion

Redis provides a very powerful toolbox for doing all sorts of neat tricks. We've coerced the ordered set operations into doing our bidding for tracking events across time; more generally, it can be used to track any numerically-ordered set of unique events. More advanced usage could use UUID's as the keys into a more complex dataset, and then index into that using several metrics via the ordered set ops. And this is just one data type; I encourage everyone to check out the Redis docs to see what else this amazing tool can do.

0 notes

temporaryneckache · 12 years ago

Quote

Good design is intelligence made visible.

Le Corbusier

0 notes

temporaryneckache · 12 years ago

Text

Fully closed, oxygen-free wort transfer, Part 1: Overview

Homebrewing has really come into its own for equipment as of late. It's rather remarkable that with 3 keggle conversions (cheap for stainless steel!) and 5 10 20 uncountable hours of labor, a single person in the comfort of their backyard can create some remarkable tasting beer and then drink it on draft from their own kegerator. In spite of this, our means of transferring beer from one container to another -- a vital part of the process -- are still rudimentary. Unless you own a conical, you'll find yourself dealing with bottling buckets, racking canes, or one of the varied type of autosyphons. All of these suffer from a few drawbacks: * it's hard to purge with CO2, thus almost ensuring some [oxidation](http://en.wikipedia.org/wiki/Alcohol_oxidation); and * these process are cumbersome to seal fully, allowing potential airborne contaminants into your brew (fruitflies are never distant enough when transferring) At [Barley to Bottles](http://www.facebook.com/BarleyToBottles), we've been tuning our system bit by bit. We've got a three-tank [all-grain](http://www.howtobrew.com/section3/index.html) system going (HLT, mash tun, and boil kettle); our mash and boil kettle (which doubles as hot-liquor heater for now) both have thermometers and separate burners; and we've got a pump-and-tubing system for moving liquids around, including a sweet whirlpool-esque recirculating cool-down system (more on that later). Up until recently, the weak point in our system was how we transferred between fermenters, namely, carboy-to-carboy and carboy-to-keg. The auto siphon, although basically a miraculous advancement compared with the old take-a-shot-of-whiskey-and-suck method, is still open to the environment and not immune to oxidization. Thus, after much thought and some refinement, we've worked out a (mostly) closed system that uses CO2 to purge/push fluid. This has allowed us, in our post-mortem on each batch, to confidently eliminated yet another vector for off-flavors. ## Components ### Carboy-to-carboy * 2 stainless steel racking canes * Orange carboy hoods that fit the tops of the carboys (NOTE: 5/6 gallon carboys use a different size than 6.5 gallon carboys!) * 3/8" ID food-grade tubing, enough to reach between carboys (PVC/vinyl works fine) * 2 hose clamps ### Carboy-to-corny keg * 1 stainless steel racking cane * 1 orange carboy hood (that fits the carboy in question) * 3/8" ID food-grade tubing for carboy-keg distance * 2 hose clamps * "In" ball lock attachment (I use flare fittings -- they fit nicely into 3/8" tubing -- but barbs will work) ### Both * 1/4" male flare to 1/4" barb adapter * 1/4" female flare to 1/4" barb adapter * hose clamps to get female flare, tank hooked onto gas line * 1/4" ID gas line * CO2 tank/regulator etc * inline HEPA filter (optional) plus hose clamps * some sort of screen filter (optional) ## Fundamentals A few ideas that have guided the development of this process: 1. We're trying to keep containers as sealed as possible; with kegs this is easy, but with carboys you'll need to be a bit more creative (sanitized jar lids over the tops work for us) 1. Every space that will contain beer for any amount of time needs to first be sufficiently purged with CO2 1. CO2 is heavier than oxygen (whew!), so adding beer to the bottom of the carboy will push oxygen out the top 1. CARBOYS EXPLODE!!!! TURN YOUR REGULATOR ALL THE WAY DOWN BEFORE OPENING THE MAIN TANK VALVE!!!! (This will make sense in the next installment.) To be continued!

0 notes

temporaryneckache · 14 years ago

Text

SQLAlchemy and MySQL connection dropping

I've finally solved the problem of an error that's been popping up. When the SQLAlchemy+MySQL app I've been writing is left over night, and then is used again the next day, an exception is thrown:

OperationalError: (OperationalError) (2006, 'MySQL server has gone away')

This has been baffling me for a while. Some Internet research revealed that MySQL has a time limit on how long it allows a connection to stay open. By default, this is 8 hours. Additionally, to change the limit appears to require some tweaking of the MySQLdb Python module, which is a bit dirtier than I'd like to get if possible.

After digging around a bit I found that SQLAlchemy allows a parameter to be passed when creating a new engine via the pool_recycle option. The example code is as follows:

engine = create_engine('mysql+mysqldb://...', pool_recycle=3600)

The only sensible units here would be 3600 seconds, i.e., 1 hour. After using this directly, I was still getting the same error. This was frustrating me to dickens. Today, just for the hell of it, I spent some time playing around with this parameter. I set pool_recycle=1 on my engine objects. After futzing a bit and then letting it be, lo and behold, but what message should I begin to see in my server logs, but:

sqlalchemy.pool.QueuePool.0x...e0ac Connection <_mysql.connection open to 'localhost' at 9fe7174> exceeded timeout; recycling

Long story short, it turns out that the pool_recycle time is in minutes, not seconds. Definitely a minor discovery, but a significant boon when the alternative is that the entire server comes to a halt when not corrected for.

update 4/8: this seems not to be the case when using the engine explicitly. Perhaps it's only when using scoped_session objects, or maybe there's something more complicated happening with the session.

0 notes

temporaryneckache · 14 years ago

Quote

Percepts and concepts interpenetrate and melt together, impregnate and fertilize each other. Neither, taken alone, knows reality in its completeness. We need them both, as we need both our legs to walk with.

William James (quoted @ http://c2.com/cgi/wiki?ThinkingOutLoud)

0 notes

temporaryneckache · 14 years ago

Text

SQLAlchemy across multiple databases

A central part of my current job involves working with a large amount of tracking data stored across multiple servers. We have a central database that houses user data, and we want to pull the tracking data in and associate it with the user data. This is a pretty common problem in business analytics.

The tool I've turned to for this is SQLAlchemy. SQLAlchemy is an advanced object-relation mapper (ORM) for Python which allows several degrees of database-model decoupling. SQLAlchemy also allows session scoping (one db-session per process), merging sessions into open transactions, data sharing across databases, and a whole slew of features that I have yet to discover.

In this particular case, the tracking data is being imported onto the same local server as the user data, but is being kept in a separate MySQL database. The data is being kept separate following a separation-of-concerns approach: if the tracking system changes, or if how the data is accessed changes, I'd like to be able to modify it without affecting other things that the user data is being used for.

SQLAlchemy uses an 'engine' to describe a connection to a database, 'metadata' to describe data about the table-data, and a 'Session' to access the data. When a query is made, it is made through the Session. So far as I can tell, either the Session or the table metadata can be employed to determine which engine the table model is associated with.

The main steps necessary for setting up a cross-database environment are roughly as follows:

define one engine for each database

associate metadata w/ the engine (one metadata object per engine)

define/reflect your tables with all of the appropriate foreign keys

setup a session to access the tables

In the following example, we've got a setup where we're tracking how many french fries a customer at a fast food chain (of your choice) eats. The local table is Customer, with all of the information we care about each customer, and the remotely-imported table is a record of each fry consumed, Fry. (A single Customer might go to several different restaurants and eat multiple french fries, so we've got to be open to sourcing the data remotely.) We're eventually going to integrate this all into an enterprise-calorie-tracking system, so we want to be able to associate entries in the Customer table with several Fry entries.

from sqlalchemy import MetaData, create_engine, Table from sqlalchemy.schema import ForeignKey, Column from sqlalchemy.types import String,Integer,DateTime from sqlalchemy.orm import (mapper, relationship,\ backref, scoped_session, sessionmaker) customer_db = "mysql+mysqldb://db_user:db_pass@db_host/customer_db" purchase_db = "mysql+mysqldb://db_user:db_pass@db_host/purchase_db" ## create engine connections customer_engine = create_engine(customer_db) purchase_engine = create_engine(purchase_db) Session = sessionmaker() # AFAIK, this can be a scoped_session as well ## make some metadata associated w/ the engines customer_meta = MetaData(customer_engine) purchase_meta = MetaData(purchase_engine) ## ## now create some python objects to map to the database; models are ## created first then mapped to tables/table-objects ## ## we're defining the local objects here, and using SQLAlchemy's 'reflection' ## capability to get the schema from the imported data ## class Customer(object): def __init__(self, **kwargs): for k in kwargs.keys(): ## just take whatever is passed, no error checking for now setattr(self, k, kwargs[k]) def __unicode__(self): return "%s, %s" % (self.last_name, self.first_name) def __str__(self): return self.__unicode__() class Fry(object): ## ...similar to above; we're only focusing on the ## Fry table from the purchase DB ## first bring in all of the tables from the external database; we'll ## reflect all of the tables and then fiddle w/ the ones we care about purchase_meta.reflect() fry_table = Table('fry', purchase_meta, useexisting=True) ## useexisting=True b/c we already imported it ## map the table instance to the python model class; make sure ## we define a primary column mapper(Fry, fry_table, primary_key=[fry_table.c.id]) ## now define our local table, tying this one to the customer_engine customer_table = Table('customer', customer_meta, Column('id', Integer, primary_key=True), Column('first_name', String(128)), Column('last_name', String(128))) ## map the table to the Customer class, adding a property ## for a foreign key into the Fry table; assume the ## customer_id col exists mapper(Customer, customer_table, properties={ 'fries':relationship( Fry, backref='customer', primaryjoin=customer_table.c.id==fry_table.c.customer_id, foreign_keys=[fry_table.c.customer_id]), ## foreign_keys is a list }) ## create the local tables customer_meta.create_all()

This will setup relationships between the tables and object-model relations to the tables. Each table can be accessed in the normal way:

## create a session instance session = Session() ## do any sort of query session.query(Customer).filter(Customer.last_name=='Jones').all() session.query(Fry).filter(Fry.purchase_date>datetime.now()).count() \ * calorie_per_fry

The tricky bit here is doing cross-database querying. The Session object can track which object is related to which database (it can be configured explicitly by passing a 'binds' dictionary during Session() instantiation, i.e. session=Session(binds={Customer:customer_engine, Fry:purchase_engine}) ), but, so far as I know, it won't actually use different databases when doing normal object-relation querying. In example:

## this does not work; SQLAlchemy tries to find the 'fry' table in the customer_db rows = session.query(Customer).join((Fry, Customer.id==Fry.customer_idj)).all() ## by being explicit in our query (using raw SQL), we can remedy the db ambiguity rows = session.query(Customer).from_statement(\ "select * from customer_db.customer as cust right join purchase_db.fry as fry on cust.id==fry.customer_id").all()

The downside here is that, although the 'from_statement' function returns a Query object, it is quite limited. Most of the methods usually available -- 'distinct()', 'count()', 'join()', et al -- are not. Pretty much the only available method is pulling all the rows down via 'all()', then manipulating them in Python. If you keep your SQL clever, this ought not to be a problem, and each individual db can still be manipulated with all of the expected ORM goodness.

One more thing: according to some research I dug up when looking into this, this trick will only work for databases that don't enforce database-level integrity. This includes MySQL+MyISAM and sqlite, but not MySQL+InnoDB or PostgreSQL. I haven't tested anything but MySQL+MyISAM myself, and the above code works there.

0 notes

temporaryneckache · 15 years ago

Text

East of Eden

Today marks my seventh day back in New York. I arrived last Wednesday evening to a city filled with warm autumnal weather, requiring no more dress than a t-shirt and shorts. By Friday that balmy paradise had transformed into the first hintings of Winter, with nighttime temperatures reaching the upper 30s. This was particularly relevant during Critical Mass on Friday night, where a friend of mine, dressed as Robin, had no more protection than tights, a t-shirt, and a cape. How ever did those protectors of Gotham survive the winter? Commissioner Gordon certainly always sported a heavy trenchcoat. # The Job Today also marks my fourth day of work, and the third day in a row where I have started my commute to work over the Williamsburg Bridge at sundown. The primary factor facilitating my visit to the city is some contract programming work for an affiliate marketing company a friend works for. It's a very small, non-traditional company, where stimulant-fueled night-long work sessions are by far the norm. Thus, arriving at the office at around 5 or 6 in the evening -- a result of getting home at maybe 8 or 9 in the morning -- is by no means out of bounds. The only window into the office is the door, which has a heavy shade on it, so natural indicators of time are virtually absent. I'm sure this will be interesting for the two weeks I'm here, but it's likely not something I could continue into perpetuity. # The Living Quarters The sublet I've been staying at is a first-floor apartment in a factory-turned-loft in South Williamsburg, five blocks from the bicycle entrance to the Williamsburg Bridge. The rent isn't bad (by NYC standards, of course), the roommates are chill, and there are two adorable resident cats, one of whom loves sleeping on my lofted bed. The building butts up against its neighbors rather close, and thus, although the apartment is windowed, almost no ambient light gets through; certainly nothing even remotely resembling direct sunlight. As a result, the weak light that makes it looks the same morning, noon, and evening. This further facilitates the sense of timelessness in my current schedule; the ambient light when I go to sleep in the morning is indistinguishable from the light in the afternoon when I wake up. So from cave to cave, opportunities for synchronizing my internal sense of time are less than forthcoming. # The Transportation New York is most engaging when traveled by bike, and my good friend Damian has been kind enough to offer the use of his spare cruiser. It's an old gaspipe 3-speed by an manufacturer called Hercules. It's heavy, a bit too small for me, rattles menacingly when hitting bumps, and the front wheel has about an inch of side-side play, but it has two functional brakes and a relatively comfortable seat, and so all of the former attributes can be overlooked in a flat city like New York. My ride to the office starts with an out-of-saddle climb from Kent street to the summit of the Williamsburg bridge bike path, and then I spin down the other side, onto Delancey, and up 3rd Avenue to midtown. (Unfortunately, the new bike lanes on 1st and 2nd Avenue are somewhat miswrought -- primarily because of the sudden uncontrolled turns that cars make across them -- and so I find the exposed interaction on 3rd Ave much preferable.) I've found a certain satisfaction in careening around on a bicycle that provides no space for vanity. My internal narrative imagines that this sort of thing broadens any spectators' appreciation of how useful a bicycle can be in a city; but more likely, I'm come across more as a reckless jackass startling unwitting pedestrians out of staring at their iPhones as they cross the street. That's if anybody is actually paying attention, of course.

0 notes

temporaryneckache · 15 years ago

Text

Secure port forwarding with ConnectBot on Android

Android is a pretty hot hacker platform. You can pretty much do whatever you want with an Android device, even to the point of bricking it (be careful with those 3rd-party kernels!). I recently gave in to my irresponsible streak and shelled out for a used Nexus One. The first thing I did was gleefully root it and flash the CyanogenMod-flavored kernel that everybody's been raving about. Supposedly, CM runs slicker and faster, and is rumored to even provide better reception, on the N1. The custom-kernel thing really excites me; it sounds like the Cyanogen team are going to add in a driver to turn on Wireless-N on the Broadcom chip in the N1 in an upcoming release. That same Broadcom chip also includes an FM receiver/transmitter just waiting to be turned on. I'm not a kernel hacker (I may be some day, but I'm certainly not starting with a mobile device). But I am a hacker, and so the whole portable-console idea of Android gets me all flustered. Yeah, Google Apps integration is great. Yeah, there are lots of cool 3rd-party apps to look at the SD card, sync with Dropbox, remote-control your torrent downloads, add as a drop-in replacement browser, etc. But the really cool thing that I just got going is doing SSH tunneling with an app called ConnectBot. SSH tunneling is an amazing feature. With it, you can add transport-layer security (TLS) to any program where you have a server-side SSH account. This comes in quite handy for anything where passwords are sent in clear-text (i.e., via HTTP-Auth). When using a device over the air (e.g., any and all smartphones) you should always employ some form of encryption, be it password-level or session-level. For running on a mobile device, ConnectBot is a very impressive SSH client. It facilitates SSH shell sessions, local/remote SSH port forwarding, shell-less SSH sessions (for port-forwarding only), and public key management. It also makes good progress in overcoming the ridiculous barrier of using a touchscreen keyboard to command a UNIX shell. I'm going to walk through the steps required to setup port forwarding with ConnectBot. I'm going to assume here that I've got SSH access on a box running a webserver on port 80. By the end, we will have our own home-rolled http+TLS. + Download ConnectBot from the Android market (it's free). + Launch ConnectBot. It'll give you a nice little overview of the features, i.e., how to use the Ctrl key. + Enter an username@server in the bottom text box. + ConnectBot will initiate the connection. As this is very likely a key new to your phone, ConnectBot will ask you if you want to continue connecting (anybody who's SSH'd into a box for the first time has seen this). Select "Yes". (On the N1, the onscreen keyboard stays up and hides the dialog box at the bottom of the screen. Hold down the menu softkey at the bottom until the keyboard disappears, then select "Yes".) + Enter your password. You've now got a live connection to the server! + Tap the Menu key. Select the "Port Forwards" option. Tap the Menu key again and select "Add port forward". + Ok, you're now at the point where you can set up the forward. ConnectBot gives the option of local forwards (equivalent to the "-L" ssh command-line flag) and remote forwards (equivalent to "-R"). I always use local forwarding for this sort of thing, but YMMV (your method may vary). Enter the "Source port", i.e., which point you want to connect to on your local device, and the "Destination", where you want to connect to on the destination network. For a webserver running on the same box we're connecting to, we'll use these values: > Source port: 8080 > Destination: localhost:80 What this means is that we're going to connect to "localhost:8080" in our browser, and that will tunnel a connection to the "localhost" on the remote end (the server we're connected to) on port 80 (the standard port reserved for HTTP). + Tap "Create port forward". And that's it! You can now load your browser of choice, type "localhost:8080" into the location, and voila, you have a TLS-enabled connection to the remote server! Now, of course you're not going to be using this much for remote web browsing, as you likely don't have SSH accounts on all of your favorite web servers. But you can definitely use this for any sort of web interface that you might have on a box at home or at work. ## EXTRAS ### Desktop shortcut ConnectBot lets you add shortcuts directly to server profiles. Tap and hold on the Android desktop as if you were going to add an Application shortcut. Select Shortcuts, then select ConnectBot. So friggin cool. ### Headless connections There may be times when you don't want to deal with the (minimal) overhead of a shell running on the remote machine, and just want the secure ports. Well, worry not, ConnectBot in its infinite spiffyness has provided for just such a thing. From ConnectBot's server-list screen, tap and hold on a server, then select "Edit host" from the popup menu. This gives you access to a list of options that you didn't see the first time around. Scroll down to "Start shell session" and uncheck it. There, a shell-less forward-only connection. ### Public keys Passwords are a pain on an on-screen keyboard. Public/private key pairs are the way to go, fo sho. To set up a public key, navigate to ConnecBot's server list screen, tap the menu button, and select "Manage Pubkeys". Tap the Menu button again, and either Generate or Import a key of your choice. If you generate a key, you'll need to get the public part of the key to the ~/.ssh/authorized_keys file on your server. Generate the key, do the fun entropy-generating game, then, when you've returned once again to the list of public keys, tap and hold on the new key and select "Copy public key". You've now got that data on the clipboard, and you can get it to the server however you want (e.g., via email). **I strongly encourage you to add some sort of (at least easy) password on this key, e.g., "123" or "nnn" or some other such nonsense.** Just as the eyes are the windows to the soul, with a public-private key pair installed thus, your smartphone is now a window to your entire computer. Moniti estis.

0 notes