goconspire-blog - Tumblr blog

goconspire-blog · 10 years ago

Text

It's not the size of your professional network, it's how you use it

There are are a couple of things that most people agree on: (1) A professional’s network is one of the most important assets in their career. (2) In order to be helpful, a relationship—professional or otherwise—must be based on actual affinity between two people.

Where does that leave us? We need to build a strong professional network by cultivating many relationships. For most of us, it feels like we’re never doing enough.

I’m here to argue that the focus on expanding our networks is misplaced. The focus should be on accessing our existing networks.

The theory of weak ties

The seminal paper The Strength of Weak Ties by Mark Granovetter teaches us important lessons. This is a quick (and incomplete) summary.

Our strong ties, i.e. our closest relationships, cluster in small groups of people that all know each other well. Consequently, it is our weak ties that create bridges to other social groups and give us access to a much broader world.

As an example, the paper presents survey data on job seekers. For people who found a new job through one of their contacts, 56% interacted with their helpful contact only occasionally and 28% interacted with their helpful contact only rarely. So the vast majority of these new jobs came via weak ties—the people we have sporadic contact with.

The implication is that the power of our professional networks come from weak ties—for job seeking, recruiting, sales, business development, reference checking, fundraising, finding mentors, etc.

The issue with weak ties

Accessing those weak ties is where it gets difficult. By their very nature, our weak ties are to people we don’t interact with often and don’t know well.

From the paper: “Chance meetings or mutual friends operated to reactivate [weak] ties. It is remarkable that people receive crucial information from individuals whose very existence they have forgotten.”

This is a problem. Our weak ties have information or connections that could greatly help us (and vice versa), but we have no idea which ones are relevant right now. We’re missing opportunities because we don’t know who to talk to. This is a problem of accessing our networks, not the size of our networks.

Do we have to leave it to chance meetings?

Create serendipity

We can do better. We can make serendipity happen. As more of our lives leave digital breadcrumbs, the opportunity grows for software to help us with the most human of all things—personal relationships.

Conspire is a piece of that puzzle. It’s a professional network that tells you the best way to get an introduction to any person you want to meet. By analyzing email communication patterns, our network understands how well people know each other. If you want to talk to Mark Cuban about your new company or ask a Dropbox employee what it’s like to work there, we find the strongest path of connections in your professional network to get you there.

Connecting with the right person in your network isn’t left up to a chance encounter. Conspire tells you exactly who to ask. In many cases this involves weak ties. These are people we wouldn’t send a Christmas card. But with a specific request that we know they can help with, engaging our networks is easy.

Conspire gives access to the strength of weak ties. And it’s a two-way street where my weak ties also know when to ask me for help.

What it means

Expanding the size of our professional network is helpful, but we get far more mileage out of engaging our existing networks intelligently.

For those of us who aren’t natural networkers, this takes some pressure off. In fact, we don’t need to do the type of networking that feels like a chore. Engage with people that share one of your interests. It is so much easier to form a relationship when you are comfortable and authentic. If you do that, the size of your network won’t be a problem. You just need to access it.

0 notes

goconspire-blog · 10 years ago

Text

If you have startup stock options, check your option plan

Startup stock options are complicated. Do you know what happens to your options if your company gets acquired? You might not get what you think you should. I’ll go over one big issue that you can check on right now.

We’re going to check whether you get full vesting acceleration if your company gets acquired *AND* the acquirer does not assume the company’s option plan. WTF does that mean?

(This is related to but different than single-trigger and double-trigger acceleration, which you may have read about. I’m not covering those here. See Fred Wilson and Brad Feld for primers on vesting.)

Your Startup Succeeds

Let’s say you’re an early employee at a startup and you get a grant for options representing 1% of the company with 4 year vesting. Two years after you accept this offer, the company gets acquired for tens of millions of dollars without taking any new funding. This is great! You can exercise your options for a tiny amount, and they are worth a lot.

Since you’ve been there for 2 years, half of your options are vested. No one can take those away from you. But what happens to the other half?

The acquirer—your new boss—comes to you and says that the remaining half of your options are gone. If you want to keep your job, you have to sign a new equity agreement which won’t make you nearly as much money and comes with a new 4 year vesting schedule that starts today. Can they do that?

Maybe.

Your company’s option plan

The answer depends on your company’s option plan—most often called an equity incentive plan. Your company has one of these to govern all options granted to employees. If you don’t have a copy, get one from whoever handles HR at your company.

Here’s what you’re looking for. From Brad Feld and Jason Mendelson’s Venture Deals in the section “Assumption of Options”: “Most contemporary option plans have provisions whereby all granted options fully vest immediately prior to an acquisition should the plan and/or options underneath the plan not be assumed by the buyer.” An example of what this provision looks like is at the bottom of this post.

This standard provision is good for you. It means that the acquirer has a few options: (1) keep your same option agreement in place—same exercise price, same number of shares, same vesting, (2) replace your option agreement using the acquirer’s stock in a way that is no worse to you in terms of total payout and vesting schedule, or (3) let you exercise all your vested and unvested options now.

In short, you’ll be in at least as good a position after the acquisition as you were before it. (But note this doesn’t mean everything will be perfect. If the acquirer decides that you are no longer needed, they could keep your option agreement intact and terminate your employment. You wouldn’t get any further vesting unless you have single-trigger or double-trigger acceleration, and you’d be out of a job. It would be the same as if you’d been fired by your company before the acquisition.)

Times are changing

The problem is that some acquirers and investors are trying to change this. They want option plans to say that there is no vesting acceleration regardless of whether the acquirer assumes the option plan. This means that the acquirer can simply cancel any unvested options and offer you whatever equity compensation they want.

A few Bay Area serial acquirers are pushing for this. It allows them to structure retention packages for their new employees in any way they see fit. Some investors also like it because cancelled options mean they get more of the payout. (To be fair, all shareholders including founders get more of the payout.)

PERSONAL OPINION ALERT: This new trend is BS. Employee and company agreed to a deal. That deal is being pulled away by the acquirer in a way that the employee probably never knew possible. Most employees have never even seen their option plan. As a founder, I’d feel incredibly sleazy about having this type of landmine hidden away in the option plan. By keeping to the standard option plan language, acquirers have a much harder time forcing bad deals down their new employees’ throats.

The situation that is most likely to lead to problems is if the acquirer wants to keep an employee but at reduced equity compensation. Full acceleration means the acquirer can't force a much less lucrative deal on the employee by saying they'll be out of a job if they don't accept the new deal. It instead forces the acquirer to honor the existing option agreement if they believe the employee is valuable. It changes the starting point of the negotiation between employee and their new boss.

Conclusion

Your option plan should give you full acceleration if an acquirer does not assume the company’s option plan (or replace it with an equivalent plan). If you don’t have that, it is worth thinking through how it could be used against you. This is especially true if you are likely to have a lot of unvested options when your company gets acquired.

Sample provision regarding assumption of option plan. This is a long, complicated provision. The bolded language in the second paragraph is the key.

Merger or Change in Control. In the event of a merger or Change in Control, each outstanding Award will be treated as the Administrator determines (subject to the provisions of the following paragraph) without a Participant’s consent, including, without limitation, that (i) Awards will be assumed, or substantially equivalent Awards will be substituted, by the acquiring or succeeding corporation (or an affiliate thereof) with appropriate adjustments as to the number and kind of shares and prices; (ii) upon written notice to a Participant, that the Participant’s Awards will terminate upon or immediately prior to the consummation of such merger or Change in Control; (iii) outstanding Awards will vest and become exercisable, realizable, or payable, or restrictions applicable to an Award will lapse, in whole or in part prior to or upon consummation of such merger or Change in Control, and, to the extent the Administrator determines, terminate upon or immediately prior to the effectiveness of such merger or Change in Control; (iv) (A) the termination of an Award in exchange for an amount of cash and/or property, if any, equal to the amount that would have been attained upon the exercise of such Award or realization of the Participant’s rights as of the date of the occurrence of the transaction (and, for the avoidance of doubt, if as of the date of the occurrence of the transaction the Administrator determines in good faith that no amount would have been attained upon the exercise of such Award or realization of the Participant’s rights, then such Award may be terminated by the Company without payment), or (B) the replacement of such Award with other rights or property selected by the Administrator in its sole discretion; or (v) any combination of the foregoing. In taking any of the actions permitted under this subsection 13(c), the Administrator will not be obligated to treat all Awards, all Awards held by a Participant, or all Awards of the same type, similarly. In the event that the successor corporation does not assume or substitute for the Award (or portion thereof), the Participant will fully vest in and have the right to exercise all of his or her outstanding Options and Stock Appreciation Rights, including Shares as to which such Awards would not otherwise be vested or exercisable, all restrictions on Restricted Stock and Restricted Stock Units will lapse, and, with respect to Awards with performance-based vesting, all performance goals or other vesting criteria will be deemed achieved at one hundred percent (100%) of target levels and all other terms and conditions met. In addition, if an Option or Stock Appreciation Right is not assumed or substituted in the event of a merger or Change in Control, the Administrator will notify the Participant in writing or electronically that the Option or Stock Appreciation Right will be exercisable for a period of time determined by the Administrator in its sole discretion, and the Option or Stock Appreciation Right will terminate upon the expiration of such period. For the purposes of this subsection 13(c), an Award will be considered assumed if, following the merger or Change in Control, the Award confers the right to purchase or receive, for each Share subject to the Award immediately prior to the merger or Change in Control, the consideration (whether stock, cash, or other securities or property) received in the merger or Change in Control by holders of Common Stock for each Share held on the effective date of the transaction (and if holders were offered a choice of consideration, the type of consideration chosen by the holders of a majority of the outstanding Shares); provided, however, that if such consideration received in the merger or Change in Control is not solely common stock of the successor corporation or its Parent, the Administrator may, with the consent of the successor corporation, provide for the consideration to be received upon the exercise of an Option or Stock Appreciation Right or upon the payout of a Restricted Stock Unit, for each Share subject to such Award, to be solely common stock of the successor corporation or its Parent equal in fair market value to the per share consideration received by holders of Common Stock in the merger or Change in Control. Notwithstanding anything in this Section 13(c) to the contrary, an Award that vests, is earned or paid-out upon the satisfaction of one or more performance goals will not be considered assumed if the Company or its successor modifies any of such performance goals without the Participant’s consent; provided, however, a modification to such performance goals only to reflect the successor corporation’s post-Change in Control corporate structure will not be deemed to invalidate an otherwise valid Award assumption.

6 notes · View notes

goconspire-blog · 10 years ago

Text

Where do I find my IMAP settings?

Here's a quick guide to find them based on which email provider/client you use.

If you have a Microsoft Office 365 account:

Email server host: outlook.office365.com

Username: your full email address

Password: your password

If that doesn’t work, fine more information here

If you use Microsoft Exchange (other than Office 365):

Follow the instructions here

If you access your email account with Apple Mail:

Select from the menu bar “Mail” -> “Preferences”

Select the “Accounts” tab

There you will see “Incoming Mail Server” and “User Name”

If you access your email account with Outlook 2007-2013:

Follow the instructions here

If you access your email account on an iPhone:

Open up the settings app

Tap "Mail, Contacts, Calendars" and select your IMAP email account

Select the account again and you will the relevant information in the “Incoming Mail Server” section

1 note · View note

goconspire-blog · 10 years ago

Text

Conspire is now for everyone

It’s been quite a year for Conspire. We rolled out to beta communities a year ago, raised funding in August, launched publicly in October and built up a network of over 37 million people.

It’s our community of users and the network they build that makes Conspire work. We are very thankful to everyone who has helped just by becoming part of the network. Thank you!

Today we are excited to open up the network to more people—anyone that uses an email account that supports the common IMAP protocol. Conspire isn’t just for Gmail and Google Apps email users anymore. Please let your friends and colleagues know. It will make your network more powerful as well as theirs.

I also want to highlight a few new things we’re excited about.

The profiles for all 37 million people in the network are now editable by the community. Every user can update photos, work history, and links to Twitter handles and other social profiles. This is a giant, shared repository for the community that helps everyone find and keep track of the people in their extended network.

The UX of the entire product got a major upgrade. Please take it for a spin and let us know what you think.

If you’re reading this, it means you are part of the Conspire community. I’d love to hear from you about how we can get better. Contact me any time at [email protected] or @alexdevkar.

0 notes

goconspire-blog · 10 years ago

Text

Conspire named to the OnCloud 50 Companies to Watch

We're excited to included in such great company. See the full list here.

We are happy customers/users of a few of the other companies, including CircleCI, CoreOs, Zapier. Congrats to all!

0 notes

goconspire-blog · 10 years ago

Text

The Hardest Part of Startup MVPs

The lean startup and minimum viable product (MVP) methodology is a powerful way to get your startup off the ground. But it isn’t easy. These are the two biggest challenges I had.

We’re not really testing hypotheses

The canonical example of an MVP test is putting up a mock landing page for a non-existent product to see if lots of people click ‘purchase’ or sign up for a mailing list. This will give us the answer about whether there is demand for the product before we build it.

What does this experiment really test? Sure, there is some weak signal about whether people want the product. But primarily what we’re testing is our ability to drive traffic to a landing page and entice people to give up their email addresses. It’s an important skill, and some people are very good at it irrespective of the product.

The point is that even the simplest examples of MVPs are not well-defined, scientific tests. We build an MVP and hope people love it. Invariably the feedback and data will be open to interpretation. There won’t be any statistical conclusions about whether a hypothesis is true or not.

An MVP is more accurately described as an unstructured search for feedback. It is a process that is supplemented by that feedback but driven by our intuition at every turn. There’s nothing wrong with that. We just need to recognize that we’re not going to get clear answers.

We misinterpret and overweigh feedback

We build a feature, test it with a few hundred people, and end up with a set of feedback. Now what?

Interpreting feedback is more art than science. Imagine you get a mix of comments like these:

“Perfect.” - person who immediately becomes hooked

“Great idea. Love it!” - person who never uses it again

“I could see other people wanting this.” - person who uses it occasionally

“It would be better if it did X also” - person who rarely uses it

“I wasn’t sure what to do” - person who never uses it

It’s easy to see whatever you want in this feedback. If our intuition was to add X, Y and Z to improve the product, we’ll see justification for it. Is that the right next step? Maybe, maybe not.

Written and verbal feedback is misleading. Humans are bad at identifying precisely why they use or buy something. (See conjoint analysis studies that identify the hidden reasons people pick products that they don’t even realize.) Beware adding features because some people said they’d use the product if you did.

Usage data — week-after-week, month-after-month—from happy users is the only thing we can trust. Other kinds of feedback are prone to misinterpretation.

The MVP methodology is aimed at learning quickly. It does just that, and I wouldn’t attempt the early stages of a startup without it. The challenge is that what we “learn” at each step isn’t obvious and, in fact, might be wrong. We have to continuously reflect and trust our intuition.

0 notes

goconspire-blog · 10 years ago

Text

What makes the new $150 million Techstars fund different

The folks at Techstars announced their latest fund today. See The WSJ and TechCrunch (among others) for coverage.

We at Conspire are thrilled to be one of the first investments. Techstars is a unique organization and this fund, as a part of the Techstars ecosystem, is no exception.

What sets the Techstars fund apart from a typical VC fund is that it is tied into the Techstars network. Techstars, at its core, is a network of knowledgeable startup and technology people—entrepreneurs, operators, and investors. And that network is evolving at an incredible pace. Just this month they announced Techstars Berlin.

As a company that focuses on professional networks, we are particularly interested in being part of Techstars. But a strong network is vital for any startup. A strong network helps you assemble the information and resources you need to be successful. This new fund is a piece of the puzzle that makes the Techstars network so powerful.

The challenge will be maintaining a supportive and cohesive network as Techstars expands in size and scope. They’ve been up to the task so far because of the people involved. The “give first” mantra is something we see from all the Techstars fund partners.

David Cohen identifies gigantic potential when only a buggy demo exists.

Mark Solon knows what it takes to go from promising startup to meaningful company.

Nicole Glaros cuts straight to what matters and keeps you focused.

Ari Newman breaks down how you get from point A to point B into actionable steps.

Jason Seats points out the flaws in your thinking and makes you laugh at yourself. (Then he may or may not help you find a solution.)

Congrats to them and the entire Techstars family!

0 notes

goconspire-blog · 11 years ago

Text

How much traffic does a TechCrunch post generate?

I didn’t know what to expect. And I couldn’t find more than a few (outdated) pieces of data from other people. Here’s how it went for us.

The TechCrunch effect

A TechCrunch post sets in motion several things that bring you traffic: other publications doing follow-on coverage, translations, social shares, and your site being posted to aggregators like HackerNews and Reddit.

The significant sources of traffic for us were:

The TechCrunch post itself

Posts in other publications that saw the TechCrunch post and took different angles, including articles in Portugal, Turkey and China

Appearances by both the TechCrunch article and our site on the HackerNews front-page

Social shares of all of the above via Twitter, LinkedIn and Facebook - the most shared was the TechCrunch post (511 tweets, 509 LinkedIn shares, and 492 Facebook likes)

I’m going to take an expansive view of the TechCrunch effect and count all of that in the numbers below.

In the six days since the TechCrunch post went live, Google Analytics shows just over 25k user sessions that we credit to the TechCrunch effect. Note that the post went live on a Friday. Our highest traffic day was the following Monday.

This is, of course, just one data point. Many factors could make the traffic numbers higher or lower, including day and time the post goes live, quality of the coverage, how appealing your product/company is to a general tech audience, how controversial you are, and whether you actively push other publications to cover you at the same time.

What does it all mean?

TechCrunch can drive a lot of traffic. But traffic in and of itself is meaningless. The important part is how it serves your company’s goals. If your goal is investor interest, you might want to highlight the grand vision on your landing page and in how you brief the reporter. If your goal is sales leads, you might want to highlight customer successes and optimize your squeeze page.

Our goal was to sign up early adopters and grow the size of our network. This means that the value of the traffic depended on our landing page conversion rate and viral sharing rate by new users. We were happy with the results—the reach of the Conspire network expanded by over 10 million people in 6 days. And we were meticulous about collecting data to improve conversion and viral sharing rates. Next time we’ll try to be even better.

1 note · View note

goconspire-blog · 11 years ago

Text

Why we aren't using the Gmail API (yet)

Update: A Google engineer reached out to us and explained some of the finer points of the API's quota and speed limits. With his help, many of the complaints in this post are no longer applicable. We have now been able to reliably achieve 11,000 messages per minute with the Gmail API.

When Google announced the new Gmail API in June, we were excited to switch. The new API promised to fix our biggest problem with IMAP: the lack of read-only permissions. Conspire takes privacy very seriously but IMAP does not allow us to request the limited access we need, we instead must request both read and write access from Google during our OAuth flow. The Gmail API lets us request read-only access.

At Conspire, we’ve expended significant engineering time on our IMAP integration. Currently we use JavaMail, a crusty old API that leaves much to be desired. Fully 20% of our IMAP code is error handling, dealing not only with the quirks of IMAP but also Google’s own idiosyncrasies in their implementation of IMAP. Everything from connection timeouts to the sole error that causes Google to return HTML instead of JSON in the response body. We’ve also spent a lot of time keeping it fast—currently we can process between 10 and 20 thousand messages per minute using IMAP.

Within a week of Google’s announcement we began prototyping a new integration using the Gmail API. There is no question that the new API is cleaner and simpler than IMAP (few APIs are hairier than JavaMail). The decision was made to replace our existing IMAP integration with the Gmail API.

Then we started looking closer at the interplay between our use case and the API’s two major limits.

Google limits the number of requests per user through their concept of quota units

The Gmail API is amazingly, hilariously slow

The first limit is workable. Google allows API consumers to request higher limits and certainly they need to prevent abuse of their API. The limits were high enough to for us even though we wouldn’t be able to process messages quite as fast as IMAP.

The second limit poses is an insurmountable problem. As mentioned earlier, we can process messages at between 10 and 20 thousand messages per minute per user. Any replacement at least needs to get close to this. We’d be willing to sacrifice some speed in exchange for more granular privacy settings but we still need to be able to function.

Our initial integration was written in Ruby using Google’s API gem. We followed Google’s performance tips—batching, partial responses—and reached between three and four thousand messages per minute. Their rate limiting was not consistent and allowed for some bursting. In any case, this was less than half our current throughput.

But Ruby can be slow, so we rewrote the prototype in Scala in order to compare like JVM implementations.

Once again, we followed Google’s recommended best practices. We requested messages in batches of 100 and only asked for the message fields we required.

After spending a day tweaking and trying to make sure we weren’t doing anything foolish in our integration, we arrived at a max throughput of five thousand messages per minute per user, and that only during a burst. Taking into account usage limits, we could typically request 1000-1500 messages before being blocked for a significant (i.e., >1 minute) amount of time, lowering our real throughput even further.

That was June, when the API debuted. Running our tests again today shows improved speeds. We can reach roughly 3000 messages per minute today during a burst with the real throughput significantly lowered due to rate limiting.

Maybe three thousand messages per minute against 10-20 thousand per minute under our current integration. Is the Gmail API always destined to be the slower cousin of IMAP or will Google give it some horsepower? Our tests show that Google has increased the burst limits of the API which is likely fine for many use cases.

We recognize that Conspire’s use case may not fall within the scope of this API. In an ideal world, we would be able to request read-only access to only message headers. Google appears to be targeting developers who only need access to a specific, well-defined subset of a user’s messages, hence the support for threading and search in the API, two of the biggest features missing from IMAP. Unfortunately, the API today would require, we think, too much of a compromise on performance, especially for new users as it would take us several hours to discover their connections.

We sincerely hope that Google improves the API. We want to use it and we are keeping an eye on its performance. Once it gets faster, we will switch.

1 note · View note

goconspire-blog · 11 years ago

Text

Forget LinkedIn, Conspire Analyses Email To Be Your Next Networking Tool

[Reposted from TechCrunch. Read the original article.]

LinkedIn — now with over 300 million users — has become the go-to platform for networking in the business world. Now, a new startup called Conspire is hoping to dig into that marketshare (and mindshare) with a new approach for how to meet people. It taps into your communications platforms — starting first with email and those who use Gmail or Google Apps email — and then uses big data analytics to find the most reliable chain of connections between you and the person you want to meet.

“Reliable” becomes the operative word here: the chain is based not on whom you know, but how well you know them, analysing things like frequency of contact, how recently you have interacted, the response time, and relationship length.

Then, the logic goes, when the chain of introductions get created to connect you through to someone, you are travelling on far more secure footing — with resulting mapped paths marked as “strong” if they are really reliable. The maximum number of “hops” in the path are three — meaning two people between you and your target.

Here’s how a link might look:

Conspire, which is currently free to use, does all this with privacy in mind: it does not alter your email when it analyses it; it doesn’t store your email, it doesn’t send messages on your behalf, and it doesn’t look at the body of the message, just the header.

The relationship graph is then created and updated in the background after you sign up for the service — meaning it’s always there for you to tap. “We see a network as something that is constantly evolving and changing and we want to update that for you,” founder Alex Devkar says. It also lets you keep tabs on your connections, letting you explore your volume of email, average response times, most frequent contacts, and a list of people you’re losing touch with.

Conspire has been quietly growing for the last several months and says that it now has some 13 million people placed in its analytical web from a user base that is far smaller than that — under 10,000. It’s also now raised a seed round — $2.5 million from David Cohen and the Techstars incubator that he runs out of Boulder, Colorado, where Conspire is also based. The funding will be used to build out Conspire’s team and product.

The challenge that Conspire is tackling is an all-too-familiar one for those of us who use LinkedIn.

Although LinkedIn indicates to us the path of people that connect us to others — including details such as whether we are linked by one, two, three or more degrees of separation — the problem is actually in LinkedIn itself.

It’s become very big, and can be used very loosely by people, so much so that unless you are really on top of your LinkedIn game, or very conservative with how you accept and make connections, you will likely have a lot of “connections” in there whom you have never met, spoken to or corresponded with. With invitations to connect on LinkedIn all too easy to send and accept, you may not even know who some of the people are in your LinkedIn network, or even how they got there. That makes using those contacts as a networking bridge tenuous at best.

“We want to understand who knows each other but also how strong those connections are,” Devkar says. “Looking through email, we can see the length of an interaction and the frequency.”

Or as Cohen describes the predicament, “Someone you met for 5 minutes at a conference is very different than a colleague you’ve worked with every day for 5 years. However, existing tools give the same weight to these connections and require you to manually maintain your network.”

Devkar says that on average, a person can have had as many as 3,500 contacts pass through their inboxes, but of those only around 1,500-2,000 have had any kind of interaction or reaction from the user in question.

Yet part of that may have to do with something else: Email is the bane of many people’s existence and some have gone so far as to stop using it or to use it as little as possible. Conspire understands that, too, and so it plans to incorporate other communication platforms (and data troves) like voice, text messages, Facebook and Twitter into its analysis. Twitter is likely to be the next network added. “These are all places where you can get valuable insights,” Devkar says. He adds that they would love to use LinkedIn “but they don’t make their user readily accessible.” (Unsurprisingly.)

Conspire’s beta-testing period is an encouraging sign of the platform’s effectively. Launched within Techstars, Conspire turned to this and other VC and startup networks as early users, specifically to put founders in touch with potential backers and partners. The connections made there, the company says, led to funding — from seed investments through to $15 million rounds — as well as business partnerships.

Right now, the frequency of “strong” paths between current users and networking targets is around 54%.

“It may not be a reliable way to find a path to President Obama or Scarlett Johansson today, but in the tech community we are getting much higher rates,” Devkar says.

He tells me that he envisions Conspiring tapping into lots of other verticals as it grows, and with more users comes stronger network effects.

0 notes

goconspire-blog · 11 years ago

Text

Two-Factor Auth on EC2 with Public Key and Google Authenticator

Public key authentication is a powerful authentication mechanism, but it presents a problem when the device a user or employee uses to connect to the protected machine gets compromised or stolen. Adding a second factor mitigates the problem by requiring information not stored on the compromised device.

This post shows you how to set up Google Authenticator as a second factor on a vanilla EC2 instance. When you're done, logging in remotely will require a user to specify the identity (private key) file and then type a fresh authentication code read from the screen of her Google Authenticator device (e.g. smartphone).

Warning: If this process goes awry, you could find yourself unable to SSH into your EC2 instance. Test it out on a fresh instance first, or make sure you have another way in if something goes wrong so you can patch your box up.

The first step is to clone the Google Authenticator source:

git clone https://code.google.com/p/google-authenticator/

Next, install dependencies and build the PAM module:

cd google-authenticator/libpam/ sudo yum install make gcc pam-devel make sudo make install

Next, to install the PAM module, edit your /etc/pam.d/sshd file so the first lines look like this:

#%PAM-1.0 auth required pam_sepermit.so auth required pam_google_authenticator.so # auth substack password-auth account required pam_nologin.so ...

Note that the pam_google_authenticator.so module has been added as the second line and the password-auth rule commented out below. The password-auth rule is removed because we just want to require a key plus a Google Authenticator code, not a password.

Now, update your sshd configuration in /etc/ssh/sshd_config. First, change the ChallengeResponseAuthentication line to read:

ChallengeResponseAuthentication yes

Second, add the following line at the end of sshd_config:

AuthenticationMethods publickey,keyboard-interactive

This second line tells sshd you want to use your public key and two-factor auth.

Next, configure the user(s) for which you want to require two-factor authentication by su-ing to the user and running:

google-authenticator

That will walk you through the per-user configuration and, early in the process, spit out some info like the following:

https://www.google.com/chart?... Your new secret key is: ***** Your verification code is ***** Your emergency scratch codes are: ***** ***** ***** ***** *****

The first line is a link to a QR code you can use to configure the user's Google Authenticator automatically. Alternatively, you can use the secret key to configure Authenticator automatically.

Finally, restart sshd to apply your changes. If the user runs cron jobs, be sure to restart crond too, or you may have authentication issues:

sudo /etc/init.d/sshd restart sudo /etc/init.d/crond restart

Next time the user logs in, she'll be prompted for her current Google Authenticator token from her mobile device:

$ ssh -i key.pem mybox Authenticated with partial success. Verification code: Last login: Thu Sep 11 03:11:25 2014 from ... __| __|_ ) _| ( / Amazon Linux AMI ___|\___|___|

Conspire keeps users' data secure. Asking our ops team to pull out their iPhone each time they log into a production resource is a small price to pay for this additional layer of protection in the event an employee's machine is compromised.

0 notes

goconspire-blog · 11 years ago

Text

Coding the Movies (Don't Fake It)

When David Fincher hired me to build software effects sequences—animations of code and computer interfaces—for Zuckerberg’s and others’ monitors in The Social Network, he threw hackers everywhere a bone we’d been slavering over for a long time.

At one point during the project a set decorator asked me, “Why did they hire you?” After the ensuing moment of crushing self-doubt passed, I answered, “So it doesn’t end up like Swordfish.” Sullen, he confessed he’d worked on the movie.

Movie code has been in the news lately thanks to moviecode.tumblr.com and those reporting on it. Earlier this year, Wired gave a shout-out to The Social Network and another project I worked on with Fincher’s team, The Girl With the Dragon Tattoo:

[D]irector David Fincher gets it right: When his characters—like Lisbeth Salander in The Girl with the Dragon Tattoo or Mark Zuckerberg in The Social Network—enter the digital realm, their coding language is consistent with what they’re trying to accomplish. Lisbeth at least knows SQL code, while Zuckerberg is using legitimate code that appears to have been created for the film.

Indeed, the code that appeared in The Social Network, in both text and deployed (i.e. webpage) form, was written just for the movie. But why?

Three reasons why you should consider real code for your movie

1. You’d be surprised how many people can spot a fake

This was what first got me excited about the Social Network project. Finally, someone was making a movie that would portray software realistically. The more people I talked to about my new project, the more I realized, it’s not just professional geeks who notice dumb software effects sequences. Other people who are going to call you out include:

Investment bankers who script Microsoft Excel

Designers and 3D artists who write ActionScript and MEL

High school cheerleaders who learn HTML in the computer lab

Any of the millions of untrained-but-still-very-skilled hobbyist programmers

Anyone who knows a software engineer (as you can see, we’re absurdly vocal about BS code in movies, hence this post)

These days, everyone codes.

2. When you use real code, you get real interfaces for free

When you build a set, you actually build it. You have a mason lay the tile and a carpenter put up the trim. You also have the carpenter hang the door on hinges, drill a hole and install a doorknob. Why?

In The Social Network, as in most movies with code, characters interacted with the deployed form of the software others were building/hacking, too. In one scene, Zuckerberg codes “Facemash.” In another, tipsy college students use the website to rank one another’s photos. Later, Mark and his colleagues write, in scene after scene, Facebook. Interleaved with those scenes are others where engrossed early adopters use the product. We used the same code for both sorts of scenes, on the one hand in text form and on the other running on a real web server.

It’s not only the Social Network that has this interplay between code and interface. Swordfish could have benefitted from a tighter (read: any) coupling between the hack and the system being hacked. Even when you don’t see source code, it’s usually obvious when a “webpage” has been slapped together in Photoshop versus coded in HTML and CSS.

So why do you put a real doorknob on the door? Because after sitting prettily in the background for awhile, it’s going to have to open for someone. You wouldn’t have a sculptor build you a fake door out of Fimo, so why would you ask a graphic designer to build your website, an animator to make it seem functional and a set decorator to scrounge the web for some bogus source code?

3. Real code keeps you honest

It’s hard to translate “[Protagonist] circumvents the firewall and hacks the mainframe” into an effects sequence without digging a lot deeper. What data is she accessing and how? What OS is she on? (Unix/Linux or maybe OS X. In any case, it should match her hardware.) Is she using a window manager and terminal window, or did she boot straight to the command line? (These days, probably the former.) Is she using ssh or telnet/s_client/etc. or coding her attack?

Instead of asking and answering these questions, filmmakers often resort to eye candy as a substitute. But eye candy isn’t what you see when you use a computer. And we all use computers. Slick 3D animation and sounds effects (why is it beeping??) yank us back out into our dirty, real theater seats and remind us that what’s unfolding on the screen is a fiction.

By coding and/or capturing real software, the filmmaker is forced to examine a technology story more closely, filling in the blanks inevitably left by the script. The result is a more convincing, immersive experience.

22 notes · View notes

goconspire-blog · 11 years ago

Text

How to get an effective email introduction

A well-executed introduction to whoever you want to meet—a potential customer, employer, employee or investor—can be the difference between success and failure. Here are the most important points to remember.

Give your friend an email she can forward. When you ask for an intro, remember that you’re asking your friend for a favor. Make it easy on her. Your friend should be able to forward your message on to your target without doing any work. If you do it right, she can do this from her phone.

In order to make this possible, your email needs to be self-contained. Your friend may know everything about you, but your target does not. Describe who you are and why you want to meet.

Keep it short. Busy people are drowning in email. You show respect for everyone’s time and have a better chance of success by getting to the point quickly.

From: Alex Founder To: Diana Friend Subject: Intro to John Investor

Hi Diana,

As you know, we’re raising a seed round and would love to talk to John Investor. Could you put us in touch? A brief description of what we do is at the bottom of this email.

Thanks,

Alex

My company (ExcitingCo) improves corporate wikis by letting employees pull important content out of email and into the appropriate wiki with one click. We ran a private beta with 100 companies over the last 3 months. We just opened it up to the public and are signing up paying customers now.

Diana can then forward it on with a recommendation.

From: Diana Friend To: John Investor Cc: Alex Founder Subject: Fwd: Intro to John Investor

John, please meet my friend Alex. His company is doing exciting things. I think you’ll be very interested.

Sent from my mobile device

On Mar 28, 2014, at 10:42 PM, Alex Founder wrote:

…

This is a great start to the introduction, but your work isn’t done.

Follow up immediately. As soon as your friend connects you to your target, you should follow up. It is your responsibility to drive the process. Thank your friend for making the intro and move her to bcc, so she doesn’t get spammed with the scheduling details.

Make a clear ask. Don’t make the next step for your target ambiguous by saying something like: “Let me know what you think.” Be direct.

From: Alex Founder To: John Investor Bcc: Diana Friend Subject: Intro to John Investor

Thanks, Diana! (to bcc)

John, great to meet you. Do you have availability for a 10 minute call next week? I’ll work around your schedule.

On Mar 28, 2014, at 10:42 PM, Diana Friend wrote:

…

Always be respectful, responsive and prepared. No matter what happens with your target, remain professional. Your friend vouched for you by making the introduction. You’re trading on her reputation as well as yours.

The multi-step dance Alex Founder went through above won’t fit every situation, but the tips will help you get effective introductions. If you want to know who the best person in your network to ask for an intro is, check out Conspire.

0 notes

goconspire-blog · 12 years ago

Text

Akka at Conspire [Part 5]: The Importance of Pulling

In our final post on our series about Akka, we’re going to cover a common pattern we used in building our backend: pulling. This pattern is not our creation, our work here is largely based upon work done by the Akka team (including the code itself). This post is intended to explain the motivation and benefits of this pattern and why we find it so useful at Conspire as well as why we think this pattern is a necessity in a clustered environment.

Pushing work is simple

Pushing work in Akka is very simple and is the right place to start. Don’t try to optimize before you’ve identified your bottlenecks. Pulling work should not be your default. We use pulling in certain situations:

Dispatching work to remote nodes

Concerns about stability of worker actors

Specific control over the amount of work done concurrently

These three situations are where pushing begins to cause problems. Essentially, tracking work becomes more difficult, failures harder to recover from and concurrency more complicated to reason about. I’ll go into detail on these problems.

The problem with routers

Routers are the most obvious way to concurrently process work in Akka. They’re seemingly great—change some config settings and bam! Instant concurrency. For some tasks, this is adequate but the simplicity comes at a cost: routers can create blind spots in your architecture. Blindly pushing work to a router relinquishes control to the underlying router implementation. There is no built-in way to know which units of work were received by which worker, nor can that be reliably tracked. Once a message is sent to a router it is placed in a routee’s mailbox but the sender has no way of knowing which mailbox. This can cause problems, especially when used with remote routees.

Accounting for work is difficult and unreliable

As noted, routers make work more difficult to track. Imagine the following scenario: 1000 work units are sent to a RoundRobinRouter using 10 remote routees. Each routee receives 100 messages. One of your remote nodes fails and consequently, one of your routees is now dead. The dead routee still may have had an unknown number of messages in its mailbox but we have no way of knowing which messages now need to be reprocessed.

We could (and probably should!) send an acknowledgement back to the sender when the worker begins working on a unit of work when using remote workers in a situation like this but acking doesn’t help if the message hasn’t been processed yet—the acknowledgement won’t have been sent. Those messages are now lost. Depending on your use case, this may or may not be a problem.

We could track all dispatched work and assume a unit of work is lost if no acknowledgement is received within a certain time but this is complicating our code when the real problem is inherent with pushing work.

Controlling concurrency is complicated

Pushing work can also complicate the task of controlling concurrency. When working in the cloud, CPU resources are at a premium. What works great on your quad-core dev machine may fall over on a relatively anemic VPS instance. We ran into this problem several times as we moved our backend into production on AWS and term it “rogue concurrency.”

This problem doesn’t manifest itself if your worker actors do all their processing within their receive function—no futures, no child actors. Controlling concurrency in such a situation is just a matter of managing the number of worker actors and perhaps tweaking their dispatcher. The issue becomes far more complicated when your worker actors have a series of child actors or futures that are part of their processing.

Imagine your worker actor first must fetch data from an external REST service and we’re using a non-blocking client for this (we use Play’s web service library). The worker actor will send off the request and receive a future, continuing work in the future’s callback. As soon as that future is created, the worker actors moves on to the next message. This could lead to a major problem: if that worker suddenly receives a large number of work requests and fires off more REST requests than it can handle, you might end up with an OutOfMemoryError due to too much data coming back from the external service. Or, if the processing is CPU intensive, you could render your node unresponsive due to CPU thrashing.

There are certainly workable techniques for controlling concurrency while still pushing work but in my opinion, pulling is far more elegant and much easier to reason about.

Our architecture is largely adopted from the blog post linked above and that post should be read before continuing.

Here’s our implementation. Note that each of the IMAP, analytics and mailer services implements the work pulling pattern:

We use slightly different terminology: A Leader represents the master coordinator for a given service, that service has member Nodes, each of which does its work inside a Processor actor and all work requests and responses are wrapped in messages inheriting from a common base of Start/Acknowledged/Completed/Failed traits allowing us to use this pattern generically across our backend. The Pipeline Manager uses a similar pattern to manage tasks at a high level across our various backend services.

To summarize, work is sent to a Leader which holds a queue of both work and workers requesting work. In our implementation, the Leader spawns workers using a cluster-aware router but never sends messages to that router—this is used only so that the creation of remote routees is done automatically by Akka. Nodes are sent WorkIsReady messages whenever work becomes available. In return, nodes request work and will be sent work if work is available. As outlined in the Akka post, this is entirely event driven, no polling is required.

How does this pattern fix these problems?

Pulling largely eliminates the problems outlined above. Because a specific actor must pull work from the coordinator, the coordinator always knows which unit of work each worker has. Failure recovery becomes much simpler: if a worker dies, the coordinator knows which unit of work to reprocess (or quarantine for inspection). Messages don’t sit in the workers’ mailboxes so the loss of those mailboxes isn’t an issue, the coordinator keeps its own queue. Since each worker will only request more work once it completes its current task, there are no concerns about a worker receiving or starting more work than can be handled concurrently. The use of futures or child actors won’t lead to rogue concurrency. Let’s explore these benefits in more detail.

Accounting can be synchronous

As demonstrated, pushing work makes the tracking of each piece of work more difficult which in turn complicates failure recovery. There is no reliable method of knowing where a given piece of work is once it has been sent: tracking can only be done once your worker sends back an acknowledgement, leaving a hole in your failure recovery strategy.

By contrast, accounting for work becomes trivial when using pulling. As you’ll see in the code samples below, a worker actor must specifically request more work. Work is only sent when there is both a worker requesting work and work ready for processing. Under those conditions, the work queue and worker queue are both dequeued, allowing to leader to track the worker receiving this particular unit of work. By using Akka’s built-in DeathWatch, we will be notified if the worker dies and we will know exactly which piece of work needs reprocessing.

Obviously the leader itself still has to hold a queue of work—if the leader should die, that queue will be lost. Pulling doesn’t eliminate that problem but it does centralize it so that work is only queued in one place. Past that, your own use case and requirements will dictate how recovery of a failed leader should proceed.

Failure recovery is easier

Stemming from simplified work accounting, failure recovery becomes far easier. Should a node die—and its workers along with it—work will return to the work queue until workers become available. In our implementation, failed nodes are restarted automatically if they fall out of the cluster or the JVM dies. Once the actor system is back up, the cluster-aware router will create new workers which will register themselves with the leader. By using this pattern, our backend is able to heal itself (with a little help from Ubuntu’s upstart utility) without our intervention.

Concurrency is easier to control

Our workers often have to fetch multiple pieces of data asynchronously and then perform some fairly CPU intensive tasks on that data. Because of this, we can easily starve ourselves of CPU or memory resources when running on VPS instances. We have to take care not to do too much work at once or we run the risk of Linux’s memory killer killing our JVM or rendering the JVM unresponsive, causing it to fall out of the cluster. Pushing makes this much easier to manage.

By configuring the cluster-aware router to only create a certain number of instances per node, we can confidently cap the amount of work done concurrently. A worker will only request more work once its task is complete, regardless of how that task is implemented within the Processor. This frees us from concerns about futures and child actors, we aren’t beholden to the implementation of the Processor actor. Switching to this pattern improved the stability of our backend immeasurably. Before, we had nodes crash fairly often due to heavy load. After, no issues whatsoever. Pulling work makes concurrency control not just easier to implement but easier to *reason about*. We could come up with a series of one-off solutions for controlling the amount of work in each specific worker based on its implementation, effectively reducing ourselves to locks and semaphores—or we can switch to pulling and unify this control structure. We keep our backend DRY and much, much simpler.

Routers still serve a specific purpose

As noted above, routers can lead to unforseen issues but they still serve a significant purpose. We use cluster-aware routers to manage the creation of workers on remote nodes but we never use that router’s actual reference. This allows us to spin up worker nodes as blank canvases which the cluster-aware creates workers on. Routers are still useful in a number of situations but those are out of scope for this post.

Each of the problems outlined above can be solved individually while still pushing work but pulling allows us to elegantly solve all three while retaining a simple, unified approach to dispatching work.

How do we implement this?

By and large, our implementation is identical to the code in the Akka blog post linked at the beginning of this post. Our version is generic: a Leader/Node pair can be creating by simply subclassing the two classes presented here. In our case, a Leader is created on our supervisor node and Node actors are created remotely on worker nodes. E.g., the supervisor node has an AnalyticsLeader which will create a certain number of AnalyticsNodes on each Akka node with the analytics role, based on the configuration of the cluster-aware router in the AnalyticsLeader.

(Akka devs: we hope you don’t mind that the bulk of this code is from your blog post)

A few notes: The facade is just an actor that tracks the location of the supervisor within the cluster. The implementation of Processors is left out because this pattern doesn’t rely on the implementation of the processor so long as it adheres to our message protocol. That protocol is available in the full gist.

(View the full example)

This is our Leader class. It's parameterized on the type of work and Node it manages.

We then create the cluster-aware router based on the configuration.

We create our work queue and workers map. Our workers map tracks all workers and uses Option to track a worker’s current state. If the value for a given worker is None, that worker is ready for work, otherwise we store both the unit of work and its original requester so that we can route replies appropriately.

This function is used to notify nodes that more work is available. You’ll see how this is used in the Node class later on. If work is available, we tell the workers such if we believe the worker isn’t already busy.

Upon creation, nodes immediately register themselves with the leader and are added to the worker queue. We also register for DeathWatch on the worker using context.watch so that we will receive a Terminated message if this worker dies.

When a worker requests work, we dequeue the work queue if work is available and send it to the worker, tracking both the original requester and the worker.

When work is done, we send the whatever message was included to the original requester and mark this node as idle.

If a node dies, we want to know about it. Akka will send a Terminated message to all actors watching the now-dead actor. In this case, we send that work back to the Leader for reprocessing and remove that worker so it isn’t reused.

The remaining code for the Leader is available in the full gist and won’t make much sense outside the context of our architecture but it does demonstrate queuing work.

Let’s move on to the Node class. Here we make use of Akka’s ability to change the behavior of an actor dynamically. Our nodes have two states: working and idle. We use context.become to switch the two states as needed. In our implementation of this pattern, Processor actors are created as-needed for each work request and never reused. This may or may not be appropriate depending on your use case. Creating a new Processor for each request can help reduce leaks in our experience.

Depending on the node’s current state, we respond to work notifications from the Leader with work requests or we ignore the notification entirely. Note that we don’t respond to the leader directly. We route all responses through our Facade so that we can continue working even if the supervisor dies and is restarted on a different node in the cluster.

How do we actually use these two classes? Simple.

That’s it, the pulling pattern for whatever work we need! Instantiate the Leader at startup and start passing work to it. Feel free to use the Leader and Node classes provided here—they will likely need modification to work for your specific use case. We didn’t create this pattern but hopefully our version will spark some ideas on how pulling can improve your Akka backend.

This is the final post in our series on how we use Akka at Conspire. If you’re using Akka already, we hope that you can learn from our mistakes. If you aren’t already using Akka, we hope that this series has shown you how Akka can help build a better backend. We’re definitely fans and we’re very excited about the direction Akka and Scala are headed. Happy hacking!

Akka At Conspire

How We Built Our Backend on Akka and Scala

Why We Like Actors

Making Your Akka Life Easier

Don’t Fall Into Our Anti-Pattern Traps

The Importance of Pulling

4 notes · View notes

goconspire-blog · 12 years ago

Text

Akka at Conspire [Part 4]: Don't Fall Into Our Anti-Pattern Traps

Our last post covered some lessons we learned while building our backend with Akka. In this post, we're going to go into detail on some of those lessons. First, we'll show how futures can violate Akka's guarantees about concurrent state modifications, then we'll talk about how to avoid that trap. Following that, we’ll touch on design considerations when coordinating work across a cluster.

(Shameless plug: To join Conspire and get a personalized weekly update about your email relationships, sign up at goconspire.com.)

Be Careful with Futures

We'll start with an example. First let's get some definitions out of the way.

Our Worker actor just performs some sort of processing for a user, we aren't particularly concerned with its details right now.

Imagine we want to have an actor to create and coordinate work requests and worker actors. We might try something like this:

We create a new Worker and ask it for the result of the work we pass it and then return that result to the sender. When work requests come in, we create a Worker and pass it the unit of Work within an ask() and then hoist a callback on the future in order to pass the result to the requesting actor. What's the issue with this implementation? For that, let's take a look at the Akka API for the Actor class

Look at the definition for sender: it's a function, not a variable. The callback for our future may or may not be processed on the same thread and sender may or may not have the value we expect, especially if other messages come in to the Coordinator before the callback is executed. We could end up sending the result to the wrong actor.

Sender isn't the only potential problem. If our actor contained some piece of local mutable state, say a Map of current workers, and we tried to modify that state within the callback, we could wind up with a ConcurrentModificationException. Passing state into the callback of the future violates basic guarantees provided by Akka, namely that the internal state of an actor will not be accessed or modified outside of an actor's receive function. Futures make violations of this rule easy to write.

Closing over sender—or any form of mutable state—within an actor is a huge recipe for disaster. At the moment Akka and Scala can't enforce this (though efforts are being made in that direction). As a programmer you have to ensure you don't close over mutable state. Scala makes it very easy to write code like this: it's a trap to be aware of.

State management is not the only task made more difficult by futures. As outlined in our previous post, futures can lead to what we term “rogue concurrency.” This occurs when far more work is done at once than is expected. On a beefy local development machine, this isn’t an issue but once you move your application into the cloud, you’ll quickly find that your JVM becomes unresponsive. Futures make controlling the amount of work done concurrently more difficult.

How can we fix this implementation?

First of all, we've eliminated the future. Without futures, violating the rule mentioned above is much more difficult to do accidentally. Now we use an internal map to keep track of requesters in order to route results to the appropriate actor. Our requesters map is only ever accessed or modified within the actor's receive function: we will never throw any sort of ConcurrentModificationException or find that our expectation of the world within this actor doesn't match reality.

Note our use of a pattern guard within the receive (if workers.isDefinedAt(user)). This allows us to gracefully handle unexpected WorkIsDone messages from users we don't know to be in progress. Such messages will go to Akka's unhandled message queue for handling elsewhere.

In our WorkIsDone case, we safely pass the result to the appropriate requesting actor and remove the user reference from our map.

This code still has a number of problems which are left as an exercise to the reader: concurrent requests for the same user are not gracefully handled. Typically this pattern would be used to implement some form of supervision or throttling but these are not present here. This implementation also assumes that the Worker shuts itself down upon completion—if we don't know this to be true, we should call context.stop on the worker when we get WorkIsDone back in order to avoid leaks.

We can slightly improve upon GoodWorkerCoordinator by eliminating the workers variable. This trick uses Akka's become() function to dynamically replace the receive function of an actor. Instead of keeping the map of requesters as a variable, we can pass it as a parameter to our receive function, updating the map with context.become. This approach is almost identical to GoodWorkerCoordinator and if it feels like clever-for-clever's-sake, there is little downside in using the previous implementation. This approach is mostly presented to show one use of context.become, an underutilized tool in Akka's kit.

Don’t Split Superversion Across Your Cluster

First, a brief refresher on our architecture: we have one supervisor node which dispatches work to a cluster of worker nodes. Each worker node has a given role denoting its service (e.g., “analytics”) and the supervisor remotely creates worker actors for the appropriate service on the worker node as needed.

The original design for our backend had each role supervising itself: using Akka’s cluster-aware routers and ClusterSingletonManager, roles would use the oldest member node of the same role as their leader which would remotely deploy workers to nodes of the given role. That is, the oldest node with the “analytics” role held the AnalyticsLeader which remotely deployed analytics workers to all nodes with that role. If that node died, the next oldest node with that role would take over. This approach is roughly the same as that described here. This pattern works but poses some problems.

Work has to be tracked by both the master supervisor and the role leader. In our design, the leader kept a queue of work requests. If it died, that queue was lost and the supervisor had to start over. Work needed to be tracked by both the master supervisor and leader. The point, though, is that if you don't do that right then, when a leader goes down, you lose state information. We never actually added work tracking to the supervisor because we realized the pattern was broken. Tracking state in both places is redundant and unnecessary.

The supervisor and all worker nodes for a given role must track the oldest node. Should that node die, a hole opens: nodes may try send work requests or route completion notifications through the leader despite its death if that member hasn’t yet been removed from the cluster. Messages could get lost.

This pattern gave us multiple single-points-of-failure. Ultimately, our cluster requires the supervisor to be running for any work to be dispatched. Given the nature of our work, this is a point of failure we’re comfortable with. Introducing role leaders as another potential point of failure pushed us into a corner: nodes could be perfectly ready for work and the supervisor ready to dispatch work but the that work could get lost because the leader had crashed but was not yet removed from the cluster. We ran into all sorts of headaches with this.

Ultimately, we moved the leaders onto the supervisor. Akka made this change extremely simple: we changed one line in our config. Cluster-aware routers have a setting to toggle the deployment of routees on the same node as the router. By toggling this to “off”, the role leader could be on the supervisor node but only create workers on analytics nodes.

...became...

The only change was to the allow-local-routees setting.

Our initial design caused problems but Akka’s power made the solution simple. Flushing out this change took almost no time at all. As a side benefit, we eliminated the cluster singleton pattern from our workers, allowing each node of a given role to be treated exactly the same. In fact, all our worker nodes are spun up with nothing more than a basic ActorSystem and a single actor which facilitates communication with the supervisor. We also eliminated work pushing in favor of work pulling, a change we will detail in our final post.

In our final post for this series, we’ll go into more detail about the design of our actor system and hierarchy and how we arrived at our final architecture.

(Hint: pushing work is easy to do but hard to do right)

Akka At Conspire

How We Built Our Backend on Akka and Scala

Why We Like Actors

Making Your Akka Life Easier

Don’t Fall Into Our Anti-Pattern Traps

The Importance of Pulling

0 notes

goconspire-blog · 12 years ago

Text

Akka at Conspire [Part 3]: Making Your Akka Life Easier

Over the course of the last few months building our backend, we’ve learned a lot about working with Akka to build a stable, resilient cluster. We went down some dead-ends and most definitely had some hair-raising moments of frustration, especially when it came to cluster stability. This post will briefly cover some of the lessons we learned in the hope that others don’t experience our exasperation.

(Shameless plug: To join Conspire and get a personalized weekly update about your email relationships, sign up at goconspire.com.)

Cap the amount of processing caused by each message

Our original implementation simply processed all of a user's message headers at the same time. Generally this isn't a problem but some of our users have millions of messages. When those users hit—or worse, when multiple users of this scale hit at the same time—our nodes would become unresponsive and marked as unreachable by the rest of our Akka cluster despite eventually completing successfully. This issue only manifested itself on AWS. Our first attempts to solve this problem involved tweaking Akka's configuration settings: increasing the threshold for failure detection and moving its internal clustering and remoting to their own dispatchers. Sometimes this helped but we still ran into too many failures. Major problems occurred when a Stop the World garbage collection was triggered—the entire node would become unresponsive. Given how we generated huge amounts of data which often manage to graduate to tenured generations, a full garbage collection would take quite some time. Tweaking JVM settings helped but wasn't a long term solution. If you aren’t familiar with garbage collection on the JVM, I highly recommend reading Oracle’s documentation.

Ultimately we had to move to incremental processing and cap the amount of work done concurrently. We chunked our message headers into blocks of 60,000, equivalent to roughly 30 megabytes of uncompressed JSON. At this size any given chunk would be processed fast enough (and cleaned up quickly enough) to avoid rendering the worker node unresponsive. This significantly ameliorated our stability issues without seriously compromising performance.

Lesson: Don’t try to do too much work at once, especially on a VPS.

Tune your cluster dispatcher

This recommendation is quite common on the Akka mailing list, we're just repeating it here for emphasis: move clustering to its own dispatcher. This won’t solve all problems and won’t survive a major garbage collection but it will improve stability with clustering in a cloud environment.

Lesson: The cloud is a scary place, adjust accordingly.

Improve testability by not creating actors directly

Creating actors responsible for interfacing with external systems directly isn't conducive for testing. If your processing actor creates its own persistence actor you're going to have to jump through hoops in your unit tests to avoid including that external system in your test.

We favor two approaches: first, pass in such actors at creation. If you're dealing with a SQL database this approach also more easily keeps your persistence actors on the own dispatcher so as to avoid blocking within the default dispatcher as well as manage the number of actors handling database connections at once. In testing you can pass in mock actors while passing in concrete implementations in production.

The second approach is to use the cake pattern. We aren't using this at the moment but are considering moving parts of our codebase to it in the future.

Lesson: Avoid patterns that prohibit or complicate testability.

Make sure your nodes can spin themselves back up

JVMs crash and cloud environments can be unreliable and noisy. We tuned our failure detection settings to allow for some leeway in this regard but we don't want to be so lax in failure detection that we miss legitimate failures. We mitigate this problem with a two-pronged approach: ensure your nodes can restart themselves and monitor for failure of the cluster. In our case, nodes which fall out of the cluster (that is, mark every other node as unreachable) kill their actor systems and shut down their JVM. Ubuntu's upstart daemon monitors the process and restarts the JVM if it exists with a non-0 code. This allows us to accept and be comfortable with unexpected failures—the node will simply rejoin and any lost work will be marked for reprocessing by the supervisor.

Lesson: Make sure your system can heal itself.

Automated deployment is a necessity

Spinning up multiple EC2 instances and provisioning manually isn't feasible. Mistakes will be made—you must have automated deployment if you want to save your sanity. We use Chef and Vagrant to provision and set up new EC2 instances. We can't say we're huge fans of Chef; it's definitely got its warts but for the moment we're content. Writing the recipes and getting everything configured correctly took us quite some time but once we got to a working state this setup has proved quite stable.

Lesson: Tying in with the previous lesson, automated deployment will make it easier to both scale and heal your cluster. We can add a new worker node with one command via Vagrant.

The FSM is your friend

One of Akka's best features is its finite-state machine helper. Strictly speaking, the FSM isn't needed but in certain situations it lends itself to far more readable code as well as guiding your thought process towards more predictable, maintanable code. We also found that the FSM handler better helps us catch corner cases we didn't initially predict. This structure isn't a magic bullet but writing a complex process with the FSM mixin will force you to clearly enunciate each state and transition within the process and more quickly find potential issues or holes in your logic.

Lesson: Your mental model of your system is very, very important and using helpers like the FSM can help keep everything straight when you translate from thought to code.

Rogue concurrency: the overreaching actor (aka be very very careful mixing actors and futures)

One of our biggest problems was what we term "rogue concurrency." This commonly occurs when intensive work is done in the callback of a future. Recall that an actor processes one message at a time and immediately processes the next message upon completion of the current. Also recall that a future operates on a different dispatcher and an actor which triggers a future within its receive still returns immediately. These are good things for scalability and performance but also form a potential trap.

Imagine the following scenario: a request for CPU intensive work is sent to a worker node. In order to begin, the worker node sends an asynchronous request to the persistence layer to retrieve a large amount of data (say, 50 megabytes); when this future is fulfilled, work begins and can take quite some time.

Now imagine that 100 requests for such work come in at roughly the same time. Because each request is non-blocking, the actor immediately starts working on all 100 requests at once. All 100 requests for data are sent off leading to five gigabytes of data coming in from your persistence service. Even if you don't get an OutOfMemoryError (we'll assume you're using a beefy machine with 32GB RAM), processing that much data at once causes your node to become unresponsive or run out of threads (which also causes an OutOfMemoryError). In either case, your node either dies unexpectedly or is marked unreachable by the rest of the cluster. We sometimes found on AWS m.medium instances that the JVM would die silently with no error logged—ultimately, rogue concurrency was found to be the culprit.

Futures aren't required for this problem to occur—this same problem would happen if we never operated within a future’s callback but relied on Akka's pipeTo helper and only did processing within an actor's receive function. The 100 requests for data would go out and be piped into an actor for processing. This actor may only process one at a time but letting that amount of data pile up in the actor's mailbox can also cause problems. Piping to a router of actors may help but not in every circumstance (see next lesson). What if the actor you pipe to delegates CPU intensive work to a child actor? Rogue concurrency.

There are several ways around this but our favorite is to always pull work rather than push it. This allows you to better manage the amount of work in progress at a given time.

Lesson: Avoid mixing concurrency patterns and always keep an eye on the amount of work done at any given time.

Pull, don't push

In the early days of development we blindly pushed work from the supervisor node to a router of workers. This approach is terrible. When (not if) your nodes crash, any in-progress work is lost with no record of the queue. Without any throttling your nodes can fall into rogue concurrency where far more work is attempted at once than is feasible. We'll go into greater detail on this approach in part 5 of this series. For now, read about this pattern for balancing work from the Akka blog for the basis of this pattern.

Lesson: Pushing work to workers contributes to rogue concurrency and inhibits resilience without some additional form of record-keeping. Letting workers pull instead both limits the amount of work done at once and makes recovery simpler in case a worker dies while processing.

Most of our lessons revolve around keeping the cluster stable and resilient. Akka certainly offers a lot of help in this regard but it’s not perfect and careful thought is still required. Ultimately, we’re now very happy with where our backend is but we had to go through some very hairy moments to get here.

Akka At Conspire

How We Built Our Backend on Akka and Scala

Why We Like Actors

Making Your Akka Life Easier

Don’t Fall Into Our Anti-Pattern Traps

The Importance of Pulling

0 notes

goconspire-blog · 12 years ago

Text

Akka at Conspire [Part 2]: Why We Like Actors

As we talked about in part one, Conspire makes heavy use of Akka for our backend. Akka itself provides terrific features for scalable, resilient processing but the heart of all of this is a concept known as “actors.”

(Shameless plug: To join Conspire and get a personalized weekly update about your email relationships, sign up at goconspire.com.)

What is an actor?

Akka’s benefits stem from a simple concurrency model known as the “actor model.” An actor is the most basic unit of computing within this model. An actor is very limited in its capabilities, but these limitations provide Akka its power: an actor can receive messages, send messages to other actors, create actors and monitor those actors it creates. You can only interact with an actor by passing a message to an actor’s reference—the actual actor is never accessible. The decoupling of an actor from its reference is key: a reference can point to an actor on the same machine or a different machine and in the event of failure, the actor behind the reference can be replaced without impacting the reference. An actor processes only one message at a time, and its state cannot be accessed or modified except by its receive function.

(See Akka's own description of actors and actor systems for a better overview)

An actor consists of state, behavior, a mailbox (i.e., a queue of messages), a supervisor and, potentially, children. State is not accessible from outside the actor—client code only interfaces with an actor via message passing to its reference. Messages are sent to an actor's reference and placed in its mailbox. Its behavior (that is, receive function) processes one message at a time, potentially changing the actor's state. An actor can create other actors, for which it serves as a supervisor, handling failures within the child actors.

To the code!

Let's take a look at an example. In this example, we'll show a common pattern: request comes in, create an actor for it, stop that actor when complete. We'll also show how actors provide resilience by specifying a supervisor strategy for child actors and how to manage state safely (hint: nothing special required for this one!).

(View complete example)

First of all, let's get some definitions out of the way. Assume that we define a NewUserQueue actor elsewhere in the system which continuously streams users for processing to actors which register themselves with this queue. Also, assume that UserIndexer is an actor which indexes relevant documents for a user: this process could take quite a long time. Finally, assume that UserPersister just handles writing users to a database. To begin, we'll stub out some actors and define our messages.

Our pipeline will marshall users from the NewUserQueue to a new UserIndexer.

We'll create our own UserPersister actor:

And a map to track the UserIndexer actors by user:

We'll set up the supervisor strategy for the actors we create. This lets us handle restart logic on a per-exception basis for exceptions that can't be cleanly handled internally by the child actors.

Then, on start we'll ask the NewUserQueue to start sending us users. On stop, we'll tell it we're no longer interested.

Finally, we'll define this actor's behavior, known as the receive function:

For each user, we're going to create a UserIndexer to do the actual work for us. We'll watch this actor (context.watch) so that we're notified in case of failures. If it succeeds, we'll write the updated user back to the database.

When a UserIndexer succeeds, we'll send that user on to the persister and stop the worker actor.

When a UserIndexer fails, we'll handle that too.

In either case, we want to stop watching that actor and remove it from our map of current indexers.

We also want to know if one of our indexers dies unexpectedly, for that we'll handle Terminated messages:

What did we just do?

In this example, you've seen actors modify their internal state (thread-safe!), interact with other actors and supervise their children. Moreover, you've seen a very common pattern: create a new actor for every request, clean it up when finished. This pattern allows each request to start from a clean slate. Moreover, actors are very lightweight so the cost of creating many actors is negligible (depending, of course, on how much state those actors contain).

Why is this better?

Replace your mental model of a thread with an actor, the difference being that state is now contained and must be communicated between actors solely through message passing. The rules imposed by the actor model are the very thing that allow Akka to keep complex systems conceptually simple. The actor model cleanly compartmentalizes state and, in turn, the way we think about it. This is how a small team of three was able to build an elastically scalable, fault tolerant backend in so little time. Akka isn't all roses though: later in this series we'll cover pitfalls we ran into over the last few months.

Up next...

How We Built Our Backend on Akka and Scala

Why We Like Actors

Making Your Akka Life Easier

Don’t Fall Into Our Anti-Pattern Traps

The Importance of Pulling

1 note · View note