mikegchambers - Tumblr blog

mikegchambers · 8 years ago

Text

A deep-dive into lessons learned using Amazon Kinesis Streams at scale

Best practices discovered while processing over 200 billion records on AWS every month with Amazon Kinesis Streams

After building a mission-critical data production pipeline at ironSource that processes over 200 billions records every month, we’d like to share some of our rules written with blood.

The data pipeline architecture using Kinesis Streams

An Overview of Amazon Kinesis Streams

Kinesis is an infinitely scalable stream as a service that consists of shards. The service is commonly used due to its ease of use and low overhead along side its competitive pricing. This is a common differentiator between Kinesis Streams and Kafka.

Like any managed service, Amazon Kinesis has some limitations you should be familiar with — and how to overcome these with scaling and throttling. It will be wise to leverage the AWS provided producers, consumers and available tools in order to leverage these best practices.

Kinesis Streams diagram showing the use of shards within a stream

Reduces cost with the Amazon Kinesis Producer Library (KPL)

At a large scale, it’s hard to change architecture once in production and cost becomes a very big pain. The service is billed per 25kb payload unit, so it makes sense to aggregate messages if you have records that are smaller in size.

When sending data into your Kinesis stream you should compress and aggregate several messages into one in order to reduce costs.

The Amazon Kinesis Producer Library (KPL) aggregates and compresses (using Protocol Buffers) multiple logical user records into a single Amazon Kinesis record for efficient puts into the stream. The library is built by AWS in C++ and has (only) Java bindings. An open-source version in Golang is available.

KPL explained showing how messages are being aggregated using Protocol Buffers

Use the Kinesis Client Library (KCL)

The KCL library, is written by AWS and supports automatic de-aggregation of KPL user records. The KCL takes care of many of the complex tasks associated with distributed computing — such as load-balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to resharding.

The KCL library provides the following language bindings in {Java, Node.js, .NET, Python, Ruby}.

Processing methods — On-Demand / Spot-instances / Lambda

While processing a Kinesis stream can be done using on-demand instances, it is highly recommend leveraging AWS spot-instances in order to process your stream — it is the most cost effective method.

There is also a way of processing the data using AWS Lambda with Kinesis, and Kinesis Record Aggregation & Deaggregation Modules for AWS Lambda. It is very easy to hook up a Kinesis stream to a Lambda function — but you must take cost into consideration and see if it makes sense for your specific use-case.

Monitoring Kinesis Streams

There are two sets of metrics you should take into consideration when monitoring your Kinesis Streams with CloudWatch:

Basic Stream-level Metrics

Enhanced Shard-level Metrics

For the stream-level metric, it’s good practice to set up an alarm on the GetRecords.IteratorAgeMilliseconds to know if your workers are lagging behind on the stream.

However, sometimes there might be a specific worker/shard that is out of sync — but the state won’t be reflected at the stream level via the global IteratorAgeMilliseconds average. In order to overcome this, I recommend running a Lambda script every minute and query at the shard-level for its IteratorAgeMilliseconds and alert if needed.

Amazon Kinesis Streams Metrics

AWS recommends monitoring the following metrics:

GetRecords.IteratorAgeMilliseconds Tracks the read position across all shards and consumers in the stream. Note that if an iterator’s age passes 50% of the retention period (by default 24 hours, configurable up to 7 days), there is risk for data loss due to record expiration. AWS advises the use of CloudWatch alarms on the maximum statistic to alert you before this loss is a risk. For an example scenario that uses this metric, see Consumer Record Processing Falling Behind.

ReadProvisionedThroughputExceeded When your consumer side record processing is falling behind, it is sometimes difficult to know where the bottleneck is. Use this metric to determine if your reads are being throttled due to exceeding your read throughput limits. The most commonly used statistic for this metric is average.

WriteProvisionedThroughputExceeded This is for the same purpose as theReadProvisionedThroughputExceeded metric, but for the producer (put) side of the stream. The most commonly used statistic for this metric is average.

PutRecord.Success, PutRecords.Success AWS advises the use of CloudWatch alarms on the average statistic to indicate if records are failing to the stream. Choose one or both put types depending on what your producer uses. If using the Kinesis Producer Library (KPL), use PutRecords.Success.

GetRecords.Success AWS advises the use of CloudWatch alarms on the average statistic to indicate if records are failing from the stream.

Throttling Kinesis Streams

If you push it to the limit, Kinesis will start throttling your requests and you’ll have to re-shard your stream. There might be several reasons for throttling. For example, you may have sent more than 1 MB of payload / 1,000 records per second per shard. But you might have a throttling problem caused by DynamoDB limits.

As noted in Tracking Amazon Kinesis Streams Application State, the KCL tracks the shards in the stream using an Amazon DynamoDB table. When new shards are created as a result of re-sharding, the KCL discovers the new shards and populates new rows in the table. The workers automatically discover the new shards and create processors to handle the data from them. The KCL also distributes the shards in the stream across all the available workers and record processors. Make sure you have enough read/write capacity in your DynamoDB table.

Re-Sharding a Kinesis Stream

When re-sharding a stream, scaling is much faster when it’s in multiples of 2 or halves. You can re-shard your stream using the UpdateShardCount API. Note that scaling a stream with more than 200 shards is unsupported via this API. Otherwise, you could use the Amazon Kinesis scaling utils.

Re-sharding a stream with hundreds of shards can take time. An alternative method involves spinning up another stream with the desired capacity, and then redirecting all the traffic to the new stream.

An example of re-sharding a Kinesis Streams

AWS Kinesis Resources

Developing Kinesis Producers & Consumers

Kinesis Producer Library — KPL

Kinesis Producer Library — KPL — Golang

Kinesis Client Library — KCL

Monitoring Kinesis

Amazon Kinesis Producer Library — Streams Producer Library (KPL) provides metrics per shard, worker, and KPL application.

CloudWatch metrics — Streams sends Amazon CloudWatch custom metrics with detailed monitoring for each stream.

Amazon Kinesis Agent — The Amazon Kinesis Agent publishes custom CloudWatch metrics to help assess if the agent is working as expected.

API logging — Streams uses AWS CloudTrail to log API calls and store the data in an Amazon S3 bucket.

Troubleshooting Kinesis

Some Streams Records are Skipped When Using the Kinesis Client Library

Records Belonging to the Same Shard are Processed by Different Record Processors at the Same Time

Consumer Application is Reading at a Slower Rate Than Expected

GetRecords Returns Empty Records Array Even When There is Data in the Stream

Shard Iterator Expires Unexpectedly

Consumer Record Processing Falling Behind

Amazon Kinesis Scaling Utils

Shimon Tolts is the Co-Founder and CTO at CodeLaborate.io and is an AWS Community Hero. Thanks for reading!

A deep-dive into lessons learned using Amazon Kinesis Streams at scale was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rt1tSd

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Some things you should know before using Amazon’s Elasticsearch Service on AWS

Elasticsearch is a powerful but fragile piece of infrastructure with a ton of things that can cause the AWS service to become unstable

I write this following a particularly frustrating day of thumb twiddling and awaiting slack messages from the AWS support team. Our Elasticsearch cluster was down for the better part of a day, and we were engaged with AWS support the whole time.

At my previous job working for Loggly, my team and I maintained a massive, multi-cluster Elasticsearch deployment. I learned many lessons and have a lot of tricks up my sleeves for dealing with Elasticsearch’s temperaments. I feel equipped to deal with most Elasticsearch problems, given access to administrative Elasticsearch APIs, metrics and logging.

AWS’s Elasticsearch, however, offers access to none of that. Not even APIs that are read-only, such as the /_cluster/pending_tasks API, which would have been really handy, given that the number of tasks in our pending task queue had steadily been climbing into the 60K+ region.

This accursed message has plagued me ever since AWS’s hosted Elasticsearch was foisted on me a few months ago:

{ "Message":"Your request: '/_cluster/pending_tasks' is not allowed." }

Thanks, AWS. Thanks….

Without access to logs, without access to admin APIs, without node-level metrics (all you get is cluster-level aggregate metrics) or even the goddamn query logs, it’s basically impossible to troubleshoot your own Elasticsearch cluster. This leaves you with one option whenever anything starts to go wrong: get in touch with AWS’s support team.

9 times out of 10, AWS will simply complain that you have too many shards.

It’s bitterly funny that they chide you for this because by default any index you create will contain 5 shards and 1 replica. Any ES veteran will say to themselves: heck, I’ll just update the cluster settings and lower the default to 1 shard! Nope.

{ "Message": "Your request: '/_cluster/settings' is not allowed for verb: GET" }

Well, fuck (although you can work around this by using index templates).

Eventually, AWS support suggested that we update the instance size of our master nodes, since they were not able to keep up with the growing pending task queue. But, they advised us to be cautious because making any change at all will double the size of the cluster and copy every shard .

That’s right. Increasing the instance size of just the master nodes will actually cause AWS’s middleware to double the size of the entire cluster and relocate every shard in the cluster to new nodes. After which, the old nodes are taken out of the cluster. Why this is necessary is utterly beyond me.

Adding an entry to the list of IP addresses that have access to the cluster will cause the cluster to double in size and migrate every stinking shard.

In fact, even adding a single data node to the cluster causes it to double in size and all the data will move.

Don’t believe me? Here is the actual graph of our node count as we were dealing with yesterday’s issue:

The node count increased by 10x for a period of time

Back at Loggly, we would never have considered doing this. Relocating every shard in any respectably sized cluster all-at-once obliterates the master nodes and would cause both indexing and search to come to a screeching halt. Which is precisely what happens whenever we make any change to our Elasticsearch cluster in AWS.

This is probably why AWS is always complaining about the number of shards we have… Like, I know Elasticsearch has an easy and simple way to add a single node to a cluster. There is no reason for this madness given the way Elasticsearch works.

I often wonder how much gratuitous complexity lurks in AWS’s Elasticsearch middleware. My theory is that their ES clusters are multi-tenant. Why else would the pending tasks endpoint be locked down? Why else would they not give you access to the ES logs? Why else would they gate so many useful administrative API behind the “not allowed” Cerberus?

I must admit though, it is awfully nice to be able to add and remove nodes from a cluster with the click of a button. You can change the instance sizes of your nodes from a drop-down; you get a semi-useful dashboard of metrics; when nodes go down, they are automatically brought back up; you get automatic snapshots; authentication works seamlessly within AWS’s ecosystem (but makes your ES cluster obnoxiously difficult to integrate with non-AWS libraries and tools, which I could spend a whole ‘nother blog post ranting about), and when things go wrong, all you have to do is twiddle your thumbs and wait on slack because you don’t have the power to do anything else.

Elasticsearch is a powerful but fragile piece of infrastructure. Its problems are nuanced. There are tons of things that can cause it to become unstable, most of which are caused by query patterns, the documents being indexed, the number of dynamic fields being created, imbalances in the sizes of shards, the ratio of documents to heap space, etc. Diagnosing these problems is a bit of an art, and one needs a lot of metrics, log files and administrative APIs to drill down and find the root cause of an issue.

AWS’s Elasticsearch doesn’t provide access to any of those things, leaving you no other option but to contact AWS’s support team. But AWS’s support team doesn’t have the time, skills or context to diagnose non-trivial issues, so they will just scold you for the number of shards you have and tell you to throw more hardware at the problem. Although hosting Elasticsearch on AWS saves you the trouble of needing a competent devops engineer on your team, it absolutely does not mean your cluster will be more stable.

So, if your data set is small, if you can tolerate endless hours of downtime, if your budget is too tight, if your infrastructure is too locked in to AWS’s ecosystem to buy something better than AWS’s hosted Elasticsearch: AWS Elasticsearch is for you. But consider yourself warned…

Some things you should know before using Amazon’s Elasticsearch Service on AWS was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rs0d1E

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

How does language, memory and package size affect cold starts of AWS Lambda?

AWS Lambda Cold Starts

Comparing the cold start times of AWS Lambda using different languages, memory allocation, and sizes of deployment package

this post looks at the coldstart time between C#, Java, Python and Nodejs runtimes on AWS Lambda

In a recent blog, we examined the performance difference between the runtimes of languages that AWS Lambda supports natively. Since that experiment was specifically interested in the runtime differences of a ‘warm’ function, the ‘cold start’ times were intentionally omitted.

A cold start occurs when an AWS Lambda function is invoked after not being used for an extended period of time resulting in increased invocation latency.

Since the cold start times of AWS Lambda is an important performance consideration, let’s take a closer look at some experiments designed to isolate the variables which may impact the first-time invocations of functions.

Testing methodology

From my experience running Lambda functions in production environments, cold starts usually occurred when an AWS Lambda function is idle for longer than five minutes. More recently, some of my functions didn’t experience a cold start until after 30 minutes of idle time. Even if you keep your function warm, a cold start will occur about every 4 hours when the host virtual machine is recycled — just check out the metrics by IO Pipe.

For testing purposes, I needed a reliable method for consistently ensuring a cold start of an AWS Lambda function. The only surefire way to create a cold start is by deploying a new version of a function before invocation.

For the experiment, I created 45 variations of AWS Lambda function. Using the Serverless framework setup below, it was easy to create variants of the same function with different memory sizes.

http://ift.tt/2rtQxb2

I recursively deployed all 45 functions and invoked each of them programmatically using the simple script below.

http://ift.tt/2sd1WsS

The deployment and invocation loop took about three minutes. To collect a meaningful amount of data points, I ran the experiment for over 24 hours.

My initial hypothesis

I based the hypothesis on my knowledge that the amount of CPU resources is proportional to the amount of memory allocated to an AWS Lambda function.

C# and Java would have higher cold start time

Memory size affects cold start time linearly

Code size affects cold start time linearly

Now it was time to see if the experiments supported my hypothesis.

Experiment # 1 — Cold start time by runtime & memory

To evaluate the the impact of memory on cold starts, I created 20 functions with 5 variants — using different memory sizes for each language runtime. The supported languages are C#, Java, Node.js, and Python.

I created 5 variants of the same hello world function (using different memory sizes) for each language runtime

After running the experiment for a little over 24 hours, I collected the following data — here’s the results:

Observation: C# and Java have much higher cold start time

The most obvious trend is that statically typed languages (C# and Java) have over 100 times higher cold start time. This clearly supports our hypothesis, although to a much greater extent than I originally anticipated.

Feel free play around with the interactive plot.ly chart here

Observation: Python has ridiculously low cold start time

I’m pleasantly surprised by how little cold start the Python runtime experiences. OK, there were some outlier data points that heavily influenced some of the 99 percentile and standard deviations — but you can’t argue with a 0.41ms cold start time at the 95 percentile of a 128MB function.

Observation: memory size improves cold start time linearly

The more memory allocate to your function, the smaller the cold start time — and the less standard deviation. This is most obvious with the C# and Java runtimes as the baseline (128MB) cold start time for both are very significant.

So far, the data from the first experiment supports the initial hypothesis.

Experiment # 2 — cold start time by code size & memory

To evaluate the the impact of memory and the package size on cold starts, I created 25 functions with various code and memory sizes. Node.js was the constant language for this experiment.

Here are the results from this experiment:

Observation: memory size improves cold start time linearly

As with the first experiment, the memory size improves the cold start time and standard deviation in a roughly linear fashion.

Observation #2 : code size improves colds tart time

Interestingly the size of the deployment package does not increase the cold start time. I would have assumed that the bigger package would equate to more time to download & unzip. Instead, a larger deployment package seems to have a positive effect on decreasing the overall cold start time.

To see if the behavior is consistent, I would love for someone else to repeat this experiment using a different language runtime. The source code used for these experiments can be found here, including the scripts used to calculate the stats and generate the plot.ly box charts.

Conclusions

Here are a few things I learned from these experiments:

functions are no longer recycled after ~5 minutes of idleness, which makes cold starts far less punishing than before

memory size improves cold start time linearly

C# and Java runtimes experience ~100 times the cold start time of Python and also suffer from much higher standard deviation

you should consider running your C#/Java Lambda functions with a higher memory allocation than you would Node.js/Python functions

bigger deployment package size does not increase cold start time

Thanks for reading! If you like what you read, hit the ❤ button below so that others may find this. You can follow me on Twitter.

How does language, memory and package size affect cold starts of AWS Lambda? was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rtGjr3

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

How to build a cloud-first IT organization that’s as much about people as technology

Many IT departments are bloated bureaucracies of interconnected silos that work to enforce the status quo — let’s fix it now

If you work in Corporate IT — congratulations — your department most likely sucks

Can I help you?

No, I just waited 30 minutes to say “hi”.

The cloud era will likely be the most disruptive experience IT departments have ever faced. There’s never been a more exciting — and painful — time to be involved in technology. Companies are rightfully expecting their technology organizations to add value, create products, solve problems, and not be so, you know, dreadful.

Every company is now a technology business — and that has a seismic impact on the structure of IT organizations. By predicting the outcome and making steps to get there more quickly, we can leapfrog the intermediate pain.

In the new world, IT is a strategic organization responsible for generating revenue, and tactical technology teams exist in every division. So far you’ve primarily seen this in marketing departments, but it will quickly spread to finance, facilities, R&D, purchasing, HR, and everywhere else. To address the challenges of managing technology when it’s everywhere, CIOs will begin replacing CEOs. Why wait until then to get started? Let’s fix it now.

Sugar-coats off — your IT sucks!

If you ask the denizens of corporate America about their IT staff, the responses echo sentiments of disappointment, rage and bewilderment. Workers have come to accept that IT departments are no longer the source of innovation and competitive advantage, but a bloated bureaucracy of interconnected silos that work to enforce the status quo. If you work in IT in Corporate Land — congratulations!— you currently suck.

But still better than United’s customer service.

For years this has been acceptable to companies that look to IT, a division they barely understand, to protect their castle and simply keep systems operational. IT has been about stopping people from crossing the moat, opening the drawbridge occasionally, and frowning when people ask questions. In summary, “No Chrome for you, Slack’s out of the question and let me brick your Android device while I’m here.”

But Winter’s Coming, my friends, and the Summer of Suckage that has defined corporate IT is ending. If mobile was the tremor, cloud is the earthquake.

Of course, some companies will botch this transition terribly and will pay a hefty price in the marketplace. Others will seize this moment to build a new, better organization around a cloud-first, customer-focused IT department. If you’re wondering what this will look like, think of the Be Our Guest number where the plates and cutlery start singing when a customer shows up.

Life is so unnerving for a coder who’s not coding.

Silos suck — destroy the silos

You know who hates your silos? Your customers. And your employees. Everyone, actually. Silos are the enemy of agility, yet IT departments are usually quick to organize themselves into PowerPoint-friendly operational groups.

From database administrators to network engineers, each group acts as a gatekeeper to change with a collective slowing effect that brings change to a grinding halt. Silos are also the enemy of accountability, so when something truly horrific happens publicly, everyone can point at someone else and dodge the blame ball.

You had one job, marketing.

With cloud services and massive amounts of automation, we can outsource a major part of what different silos do on a daily basis. This means that 50–70% of your existing technical workforce won’t have to put the cap on the toothpaste tube and can be deployed for more useful jobs. “Being more useful” is my euphemism for “doing work for customers” — namely, shipping features, solving their problems, and making them happy.

Now silo-lovers will scoff at this suggestion, claiming that employee X could never learn the skills of employee Y because of blah blah blah. They’ll also argue that nobody will ever be allowed to go on vacation in a non-silo model. Well the answer to that is — Google, Facebook and all the other tech companies we admire, who cross-train merrily and vacation frequently.

If you create small, product-driven teams with a specific actual customer goal, you’ll be amazed how stuff gets done and quality improves. The cross-pollination of knowledge is a by-product of the increased morale and motivation of the group. Seriously, try it out — and if it doesn’t work, you can always go back.

Your cloud skills suck — start with the training

Ignorance breeds strong opinions. People with the poorest understanding of a topic often have the most extreme views. The cloud is no different. I’ve seen over and over how IT staff can overestimate their knowledge of cloud technologies, and then express the loudest anti-cloud objections.

I used to tackle this head-on, but it becomes an exhausting and fruitless exercise as people dig in their heels to avoid being wrong. You can skip this pain by identifying the problem actors ahead of time, pulling them aside, and charging them with the responsibility of being the key expert in Google Big Query, Amazon Redshift, Azure IoT or whatever.

Critically, you must send these people for training and, most importantly, get them certified. You’d be shocked how you can convert ferocious anti-clouders to strong cloud warriors who will drive your goals. Unlike many certification programs (Oracle’s Java certification, OMG) cloud training generally has a surprising number of included aha moments that serve to motivate students.

Your leadership sucks — trust your teams

Here’s a thing — software engineers tend to be smart, conscientious people and often enjoy working on hard problems. So it boggles my mind when companies use security teams, legal, and other silos to block their work and pose so many questions to the point where nothing gets done and motivation fizzles out.

I worked at one financial services company that didn’t trust any software coming from its prized developers “for security reasons”. So I asked why there wasn’t a security guard watching every single bank teller’s transaction to make sure they weren’t giving away the shop. “Oh, we trust them to safeguard our assets,” they replied. So we trust the minimum-wage, high-turnover front-end of the business but don’t trust our brain trust that produces the software they use? That’s ridiculous — and any developer working in that environment should find a better job immediately.

Any decent developer tends to be full-stack by nature. They often know more about security holes than the security teams, and they come pre-installed with skills like database management and testing automation. They are often good at the programming languages that don’t appear on their resume and have weekends where they submit pull requests for open source projects for fun.

These are skilled people. You need to foster an environment that feeds their natural curiosity, develops their skill sets, and demonstrates that you trust them to drive your business forward.

Focus on customer problems — not your problems

Moving to the cloud can seem like an insurmountable challenge without an obvious place to start. It’s as much a people problem as a technical issue, so you have to find a common goal to rally everyone around.

I’ve found that starting with the technical side can be a mistake. The effort usually gets its wheel stuck in the mud of upgrades that don’t help and arcane mini-projects. These are technical debt pet-projects that take six months, go nowhere, and there should be a drinking game for every time somebody wants to start one.

For when you hear: “CentOS upgrades first!”, “Migrate to Exadata first!” or “World peace first…”

Like any massively overwhelming task from “reduce CO2 emissions” to “lose weight”, you have to look for immediate wins that have the most visible impact. This isn’t about looking good — it’s about building confidence to know that change is possible, change is beneficial, and change is happening.

I guarantee that your business has a stack of major customer problems that have been put on the back-burner for a long time. Let’s dust-off the list of issues and commit to fixing them at the beginning of the cloud-first initiative.

For example, I once worked at a company where the cloud was having trouble taking off. They were trying to “fix stability”, which was a compound problem riddled throughout their existing systems. We quietly moved away from this amorphous goal, talked to various customer-facing groups (and actual customers, wow) and settled on three of their biggest immediate issues:

customers could not see the same products online as in the stores

customers had multiple sign-ups when dealing with different product areas and could not remember how to log-in from one to the other

customers could not figure out how to contact us when they had a problem outside the physical stores (very common)

Each of these problems were well-recognized and used to stand up three cloud-oriented teams of 8–12 people. Since these problems had existed since the creation of fire, the initial sprint involved the usual “this will never get solved” pessimism. Within 5–6 sprints, each one was fully addressed.

How could multi-year customer issues be fixed in 12 weeks? While the success was attributed to the cloud initiative, it was really the success of a team-oriented structure. The issues where well-defined, had C-level support, and ultimately had a customer’s smiling face as the reward for delivery. People like helping customers — who knew?

TL;DR — Just the facts, James!

Embrace your current failure. Your IT sucks. Look in the mirror and accept the reality — this is the first step to improvement. IT’s role is about to change dramatically so to prepare for this strategic marathon. We need to cue the Rocky music and start the pound-shedding montage.

Cloud will aggressively promote change, but organizational silos are the enemy of change. The faster you eliminate silos, the faster your cloud adoption will succeed, and the faster you start delivering customer value.

Your technology teams don’t yet understand cloud, so they will need training and certification. This converts resisters into advocates, and provides essential skills for understanding fault-tolerance, availability, automation and good security practices.

Your IT people are smart and capable, though probably beaten down. You need to trust these teams fully, and allow them to deliver up to the finish line without dealing with endless internal roadblocks. These people are critical to your success — so help them help you.

Refocusing the cloud migration and adoption efforts on actual customer problems yields surprisingly effective results. Pick a couple of well-defined customer aches (by talking to customers first, preferably) and create teams whose sole purpose is to solve those problems. Once they succeed, rinse and repeat.

With any revolution, the beginning is the most difficult time. In cloud, the technology is not the most difficult problem — the people and corporate environment are the key factors for determining whether your implementation will ultimately succeed or fail. So let’s fix it now.

How to build a cloud-first IT organization that’s as much about people as technology was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rP8FdO

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

How blockchain and serverless processing fit together to impact the next wave

As application patterns evolve to event-driven architectures, the more likely serverless and blockchain will be used together

Blockchain and Serverless Processing

At face value, blockchain networks and serverless computing appear have little in common. Serverless is stateless, blockchain is stateful; serverless is ephemeral, blockchain is persistent; serverless processing relies on trust between parties, blockchain is designed to be trustless; serverless has high scalability, blockchain has low scalability.

On closer examination, serverless and blockchain actually have a fair amount in common. For example, both are event-driven and highly distributed. Another key similarity is where they both process functions — in a layer above the infrastructure/server level. Even when the characteristics aren’t shared or similar, they are complimentary. For example, serverless is stateless — but blockchain can be used as a state machine for validating transactions.

Taking a look at the similarities and differences helps to gain a better understanding of both serverless and blockchain. The deeper analysis also informs how each of the technologies might impact this next wave of computing cycles.

Venn Diagram — Serverless and Blockchain

It all adds up … the sum is larger than their parts

Over the past few months, I’ve spent a lot of time diving into the world of blockchain. Much like serverless computing, there is a lot of pieces to consume and absorb. It took a fair amount of effort to work through the basic workflows, understand the different platforms, and relate together the various components.

I started on my blockchain discovery journey a few months ago after a conversation with Lawrence Hecht from The New Stack. During our call, we discussed how a high percentage of Alexa Voice Service/Echo applications are using AWS Lambda as their primary processing engine. In other words, Alexa plus Lambda equals application.

His hypothesis — one that I wholeheartedly subscribe to — is that serverless processing in combination with emerging API-driven platforms will be leading the way in serverless adoption.

Alexa + Lambda = Application

Certainly serverless processing will touch on almost all areas of IT computing. But this theory implies that the most rapid growth and adoption will be in combination with new platforms. In other words, new platforms plus serverless processing equals higher rates of growth.

Since these applications are starting from scratch, there is no legacy code or existing architectures to worry about. Serverless processing supports rapid development and quick access to scale. As a result, it’s a smart choice for people wanting to develop applications quickly and scale them without much additional effort.

Another driver is that new applications are often event-driven — or at least have large event-processing needs. This characteristic lends itself to microservices and functions as a service (FaaS) architectures. The new tools that serverless processing provides are perfect for handling event-driven computing patterns.

New Platform + Serverless = Many Applications

For this reason, I believe that the combination of blockchain plus serverless will have a sum far larger than their parts. The combination will be a predominant method for building and supporting blockchain-related applications — especially when it comes to private blockchain networks.

An introduction to blockchain

While exploring the world of blockchain, I found that most of the published material was either too high-level, too effusive, or dived too deep into cryptography and consensus algorithms. What was missing was a clear explanation — directed at architects and developers — that addressed the practical questions about building a blockchain-based applications for business use.

To lay the groundwork, readers should have some familiarity with the basics of serverless processing. If a refresher on serverless is needed, review my recent Thinking Serverless series, the blogs by Martin Fowler, or articles curated from the serverless community at A Cloud Guru.

Blockchain is not the same as digital currency

Digital currency is based on blockchain — and the currency exists because of blockchain technology. Digital currency transactions are essentially notarized by the processing and storage nature of the blockchain. In other words, the currency’s value is protected by the chained hashed blocks and the distribution of the networks.

It is difficult, if not impossible, for digital currency to exist outside of a distributed blockchain network. For that to happen, digital currency would have to put faith into a single trusted entity — and hope that it wouldn’t game the system, get hacked, nor inflate the monetary supply. These are options that very few adherents of digital currency are willing to trust.

Blockchain networks are powered by digital currency. Organizations that operate nodes within the network are rewarded via the use of digital currency within the system. For example, payment for processing transactions in the Bitcoin network are paid in Bitcoin. Processing on the Ethereum platform is paid via Ether — the coin used in Ethereum.

Did you know that non-currency assets can also be traded and the transactions preserved within a blockchain?

Since blockchain technology can also be used for applications separate from digital currency, there is significant activity and investment in the space. Companies are using blockchain technology to create transactional-based solutions for trading and transferring physical and digital assets in a manner that is secure and readily verifiable.

With traditional marketplaces and trading platforms, transactions are stored in a centralized ledger owned by a single entity. Blockchain platforms allow these transactions to be digitally signed and preserved as an immutable record. It also stores the transactions in a distributed ledger across multiple independent and replicated nodes.

At its core, a fully formed blockchain network becomes a mechanism for designing the rules of a transactional relationship. The blockchain network acts as programmed adjudication or final settlement, reducing the need to appeal to human institutions. As a result, the blockchain become a programmable social contract which allows for trusted, validated, and documented interactions between parties at a very low cost.

Ethereum Lightning Network and Beyond

What is a blockchain network?

Blockchain networks are comprised of near identical nodes operating in a distributed but independent manner. The network of nodes is used to validate sets of transactions and encapsulate them within blocks in a chain. At the core of the blockchain platforms is a distributed transaction processing engine that validates and cryptographically seals transactions.

These transactions are maintained in a distributed ledger that is replicated, shared, and synchronized within any participating nodes. Blockchains use cryptographic technology to create a permanent record of transactions. A set of transactions are cryptographically stored within a “block”. Successive blocks are added in a chain — secured and preserved in order using hashing algorithms.

The ledgers for public blockchains, such as Bitcoin and Ethereum, are stored in thousands of nodes across the world. Private blockchain networks, on the other hand, may only have a few nodes. Typically, any full participant within a blockchain network would want to maintain an active and operating node — ensuring the validity of the ledger independently from anyone else.

A consistent record of truth is made possible by the cryptographic and shared nature of the ledger. This is critical for addressing several of the problems blockchains are trying to solve:

eliminating the need for central source of truth

eliminating contention when parties dispute the status of transaction(s)

Blockchain is a consistent record of truth that is shared among participants in the network. It becomes the ultimate arbitrator — eliminating any version of “he said, she said” disputes.

Reprinted from BT (bt.com)

What are the blockchain platforms?

As noted earlier, there are a number of blockchain platforms. While Bitcoin is the most popular cryptocurrency, Ethereum has one of the most popular blockchain platforms — especially for purposes that go beyond just digital currency store. Hyperledger is a Linux Foundation project and has received significant support from large finance and technology institutions. It is also popular as measured by the number of projects using it and the level of community support.

Blockchain networks may be public, private, or hybrid. This means that a public transaction would be encrypted into the public ledger, while private transactions would be stored in a private ledger. Private transactions could also be stored in the public ledger, hence, the hybrid designation.

To support hybrid use cases, the Enterprise Ethereum Alliance is working hard to keep the public Ethereum network and private platform compatible. According to a source monitoring the effort, a big topic for the group is private transactions on a permissioned, or private, blockchain. Chase Bank’s fork of the Ethereum Go client (Quorum) has added private transactions — where only the sender and receiver know the details of the transaction. Compatibility with the interactions of the public chain, though, is still a driving tenet.

Digital currencies, tokens, and assets

Digital currencies are tied to particular blockchains. Transactions involving a particular currency are represented, denoted, and enshrined in the distributed blockchain for that currency.

Bitcoin transactions are handled on the Bitcoin platform and stored in a Bitcoin ledger. Ethereum transactions are be handled on the Ethereum platform and stored in a Ethereum distributed ledger. Hyperledger transactions are handled on the Hyperledger platform and stored in a Hyperledger distributed ledger. And so on.

Some blockchains feature the concept of a digital tokens as a secondary asset or currency. These assets are priced in the base digital currency. The digital tokens can most often can be used for services on a particular application or sub-platform within that blockchain platform. A look at the listings in TokenMarket will show the digital assets that are available under the Ethereum blockchain.

For example, envision an Uber digital token that could be used as currency for any Uber block-chain enabled service. The service could simply draw from any Uber digital tokens in your Uber account. The tokens could be tied to a digital currency such as Ether, or just be built on the platform and gain or lose value within the platform — as well as via speculative interest.

Digital tokens are being released in ICOs — initial coin offerings. In some ways, it’s similar to pre-registration stock offerings in the 1800s and early 1900s. Any entity can create a token and offer it for sale in ICO. The transaction would be performed in the blockchain digital currency.

A report referenced in Blockchain News indicated that one quarter or $250M of the $1B of investment raised by blockchain companies was the result of ICOs.

Digital assets simply refer to the digital or physical items at the heart of a trade. Example of assets that might be purchased and transferred include a house, a car, shares in a company, or a painting. These transactions would be registered within a blockchain, and the asset becomes the item referenced in the exchange. A shipment could also be considered an asset at the core of a set of blockchain transactions — a blockchain use case that is already in operation.

Blockchain platforms are the processing engines

All blockchain platforms contain a processing component that is a critical part of transaction assurances. The blockchain networks are set up for “miners” that competitively “forge” each successive blockchain.

The terms mining and forging are used to describe the process of validating and preserving transactions in blocks, as well as receiving new digital currency tokens in return for the work. The process of mining introduces new currency into the system at a reasonably fixed rate of frequency.

Miners are compensated for being the first to solve a block solution. This means being the first to calculate a hash for the block that is less than an arbitrary threshold set by arbitrary mechanisms of the network. Blockchain platforms are self-arbitrating with respect to setting and adjusting the thresholds — this allows the aggregated set of miners to mine blocks within specific and regular time windows.

All miners have access to each transaction. As part of forging a block, or a set of transactions, they process the code for each transaction and attempt to arrive at the hash solution.

In the article, “How Bitcoin Mining Works”, the author provides a detailed explanation of the process. It describes how each block’s hash is produced using the hash of the block before it, becoming a digital version of a wax seal. The process confirms that this block — and every block after it — is legitimate, because if you tampered with it, everyone would know.

This distributed nature of blockchain processing means that each active node can theoretically executes each transaction within the system. For example, the transfer of a public digital currency from one person to another might get processed by every node in the entire network.

In the case of Bitcoin or Ether, this could be mean over 10,000 nodes are executing the code for a given transaction. This same processing replication takes place for each type of transactions — whether it’s as simple as transfer of a digital asset or a transaction with extremely complex processing logic.

Number of Bitcoin Transactions Per Day — Theoretically executed by every node in the network Source: blockchain.info

Estimated Number of Hashes Per Second on Bitcoin Blocks Source: blockchain.info

Coding smart contracts for processing transactions

The code for each transaction is referred to as a smart contract, and the processing is referred as on-chain processing.

In the case of Ethereum, the smart contracts are written in Solidity — a high-level contract-oriented language whose syntax is similar to that of JavaScript and designed to target the Ethereum Virtual Machine (EVM).

Functions are uploaded as part of the process for creating an asset type in the blockchain, The uniformity of the language and the processing model ensures a high degree of idempotency — meaning the outcome of executing a smart contract should be the same on each node within the system.

In the case of Hyperledger Fabric, smart contract processing is performed in Go. The language is OCaml is being used for a new blockchain platform expected to be released in June 2017. According to CryptoAcademy, “OCaml [is] a powerful functional programming language offering speed, an unambiguous syntax and semantic, and an ecosystem making Tezos a good candidate for formal proofs of correctness.”

Within public blockchain networks, there is a charge for processing transactions — a cost borne by the party submitting the transaction. In the case of Ethereum, the processing charge is called the “gas” and it’s submitted as part of the transaction in the form of ether, the base currency in Ethereum.

Private blockchain networks, however, can be more flexible with respect to transaction costing methods. The operating costs could be borne by the network participants with little or no cost, or could be accounted in a manner as determined by the network participants.

Cost per Bitcoin Transaction Source: blockchain.info

Various blockchain platforms are working on improved forms for arriving at processing consensus. The current form is called Proof of Work. A proposed approach within the Ethereum network is called Proof of Stake. You can read about the proposal and debate here, here, and here.

Comparing blockchain & serverless processing

The blockchain processing model differs significantly from serverless processing. Not only do most serverless platforms support multiple languages, but the goal for serverless processing is one-time processing.

Processing the same transaction on a concurrent basis is antithetical to the serverless processing philosophy. Within blockchain platforms, however, this concurrent identical processing model is a key feature — ensuring and maintaining the blockchain as a validated and trusted source of transaction history and current state.

Given the closed nature of blockchain processing, there is no need — nor any entry way — for serverless processing of on-chain transactions for execution of smart contracts. However, there is a tremendous amount of need for processing off-chain transactions — especially in terms of setting up transactions, helping to perfect transactions, and addressing post-transaction workflows.

Why is there such a need for off-chain transactions? The reasons are because 1) on-chain processing capabilities are severely limited and 2) on-chain data storage is severely limited. As a result, off-chain data processing will need to take place for transactions that are complex and/or data-heavy.

With respect to the first issue of limited processing capabilities, on-chain transaction logic needs to be kept to a minimum in order to arrive at effective transaction throughputs. Cost mechanisms for processing transactions — ones that provide equitable rewards for processing transactions and operating the ledger — also impose costs for transactions.

Without the “gas” charged for transaction processing, transaction parties would get free rides on the network. In addition, they could easily overwhelm the processing capabilities of the network — blockchain DDOS attacks anyone?

To arrive at the optimal balance of performance, cost, and consistency for each transaction, transaction logic for blockchain applications need to adequately address both on-chain processing and off-chain processing. This also applies to addressing on-chain data and off-chain data. As a result, effective blockchain design means using the blockchain network for only for the most minimal amount of data processing and data storage necessary to perfect and preserve the transaction.

A game of chess to compare on-chain & off-chain

The separation of on-chain versus off-chain is illustrated by a recent post that describes using the Ethereum network as a mechanism for validating a game of chess.

The writer, Paul Glau, describes how it’s not practical to submit every move as a transaction into the chain. Each transaction would not only take a significant amount of time to settle, but also be potentially cost prohibitive because transaction cost. For example, a move by a player might take seconds whereas committing the move onto a blockchain might take minutes.

In assessing the problem, the team realized each move did not need to be a part of the chain. Independent arbitration — blockchain processing — is only needed to establish the start of a game as well as resolve a dispute in the game. Once a game is established, each player submits their move along with their perceived state of the game to the opposing player as a signed transaction off-chain. If the opposing player signs and submits a subsequent move, the prior move is deemed accepted.

A transaction is submitted to the blockchain only in situation where there is a dispute — a player believes there is a checkmate, stalemate, or timeout condition.

In such a situation, the processing of the smart contract would determine if such as condition was true, thereby dictating the outcome or next step — continued play, declared winner, or stalemate. It’s inevitable that questions will arise as to how complex the smart chain logic should and can be. In the case of chess, it’s proposed that a reasonably quick algorithm can be created to assess the condition within a smart contract. More complicated games such as Go, however, are another matter.

Median Time for Transaction to be Accepted into a Mined Bitcoin Block Source: blockchain.info

How about off-chain processing and data?

The split between on-chain processing and off-chain processing means that off-chain processing needs must be able to set up transactions and manage any post processing needs. Remember, blockchain provides a trusted record of transactions — but parties that are making use of that transaction don’t need to do so in an on-chain manner.

For example, a voting application that uses blockchain to verify eligibility of voters can access blockchain records for confirmation. Aside from registering a vote, any of its processing does not need to happen within a blockchain platform. Likewise, using a blockchain ledger to preserve the accident and repair history of an automobile will have incidents committed to the blockchain, but any actions performed on those incidents do not need to be done in an on-chain manner.

The limitations for storing data on-chain also has implications that directly relate to serverless processing. Any data that is supporting a transaction will need to be digitally preserved and linked as part of the transaction — like the actual contract that is recorded in the blockchain as being signed. Services are already being developed to perform this capability within several public blockchains. Tierion, for example, is performing this for Ethereum-based applications. For private blockchain networks, potential candidates for serverless processing include preparing the data, validating it, and accessing it post-processing.

The difference between what is processed on-chain versus off-chain largely comes down to trust levels. On-chain processing is designed to be trustless — meaning the parties do not have to trust each other in order to perfect a transaction. Off-chain processing performed by one party in a serverless environment is situated for cases where 1) there is no transaction effected, 2) two or more parties trust each other to forego any sort of consensus algorithm, or 3) there is a consensus algorithm in place to verify the results of the off-chain processing.

The catalyst of event-driven architectures

Blockchain and serverless processing are two independent technical innovations which are markedly different, but they share a number of things in common. While serverless is intended to be stateless, blockchain provides a publicly and independently verifiable way to maintain transactional states.

As application patterns quickly evolve to event-driven architectures, the need for independently verifiable transactional states will increase — and the more likely the serverless and blockchain will be used together. The use case for this combination is especially true in private and/or permissible blockchains where the trust level is higher, and the allowances for use of outside components and services more tolerable.

What’s next?

The next article will explore blockchain platforms and components in more depth, as well as dive into some of the issues blockchain platforms are trying to address. In addition, we’ll cover a few areas where serverless platforms can build hooks to better support the development and operation of blockchain applications.

Thanks for reading! If you like what you read, hit the ❤ button below so that others may find this. You can follow me on Twitter.

Special thanks to the following for assisting in the crafting of this post. Their collective insights and generosity in answering questions is greatly appreciated — James Poole, Paul Grau, and Robert Mccone.

How blockchain and serverless processing fit together to impact the next wave was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qLMS3I

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Focus on features — not versions — when building products in the cloud

Why are IT shops using versions to trade stability for productivity? Have your cake and eat it too by deploying features in the cloud!

After years of consuming a heavy diet of Microsoft Office and Oracle, we’ve been conditioned to think that software improvements are packaged in versions. This atomic ‘all or nothing’ upgrade comes from the days when vendors batched all the updates for a release, scripted the installation procedures, and sent you a CD that contained all the magic.

The user didn’t know whether the latest build contained a single change or 100% new code— it was just ‘the next version’. We also came to expect pain and unpleasantness from this method of upgrades. For most IT shops, upgrades are synonymous with outages, instability and irate calls — not fun.

This approach to upgrades is single-handily the most unhelpful idea to making better applications for users. We must shift this way of thinking about software changes.

I’ve been trumpeting the idea of features over versions for ages — and corporate IT leaders still look at me like I’m crazy. But do you know which version of Google search you’re using? What the current version Amazon.com’s homepage? And what are the odds that you’re using the exact same version of those pages as me?

In the world of cloud applications, nobody — nobody — is talking about versions. Features are all users care about — and with the cloud, we can now deliver without the tyranny of traditional deployment.

James’ Book Listing Service

I’ve been busy again with another unicorn venture. This time I’ve built an incredibly useful Book Listing Service that allows you to add book titles and authors, and then list them. In expectation of its success and VC funding, I’m reserving my new Tesla.

Under the covers, the Book Listing Services looks like this:

As users are prone to do, they immediately provide feedback with a list of new feature requests which they never mentioned before. Tssk, users! And the developers also injected a couple of changes as well — so my backlog is already looking like this:

The Post-Its will stop if we just shut down 3M

There are now hundreds of active paying users expecting me to start rolling out these changes — so what are my options? Traditionally, there are only a couple of ways to release changes into a production environment:

The Big Bang. We take the system offline late at night, run whatever processes are needed to make the changes, and put it back with fingers crossed so hard we’re practically cutting off the circulation.

Build a duplicate. We buy new hardware, install the latest version, see if it looks okay and then cut everyone across.

As a Product Manager, I want to roll all these features together into version 2 because multiple deployments are painful. But one of my best customers is unhappy and is demanding an immediate bug fix or she’ll stop using the product— you can only add book titles up to 50 characters. What to do!?!

Step 1: Blame the Cloud.

A feature-driven approach to the rescue

Fortunately, our best developer is a master of micro-services, a curator of the cloud, and likes nothing more than using her services sorcery to solve problems like these. She decomposes the back-end design further:

We start mapping features to required changes in the main three components and realize we can do some clever things. It turns out that our two main services are actually Lambda functions and our front-end talks to AWS’ API Manager reach them.

I call our star customer and ask if she’s interested in becoming a beta user — we’re going to make the change today and she will be the only person to receive the feature. If it looks good and doesn’t cause any issues, we’ll then roll it out to everyone.

We will allow a subset of our users to see a different version in one component of the system, while everyone else stays on the old version:

Fast forward a week and this change is promoted to all active users — and I’m now choosing the color and interior finish of my Tesla. I work with my lead developer and very soon we are flowing testers and developers onto different versions of the components so everyone sees a slightly different ‘version’ of the system:

Our developers are now working on a feature that will check the book price using Amazon’s API, but it’s working against production versions of the other services. Needless to say, our QA people are doing back-flips down the hallway.

In this model of our coding factory, the features list— the backlog — is our list of customer orders, and we keep the conveyor belt full of code moving into production in a fairly constant, continuous state. We can easily roll back changes if needed, apply automated testing to the release process, and generally feel good that we are both delivering customer value and not hurting stability. Just so much winning.

The Myth of the Next Version

Although my book listing example is very simple, you can see how we can do things very differently with service oriented architectures and the cloud. By decomposing functionality into smaller and smaller units, we can move versioning down from the overall product level down to the code level.

This approach has positive impacts on customers and product managers:

You can blend development, testing and production environments together. That’s crazy, right? Well no, not really — look at Gmail Labs features as a example. By throwing beta-level functionality to a subset of users, Google can launch testing-level features into a production space.

You can A/B test ideas. You see this on webpages all the time (or not, since it should be invisible) where companies test multiple features to see which perform better. You can adopt this approach in any other software to compare how features are used.

You can release constantly. Looking at the feature board for products I’ve worked on, they’re usually composed of customer-driven features that can often be released one at a time (or in small groups). There’s no need to wait for some “next version” — just keep pushing out the features.

You can make upgrades to systems that previously were considered untouchable (the ‘too important to fail’ problem of monolithic systems).

You can change direction more easily, making it much more compatible with agile-minded product management.

Though not new to the technical audience, I’m surprised how many business leaders have no idea we can do this.

Types of upgrade in the cloud

The Books Listing application illustrates a micro-service approach, but there are several other types of upgrade that commonly affect cloud infrastructures.

Upgrading your instances

If you’re managing your own code on EC2 instances, this isn’t too different from on-premise upgrades and has the same likelihood of success or failure. An upgrade script is running on the instance and if it works — profit! — and if not, we attempt to rollback and hope the rollback state is just like it was before.

Alternatively, and preferably, you have a stateless army of instances. You upgrade one, test it, and if it works as expected create an image to generate new clones. If it doesn’t work, you terminate it — a few pennies poorer but the lights are still on.

Amazon Machine Images (AMIs) are grossly underused among the clients I’ve worked with. You can have an inventory of different OS’s, stacks and applications at different version numbers all stored as APIs, waiting patiently to get spun up into live instances whenever you need them.

Whenever making a build change, always ask “Should I create an AMI?” before just doing it anyway.

2. Managed application upgrades

If you’re on Google’s App Engine or Amazon’s Elastic Beanstalk, these environments create a safe space to do tightly-controlled upgrades with reliable rollbacks and versioning baked in. These are almost always a better way to deploy apps than juggling EC2 instances and really handle a lot of the administrative pain for you. These are fantastic services that developers fall in love with quickly.

3. Complex project upgrades

This is where on-premise IT often shuns the cloud believers — “You could never perform an Exchange/PeopleSoft/Dynamics/[insert horrific software here] upgrade in the cloud!” they proclaim. Well, we can and we do, more quickly and with better results than the equivalent on-premise plate-spinning spectacle.

The reason is we have CloudFormation — which I’ve come to believe is the greatest thing since Python. Even if your upgrade has a mixed of server types, databases, firewalls and security changes, we can script the upgrade in CloudFormation and safely test by spinning up entire cathedrals of virtual hardware with the new version. Once it passes the tests, we just point users to the new stack. It’s a beautiful thing.

Under the covers of CloudFormation might be less sophisti-cat-ed than we hoped.

The factory that produces nothing

The reason I’m fascinated by versioning is because the process is central to getting new software out the door. As a lifelong Product Manager and entrepreneur, ‘releasing stuff’ is the most important thing a development team can do for me and, by extension, my customers.

In most IT shops, the traditional versioning and upgrade cycle becomes a reason not do things. But I’m more interested in working in a cycle that makes the lack of change the exception, not the rule. I want to ferret out anything that doesn’t change and increment its version number just to make a point. User feature requests never stop — if they do, our product stinks. And we need to keep delivering — if we don’t, our product will stink.

Technology managers often see a contention between change and stability. Their empire is a factory where widgets roll off an assembly line and anything that threatens the widget-rolling is bad. The problem is that just being operational doesn’t mean you’re producing anything of value to the customer.

We are standing up a whole operations where virtual conveyor belts run with nothing on them. But in software, the feature releases are the widgets. We need to turn this thinking on its head — when you release nothing, you produce nothing.

Keeping the lights on and ‘being stable’ is expected — it’s not an excuse for releasing nothing.

TL;DR — Let’s release features not versions

Pushing versioning from the overall application level down to the code level can have a major positive impact on agility and stability.

Using the tools available in the cloud, we have enormous flexibility in implementing a highly controlled and testable approach to versioning.

Being stable is nothing to celebrate — it’s expected that you can talk and chew gum at the same time.

Don’t be the factory that produces nothing — your customers want to see a steady stream of improvements and feature releases.

Versioning isn’t atomic — we can reveal different versions of infrastructure to different audiences and even put testing, development and production environments together if we want.

Thanks for reading! If you like what you read, hit the ❤ button below so that others may find this. You can follow me on Twitter.

Focus on features — not versions — when building products in the cloud was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qz2OX4

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Serverless Event Sourcing at Nordstrom

Building a unified event stream on AWS

Leading up to Serverlessconf Austin this year we held the inaugural Serverless Architecture competition. The competition was launched to encourage people to share their Serverless Architectures.

Hello Retail by the team at Nordstrom is a well deserving winner of the competition.

What is Hello Retail?

Hello Retail is a proof-of-concept Serverless architecture for a retail store. The team at Nordstrom built the project to experiment with Serverless Event Sourcing.

Nordstrom is an early adopter of Serverless architectures. The team has built Serverless microservices for a number of production systems. The use cases for Serverless include API backends and stream processing.

Microservices allowed Nordstrom to create smaller, cohesive services. When these microservices need to interact, services call the API of another service. But this approach creates code and operational dependencies between microservices.

Code dependencies created by calling other services creates complexity. The caller has to know which dependent services to call and how to call them. This becomes complex to manage in code as the number of dependencies grows.

Operational dependencies between services can affect performance and availability of the application. Services that are dependent on an API depend on the performance that API. Increased latencies or failures in one service will impact other services.

The solution to these problems is to reverse these dependencies by using events. Creating services that produce and consume events allows you to decouple them.

Event Sourcing is a well understood solution to this problem. But applying this solution to a completely Serverless application is new.

The Concept

The team at Nordstrom built Hello Retail with one scenario in mind: a merchant adding a product to their store.

When a product is added to the store, two things need to occur. A photographer needs to take a photo of the product. After this, customers should see the new product with the new photo in the product catalog.

The Hello Retail project solves this problem with events. The three major events in this scenario are:

New Product

New Photo

Various microservices in the system produce and consume these events. A central Event Log stream connects these producers and consumers together.

Implementing Hello Retail

The best way to understand a system that uses Event sourcing is to follow the flow of events. Hello Retail has two main event flows: photographer registration and product creation.

Photographer Registration

Hello Retail requires a database of photographers. But, the system does not have a traditional Create Photographer API. Instead, the front-end creates a Register Photographer event.

To create the event, the front-end calls an API endpoint that triggers a function. This function writes the new event to the central event stream.

A second function is listening for the Register Photographer event. This function uses the event data to write a new photographer into the database.

Product Creation

The product creation process takes this architecture a step further. This process spans multiple microservices and events.

As before, instead of a Create Product API call, to create a new product, the font-end raises a New Product event. When a New Product event is written to the event stream two functions are triggered.

The Product Service writes product information to the products and categories databases. This allows customers to view the new product in the product catalog.

The Photograph Management Service assigns a photographer to take a photo of the new product. It is important to note here that the Product Service did not make a direct call to the Photograph Management service to initiate this process.

So without a direct call, how does the Product Service know when a photo of the new product has been taken?

When the photo of the new product has been taken, the Photograph Management service creates a New Photo event. The event triggers a function in the Product Service which updates the database with the new photo.

Challenges

This architecture has many benefits as previously discussed. But there are also a number of challenges that must be overcome.

Failure Handling

Hello Retail uses a Kinesis stream as the central Event Log. The consumers of the stream process events in batches from the end of the stream.

If there is an error with the consumer, the failed batch will remain at the end of the stream. The consumer will retry processing the batch until it is fixed or the events expire (configurable up to 7 days).

In an active system, events will continue to be added to the stream while the consumer is not processing events. This creates a backlog of unprocessed events called a log jam.

An example of a log jam

Poison pill data is a common cause of log jams. These are malformed or unexpected events on the stream. These events need to be removed from the stream and stored for manual processing.

Even with careful handling of events, sometimes log jams occur. When the consumer is fixed the unprocessed events will be processed automatically. But what happens when there is a logic error in the consumer?

Replay and Log persistence

In a system using Event Sourcing there are two sets of data, the Application State and the Event Log. Unlike a traditional system, it is the Event Log, not the Application state, that is the critical data to manage.

A system that is employing Event Sourcing should be able to rebuild the entire Application State from the Event Log at any time. Version control and an accounting ledger are examples of systems that use Event Sourcing.

So what happens when there is a logic error in a consumer? After the logic error is fixed, old events can be replayed through the consumer. The fixed consumer can then rectify the application state.

Hello Retail does not maintain a historical log of events. As a result, events cannot be replayed through consumers. This architecture needs a mechanism to persist events and replay events.

Eventual Consistency

All events in Hello Retail are processed asynchronously. This introduces eventual consistency into all reads in the system.

Eventual consistency can be challenging to handle correctly. Systems with eventual consistency requires a user experience that reflects this characteristic.

Securing Events

A central Event Log presents interesting security challenges. A central Event Log will include events that contain private information. In a production system, Microservices may only be authorised to access a subset of events or event data.

A system to protect events and event data will be required to take this proof-of-concept to production. Nordstrom is investigating a system to encrypt data on the stream. Controlling the ability to decrypt data will allow Nordstrom to control which services can access events.

Wrapping up

This project solves a common problem teams encounter when adopting microservices. It is a great starting point for Event Sourcing in a Serverless architecture.

The team at Nordstrom needs to solve three problems before this is production ready.

Improvements to handling of poison pill data — a system to catch and store bad events in the stream

Stream persistence and replay

Securing sensitive data in events

I am confident the great team at Nordstrom will be able to develop solutions to these problems.

I want to thank the team at Nordstrom for creating Hello Retail and sharing it with the community. It is a great example of applying a well understood architectural pattern to a Serverless project.

What’s Next

If you are interested in diving deeper into this project you can view the code on Github or take the Hello Retail workshop.

I also recommend watching this presentation by Martin W. Fowler if you are unfamiliar with Event Sourcing.

The Nordstrom serverless team is hiring talented developers with a passion for learning and trying new things. If you’re interested, drop them a line at [email protected] and let them know what you think of Hello, Retail!

If you want to read more on Serverless don’t forget to follow me @johncmckim on Twitter or Medium.

Serverless Event Sourcing at Nordstrom was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rlsARE

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Don’t strangle your monolith when migrating to the cloud — starve it to death

Cloud is key to reinvigorating your IT infrastructure, but you won’t get very far until you stop the behavior that feeds your monolith

Google’s home page

AirBNB for Fitness

Imagine that you’ve secured VC funding for a new concept called Muscle Unbound. Silicon Valley refers to your concept as the AirBNB for fitness — homeowners can rent out their exercise equipment when it’s not being used.

In preparation for the big summer launch, you’ve started deploying your cloud architecture and finalizing the design of a mobile UX. The entire platform is coming together so quickly that Werner Vogels is calling you on Chime about a keynote presentation at re:Invent — and now is leaving messages on your Alexa.

Back in reality, you’re the technical lead for a national gym chain with 1000 locations. The company is planning to introduce the exact same concept — but you have to make it work with the existing on-premises systems.

Your company stores all transactional data in an Oracle nightmare, accounting data in PeopleSoft, member logins through a third-party application, your product data arrives lazily through mainframe batches, and there’s a security governance team approving code releases monthly.

Every one of these platforms will be touched as part of your project implementation. Can you hear that sound? That, my friend, is the airy sound of the candles burning on your retirement cake and all hope evaporating — long before version one is ever rolled out the door.

Welcome to the Monolith

This experience is common to anyone who’s worked in a reasonably large company — with the added pleasure of being security-slapped and Docker-blocked to the point where all-day meetings seem productive. After a while, you become obsessed with the idea there’s a better way. You’ve read about large organization like Google, Amazon, Facebook and others shipping code like a start-up, although they have the advantage of employing more engineers than Starbucks has baristas.

While trying to find a better way, you might’ve heard about using micro-services to strangle the monolith. While it’s inspiring, it’s not at all clear how you get there. Even with the cloud’s latest suite of goodies, it’s hard to strangle something you can’t get your hands around — and monolith fights back. Surprisingly, a monolith’s survival usually has more to do with the people than the technology involved.

The behavior and interaction of teams is a big driver of dysfunctional infrastructure design. When something fails, whatever the management group lacks in knowledge they often make up for with loud opinions. Based on how individuals and teams are incentivized, the technical teams will often decide that it’s much safer for their careers to release less often, resist change, and avoid failure at all costs. The dynamic of the two groups results in monolith-buiding.

The mainframe never died. It’s still here.

Oh, how we laugh at those old school companies with their IBM contracts. We picture how they have one special room with tons of air conditioning housing one big computer. We mock how it’s nursed by an army of middle-aged, well-dressed engineers who still use a Casio FX calculator and a pencil. Too smart for a job at the Geek Squad and too scared of heights to become cable installers — they are the sworn protectors of the mainframe.

In reality, we all build mainframes everywhere we go — no matter how small an application starts. The team adds features, bolts on unexpected interfaces, and lasso crap around something that once was nimble. The monolith is a virtual mainframe — it’s an unmovable black box of ordered chaos that always arises out of corporate systems.

You laughed at the actual mainframe programmer. But now you’re policing who gets to interface with your system, planning downtime — and dammit — you’ve got a pencil in the top corner pocket of your pressed white shirt. Richard Matheson would be proud.

When the tech moved from mainframe to client-server, then to n-tier and mobile, the premise was to move the work away from some central source. The process of decentralization itself supposedly breaks apart this rigidity, magic happens, and cue the end credits. But it doesn’t, it hasn’t, and it won’t.

The virtual mainframe of having one central system is now spread across lots of machines. It’s still there — hardening itself with every passing day. How did it survive when we thought we watched it die? How is this happening all over again?

“Wireless”

Why does everyone build a monolith?

I’ve pondered this question extensively while pretending to watch The Crown. There are three things I’d like to mention as background in my evolving theory of why monoliths occur:

Conway’s Law, which says your infrastructure design mirrors your organization’s communication structure. So, pyramidal bureaucratic hierarchical structures produce hierarchical IT systems. A friend who worked at Dell said internally they were organized around enterprise, consumers and students. Hence when you want to buy a laptop on their website, you’re forced to declare yourself one of these first. Conway, by the way, was brilliant and well worth reading.

Human nature, which likes to simplify complicated things by keeping similar things together and expanding the surface area of entities that seem to work well. Like your house? Add a garage. Like your customer database? Keep adding columns. Attaching onto things that already work seems to the right thing to do but in IT it creates the ball of mud.

Software development preaches loose-coupling and code reuse but it rarely happens after the first version of something is out of the door. Code gets added on and kluges happen increasingly often until it becomes fragile and untestable. Object-oriented techniques all too frequently decompose into a wicked-slick object model that works primarily because of a God object in the middle. That’s a baby monolith right there.

This pattern repeats itself in businesses over and over. You raise the golden goose, it’s laying eggs, you keep feeding it. Over and over. The architecture always ends up looking the same — a hub-and-spoke diagram with little boxes emanating from the giant monolith in the middle. Here we go again.

All it needs is a spider in the middle.

But why does it have to stop?

While we’ve been feeding our own private monster, the IT world has gone from mainframe and in-house architecture to open source, cloud-based and disparate solutions, all while the rate of change has accelerated. The fundamental fact is that this monolith will never work with these newer paradigms, and your company will never be able to keep up with customer technical demands.

I’m convinced this is why new start-ups are effectively trouncing the old dinosaurs — it’s not that they have beanbags in the office and pajama days on Fridays.

Why did Lemonade think of AI-centric insurance claims that are processed in 3 seconds and not Geico? Geico has a monolith — managed by a British lizard — which prevents radical change.

Why isn’t a single major traditional retailer beating Amazon? You can rest assured that the heart beating in the middle of Sears, Macy’s and Nordstrom’s is a cold, concrete monolith that will never be delivering the hundreds of features a day that Amazon is shipping.

That’s why. And beanbags.

Backpain prevents monolithism.

The 5 Steps to Starving The Monolith

Getting passed this problem requires some rethinking of how things work because we cannot build truly distributed, agile systems this way.

1. Commit to starving the monolith

Don’t kick the can down the road and decide you’ll just add some more technical debt this one time. From now on, the monolith doesn’t get feed. That’s it. I know your Kanban board is growing relentlessly and you want a promotion, but we have to draw the line in the sand today.

Conway (brilliant, remember) also observed in systems design behavior that there’s never enough time to do it right but always enough time to do it over. So let’s just do it right for once.

2. “Two pizza teams”

Amazon is the only large company I’ve known that slayed the monolith violently and directly. And from the Amazonians I know, it sounds like the two-pizza team concept was key.

Let’s steal that. In your project, get the right 8–10 people together and own every single part of your solution. Don’t depend on a dozen other teams and getting prioritization in their queues because dinosaurs will be walking the earth again before your customers get any software. And let’s not wheel out the usual excuses of who’s going to be upset.

3. Build generically

When you’re building the shipping label system for your company, imagine it’s actually a start-up for shipping labels that will have thousands of external users. There is no existing monolith to connect into. You have to build everything you need to support your user base and their wide variety of systems. On your virtual private island of pristine code beaches, only a handful of APIs will connect to these systems of which you know nothing. Make those APIs rock.

If you’re not convinced, think about PayPal — they have a widget embedded on millions of websites and successfully manage payments with no idea about how any of their customer operate technically. Make sure you are always building the consumable widget or service that doesn’t know how its consumers work. Be RESTful and use the standards out in the wild that will help you.

4. Learn to embrace eventual consistency

In science fiction, HAL, the Matrix and the Terminators were all monoliths — they were single systems that knew everything going on real time. But notice how they could never get any upgrades out? The T-800 series was excellent at pursuing the Connors, but in reality Skynet could never have deployed a successful mobile app.

In our new world, our independent systems will be slightly out of sync with each other but that’s okay. Only when we realize that we don’t need to know the exact number of sticks of spaghetti in every retail store can we allow the zen-like feeling to wash over us. We are going to be building lots of small systems with their own independent data stores that don’t always know the score … just yet.

5. Use your cloud superpowers

Working on-premise encourages you to repeat the same behaviors. Move your code to GitHub. Work remotely. Use Slack. Try decomposing into serverless functions. The sheer lack of compatibility between these approaches and monolithic behaviors is the beginning of the revolution. It will feel odd at first — but as you build more and more away from the old, it will start to decay and die.

[Spoilers!] How does this end …

Companies that have monoliths sporadically realize the technical noose is tightening and occasionally launch initiatives that clearly come from business people and not developers. Their 5-year plans to slowly migrate away lose steam after 6 months. The noose tightens. And the plan to document the old system and build the new either duplicates the problem in a different version and newer hardware, or it never gets funded because the consultant’s analysis was so expensive.

Starving the monolith ultimately leads to dozens, hundreds and thousands of other systems, functions and processes that slowly but surely take over. The King is Dead, but there’s no single point when the heart stopped, and no single point when the revolution started — we just put the monolith out to pasture and stopped feeding.

Don’t strangle your monolith when migrating to the cloud — starve it to death was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rHhJCk

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

What’s the Community Saying About Serverless?

The Top 5 Blogs: Written by the community, for the community

Node is the wrong runtime for serverless — Python is a more natural fit compared to the complexity of async code (3 min read) Ben Kehoe

The serverless approach to testing is different and may actually be easier — but requires a different approach and tools (7 min read) Paul Johnston

Engineers at OpsGenie are using AWS Lambda for new product features and moving existing apps to serverless (8 min read) sezgin k. karaaslan

Environmental sympathy with development & operations is the only way to gain confidence with serverless (6 min read) Matt Weagle

Fourthcast uses AWS Lambda to host Alexa skills and haslearned a lot of pitfalls and valuable lessons (6 min read) Mitchell Harris

New Course! Learn how to build scaleable and available serverless applications using The Serverless Framework with GraphQL.

Teaching the world to cloud

What’s the Community Saying About Serverless? was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2r4Ut0a

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Service Discovery as a Service: The missing serverless lynchpin

Service Discovery as a Service is the missing serverless lynchpin

Changing a functions dependent resources after deployment is a critical step towards feature parity with traditional architectures

When we talk about serverless architectures, we often talk about the natural fit between serverless and microservices. We’re already partitioning code into individually-deployed functions — and the current focus on RPC-based microservices will eventually change into event-driven microservices as serverless-native architectures become more mature.

We can generally draw nice microservice boundaries around components in our architecture diagrams. But that doesn’t mean we can actually implement those microservices in a way that achieves two important goals: 1) loose coupling between services, and 2) low end-to-end latency in our system.

This blog is the first part of the series exploring the missing pieces to achieve a vision for loosely-coupled, high-performance serverless architecture using AWS as an avatar for all serverless platform providers..

Missing Piece #1: Service Discovery as a Service

In this post, I’ll focus on loose coupling. In particular, I propose that the lack of Service Discovery as a Service as part of a provider’s ecosystem causes customers to implement their own partial solutions.

I’ll define loose coupling as the ability to change the resources that a given function uses after deployment. There are two important use cases for this:

circular dependencies between services — meaning that all the resource names generally cannot be known until after deployment

the ability to update a microservice without requiring redeployment of the dependent microservices

Serverless deployments without Service Discovery as a Service

In serverless deployments without Service Discovery as a Service, the functions exist in the same namespace. They are connected to other functions within their deployment through environment variables, which are fixed at deployment time. Updating one function requires an update/deploy to all callers — and every function must be deployed with the full physical resource ids that it uses.

Service discovery allows us to keep our code from having to know exactly where to find the resources it depends on. An important part of this is the service registry, which gives us the ability to turn a logical name (e.g., UsersDatabase) into a physical resource id (arn:aws:dynamodb:us-east-1:123456789012:table/UsersDatabase-MVX3P).

If this mapping is known at deployment time, serverless platform providers generally have a way of including it in the deployment; for example, environment variables in AWS Lambda functions. But these mechanisms don’t allow for change without redeploying the function, so they don’t fulfill our need.

My experience has been that everybody who has implemented a serverless system has built their own way of solving this — which is pretty much the definition of undifferentiated heavy lifting. Any remote parameter store that is updatable at runtime will suffice. The EC2 Systems Manager parameter store is a good option.

At iRobot, we have solved this by using our tooling to inject a DynamoDB table into every deployment (i.e., it writes it into the CloudFormation template) to act as a runtime-updatable key-value store. The auto-generated name of this table is injected into each Lambda function’s environment variables using the CloudFormation resource support for env vars.

Service discovery in a traditional architecture

With service discovery in a traditional architecture, the service registry provides a mapping from logical name (e.g., “A”) to a physical resource id (e.g., v1.a.domain in Deployment 1, v2.a.domain in Deployment 2). The isolation by VPC or subnet provides separation between the deployments.

Separation of Environments

To step back a bit further, there’s another advantage provided by service discovery mechanisms in traditional microservice architectures: separation of environments.

Infrastructure as a Service offerings like on EC2 have comprehensive mechanisms for separating groups of resources. On AWS, this is accomplished at the highest level with Virtual Private Clouds (VPCs), which, as the name implies, completely partition EC2 resources into separate silos. Within a VPC, subnets can be used to further isolate instances from each other.

This separation is leveraged to create independent sets of service discovery information, such that the service discovery information itself can have a well-known name, rather than also needing some sort of lookup. For example, it can be accomplished through DNS, which works because the networks of different VPCs are isolated, so the DNS lookups for the same name in each can have different results.

Another option is a configuration manager like Zookeeper, etcd, or Consul — which works because the configuration manager deployments in different VPCs don’t know about each other. As result they don’t conflict, but have a well-known name within each VPC/subnet.

As noted by Martin Fowler, this separation isn’t currently present in any provider’s offering. On AWS, Lambda functions can be run in a VPC, but that is heavy-handed and complicated just to gain logical separation between the functions. This means that, for whatever remote parameter store is being used, there still needs to be a mechanism for separating those parameters between deployments.

With EC2 Systems Manager parameter store, this means the Lambda functions need to understand prefixing, and that prefix needs to be delivered to the function through its env vars. For iRobot’s solution, we create a DynamoDB table with each deployment, inject its name into an environment variable in every Lambda, and we have a library, injected into each packaged Lambda code, that uses it as a parameter store.

Azure actually provides this capability in Azure Service Fabric, but it is currently not available for use with Azure Functions.

Service Discovery as a Service

Service Discovery as a Service tags functions and make a non-namespaced call to the service discovery service (e.g., Get(“A”)), which uses the tag to index into the namespace (e.g., Env1). At deployment time, the functions need only be tagged with an immutable identifier.

Service Discovery as a Service

The functionality that is really needed is a new feature or service as part of the providers’ platforms. We need Service Discovery as a Service (SDaaS) — or more precisely, Service Registry as a Service.

What would this look like? I see it as relatively simple; a key-value store with multiple distinct namespaces. But the crux is this: when making a Get call, the namespace is chosen based on some property of the caller, rather than selected explicitly. Of course, explicit selection would also be available.

For example, a standalone version of this service could use the IAM role of the caller. This would have the added advantage of being usable by server-based implementations as well. A version integrated into AWS Lambda could leverage the recently-added tagging functionality.

To be fully functional as SDaaS, the service would have to allow phase rollouts of changes to the namespace selections. That is, it should support blue-green updates to the values that a given caller receives.

Whatever form this service takes, it would eliminate the need for customers to build their own solutions, allowing them to focus on the tasks specific to their needs and reducing the barrier to entry in the serverless space. As a critical step towards feature parity with traditional architectures, Service Discovery as a Service is the missing lynchpin for serverless.

Update: Tim Wagner, the GM of AWS Lambda and API Gateway, asked some good questions and I wrote a long response that forms an appendix to this post.

Service Discovery as a Service: The missing serverless lynchpin was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qlvLpa

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

A vision for loosely-coupled, high-performance serverless architecture

A vision for loosely-coupled and high-performance serverless architecture

Explore the key missing pieces required to achieve loose coupling between services and low end-to-end latency in your systems

Shel Silverstein, The Missing Piece

In this series of posts, I’ll explore the key missing pieces using AWS as an avatar for all serverless platform providers.

Missing Piece #1: Service Discovery as a Service

Service Discovery is an essential part of a modern microservice architecture. The lack of Service Discovery as a Service as part of a provider’s ecosystem causes customers to implement their own partial solutions.

Missing Piece #2: Asynchronous RPC architectures

Because FaaS is billed by execution time, time spent waiting is money wasted — and synchronous invocation of other functions means double billing. However, despite steps in the right direction, asynchronous call chains are not sufficiently supported by providers’ platforms.

Missing Piece #3: Event-driven architectures

Event-driven architectures are a more natural fit for FaaS and serverless, but there are key difficulties with existing services, such as limited fanout and lack of checkpointing support, that prevent robust implementations.

Missing Piece #4: Microservices and API Gateways

On the cloud side, a microservice should control the APIs it exposes to other services and to clients. On the client side, there should be one cloud endpoint exposing an API that brings together all the services. These two goals are in conflict; existing API gateways don’t facilitate a good solution.

Missing Piece #5: Deployment

The ability to perform a controlled, phased rollout of new code is essential to operations at scale. Existing serverless platforms don’t provide this functionality at either the FaaS or API gateway level, and we need it in both places.

Missing Piece #6: Inter-service permissions

Permissions in serverless architectures are highly dependent on the providers’ IAM systems, which may use some mix of role-based access control, policy-based access control, and perhaps other schemes. These can present difficulties by coupling together infrastructure components between microservices.

Missing Piece #7: Serverless availability zones

In IaaS, availability zones allow customers to build resiliency in the face of provider incidents without incurring the high overhead of cross-region architectures. Serverless platforms are usually region-wide and therefore resilient in the face of incidents in the underlying IaaS, but they need an availability-zone like concepts to allow customers to be resilient in the face of software problems in the serverless platform itself.

A vision for loosely-coupled, high-performance serverless architecture was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rlEv31

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Build an Alexa Skill to “Speak Up!” for a social cause and win a lifetime subscription

Use the Alexa voice platform to amplify messages of social change and worthy causes into the homes of millions of users worldwide

“Those who are crazy enough to think they can change the world usually do.” ― Steve Jobs

Be a Voice of Change to Millions of Alexa Users

A Cloud Guru will teach you how to build an Alexa skill so you can help change the world. Join the Alexa “Speak Up!” Challenge by simply publishing a skill that amplifies a positive message or cultivates awareness and understanding of a cause — Speak Up!

Get inspired by finding a cause that you care about — consider building a skill that advocates for a non-profit, connects the local community, or supports individuals fighting a personal battle.

Alexa an incredible platform for amplifying messages of social change and justice into the homes of millions of users worldwide.

Amazon owns 70 percent of the voice-enabled speaker market, and analyst predict that more than 35 million people in the US will use one of these stand-alone devices at least monthly in 2017.

How to join the Alexa ‘Speak Up Challenge

The Alexa ‘Speak Up!’ Challenge is open to developers worldwide.

Publish your Alexa skill to Amazon’s US or UK skill store

Submit the entry form below anytime between June 1st — August 18th

Follow A Cloud Guru on both Medium and Facebook

The Judging Criteria for the Alexa Challenge

The panel consists of several Alexa Champions — individuals formally recognized by Amazon as some of most engaged developers and contributors in the community.

25 points: The Voice User Interface (VUI) Design (best practices)

25 points: A Voice-First User Experience (VUX) (considerations)

25 points: Integration with AWS services and APIs (example)

25 points: Champion a cause, create awareness, and positvely impact the community. “Speak Up!” about what matters most to you and let Alexa amplify your voice to millions of households worldwide.

The Prizes

The winners will be announced on our Facebook page on August 31st!

1st Place: Lifetime Subscription to A Cloud Guru + Amazon Echo Show

body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}

Introducing the all-new Echo Show: https://t.co/FcqHNUkzb5

— @alexadevs

2nd Place: Two-Year Subscription to A Cloud Guru + Amazon Tap

3rd Place: One-Year Subscription to A Cloud Guru + Amazon Dot

Bonus: Special prizes will be awarded to top skills that are affiliated with an AWS or Alexa user group.

Don’t Forget Your Free T-Shirt!

Every month, Amazon offers developers of Alexa Skills a free T-shirt once they publish a skill. All you need to do is fill out their form with the name of your published skill and submit to Amazon!

body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}

There's a new Alexa #developer t-shirt for May! Publish a skill, get a shirt. #nodejs #code templates available: https://t.co/PTAaYXt8IA 👕

— @alexadevs

Resources to You Get Started

Below are a few resources to help you publish your first Alexa Skill!

Don’t have an Echo? The Alexa Skill Testing Tool (EchoSim.io) is browser-based interface that allow developers to test their skills in development.

A Free Introduction to Alexa: The “Alexa Course for Absolute Beginners” allows anyone to learn how to build skills for Alexa. The beginner guide to Alexa will walk you through setting up an AWS account, registering for a free Amazon Developer account, and then building and customizing two Alexa skills using templates.

Dive Deeper with Alexa Development: A Cloud Guru also offers an extended version of the course for developers that want to extend their skills. Learn how to make Alexa rap to Eminem, how to read Shakespeare, how to use iambic pentameter and rhyming couplets with Alexa, and more.

AWS Promotional Credits for Alexa Developers: Developers with a published Alexa skill can apply to receive a $100 AWS promotional credit and can also receive an additional $100 per month in AWS promotional credits if they incur AWS usage charges for their skill — making it free for developers to build and host most Alexa skills.

User Groups: Join a local user group! The AWS User Group in South Wales and the Alexa User Group in Richmond, Virginia (RVA) are both offering free workshops on building Alexa skills. As a bonus, anyone that attends their Alexa workshop and publishes 3 skills in 30 days will also receive a free Amazon device!

Complete the form to submit your entry!

http://ift.tt/2qSS88O

Build an Alexa Skill to “Speak Up!” for a social cause and win a lifetime subscription was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qSHezO

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Scaling the serverless summit requires environmental sympathy with dev & ops

The only way to gain confidence that a feature branch will work in the cloud is to run it in the cloud — with environmental sympathy

In the wake of Serverlessconf 2017 in Austin, there’s been an increasing number of discussions about today’s cold reality of serverless. While we can see the glory of serverless manifesting in the not-too distant future, the community still finds it difficult to test, deploy, debug, self-discover, and generally develop serverless applications.

The discussion has been amplified in recent days with tweet storms and the great threads on the Serverless Slack channel from Paul Johnston that prompted this post. The common sentiment is that the difficultly with serverless gets more acute when developing applications composed of multiple sets of functions, infrastructure pieces, and identities evolving over time.

On the one hand, the serverless approach to application architecture does implicitly address some of the high-availability aspects of service resiliency. For instance, you cloud assume — without empirical evidence — that AWS transparently migrates Lambda execution across Availability Zones in the face of localized outages. This is unlike a more traditional VM/container model, where you must explicitly distribute compute across isolated failure domains and load balance at a higher logical level (e.g. ELB and ALB).

While this intrinsic reliability is undoubtedly a good thing, overall resiliency isn’t so easily satisfied. Take for instance the canonical “Hello Serverless” application: an event-based thumbnailing workflow. Clients upload an image to an S3 bucket, a Lambda function handles the event, thumbnails the image, and posts it back to S3. Ship it.

Except, how do you actually test for the case when the S3 bucket is unavailable? Or can you? I’m not thinking of testing against a localhost mock API response, but the actual S3 bucket API calls — the bucket you’re accessing in production, via a dynamically injected environment variable.

Another example is when you have two Lambda functions, loosely coupled. The functions are blissfully ignorant of one another, although they share a mutual friend: Kinesis. In this use case, “Function A” publishes a message, perhaps with an embedded field whose value is another service’s event format (like an S3 triggering event) that’s consumed by “Function B”. While there’s no physical coupling, there’s potentially a deep logical coupling between them — one which might only appear at some future time as message contents drift across three agents in the pipeline.

How can we guard against this? How can we be certain about the set of functions which ultimately defines our service’s public contract?

Are they coherent? Are the functions secure? Resilient? Correct? Observable? Scalable? How can we reduce uncertainty around non-functional requirements?

body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}

Serverless is an implementation detail, not an architectural pattern.

— @mweagle

The non-functional requirements of serverless

The great thing about non-functional requirements is that they’re … non-functional. They speak to a system’s characteristics — how it should be — not what it should do, or how it should be done. In that sense, non-functional requirements both have nothing and everything to do with serverless.

The slide from Peter Bourgon’s presentation on the microservice toolkit for Go

The slide above is from Peter Bourgon’s excellent presentation on the design decisions behind go-kit, a composable microservice toolkit for Go. The concerns listed apply equally to a JVM monolith, a Go-based set of microservices, or a NodeJS constellation supported by FaaS. If you’re running something in production, those *-ilities lurk in the shadows whether or not they’re explicitly named.

In that sense, serverless is less a discontinuity with existing practice and more the next stage in the computing continuum — a theme emphasized in Tim Wagner’s closing keynote. It’s a technique that embeds more of the *-ilities into the vendor platform itself, rather than requiring secondary tools. Serverless enables us to deliver software faster and with fewer known unknowns — at least those that are externally observable.

Although serverless offloads more of these characteristics to the vendor, we still own the service. At the end of the day, each one of us is responsible to the customer, even when conditions change. We need to own it. And that means getting better at Ops. Or more specifically — cloud-native development.

Charity Majors does an excellent job describing the operational best practices for serverless

The Base Camp — “Works on My Machine”

For many of us, the end result of our furious typing is in many cases a cloud-native application. In more mature organizations, our software constructs go through a structured CI/CD pipeline and produce an artifact ready to ship. This artifact has a well-defined membrane through which only the purest configuration data flows and all dependencies are dynamic and well behaved.

On a day-to-day basis, though, there is often a lot of bash, docker-compose, DNS munging, and API mocks. There is also a lot of “works on my machine” — which may be true, at least at this instant — but probably doesn’t hold for everyone else on the team. And it definitely doesn’t provide a lot of confidence that it will work in the cloud.

The only way to gain confidence that a feature branch will work in the cloud is to run it in the cloud.

Operations is the sum of all of the skills, knowledge and values that your company has built up around the practice of shipping and maintaining quality systems and software. — Charity Majors, WTF is Serverless Operations

If everyone on the team is developing their service feature branch in the cloud, complete with its infrastructure, then we’re all going to get better at ops. Because it’s development and ops rolled together. And we’re all going to share a sense of Environmental Sympathy.

To the Summit — From #NoOps to #WereAllOps

Environmental Sympathy, inspired by Mechanical Sympathy, is about applying awareness of our end goal of running in the cloud to the process of writing software.

While it’s always been possible to provision isolated single-developer clusters complete with VMs, log aggregators, monitoring systems, feature flags, and the like, in practice it’s pretty challenging and expensive. And perhaps most aggravating, it can be very slow. Short development cycles are critical to developer productivity and that’s not really a hallmark of immutable, VM-based deploys.

Serverless, precisely because it’s so heavily reliant on pre-existing vendor services and billed like a utility, makes it possible for every developer to exclusively develop their “service” in the cloud.

The service can have its own persistence engine, cache, queue, monitoring system, and all the other tools and namespaces needed to develop. Feature branches are the same as production branches and both are cloud-native by default. If during development, the *-ilities tools prove too limiting, slow, or opaque, developer incentives and operational incentives are aligned. Together we build systems that make it easier to ship and maintain quality systems and software. Which will also help to minimize MTTR as well.

Serverless, for both financial and infrastructure reasons, makes it possible to move towards cloud-native development and Environmental Sympathy. It represents a great opportunity to bring Dev and Ops — and QA, and SecOps) together. This allows us to mov from “worked on my machine” to “works in the cloud — I’ll slack you the URL.”

From #NoOps to #WereAllOps.

Scaling the serverless summit requires environmental sympathy with dev & ops was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2rFik4J

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

My personal journey to the cloud from my first job on a trading floor to startups

“Hi, my name is James. I’m a self-confessed cloud-oholic and I’ve been off-premise for 7 years now.”

Please do not share this picture with anyone … it’s was my pre-cloud late 90’s look.

There’s no self-help group for someone like me — technology is in my blood. I owned my first computer when I was 5, and from then onward I collected technology like most kids had stuffed toys.

I started writing for a local computer magazine at 12, then a national UK magazine. When I got to college, I had a regular column on a British platform called Oracle Teletext.

“You’ll never need more than 16KB of RAM.” Let me just type that reminder into my 4GB Pixel….

It would have been easier to do cool kid things and hang out at the mall but spending every waking hour thinking about all the problems that can be fixed with machines kept me too busy.

My first job on a trading floor

After earning a CS degree at college, starting work as a software developer was a jarring experience. Having to work in teams of developers is not something they taught at school, nor did they train anyone on how to make sense of ancient convoluted systems. At school, you start from a blank screen and build something beautiful all by yourself, which very rarely happens at work.

If you show this picture to ANYONE… oh wait, it’s the Internet.

My group built electronic trading software for hedge funds, portfolio managers, and the like. Most trading happens at the open and close like a twice-daily Black Friday that stresses the back-end. Our infrastructure was killing us as the platform grew and the amount of data choked any attempt to solve the problem without taking the system offline.

Best of all, our users were traders — and they were assholes on a good day. Their needs changed constantly and were communicated telepathically. They completely lost it when during an outage and had no tolerance for bad data, missing functions or sloppy UI.

Smart, volatile and demanding users — working for them ended up being the very best training for today’s user base, and it taught me several life lessons:

User don’t know what they want. At least, not the final version of what they want. They can read the road for the next hundred feet but anything beyond the headlights is unknown. The paradox is we can only know what to do build in the short term, yet our scaffolding has to work forever.

Developers underestimate how much effort is involved. The one constant across all developers is a total failure to estimate to how long something will take. 80% completion happens quickly and then the last 20% takes forever, if at all. You pad their estimates, your boss pads your estimates and the delivery is still late.

Stability is everything. Your system should not fail. Do not, under any circumstances, allow it to fail. But if you do fail, every single outage must be investigated and remedied so it never happens again.

On Error Resume Next: Product Management

In the mid-2000s, I moved to the Bay Area and worked in startups for several years as a Technical Product Manager. This was during the phase when everybody who previously had an idea for a website now had ideas for mobile.

In the Bay Area, a Product Manager is a coder who is yelled at by customers and also produces road-maps nobody uses.

As we moved into mobile, our users were regular people with cell phones, and our competitors were either well-funded startups or the established technical luminaries. Our development teams were much smaller, budgets were tighter and yet our epic aspirations didn’t seem to notice we were horribly equipped for success.

Mobile made scaling problems insurmountable for start-ups — buying new servers sucked up budgets, configuring load balancers and database replication wasted development time that should have been spent perfecting the UI. And investors and founders, usually bored with the the grind of their real jobs and attracted to the gold rush, were on a mission to become the next billion dollar app with no revenue and an army of users.

At the time, there was iOS, Android, Windows and BlackBerry, all using different frameworks and languages, and it looked like these could fragment further. We were trying to put together apps that are essentially a dozen screens which could have been built as a .NET desktop app in a day. And yet we did manage to release apps, solve problems and build some businesses.

I learned:

You don’t know enough. Your team’s knowledge has gaps in networking, security, scaling, electrical engineering, machine code, you name it. When you face problems that veer into these areas, it’s like quicksand for your product. Developers like tough problems and have curious minds, so these types of issues are a siren’s call.

Complexity is death to progress. When your team owns all the pieces, they write complex code that locks systems together. But when developers can only use APIs to talk to other systems and don’t know how they are implemented, they write simple code that makes the system modular.

Dreams aren’t code. If you can’t make your idea function in a spreadsheet or a flowchart, it cannot be built in code, no matter how simple the investor or VC says it is.

Understanding the problem domain is key to building good solutions.

Discovering a better way

Sometime around 2010 it became clear to me that as a development group, we could confidently write solid applications running on machines in the same building. But deployment was difficult — and once apps hit production they weren’t performing as well.

We had been using some cloud apps for a while but hadn’t seriously used AWS until it became absolutely necessary. A client app had started to gain momentum and we didn’t have the money to scale up on-premise, so we became AWS users very quickly. It was a fortuitous but mildly alarming moment to realize we didn’t have any alternatives — but it quickly became the de facto way to build our products.

I had some lightbulb moments during this time:

Infrastructure is hell. It brings out the inner tinkerer in everyone, and it’s a distraction that stops you writing code. You also can’t manage it well no matter how hard you try. So don’t.

Dev-staging-prod doesn’t work. It’s not sophisticated enough, doesn’t stop bugs reaching the customer and ultimately just provides an illusion of quality. Every service needs versioning at every stage with incoming traffic routed accordingly.

Agile is beautiful. We were doing it while also doing waterfall because that was considered professional. When I read the Agile Manifesto I almost wept — I knew this was how we would build software from now on.

What happens in Vegas … becomes a career

In 2012 I attended the very first AWS re:Invent conference in Vegas and that changed everything. Witnessing the entire ecosystem around the platform, it was obvious that many people had been grappling with the same issues and there were a slew of great solutions available.

There was a haunting question about why nobody else was offering this — Amazon was the only game in town and either they were incredibly prescient or we were all being gleefully over-optimistic about this whole cloud thing. This lag continued for years — it gave AWS a 6-year lead over its competition which is why its capabilities still smoke the competition.

In our shop we weren’t the first to the cloud by any measure but we embraced it wholeheartedly. Within 6 months there were a number of unexpected side-effects:

We became truly agile. Our users still didn’t know what they wanted and the devs still underestimated the work, but the dynamic in building products had changed. We could spin on a dime and make radical shifts without blowing the house down — or blowing the budget up.

The things we didn’t understand well were understood for us. Cloud took many of the computer sci-ency problems away and solved them. This allowed us to focus on building only the apps and our productivity (and profitability) sky-rocketed.

Our apps became really good. Many weren’t popular and didn’t survive investment rounds but they were extremely stable, scalable and looked like the products of a much bigger team. I cried for the apps that didn’t make it.

My future as a Technical Product Manager in cloud

In using cloud solutions as the backbone to all the products I’ve worked on, I’ve had to step up my technical game constantly. It’s not enough to be a Product Manager with road-maps and wire-frames — I need to know reliable patterns and trusted practices to create the best technical architecture.

This has meant constant training, taking on programming projects and learning new frameworks as the environment changes. It’s also meant making a commitment to conferences and workshops, which has become an automatic line-item in my budget.

On the business side, cloud has given me the confidence to assess viability and likely cost, predict timeframes more reliably and help business partners understand where the business ideas and the technology meet. In many ways, the concepts between agile, cloud and lean are so intertwined that I often think they are different views around the same thing.

Fail fast, waste little, learn constantly and always deliver customer value — cloud is central to making this work.

There are still a few road bumps

There are still plenty of naysayers. I worked for some more traditional companies after the California days and it was like jumping in the DeLorean and setting the clock to ‘Fail’.

They all grappled with an aging, fragile, expensive IT infrastructure that delivered limited business value and had no hope of helping them innovate or differentiate in the future. Those companies are waiting for a generation of executives to retire and competitive threats to reawaken the appetite that once made them giants.

There are also the fakers in the industry, the ones who for years dismissed cloud, laughed at Amazon and claimed it could never work. Now they scramble to promote their own clouds with the same limited tools and restrictive contracts they had on-premise.

The me-too players like Oracle serve to bring the laggards into the cloud ecosystem but they offer nothing fundamental or game-changing to the technology. 5 years ago they said cloud wasn’t secure and now they say only their clouds are safe, so I suppose fear can drive sales in anything.

But I live by mantra “Go where you are celebrated, not tolerated.” I’m not here to convince yesteryear’s IT professionals that our industry’s change is accelerating geometrically. I’m here because I’m committed to using the cloud and its toolbox to build the next generation of software that solves the next round of problems. I want to get to machine learning and AI, and move from onClick to onPrediction — the cloud is where all of this will happen.

So that’s my story. Most of us geeky kids who grew up with computers didn’t become Steve Jobs or Jeff Bezos but it’s been an amazing ride. The opportunities are everywhere and the future has never been brighter. My name is James. I’ve been a self-confessed cloud-oholic for the last 7 years. I don’t think that’s ever going to change.

My personal journey to the cloud from my first job on a trading floor to startups was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qBkBh3

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

Serverlessconf Austin ’17 in Photos

Check out some pictures of the presenters, attendees, booths, events, and donuts from the hottest conference in the cloud!

On April 26–28 we hosted the 4th conference on serverless technologies and architectures in Austin, TX. Serverlessconf was attended by 450 serverless aficionados who listened to 35 fantastic presentations.

You can watch all the presentations on our YouTube channel right now!

Serverlessconf

Here’s a small collection of photos from the conference. Be sure to check out the Imgur album to see all of the pictures from Serverlessconf!

A short teaser before the start of the conference. The conference was held at the Zach Theatre in Austin, TX.

The Topfer Theatre MC, Mike Chambers, takes the stage and welcomes attendees.

Welcome note from Sam Kroonenburg and Peter Sbarski. We had custom donuts on Day 1 of the conference. They were good. Real good (but not better than our speakers and their talks).

Before the start of the conference we ran a competition to find great serverless architectures. Here Sam and Peter announce the winner — Hello Retail by the Nordstrom team.

Austen Collins, Serverless Inc, delivers the first keynote of the day.

John Gossman, Microsoft, delivers the second keynote.

430 packed the Topfer theatre for the opening keynotes.

I told you we had custom donuts! These were delicious and so very popular.

A Cloud Guru had an amazing t-shirt printing machine in the sponsor’s pavilion. Free custom-made swag for all attendees.

The A Cloud Guru t-shirts were made to order!

DJ Gatsby entertained everyone during lunch and at the conference party on Thursday.

The last donut picture. I promise.

Conference t-shirts were a hit!

Our sponsor Google Cloud gave out delicious Firebase hot sauce as swag. I think I took 3 bottles.

After a break Sam Kroonenburg, A Cloud Guru, spoke about building a serverless startup.

Dragos Haut, Adobe, spoke about the benefits of OpenWhisk.

Donna Malayeri & Chris Anderson, @micr, described what a life is like for an Azure Serverless developer.

We had BBQ for lunch on Thursday. It was so good. And there’s that Firebase sauce!

I wish I could go back…

Our sponsor’s pavilion was big enough for everyone to have lunch, for sponsor booths, our DJ and entertainment.

Our second track was held in a second theatre, Kleberg, which was a thrust stage. A thrust stage is where the audience surrounds you on 3 sides. We had 3 massive 80 inch monitors set up so that everyone could see the presentation.

Aaron Kammerer, iRobot, spoke about building serverless ops for the robot army. Cool stuff!

Mike Roberts, Symphonia, delivered a popular talk on combining Agile and Serverless in the modern age.

Our returning champion Paul Johnston, Movivo, delivered a great talk titled, “Less is more — thoughts from a pragmatic CTO”.

Rob Gruhl, Nordstrom, showed how they do things serverless-ly at Nordstrom.

Our two speakers, Ryan Brown and Sam Kroonenburg, discussing something important. Probably.

Srini Uppalapati, CapitalOne, told the audience how a big bank is leading the way in cloud adoption and implementation of serverless architectures.

David Pollak, Funcatron, spoke about porting existing .NET/Java apps to Serverless.

Lynn Langit, AWS Community Hero/Cloud Architect/Author, spoke about Serverless SQL queries with Amazon Athena.

Chris Anderson, FaunaDB, spoke about the cool new kid on the block (i.e. their serverless database).

A powerful talk from Michael Corning, UbiModo, that gained a lot of praise.

Tom Myers, Accenture, spoke about Serverless in the enterprise.

It’s 5.30pm and after a day of talks it’s time to relax and have a bit of a party!

See, we didn’t just have donuts. We had veggies too.

All smiles :)

Women Who Code Austin had a great presence at the conference!

Our awesome friends from Japan flew for many hours to be there!

The Microsoft team had a great booth and presentation!

The AWS team had a fantastic presence at the event.

And IBM had a very popular booth too.

Lots of networking and great conversations took place. Everyone was so friendly!

Accenture was a sponsor of Serverlessconf — thank you!

The team from IOpipe were awesome. Check out their insights and service!

Meeting new friends from Spotinst (they flew from Israel)!

Did I tell you how much swag there was? Thank you Serverless Inc.!

Attendees wanted to stay and chat — that was a great feeling (for a conf organizer).

Just hangin’. Peter Sbarski, Jason McGee, Ryan Kroonenburg and Sam Kroonenburg

Anne-Jeanette was one of several attendees from our sponsor Capital One DevExchange!

We also had a great camera crew that took interviews and photos.

The A Cloud Guru team was working hard at the t-shirt printing press.

Charity Majors, Honeycomb, delivered an ultra-popular talk on operations in the Serverless era.

Bret McGowen and Mike McDonald from Google delivered a talk about peanut butter and jelly. And, also about serverless architectures with Google Cloud Functions and Firebase.

Guy Podjarny, Snyk, spoke about security and Serverless.

The second day there was more food (believe it or not).

This was very good. I think I had too many extra brownies.

Jared Short, Trek10, nailed his presentation on Single Page Apps and Serverless backends.

Florian Motlik, Cloudthropology, spoke about infrastructure tooling.

Marcia Villalba, Rovio, spoke about pains and gains of migrating to Serverless.

Forrest Brazeal, Infor, spoke about migrating from SWF to Step Functions. He also performed a rap at the end!

Ben Kehoe, iRobot, spoke truth about what’s missing at the moment.

Andreas Naurez & Michael Behrendt, IBM, told us about new developments in the OpenWhisk land.

Randall Hunt, AWS, delivered a super entertaining presentation on lesser known AWS services.

Tim Wagner, AWS, made a heroic effort to attend the conference after numerous delays and flight cancellations. He closed the conference with a bang. He also managed to use a blender on stage.

The full team. The lust hurrah!

And one more group photo of the awesome A Cloud Guru team!

So that brings us to the end of Serverlessconf Austin 2017. Our sincere thanks to our speakers, attendees, and sponsors who made this conference so interesting and exciting. We love the passion in our community. It makes Serverlessconf a lot of fun to organize and run.

I also want to thank the amazing A Cloud Guru team who worked extra hard to make Serverlessconf Austin ’17 a special event. Thank you all for your hard work and infectious enthusiasm.

With that, I hope to see you at the next Serverlessconf!

Serverlessconf Austin ’17 in Photos was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qxlCZl

#A Cloud Guru - Medium

0 notes

mikegchambers · 8 years ago

Text

The serverless approach to testing is different and may actually be easier

Discovering the inherent advantages for testing smaller bits of uncoupled logic requires a different approach — and tools

I’ve been thinking a lot about testing recently. At work we have recently increased the number of our lambda functions by a significant amount due to additions of client applications and addition of features. This isn’t a massive deal to develop new features, but there has been something that has been beginning to bug me (if you’ll excuse the pun).

Testing is a “good thing”

I’m all for creating tests. Whether it’s true “Test Driven Development” — or whatever the testing methodology du jour is now — is immaterial to me. Sometimes in a startup, you just have to deploy something fast, and write a test later (I know, I know — but I’m just giving people who’ve never worked in a startup the real world scenarios). And sometimes, the tests never get written because you think that your use case is already caught (it isn’t).

Often tests get written because of bugs occurring in the production environment. This will always occur unless you have endless money and time — which you won’t in a startup.

Tests are vitally important.

But if you’re using the prevailing testing wisdom — serverless is hard.

Testing interactions with “service-full” architecture

Serverless architecture uses a lot of services — hence why some prefer to call the architecture “service-full” instead of serverless. Those services are essentially elements of an application that are independent of your testing regime.

An external element.

A good external service will be tested for you. And that’s really important. Because you shouldn’t have to test the service itself. You only really need to test the effect of your interaction with it.

Here’s an example …

Let’s say you have a Function as a Service (e.g. Lambda function) and you utilise a database service (e.g. DynamoDB). You’ll want to test the interaction with the database service from the function to ensure your data is saved/read correctly, and that your function can deal with the responses from the service.

Now, the above scenario is relatively easy because you can utilise DynamoDB from your local machine, and run unit tests to check the values stored in the database. But have you spotted something with this scenario? It’s not the live service — it’s a copy of it. But the API is the same. So, as long as the API doesn’t change we’re ok, right?

To be honest, I’ve reached a point where I’m realising that if we use an AWS service, the likelihood is that AWS have done a much better job of testing it than I have. So we mock the majority of our interactions with AWS (and other) services in unit tests. This makes it relatively simple to develop a function of logic and unit test it — with mocks for services required.

This is similar to when using a framework such as Rails. You shouldn’t be testing that the ORM works. That’s the ORM maintainers job, not yours. So it stands to reason that if a service provides an interface and documentation about how the interface works, then it should be fine — right?

Hopefully…

What about other parts of testing — beyond unit tests?

Here’s where there is a problem with serverless… sort of. Unit tests are easy with a FaaS function because the logic is often tiny. There is a tendency to an over reliance on mocks in my view but it works.

All other forms of testing are hard. In fact, I’d say we’ve possibly moved into needing a different paradigm to discuss this.

Through years of building monolithic applications, we’ve got absolutely obsessed that certain types of testing are absolutely vital — and if we don’t have them we’re “wrong”.

So let’s just step back a bit.

We’ve actually been having the discussion about distributed systems and testing for a while. The microservice patterns have shown us that it’s not always appropriate and often expensive to try to test everything in the way we do a monolith.

The key for integration testing with a microservice pattern is that you test the microservice and it’s integration with external components. Which is interesting, because you’re still imagining some sort of separation here.

In Lambda, in this context, every single Lambda needs to be treated as a microservice then for testing. Which means that your function’s unit tests (with mocks) need to be expanded to integration tests by removing the mocks, and using the actual service or stubbing the service in some way.

Unfortunately not every external service is easily testable in this way. And not every service provides a test interface for you to work with — nor do some services makes it easy to stub themselves. I would suggest that if a service can’t provide you with a relatively easy way to test the interface in reality, then you should consider using another one.

This is especially true when a transaction is financial. You don’t want a test to actually cost you any real money at this point!

Going beyond unit and single function integration tests

For me, the easiest way to test a serverless system as a whole is to generate a separate system in a non-linked AWS account (or other cloud provider). Then make every external service essentially link to a “test” service, or as best we can limit our exposure to cost.

This is how I’ve approached it — and it relies on Infrastructure as Code to make it happen. Hence, the use of something like Terraform or CloudFormation.

But interestingly, when you go beyond a single function like this in a microservice approach, you get onto things like component testing and then system testing. Essentially testing is about increasing the test boundary each time. Start with a small test boundary and work out.

Unit testing, then integration, and so on …

But interestingly, our unit tests are doing the job of testing the boundary of each function reasonably well, plus doing the unit test, and also testing the function’s relation to external services reasonably well. So the next step is to test a combination of the services together.

But since we’re using external services for the majority of our interactions, and not invoking functions from within functions very often, then the test boundaries are actually relatively uncoupled.

Hmm… so basically, the more uncoupled a function’s logic is from other function’s logic, the closer the test boundary is as we move outwards in tests.

So after good unit and integration tests on a Function by Function basis, what comes next? Is it simply end to end testing next? This becomes really interesting, since that means testing the entire “distributed system” in a staging style environment with reasonable data.

Wait! Did we just … ?

Basically, what it seems to happen with a Function as a Service approach is that the suite of tests seem a lot simpler than you would normally do with a monolithic or even a microservice approach.

The test boundaries for unit testing a FaaS Function appears to be very close to an integration test versus a component test within a microservice approach.

Quick Caveat: if you do lots of function to function invocations, then you are coupling those functions and then test boundaries will change. Functions invoking functions make a separate test boundary to worry about.

Which comes back to something else very interesting. If you build functions, and develop an event driven approach utilising third party services (e.g. SNS, DynamoDB Triggers, Kinesis, SQS in the AWS world) as the event connecting “glue” — then you may be able to essentially limit yourself to testing the functions separately and then the system.

Hmm … so testing is simpler?

Not exactly, but close.

I would suggest the system testing is harder. If you’re purely using an API Gateway with Lambdas behind it, then you can use third party tools to test the HTTP endpoints and build a test suite that way. It’s relatively understood.

But if you’re doing a lot of internal event triggering, such as DynamoDB triggers setting of a chain of events and multiple lambdas, then you have to do something different. This form of testing is harder, but since everything is a service — including the Lambda — it should be relatively simple to do.

The person that builds the tool for this kind of system testing with serverless will do very well. At present, the CI/CD tools we have and testing tools around it are not (quite) good enough.

Testing and Serverless is different

When I started thinking about this article, I was expecting to figure out a lot of things around how to fit better testing regimes into our workflow.

As this article has come together, what’s happened is an identification of why serverless approaches are different to monolithic and microservice approaches. As a result, I’ve realised the inherent advantages for testing of smaller bits of uncoupled logic.

You can’t just drag your old “Testing for Monoliths Toolbox” into the serverless world and expect it to work any more.

Testing in serverless is different.

Testing in serverless may actually be easier.

In fact, testing in Serverless may actually be easier to maintain as well.

But we’re currently lacking the testing tools to really drive home the value. — looking forward to when they arrive.

Some final thoughts

I’m often a reluctant test writer. I like to hack and find things out as I go before building things to “work”. I’ve never been one of the kinds of people to force testing in any scenario so I may be missing the point in some of this. There are definitely people more qualified to talk about testing than me, but these are simply thoughts on testing.

Additional Resources

Testing Strategies in a Microservice Architecture

bliki: TestPyramid

Just Say No to More End-to-End Tests

The serverless approach to testing is different and may actually be easier was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.

from A Cloud Guru - Medium http://ift.tt/2qtKWzD

#A Cloud Guru - Medium

0 notes