#location of aws ec2 bootstrap scripts in an ec2 instance | Explore Tumblr posts and blogs

iasflaherty · 7 years ago

Text

Creating a Chef Automate Workflow Pipeline

My company's Chef Automate workflow pipelines were designed as a part of a middleware infrastructure project. The project required three auto scaled instances each sitting behind their own AWS ELB. The project enlisted the services of five teams, each with their own specialization for the project. The Infrastructure team created AWS Cloudformation Templates (CFTs), CFT cookbooks, VPCs, security groups and ELBs. The middleware team created the cookbooks for the respective instances including the underlying base cookbooks which will be utilized by or company for future projects. The QA team created and iterated upon smoke and functional testing for single instances and their communication between other instances. Finally, the Security team determined the compliance testing necessary for instances' and helped create proper security testing which would stop pipelines should servers fall out of compliance. When designing the infrastructure and procedures for my company's Chef Automate workflow pipelines we came across a number of hurdles. First, when provisioning instances via our CFT cookbook, the nodes are bootstrapped with chef client with a user data script. After chef client is installed, via the script, the nodes will run their first-boot.json. This contains the name of the cookbook for the the current project pipeline. If the recipe fails, however, during the initial bootstrapping process the node will not be attached appropriately to chef server. This bootstrapping process is a necessary component for AutoScaled instances. If new instances are booted, as a part of an AutoScale procedure, those nodes will require a bootstrap procedure be run with the latest cookbooks. Therefore, testing of the cookbook will need to be independent of the CFT deployment steps. In order to bypass this issue my company developed a pipeline that calls on, not only, our internal CFT provisioning cookbook but also test kitchen for our acceptance nodes.

By using kitchen-ec2 we are able to converge and destroy our cookbooks in acceptance to verify their viability before passing them to our user data script. This is made easier with the inclusion of the delivery-sugar cookbook. Delivery-sugar contains resources that allow for the creation, convergence and destruction of EC2, Azure, Docker and Vsphere instances using the delivery_test_kitchen resource. My company is currently calling on kitchen-ec2 for instance creation. EC2 currently requires ALL of the following components to run successfully.

Test Kitchen Setup (Acceptance Stage Provisioning):

In order to enable this functionality please perform the following prerequisite steps. Add ALL of the following items to the appropriate data bag within your Chef Server

You can convert the private key content to a JSON-compatible string with the following command.

chef exec ruby -e 'p ARGF.read' automate_kitchen.pem >> automate_kitchen.json

Since the private key should be secured this data bag should be encrypted. In order to add an encrypted databag to the chef server you must first have proper access to the chef server, which is necessary for a knife command to be run. After this permission is in place you must run the following command.

knife data bag create delivery-secrets -- --secret-file encrypted_data_bag_secret

Where is the name of your enterprise, is the name of your organization and is the current name of the pipeline you are creating. In order to decrypt this data the encrypted_data_bag_secret file, used to encrypt the data bag, must be added to your Chef Build servers at the following location.

/etc/chef/

Once these components are deployed Customize your kitchen YAML file with all the required information needed by the kitchen-ec2 driver driver. NOTE This kitchen.yml file will be the one found in your .delivery/build_cookbook and not the one found under your project cookbook

Delivery-sugar will expose the following ENV variables for use by kitchen.

KITCHEN_INSTANCE_NAME - set to the - values provided by delivery-cli

KITCHEN_EC2_SSH_KEY_PATH - path to the SSH private key created from the delivery-secrets data bag created from the step above.

These variable may be used in your kitchen YAML like the following example: Once the prerequisites are in place you can use delivery_test_kitchen within your .delivery/build_cookbook/provision.rb to deploy instances through test kitchen. Trigger a kitchen converge and destroy action using Ec2 driver and pointing it to .kitchen.ec2.yml in delivery. NOTE: When adding a repo_path my companychooses #{workflow_workspace_repo}/.delivery/build_cookbook/. This is by preference and the location of the .yml file can sit wherever the user requires. Trigger a kitchen create passing extra options for debugging Trigger a kitchen create extending the timeout to 20 minutes.

Since we are only using kitchen in our acceptance node my company must add the following logic to verify test kitchen is not used outside of the acceptance stage. (workflow_stage is a resource provided by delivery-sugar)

Version Pinning

The second issue we were presented with in creating our workflow pipelines came in the pinning of our environments.

If using base cookbooks for multiple projects, pinning should not be done on the base cookbook itself. Since cookbooks are pinned at an environment level if the base cookbook is pinned at the environment and then updated, that base cookbook update will in effect alter all projects using it in that environment (acceptance, union, rehearsal delivered. To prevent this pinning from taking place, through workflow, under

.delivery/build-cookbook/provision.rb

comment out

delivery-truck::provision

In turn if we version pin only the role cookbook at the environment level, being project specific, any changes made to the role cookbook should not have an effect on any other project.

This does mean that in order for a base cookbook to be updated in a project its version must be changed in the role cookbook. So for every underlying cookbook change the role cookbook will need to be version bumped. This is a much more manual process, but it will provide protection from projects breaking with a change to one base cookbook. This also has the added benefit of version controlling any version bumps we have in our environments for a given projects node. Since the only version pins in an environment fall on the role cookbook, all other changes to versions should be controlled through the role cookbooks metadata and delivery cli commands. These commits can be tied back to individual users and version changes which will better stabilize the environments.

The leading measure in Workflow, if base cookbooks are not project specific, should sit with role cookbooks. These cookbooks should be used to provision servers, and version pin underlying cookbooks, when going through the Union, Rehearsal and Delivered stages of the Chef Automate Workflow to separate project version pinning.

Setting up Metadata.rb, Provision.rb, Kitchen.yml and Berksfile in .delivery/build_cookbook

NOTE: before adding the workflow provisioning steps to the build_cookbook please add the project cookbook to the chef server, Either through automate workflow or through a knife command. If the project cookbook is not available upon the first run of the pipeline it will fail when trying to download cookbooks With these two problems resolved, and explained, it is now time to setup the rest of our workflow pipeline. We will start by modifying our Berksfile within .delivery/build_cookbook/. Since we will be calling on cookbooks that are currently stored in the chef server we will need to make sure that the workflow pipeline can reach out to it to find cookbooks. We do this by adding the chef server source

Next, we will modify our metadata.rb file. We need to make sure we are calling in delivery-sugar, delivery-truck, the current project cookbook for the pipeline and the cookbook we are using to provision our servers.

NOTE: We only need to call the provisioning cookbook here if this is the role cookbook

We will also configure our kitchen.yml (which we have named here as kitchen.ec2.yml) as we described in the steps above. This file is used for our kitchen converge and destroy in our acceptance provisioning stage.

NOTE: do not forget to change the cookbook we are calling in the kitchen.yml to reflect the current cookbook we are sitting in. (See run_list)

Finally, we will modify our provision.rb file. Depending on our environment (role cookbook vs base cookbook or wrapper cookbook. Please see the documentation for version pinning for further explanation).

In a ROLE cookbook We will call upon the provisioning cookbook if we are in the union, rehearsal or delivered stage. This check can be made using the delivery-sugar resource workflow_stage which will call the current stage the pipeline is currently running in.

We will also call on the delivery-truck::provision cookbook to pin our environment.

NOTE: the delivery-truck::provision recipe is included AFTER the run of our provisioning cookbook, See the section on versioning for more information)

If we are NOT IN A ROLE COOKBOOK delivery-truck::provision will not be called. We will also not need to include the recipe for provisioning in union, rehearsal or delivered. To keep things simple, and to prevent us from having to make too many modifications to our code, we will simply add a warning message in place of the provisioning cookbook includes.

Once these changes are saved we can version bump our project cookbook, either through the metadata.rb file or the delivery command, and run delivery review. NOTE: this version bump is done in the PROJECT COOKBOOK not the build cookbook.

This will push the cookbook into Automate and start the Chef Automate Workflow Pipeline.

via Blogger http://ift.tt/2hSnFFJ

#Blogger #.delivery #berksfile #build-cookbook #chef automate #cookbooks #delivery-sugar #delivery-tr

0 notes

globalmediacampaign · 5 years ago

Text

Migrating a Neo4j graph database to Amazon Neptune with a fully automated utility

Amazon Neptune is a fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. You can benefit from the service’s purpose-built, high-performance, fast, scalable, and reliable graph database engine when you migrate data from your existing self-managed graph databases, such as Neo4j. This post shows you how to migrate from Neo4j to Amazon Neptune by using an example AWS CDK app that utilizes the neo4j-to-neptune command-line utility from the Neptune tools GitHub repo. The example app completes the following tasks: Set up and configure Neo4j and Amazon Neptune databases Exports the movies graph from the example project on the Neo4j website as a CSV file Converts the exported data to the bulk load CSV format in Amazon Neptune by using the neo4j-to-neptune utility Imports the converted data into Amazon Neptune Architecture The following architecture shows the building blocks that you need to build a loosely coupled app for the migration. The app automates the creation of the following resources: An Amazon EC2 instance to download and install a Neo4j graph database, and Apache TinkerPop Gremlin console for querying Amazon Neptune. This instance acts both as the migration source and as a client to run AWS CLI commands, such as copying exported files to an Amazon S3 bucket and loading data into Amazon Neptune. An Amazon S3 bucket from which to load data into Neptune. An Amazon Neptune DB cluster with one graph database instance. Running the migration Git clone the AWS CDK app from the GitHub repo. After ensuring you meet the prerequisites, follow the instructions there to run the migration. The app automates the migration of the Neo4j movies graph database to Amazon Neptune. After you run the app successfully, you see an output similar to the following screenshot in your terminal: Record the values, such as NeptuneEndpoint, to use in later steps. The app provisions the Neo4j and Amazon Neptune databases and performs the migration. The following sections explain how the app provisions and runs the migration, and shows you how to use the Gremlin console on the EC2 instance to query Neptune to validate the migration. Migration overview The AWS CDK app automates three essential phases of the migration: Provision AWS infrastructure Prepare for the migration Perform the migration Provisioning AWS infrastructure When you run the app, it creates the following resources in your AWS account. Amazon VPC and subnets The app creates an Amazon VPC denoted by VPCID. You must create Neptune clusters in a VPC, and you can only access their endpoints within that VPC. To access your Neptune database, the app uses an EC2 instance that runs in the same VPC to load data and run queries. You create two /24 public subnets, one in each of two Availability Zones. EC2 instance A single EC2 instance denoted by EC2Instance performs the following functions: Download and install a Neo4j community edition graph database (version `4.0.0`), Runs AWS CLI commands to copy local files to Amazon S3 Runs AWS CLI commands to load data into Neptune Runs Apache TinkerPop Gremlin commands to query and verify the data migration to Neptune S3 bucket The app creates a single S3 bucket, denoted by S3BucketName, to hold data exported from Neo4j. The app triggers a bulk load of this data from the bucket into Neptune. Amazon S3 gateway VPC endpoint The app creates a Neptune database cluster in a public subnet inside the VPC. To make sure that Neptune can access and download data from Amazon S3, the app also creates a gateway type VPC endpoint for Amazon S3. For more information, see Gateway VPC Endpoints. A single-node Neptune cluster This is the destination in this migration—the target Neptune graph database denoted by NeptuneEndpoint. The app loads the exported data into this database. You can use the Gremlin console on the EC2 instance to query the data. Required AWS IAM roles and policies To allow access to AWS resources, the app creates all the required roles and policies necessary to perform the migration. Preparing for the migration After provisioning the infrastructure, the app automates the steps shown in the diagram below: Create a movie graph in Neo4j The app uses bootstrapping shell scripts to install and configure Neo4j community edition 4.0 on the EC2 instance. The scripts then load the Neo4j movies graph into this database. Export the graph data to a CSV file The app uses the following Neo4j Cypher script to export all nodes and relationships into a comma-delimited file: CALL apoc.export.csv.all('neo4j-export.csv', {d:','}); The following code shows the location of the saved exported file: /var/lib/neo4j/import/neo4j-export.csv As part of automating the Neo4j configuration, the app installs the APOC library, which contains procedures for exporting data from Neo4j, and edits the neo4j.conf file with the following code so that it can write to a file on disk: apoc.export.file.enabled=true The app also whitelists Neo4j’s APOC APIs in the neo4j.conf file to use them. See the following code: dbms.security.procedures.unrestricted=apoc.* Performing the migration In this phase, the app migrates the data to Neptune. This includes the following automated steps. Transform Neo4j exported data to Gremlin load data format The app uses the neo4j-to-neptune command-line utility to transform the exported data to a Gremlin load data format with a single command. See the following code: $ java -jar neo4j-to-neptune.jar convert-csv -i /var/lib/neo4j/import/neo4j-export.csv -d output --infer-types The neo4j-to-neptune utility creates an output folder and copies the results to separate files: one each for vertices and edges. The utility has two required parameters: the path to the Neo4j export file (/var/lib/neo4j/import/neo4j-export.csv) and the name of a directory (output) where the converted CSV files are written. There are also optional parameters that allow you to specify node and relationship multi-valued property policies and turn on data type inferencing. For example, the --infer-types flag tells the utility to infer the narrowest supported type for each column in the output CSV as an alternative to specifying the data type for each property. For more information, see Gremlin Load Data Format. The neo4j-to-neptune utility addresses differences in the Neo4j and Neptune property graph data models. Neptune’s property graph is very similar to Neo4j’s, including support for multiple labels on vertices, and multi-valued properties (sets but not lists). Neo4j allows homogeneous lists of simple types that contain duplicate values to store as properties on both nodes and edges. Neptune, on the other hand, provides for set and single cardinality for vertex properties, and single cardinality for edge properties. The neo4j-to-neptune utility provides policies to migrate Neo4j node list properties that contain duplicate values into Neptune vertex properties, and Neo4j relationship list properties into Neptune edge properties. For more information, see the GitHub repo. Copy the output data to Amazon S3 The export creates two files: edges.csv and vertices.csv. These files are located in the output folder. The app copies these files to the S3 bucket created specifically for this purpose. See the following code: $ aws s3 cp /output/ s3:///neo4j-data --recursive Load data into Neptune The final step of the automated migration uses the Neptune bulk load AWS CLI command to load edges and vertices into Neptune. See the following code: curl -X POST -H 'Content-Type: application/json' -d ' { "source": "s3:///neo4j-data", "format": "csv", "iamRoleArn": "arn:aws:iam:::role/", "region": "", "failOnError": "FALSE" }' For more information, see Loading Data into Amazon Neptune. Verifying the migration After the automated steps are complete, you are ready to verify that the migration was successful. Amazon Neptune is compatible with Apache TinkerPop3 and Gremlin 3.4.5. This means that you can connect to a Neptune DB instance and use the Gremlin traversal language to query the graph. To verify the migration, complete the following steps: Connect to the EC2 instance after it passes both the status checks. For more information, see Types of Status Checks. Use the value of NeptuneEndpoint to execute the following commands: $ docker run -it -e NEPTUNE_HOST= sanjeets/neptune-gremlinc-345:latest At the prompt, execute the following command to send all your queries to Amazon Neptune. :remote console Execute the following command to see the number of verticies migrated. g.v() .count() The following screenshot shows the output of the command g.V().count(). You can now, for example, run a simple query that gives you all the movies in which Tom Cruise acted. The following screenshot shows the intended output. Cleaning up After you run the migration, clean up all the resources the app created with the following code: npm run destroy Conclusion Neptune is a fully managed graph database service that makes it easy to focus on building great applications for your customers instead of worrying about database management tasks like hardware provisioning, software patching, setup, configuration, or backups. This post demonstrated how to migrate Neo4j data to Neptune in a few simple steps. About the Author Sanjeet Sahay is a Sr. Partner Solutions Architect with Amazon Web Services. https://probdm.com/site/MTg5MzY

0 notes