#portmap | Explore Tumblr Posts and Blogs

terabitweb · 5 years

Text

Original Post from Talos Security Author:

By Warren Mercer and Paul Rascagneres.

Introduction

Cisco Talos recently observed attackers changing the file formats they use in an attempt to thwart common antivirus engines. This can happen across other file formats, but today, we are showing a change of approach for an actor who has deemed antivirus engines perhaps “too good” at detecting macro-based infection vectors. We’ve noticed that the OpenDocument (ODT) file format for some Office applications can be used to bypass these detections. ODT is a ZIP archive with XML-based files used by Microsoft Office, as well as the comparable Apache OpenOffice and LibreOffice software.

There have recently been multiple malware campaigns using this file type that are able to avoid antivirus detection, due to the fact that these engines view ODT files as standard archives and don’t apply the same rules it normally would for an Office document. We also identified several sandboxes that fail to analyze ODT documents, as it is considered an archive, and the sandbox won’t open the document as a Microsoft Office file. Because of this, an attacker can use ODT files to deliver malware that would normally get blocked by traditional antivirus software.

We only found a few samples where this file format was used. The majority of these campaigns using malicious documents still rely on the Microsoft Office file format, but these cases show that the ODT file format could be used in the future at a more successful rate. In this blog post, we’ll walk through three cases of OpenDocument usage. The two first cases targets Microsoft Office, while the third one targets only OpenOffice and LibreOffice users. We do not know at this time if these samples were used simply for testing or a more malicious context.

Case study No. 1: ODT with OLE object and HTA script

The first campaign we’ll look at used malicious ODT documents with an embedded OLE object. A user must click on a prompt to execute the embedded object. We saw attackers use this methodology to target both Arabic and English-speaking users.

In both campaigns, the OLE Object deployed an HTA file and executed it:

The two HTA scripts downloaded a file on top4top[.]net. This website is a popular Arabic file-hosting platform:

The two campaigns downloaded a remote administrative tool (RAT). In the Arabic campaign, the payload is the longstanding NJRAT malware. The C2 server in this case is amibas8722[.]ddns[.]net, which pointed to an Algerian ISP:

RevengeRAT was the payload in the English campaign, with its C2 server hidden behind the portmap platform (wh-32248[.]portmap[.]io). The PE is stored in registry and executed with a scheduled task and PowerShell script:

The operating mode is similar to the one we previously published here. In both cases, the same RAT with the same patches, the payload is stored in the registry, a PowerShell script decodes and executes it and, finally, the Portmap platform hides the final IP of the attacker infrastructure. Based on these elements, we assess with medium confidence that these two cases are linked by the same actor or framework.

Case study No. 2: ODT with OLE object and embedded malware

In the second case, the ODT file also contains an OLE object:

$ unzip -l 80c62c646cce264c08deb02753f619da82b27d9c727e854904b9b7d88e45bf9e Archive: 80c62c646cce264c08deb02753f619da82b27d9c727e854904b9b7d88e45bf9e Length Date Time Name --------- ---------- ----- ---- 39 1980-01-01 00:00 mimetype 1540 1980-01-01 00:00 settings.xml 805 1980-01-01 00:00 META-INF/manifest.xml 1026 1980-01-01 00:00 meta.xml 491520 1980-01-01 00:00 Object 1 17784 1980-01-01 00:00 ObjectReplacements/Object 1 3354 1980-01-01 00:00 content.xml 6170 1980-01-01 00:00 styles.xml --------- ------- 522238 8 files

Again, this document requires user interaction. The OLE execution writes “Spotify.exe” to the victim machine, which is clearly not the legitimate Spotify platform executable. This .NET binary deflates a new binary stored as a resource. The new PE is a new binary packed with a multitude of different packers such as Goliath, babelfor.NET and 9rays.

Once all the layers are unpacked, the final payload is AZORult. We can see the infamous strings of this stealer in the final binary:

Case study No. 3: ODT with StarOffice Basic

We also discovered a third campaign that targeted OpenOffice and LibreOffice, but not Microsoft Office. In this case, the attackers used the equivalent of macros in Microsoft Office documents in the StarOffice Basic open-source software. StarOffice Basic’s code is located in the Basic/Standard/ repository inside the ODT archive:

$ unzip -l 525ca043a22901923bac28bb0d74dd57 Archive: 525ca043a22901923bac28bb0d74dd57 Length Date Time Name --------- ---------- ----- ---- 0 2019-08-19 12:53 Thumbnails/ 728 2019-08-19 12:52 Thumbnails/thumbnail.png 10843 2019-08-19 12:52 styles.xml 0 2019-08-19 12:53 Basic/ 0 2019-08-19 13:22 Basic/Standard/ 1317 2019-08-19 13:00 Basic/Standard/Module1.xml 348 2019-08-19 12:52 Basic/Standard/script-lb.xml 338 2019-08-19 12:52 Basic/script-lc.xml 8539 2019-08-19 12:52 settings.xml 0 2019-08-19 12:53 Configurations2/ 0 2019-08-19 12:53 Configurations2/accelerator/ 0 2019-08-19 12:52 Configurations2/accelerator/current.xml 0 2019-08-19 12:53 META-INF/ 1390 2019-08-19 12:52 META-INF/manifest.xml 899 2019-08-19 12:52 manifest.rdf 1050 2019-08-19 12:52 meta.xml 39 2019-08-19 12:52 mimetype 3297 2019-08-19 12:52 content.xml --------- ------- 28788 18 files

Here is an example:

The code downloads and executes a binary called “plink.” The software creates SSH communications. The IP is a local network IP and not an IP available on the internet, which is interesting because the other documents we identified download an executable from the local network. We do not know if it is a test, a pentest framework, or if it was used in a specific context. There is the possibility that an actor could use this to carry out additional lateral movement within an already compromised environment.

We identified attempts to download Metasploit payloads:

And finally, some more obfuscated versions using WMI in order to execute the downloaded payload:

These samples only targets users using OpenOffice and StarOffice. We still do not know the final payload or the context under which this document was deployed.

Conclusion

Microsoft Office is a commonly attacked platform and is considered the most popular productivity suite on the market. This, similarly to the Microsoft Windows operating system, makes it a prime target for threat actors.

By attacking known platforms, attackers increase their chances of gaining access to machines. And the use of the ODT file format shows that actors are happy to try out different mechanisms of infection, perhaps in an attempt to see if they are these documents have a higher rate of infection or are better at avoiding detection. As we point out some AV engines and sandboxes do not handle these file formats with the appropriate method so they become “missed” in some instances. Whilst less people may avail of these pieces of software the actor may have a higher success rate due to low detections. The potential for specifically targeted attacks can also increase with the use of lesser used file formats. This can be coupled with OSINT from an attacker to understand who has potentially began to use LibreOffice formats by referring to the LibreOffice public migration page here, whilst this is a nice feature to show the uptake in their software it also leaves a valuable piece of information pertaining to what infrastructures are running their software.

Coverage

Intrusion prevention systems such as SNORT® provide an effective tool to detect this activity due to specific signatures present at the end of each command. In addition to intrusion prevention systems, it is advisable to employ endpoint detection and response tools (EDR) such as Cisco AMP for Endpoints, which gives users the ability to track process invocation and inspect processes. Try AMP for free here.

Additional ways our customers can detect and block these threats are listed below.

Cisco Cloud Web Security (CWS) or Web Security Appliance (WSA) web scanning prevents access to malicious websites and detects malware used in these attacks.

Email Security can block malicious emails sent by threat actors as part of their campaign.

Network Security appliances such as Next-Generation Firewall (NGFW), Next-Generation Intrusion Prevention System (NGIPS), and Meraki MX can detect malicious activity associated with this threat.

AMP Threat Grid helps identify malicious binaries and build protection into all Cisco Security products.

Umbrella, our secure internet gateway (SIG), blocks users from connecting to malicious domains, IPs, and URLs, whether users are on or off the corporate network.

Open Source SNORTⓇ Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.

IOCs

Case #1

ODT Documents:

de8e85328b1911084455e7dc78b18fd1c6f84366a23eaa273be7fbe4488613dd f24c6a56273163595197c68abeab7f18e4e2bedd6213892d83cdb7a191ff9900

PE:

02000ddf92ceb363760acc1d06b7cd1f05be7a1ca6df68586e77cf65f4c6963e 19027327329e2314b506d9f44b6871f2613b8bb72aa831004e6be873bdb1175d

C2 servers:

wh-32248[.]portmap[.]io amibas8722[.]ddns[.]net

Payload storage:

top4top[.]net

Case #2

ODT document: 80c62c646cce264c08deb02753f619da82b27d9c727e854904b9b7d88e45bf9e

PE: 20919e87d52b1609bc35d939695405212b8ca540e50ce8bece01a9fccfa70169

Case #3

2f4aa28974486152092669c85d75232098d32446adefeeef3a94ad4c58af0fc8 d099eac776eabf48f55a75eb863ad539a546202da02720aa83d88308be3ce4ca 84cb192cc6416b20293dfb8c621267e1584815a188b67757fa0d1af29a7cfdcd b2b51864fa2f80f8edbdaf6721a6780e15a30291a748c2dfc52d574de0d8c3ed f9138756639104e2c392b085cc5a98b1db77f0ed6e3b79eacac9899001ed7116 efb81fb8095319f5ee6fd4d6741b80386a824b9df05460d16d22cad1d6bbb35d f5194cc197d98ed9078cceca223e294c5ec873b86cbeff92eb9eaca17fc90584 429d270195bed378495349cf066aee649fd1c8c450530d896844b1692ddddc77

#gallery-0-5 { margin: auto; } #gallery-0-5 .gallery-item { float: left; margin-top: 10px; text-align: center; width: 33%; } #gallery-0-5 img { border: 2px solid #cfcfcf; } #gallery-0-5 .gallery-caption { margin-left: 0; } /* see gallery_shortcode() in wp-includes/media.php */

Go to Source Author: Open Document format creates twist in maldoc landscape Original Post from Talos Security Author: By Warren Mercer and Paul Rascagneres. Introduction Cisco Talos recently observed attackers changing the file formats they use in an attempt to thwart common antivirus engines.

#Talos Security

0 notes

letterbead93-blog · 5 years

Text

Pentaho Data Integration: WebSpoon on AWS Elastic Beanstalk and adding EBS or EFS Storage Volumes

This is a long overdue artilce on Hiromu’s WebSpoon. Hiromu has done a fantastic work on WebSpoon - finally bringing the familiar Spoon Desktop UI to the web browser.

For completeness sake I took the liberty of copying Hiromu’s instructions on how the set up the intial AWS Elastic Beanstalk environment. My main focus here is to provide simple approaches on how to add persistant storage options to your WebSpoon setup, some of which are fairly manual approaches (which should be later on replaced by a dedicated automatic setup). The article is more aimed toward users which are new to AWS.

Note: Once finished, always remember to terminate your AWS environment to stop occuring costs.

Overview

This guide will give you an example of how to deploy webSpoon to the cloud with the following requirements.

Deploy webSpoon to the cloud

Deploy webSpoon with plugins

Auto-scale and load-balance

Initial Quick Setup

To easily satisfy the requirements, this example uses the Docker image with the plugins and deploys it to AWS Elastic Beanstalk.

These are the rough steps using the new Amazon Configuration Interface:

Head to the AWS Console and sign into the console. Choose Beanstalk from the main menu. Then click Get Started or Create New Application:

In the Choose Environment dialog pick Web server environment.

In the Platform section, choose Multi-container Docker as a Preconfigure platform.

While we have a Dockerfile to provision our machine(s), we still need a method of orchestrating the setup of our cluster. This is where the Dockerrun.aws.json file comes in. In the Application code section, tick Upload your code and choose Dockerrun.aws.json as an Application code - contents copied below for convenience:

"AWSEBDockerrunVersion": 2, "containerDefinitions": [ "name": "webSpoon", "image": "hiromuhota/webspoon:latest-full", "essential": true, "memory": 1920, "environment": [ "name": "JAVA_OPTS", "value": "-Xms1024m -Xmx1920m" ], "portMappings": [ "hostPort": 80, "containerPort": 8080 ] ]

Click Configure more options instead of Create application.

On the next screen under Configuration presets you have two options. If you just want to run the container on one node choose Custom Configuration, otherwise choose High Availability.

On the same page, change EC2 instance type from t2.micro to t2.small or an instance type with 2GB+ memory. Click Save.

Optional: If you plan to ssh into the EC2 instance, edit the Security settings: Define your key pair.

Click Create Environment. Wait until your application and environment are created, which will take a few minutes:

High Availability option only: The screen will look like this one the environment is ready. Click on Configuration in side panel.

High Availability option only: Scroll down to find the Load Balancing section and click on the wheel icon:

High Availability option only: On the following screen enable session stickiness:

You can access the Spoon Web Interface via this URL:

http://<your-beanstalk-app-url>.com:8080/spoon/spoon

Your Beanstalk App URL is shown on the AWS Beanstalk application overview page.

Adding Volumes

The main aim of adding volumes is to persist the data outside of the Docker Container. We will have a look at various options:

Adding an EC2 Instance Docker Volume

Sources:

Note: Choosing this option does not really provide much benefit: We only map Docker container folders to local folders on the EC2 instance. So if you were to terminate your Beanstalk environment, the files would be gone as well. The main benefit here is that if the Docker Container gets terminated, the file at least survice on the EC2 instance.

Create a new project folder called beanstalk-with-ec2-instance-mapping.

Inside this folder create a Dockerrun.aws.json with following content:

"AWSEBDockerrunVersion": 2, "volumes": [ "name": "kettle", "host": "sourcePath": "/var/app/current/kettle" , "name": "pentaho", "host": "sourcePath": "/var/app/current/pentaho" ], "containerDefinitions": [ "name": "webSpoon", "image": "hiromuhota/webspoon:latest-full", "essential": true, "memory": 1920, "environment": [ "name": "JAVA_OPTS", "value": "-Xms1024m -Xmx1920m" ], "portMappings": [ "hostPort": 80, "containerPort": 8080 ], "mountPoints": [ "sourceVolume": "kettle", "containerPath": "/root/.kettle", "readOnly": false , "sourceVolume": "pentaho", "containerPath": "/root/.pentaho", "readOnly": false , "sourceVolume": "awseb-logs-webSpoon", "containerPath": "/usr/local/tomcat/logs" ] ]

First we create two volumes on the EC2 instance using the top level volumes JSON node: one for the .kettle files and one for the .pentaho files. Note that the sourcePath is the path on the host instance.

Note: This defines volumes on the hard drive of the EC2 instance you run your Docker container on. This is pretty much the same as if you were defining volumes on your laptop for a Docker container that you run. This does not set up magically any new EBS or EFS volumes.

Next, within the containerDefinitions for webSpoon, we add three mountPoints within the Docker Container. Here we map the container paths to the volumes we created earlier on (kettle and pentaho). The third mount point we define is for writing the logs out: This is a default requirement of the Beanstalk setup. For each container Beanstalk will create automatically a volume to store the logs. The volume name is made up of awseb-logs- plus the container name: In our case, this is: awseb-logs-webSpoon. And the logs we want to store are the Tomcat server logs.

The Beanstalk environment setup procedure is exactly the same as before, so go ahead and set up the environment.

Note: On the EC2 instance the directory /var/app/current/ is where the app files get stored (in our case this is only Dockerrun.aws.json). This folder does not required sudo privileges. If you ran the Docker container on your laptop you might have noticed that by default Docker stored name volumes in /var/lib/docker/volumes. On the EC2 instance this directory requires sudo privileges.

Once the environment is running, we can ssh into the EC2 instance.

Note: You can find the public DNS of your EC2 instance via the EC2 panel of the AWS console.

It is beyond the scope of this article to explain how to set up the required key pairs to ssh into an EC2 instance: Here is a good article describing the required steps. If you want to ssh into your instance, read this first. You also have to make sure that your Beanstalk environment knows which key you want to use. You can configure this via the main Configuration Panel under Security. This will restart the EC2 instance.

ssh -i <path-to-pem-file> ec2-user@<ec2-instance-public-dns>

We can double check now that the volume directories got created:

$ ls -l /var/app/current/ total 12 -rw-r--r-- 1 root root 1087 Dec 26 09:41 Dockerrun.aws.json drwxr-xr-x 2 root root 4096 Dec 26 09:42 kettle drwxr-xr-x 3 root root 4096 Dec 26 09:43 pentaho

Adding an EBS Volume

Sources:

Note: An EBS Drive is a device that will be mounted directly to your EC2 instance. It cannot be shared with any other EC2 instance. In other words, every EC2 instance will have their own (possibly set of) EBS Drive(s). This means that files cannot be shared across EC2 instances.

Two ESB Drives to Two Docker Volumes

For the next steps to work the volume mapping from Docker container to the EC2 instance has to be in place (as discussed in the previous section). We cover this below again.

Basically we have to create two layers of volume mapping:

Docker Container to EC2 instance

EC2 instance to EBS

There is no way to define an EBS volume in the Dockerrun.aws.json file: You have to create another file with a .config extension, which has to reside in the .ebextensions folder within your project’s folder. So the project’s folder structure should be like this:

. ├── Dockerrun.aws.json └── .ebextensions └── options.config

If you are not familiar with how mounting drivers on Linux works, read this article first.

Important: Configuration files must conform to YAML formatting requirements. Always use spaces to indent and don’t use the same key twice in the same file.

On the EC2 instance we will create a mount point under a new /data directory, which has less chance to interfere with any other process.

Let’s get started:

Create a new project folder called beanstalk-with-ebs-two-volumes.

Inside this folder create a Dockerrun.aws.json with following content:

"AWSEBDockerrunVersion": 2, "volumes": [ "name": "kettle", "host": "sourcePath": "/data/kettle" , "name": "pentaho", "host": "sourcePath": "/data/pentaho" ], "containerDefinitions": [ "name": "webSpoon", "image": "hiromuhota/webspoon:latest-full", "essential": true, "memory": 1920, "environment": [ "name": "JAVA_OPTS", "value": "-Xms1024m -Xmx1920m" ], "portMappings": [ "hostPort": 80, "containerPort": 8080 ], "mountPoints": [ "sourceVolume": "kettle", "containerPath": "/root/.kettle", "readOnly": false , "sourceVolume": "pentaho", "containerPath": "/root/.pentaho", "readOnly": false , "sourceVolume": "awseb-logs-webSpoon", "containerPath": "/usr/local/tomcat/logs" ] ]

Inside your project directory (beanstalk-with-ebs-two-volumes), create a subdirectory called .ebextensions.

In the .ebextensions directory create a new file called options.config and populate it with this content:

commands: 01mkfs: command: "mkfs -t ext3 /dev/sdh" 02mkdir: command: "mkdir -p /data/kettle" 03mount: command: "mount /dev/sdh /data/kettle" 04mkfs: command: "mkfs -t ext3 /dev/sdi" 05mkdir: command: "mkdir -p /data/pentaho" 06mount: command: "mount /dev/sdi /data/pentaho" option_settings: - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:1,/dev/sdi=:1

These instructions basically format our two external volumes and then mount them. Note that at the very end under option_settings we specified that each EBS volume should be 1 GB big (which is very likely quite a bit too much for the pentaho volume, however, this is the minimum you can define).

Now we have to zip our files from within the project root directory:

[dsteiner@localhost beanstalk-with-ebs-two-volumes]$ zip ../webspoon-with-ebs.zip -r * .[^.]* adding: Dockerrun.aws.json (deflated 65%) adding: .ebextensions/ (stored 0%) adding: .ebextensions/options.config (deflated 43%)

Note: The zip file will be conveniently placed outside the project directory.

Next via the Web UI create a new Beanstalk environment. The approach is the same as before, just that instead of the Dockerrun.aws.json you have to upload the zip file now.

Important: You have to create the new Beanstalk environment in exactly the same Availability Zone within your Region as your EBS Drive resides in! Otherwise you can’t connect it! You can define the Availability Zone in the Capacity settings on the Configure env name page.

Once the environment is running, ssh into the EC2 instance.

Note: You can find the public DNS of your EC2 instance via the EC2 panel of the AWS console.

ssh -i <path-to-pem-file> ec2-user@<ec2-instance-public-dns>

We can check the mount points now:

$ ls -l /data total 8 drwxr-xr-x 3 root root 4096 Dec 26 16:12 kettle drwxr-xr-x 4 root root 4096 Dec 26 16:11 pentaho $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdh 202:112 0 1G 0 disk /data/kettle xvdi 202:128 0 1G 0 disk /data/pentaho

As you can see, now everything looks fine.

Once you have the new configuration running, you might want to check if the new volumes got created: You can do this by going to the EC2 section of the AWS console. On the side panel under Elastic Block Storage click on Volumes:

Note: Elastic Beanstalk will by default create also a volume /dev/xvdcz to store the Docker image.

One ESB Drive to Two Docker Volumes

Actually, what we did was a bit too complex (but might be required in some scenarios): We could have simply just mapped the /root folder of the Docker container to the /data folder on the EC2 instance and created a mount point /data that links the EC2 instance directory to the EBS volume. This way all the data is contained in one drive. Well, as it turns out, this is actually not a good idea, as we get loads of other files/folders as well:

[ec2-user@ip-xxx-xx-xx-xx ~]$ ls -la /data total 40 drwxr-xr-x 7 root root 4096 Dec 26 21:36 . drwxr-xr-x 26 root root 4096 Dec 26 21:24 .. drwxr-x--- 3 root root 4096 Dec 26 21:36 .java drwxr-x--- 3 root root 4096 Dec 26 21:37 .kettle drwx------ 2 root root 16384 Dec 26 21:24 lost+found drwxr-x--- 3 root root 4096 Dec 26 21:26 .m2 drwxr-x--- 3 root root 4096 Dec 26 21:36 .pentaho

Ok, so instead of this, we can leave the original Docker container to EC2 instance volume mapping in place:

Docker container path EC2 instance volume path /root/.kettle /data/kettle /root/.pentaho /data/pentaho

And just use one ESB volume, which we mount to /data.

Create a new project directory called beanstalk-with-ebs-one-volume.

Add a new Dockerrun.aws.json file to this folder, which looks like this (it’s exactly the same as when we added the volumes originally):

Inside your project directory (beanstalk-with-ebs-one-volume), create a subdirectory called .ebextensions. In the .ebextensions directory create a new file called options.config and populate it with this content:

commands: 01mkfs: command: "mkfs -t ext3 /dev/sdh" 02mkdir: command: "mkdir -p /data" 03mount: command: "mount /dev/sdh /data" option_settings: - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:1

Now we have to zip our files from within the project root directory:

[dsteiner@localhost beanstalk-with-ebs-one-volume]$ zip ../webspoon-with-ebs.zip -r * .[^.]* adding: Dockerrun.aws.json (deflated 65%) adding: .ebextensions/ (stored 0%) adding: .ebextensions/options.config (deflated 43%)

Note: The zip file will be conveniently placed outside the project directory.

Next via the Web UI create a new Beanstalk environment. The approach is the same as before, just that instead of the Dockerrun.aws.json you have to upload the zip file now.

When we shh into our EC2 instance, we can see:

[ec2-user@ip-172-31-14-201 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdh 202:112 0 1G 0 disk /data [ec2-user@ip-172-31-14-201 ~]$ ls -l /data/ total 24 drwxr-xr-x 3 root root 4096 Dec 26 22:33 kettle drwx------ 2 root root 16384 Dec 26 22:05 lost+found drwxr-xr-x 3 root root 4096 Dec 26 22:33 pentaho

As you can see, our /data directory looks way tidier now.

Avoid Conflicts

If we had specified /var/app/current/kettle and /var/app/current/pentaho as mount points we would have run into problems. Everything specified in .ebextensions gets executed before anything in Dockerrun.aws.json. So this approach would have mounted the EBS volumes first under /var/app/current and then later on when Dockerrun.aws.json would have tried to deploy our project, it would have seen that the /var/app/current already exists. In this case it would have renamed it to /var/app/current.old and deployed the app to a fresh new /var/app/current directory.

You can see this when you run the lbslk command to check how the devices were mounted:

[ec2-user@ip-xxx-xx-xx-xx ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdh 202:112 0 1G 0 disk /var/app/current.old/kettle

Conclusion: That’s why we need a different mount point! We want to specify a custom location that does not interfere with any other process.

Really Persisting the Data

Note: Here we only discuss a simple manual approach. This is only sensible if you run a single EC2 node with WebSpoon on it. For a more complex setup with load balancer and auto-scaling an automatic solution should be put in place.

So what is the point of this exercise really? Why did we do this? Our main intention was to have some form of persistent storage. Still, if we were to terminate the Beanstalk environment now, all our EBS volumes would disappear as well! However, via the EC2 panel under Elastic Block Storage there is a way to detach the volume:

The normal Detach Volume command might not work, because the volume is still used by our EC2 instance. You can, however, choose the Force Detach Volume command, which should succeed. Wait until the state of the drive shows available.

Next terminate your current Beanstalk environment. Once it’s terminated, you will see that your EBS Drive is still around. Start a new Beanstalk environment (this time just use the Dockerrun.aws.json file from this section, not the whole zip file - we do not want to create an new EBS drive).

Next, on the EC2 page in the AWS Console go to the EBS Volumes, mark our old EBS drive and right click: Choose Attach Volume. In a pop-up window you will be able to define to which instance you want to attach the EBS drive to.

Once it is attached, grab the public URL of your EC2 Instance from the EC2 area of the AWS console (click on Instances in the side panel). Ssh into your EC2 instance and then run the following:

$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdf 202:80 0 1G 0 disk xvdcz 202:26368 0 12G 0 disk

Originally I asked the EBS drive to be named sdf, but due to OS specifics it ended up being called xvdf as we can see from running the lsblk command. Note that the last letter remains the same. Also, you can see that it doesn’t have a mount point yet. So now we want to mount the EBS drive to the /data directory:

$ sudo mount /dev/xvdf /data

Next you have to restart the Docker container so that the changes can be picked up.

$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 18dfb174b88a hiromuhota/webspoon:latest-full "catalina.sh run" 15 minutes ago Up 15 minutes 0.0.0.0:80->8080/tcp ecs-awseb-Webspoon-env-xxx 0761525dd370 amazon/amazon-ecs-agent:latest "/agent" 16 minutes ago Up 16 minutes ecs-agent $ sudo docker restart 18dfb174b88a

Note: If you create the EBS Driver upfront separate from the Beanstalk environment, when you later shutdown the environment, the EBS Driver does not get terminated.

Alternative Approach: Manually create EBS Drive

Note: This approach is only really sensible if you were to run one EC2 instance only. The beauty of the previous approach is that every EC2 instance spun up via the auto-scaling process will have exactly the same setup (so e.g. one EC2 instance with one EBS drive). So for the approach outlined below, you do not need a load balancer and also not auto-scaling.

If you want to go down the manual road, you can as well create the EBS Drive manually:

Go to the EC2 area within the AWS Console and click on Volumes under Elastic Blockstorage (in the side panel).

Click on the Create Volume button. Fill out the required settings and confirm to create the volume.

Next go to the Elastic Beanstalk area of the AWS Console. Start a new Beanstalk environment: Use the Dockerrun.aws.json file from the beanstalk-with-ebs-one-volume project (if you skipped the previous sections, the instructions for setting up the Beanstalk environment are quite at the beginning of this article). This time also change edit the Capacity settings on the Configure env name page. For the Availability Zone define the same zone as your EBS Drive resides in.

Once the environment is up and running, you can Attach the Volume. On the EC2 page in the AWS Console go to the EBS Volumes:

Mark our EBS drive and right click: Choose Attach Volume. In a dialog you will be able to define to which instance you want to attach the EBS drive to.

Once it is attached, grab the public URL of your EC2 Instance from the EC2 area of the AWS console (click on Instances in the side panel). Ssh into your EC2 instance and then run the following:

$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 8G 0 disk └─xvda1 202:1 0 8G 0 part / xvdf 202:80 0 1G 0 disk xvdcz 202:26368 0 12G 0 disk

Originally I asked the EBS drive to be name sdf, but due to OS specifics it ended up being called xvdf as we can see from running the lsblk command. Note that the last letter remains the same. Also, you can see that it doesn’t have a mount point yet.

Since it is a fresh new EBS drive, we have to format it first:

$ sudo mkfs -t ext3 /dev/xvdf

Next we want to mount the EBS drive to the /data directory:

$ sudo mount /dev/xvdf /data

Next you have to restart the Docker container so that the files can be picked up:

Once you terminate your environment, the EBS Drive will still be available, so you can later on easily attach it to a new EC2 instance. It behaves this way because you created the EBS Drive separately from the Beanstalk environment.

Making use of Snapshots

If you don’t want to use WebSpoon nor the EBS drive for some time, you can take a snapshot of the data on the EBS Drive and store the snapshot on S3. Then you can get rid of the EBS Drive. Whenever you decide to get WebSpoon running again, you can restore the data from the snapshot onto a EBS Drive and attach it to the EC2 instance that is running WebSpoon and all is back to normal again.

Adding EFS Volume

Note: An EFS Volume is a network file storage (available in all AWS regions that support it) that can be shared between many EC2 instances at the same time. Because it is a network storage, it will be a bit slower than an EBS Volume (which is a device directly attached to an EC2 instance). Another advantage of an EFS Volume is that it grows or shrinks automatically according to your storage needs.

“You can create an Amazon EFS volume as part of your environment, or mount an Amazon EFS volume that you created independently of Elastic Beanstalk.”

Important: Before you start make sure that EFS is available in your region! If not, change the region selector in the top right hand corner of the AWS console.

Manual Approach: Via Web UI and Command Line

Note: This approach is only really suitable for smaller setups.

Go to the EFS Console and click on Create file system button. Provide the relevant details in the wizard dialog. This is really quite easy. Your network drive should be ready in a matter of minutes.

Next you can either ssh into your EC2 instances and run a few commands to mount the EFS Drive or add an additional config file to the beanstalk config files (and there are a few other options available as well). We will go with the first option for now.

Follow these steps:

Create a new Beanstalk environment with the Dockerrun.aws.json file from the beanstalk-with-ebs-one-volume project.

In the EFS Console, expand the details on your EFS Volume and you will find a link on how to mount the volume to an EC2 instance. The steps below are mostly a copy and paste of these instructions.

Check which security group is assigned to your EFS Volume Mount targets. Chances are that it is the default security group. We have to add the same security group to our EC2 instance(s), so that the instance can access the EFS mount target.

Head over to the EC2 console and click on Instance, then right click on your currently running instance: Networking > Change Security Groups.

Tick the default security group and confirm changes.

Next copy the Public DNS of your EC2 instance, then from the command line, ssh into your instance.

Once logged in, let’s install the NFS client: sudo yum install -y nfs-utils

In our case, the mount directory already exists (/data), hence we can directly mount the NFS Volume like so:

sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-9c912d55.efs.eu-west-1.amazonaws.com:/ /data

If you run the ls command on the /data directory you will see that it is empty now. Restart the Docker Container now so that these changes can be picked up.

Via Config

Important: Make sure that the EC2 instance has the same Security group assigned as the EFS Volume, otherwise you won’t be able to connect the two!

Elastic Beanstalk provides the option to:

create a new EFS (using the storage-efs-createfilesystem.config - Example). Note that this volume will be deleted once the environment is terminated since it is defined within the configuration detailed of this particular environment.

mount an existing EFS (using the storage-efs-mountfilesystem.config - Example).

These options are not exclusive: So you can either create a new EFS Volume and mount it or mount an existing EFS volume (which you already created separately at some point in the past - the configuration details are not part of the Beanstalk configuration details, in which case the EFS Volume will still exist after the Beanstalk environment was terminated).

If your plan is to not have the EFS Volume running all times (and WebSpoon for that matter), you can take a backup of the drive and reinstate it later on.

The configuration files have to be stored in the .ebextensions directory in your source code.

Note: “EFS requires a VPC. You can use the default VPC or create a custom VPC to use. Use the VPC management console to determine the number of available AZs and the default subnet for each.”

Mount Existing EFS

We will deploy new artifacts one by one to make sure that everything is working correctly:

Create a new project folder called beanstalk-with-efs-mount-existing-volume.

Copy and paste the Dockerrun.aws.json file from the beanstalk-with-ebs-one-volume project into this directory.

Create a new Beanstalk environment with this Dockerrun.aws.json file.

Next go to the EFS Console and create a new EFS Volume. Once the volume is ready, expand the details on your EFS Volume and copy the ID of Volume.

Head over to the EC2 console and click on Instance, then right click on your currently running instance: Networking > Change Security Groups.

Tick the default security group and confirm changes.

Inside this your project directory (beanstalk-with-efs-mount-existing-volume), create a subdirectory called .ebextensions.

In the .ebextensions directory create a new file called storage-efs-mountfilesystem.config and populate it with this content:

option_settings: aws:elasticbeanstalk:application:environment: # Use variable in conjunction with storage-efs-createfilesystem.config or # replace with EFS volume ID of an existing EFS volume # FILE_SYSTEM_ID: '`"Ref" : "FileSystem"`' FILE_SYSTEM_ID: 'fs-5d6fd394' # Replace with the required mount directory MOUNT_DIRECTORY: '/data' ############################################## #### Do not modify values below this line #### ############################################## REGION: '`"Ref": "AWS::Region"`' packages: yum: nfs-utils: [] jq: [] commands: 01_mount: command: "/tmp/mount-efs.sh" files: "/tmp/mount-efs.sh": mode: "000755" content : | #!/bin/bash EFS_REGION=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.REGION') EFS_MOUNT_DIR=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.MOUNT_DIRECTORY') EFS_FILE_SYSTEM_ID=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.FILE_SYSTEM_ID') echo "Mounting EFS filesystem $EFS_DNS_NAME to directory $EFS_MOUNT_DIR ..." echo 'Stopping NFS ID Mapper...' service rpcidmapd status &> /dev/nulljq if [ $? -ne 0 ] ; then echo 'rpc.idmapd is already stopped!' else service rpcidmapd stop if [ $? -ne 0 ] ; then echo 'ERROR: Failed to stop NFS ID Mapper!' exit 1 fi fi # our mount point already exists, hence commented this section out # echo 'Checking if EFS mount directory exists...' # if [ ! -d $EFS_MOUNT_DIR ]; then # echo "Creating directory $EFS_MOUNT_DIR ..." # mkdir -p $EFS_MOUNT_DIR # if [ $? -ne 0 ]; then # echo 'ERROR: Directory creation failed!' # exit 1 # fi # else # echo "Directory $EFS_MOUNT_DIR already exists!" # fi mountpoint -q $EFS_MOUNT_DIR if [ $? -ne 0 ]; then echo "mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $EFS_FILE_SYSTEM_ID.efs.$EFS_REGION.amazonaws.com:/ $EFS_MOUNT_DIR" mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $EFS_FILE_SYSTEM_ID.efs.$EFS_REGION.amazonaws.com:/ $EFS_MOUNT_DIR if [ $? -ne 0 ] ; then echo 'ERROR: Mount command failed!' exit 1 fi chmod 777 $EFS_MOUNT_DIR runuser -l ec2-user -c "touch $EFS_MOUNT_DIR/it_works" if [[ $? -ne 0 ]]; then echo 'ERROR: Permission Error!' exit 1 else runuser -l ec2-user -c "rm -f $EFS_MOUNT_DIR/it_works" fi else echo "Directory $EFS_MOUNT_DIR is already a valid mountpoint!" fi echo 'EFS mount complete.'

Replace the FILE_SYSTEM_ID with the id of your EFS Volume Id. The config file I use here is fairly similar to this example, just that I commented out the section that creates the mount point, since this one already exists in our case: Remember that this directory gets created already by the instructions in our Dockerrun.aws.json (where we ask it to create the Docker volumes).

Next zip the two files up as we did before.

Go to the Dashboard page of your Beanstalk app and click on Upload and Deploy.

Add the zip file we just created and provide a version number. Confirm changes. It will attempt to deploy the new configuration: The good things now is that if this fails, it will automatically revert to the previous version, so at no point your are left with a broken environment.

Once the deployment succeeds, ssh into the EC2 instance and verify that the mounting was successful:

[ec2-user@ip-xxx-xx-xx-xxx ~]$ mount -t nfs4 xxxxxx.efs.eu-west-1.amazonaws.com:/ on /data type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx-xx-xx-xxx,local_lock=none,addr=xxx-xx-xx-xxx)

Create EFS and Mount It

This time we will create and mount the EFS volume via config settings.

The AWS Guide recommends deploying the storage-efs-createfilesystem.config with the source code code first (without any other changes), make sure that this deployment succeeds and only deploy everything with storage-efs-mountfilesystem.config. “Doing this in two separate deployments ensures that if the mount operation fails, the file system is left intact. If you do both in the same deployment, an issue with either step will cause the file system to terminate when the deployment fails.”

We actually have to do this in three major steps:

Create a new project folder called beanstalk-with-efs-create-and-mount-existing-volume.

Copy and paste the Dockerrun.aws.json file from the beanstalk-with-ebs-one-volume project into this directory.

Create a Beanstalk environment only with the Dockerrun.aws.json file.

Inside the your project directory (beanstalk-with-ebs-two-volumes), create a subdirectory called .ebextensions.

In the .ebextensions directory create a new file called storage-efs-createfilesystem.config with following content:

option_settings: - namespace: aws:elasticbeanstalk:customoption option_name: VPCId value: "vpc-9acf2ffc" - namespace: aws:elasticbeanstalk:customoption option_name: SubnetEUWest1a value: "subnet-9e9d92f9" - namespace: aws:elasticbeanstalk:customoption option_name: SubnetEUWest1b value: "subnet-b02536f9" - namespace: aws:elasticbeanstalk:customoption option_name: SubnetEUWest1c value: "subnet-d74e058c" Resources: FileSystem: Type: AWS::EFS::FileSystem Properties: FileSystemTags: - Key: Name Value: "EB-EFS-FileSystem" PerformanceMode: "generalPurpose" Encrypted: "false" ## Mount Target Resources # specify one mount target by availability zone MountTargetA: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SecurityGroups: - Ref: MountTargetSecurityGroup SubnetId: Fn::GetOptionSetting: OptionName: SubnetEUWest1a MountTargetB: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SecurityGroups: - Ref: MountTargetSecurityGroup SubnetId: Fn::GetOptionSetting: OptionName: SubnetEUWest1b MountTargetC: Type: AWS::EFS::MountTarget Properties: FileSystemId: Ref: FileSystem SecurityGroups: - Ref: MountTargetSecurityGroup SubnetId: Fn::GetOptionSetting: OptionName: SubnetEUWest1c ############################################## #### Do not modify values below this line #### ############################################## MountTargetSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Security group for mount target SecurityGroupIngress: - FromPort: '2049' IpProtocol: tcp SourceSecurityGroupId: Fn::GetAtt: [AWSEBSecurityGroup, GroupId] ToPort: '2049' VpcId: Fn::GetOptionSetting: OptionName: VPCId

Note: There are various syntax options available: shorthand and longform expressions. To understand how they work, take a look at Option Settings. Also, validate the YAML file with one of the available online services, e.g. Code Beautify.

Once Beanstalk environment is created, the dedicated VPC is set up as well. Go to the VPN management console and get the VPC ID. Paste it into the relevant section within your storage-efs-createfilesystem.config file.

Next get the Subnet IDs. Paste them into the relevant section within your storage-efs-createfilesystem.config file.

Next zip up the two files as we did before.

Go to the Dashboard page of your Beanstalk app and click on Upload and Deploy.

Next add the storage-efs-mountfilesystem.config from the beanstalk-with-efs-mount-existing-volume project. We just have to make a small modification: Replace the hardcoded FILE_SYSTEM_ID with a variable:

option_settings: aws:elasticbeanstalk:application:environment: # Use variable in conjunction with storage-efs-createfilesystem.config or # replace with EFS volume ID of an existing EFS volume FILE_SYSTEM_ID: '`"Ref" : "FileSystem"`' # FILE_SYSTEM_ID: 'fs-5d6fd394' # Replace with the required mount directory MOUNT_DIRECTORY: '/data' ############################################## #### Do not modify values below this line #### ############################################## REGION: '`"Ref": "AWS::Region"`' packages: yum: nfs-utils: [] jq: [] commands: 01_mount: command: "/tmp/mount-efs.sh" files: "/tmp/mount-efs.sh": mode: "000755" content : | #!/bin/bash EFS_REGION=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.REGION') EFS_MOUNT_DIR=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.MOUNT_DIRECTORY') EFS_FILE_SYSTEM_ID=$(/opt/elasticbeanstalk/bin/get-config environment | jq -r '.FILE_SYSTEM_ID') echo "Mounting EFS filesystem $EFS_DNS_NAME to directory $EFS_MOUNT_DIR ..." echo 'Stopping NFS ID Mapper...' service rpcidmapd status &> /dev/nulljq if [ $? -ne 0 ] ; then echo 'rpc.idmapd is already stopped!' else service rpcidmapd stop if [ $? -ne 0 ] ; then echo 'ERROR: Failed to stop NFS ID Mapper!' exit 1 fi fi # our mount point already exists, hence commented this section out # echo 'Checking if EFS mount directory exists...' # if [ ! -d $EFS_MOUNT_DIR ]; then # echo "Creating directory $EFS_MOUNT_DIR ..." # mkdir -p $EFS_MOUNT_DIR # if [ $? -ne 0 ]; then # echo 'ERROR: Directory creation failed!' # exit 1 # fi # else # echo "Directory $EFS_MOUNT_DIR already exists!" # fi mountpoint -q $EFS_MOUNT_DIR if [ $? -ne 0 ]; then echo "mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $EFS_FILE_SYSTEM_ID.efs.$EFS_REGION.amazonaws.com:/ $EFS_MOUNT_DIR" mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $EFS_FILE_SYSTEM_ID.efs.$EFS_REGION.amazonaws.com:/ $EFS_MOUNT_DIR if [ $? -ne 0 ] ; then echo 'ERROR: Mount command failed!' exit 1 fi chmod 777 $EFS_MOUNT_DIR runuser -l ec2-user -c "touch $EFS_MOUNT_DIR/it_works" if [[ $? -ne 0 ]]; then echo 'ERROR: Permission Error!' exit 1 else runuser -l ec2-user -c "rm -f $EFS_MOUNT_DIR/it_works" fi else echo "Directory $EFS_MOUNT_DIR is already a valid mountpoint!" fi echo 'EFS mount complete.'

Zip everything up. Deploy this new archive to your Beanstalk environment.

After successful deployment, you can ssh into the EC2 instance and double check that the EFS volume is mounted:

[ec2-user@ip-xxx-xx-xx-xxx ~]$ mount -t nfs4 fs-cb48f402.efs.eu-west-1.amazonaws.com:/ on /data type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx-xx-xx-xxx,local_lock=none,addr=xxx-xx-xx-xxx)

Pentaho Server

You could also use Elastic Beanstalk to spin up a Pentaho Server with the Jackrabbit Repository and use the latter to store you files.

Source: http://diethardsteiner.github.io/pdi/2017/12/30/Webspoon-on-AWS.html

0 notes

theresawelchy · 6 years

Text

Working with the Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System (HDFS) allows you to both federate storage across many computers as well as distribute files in a redundant manor across a cluster. HDFS is a key component to many storage clusters that possess more than a petabyte of capacity.

Each computer acting as a storage node in a cluster can contain one or more storage devices. This can allow several mechanical storage drives to both store data more reliably than SSDs, keep the cost per gigabyte down as well as go some way to exhausting the SATA bus capacity of a given system.

Hadoop ships with a feature-rich and robust JVM-based HDFS client. For many that interact with HDFS directly it is the go-to tool for any given task. That said, there is a growing population of alternative HDFS clients. Some optimise for responsiveness while others make it easier to utilise HDFS in Python applications. In this post I'll walk through a few of these offerings.

If you'd like to setup an HDFS environment locally please see my Hadoop 3 Single-Node Install Guide (skip the steps for Presto and Spark). I also have posts that cover working with HDFS on AWS EMR and Google Dataproc.

The Apache Hadoop HDFS Client

The Apache Hadoop HDFS client is the most well-rounded HDFS CLI implementation. Virtually any API endpoint that has been built into HDFS can be interacted with using this tool.

For the release of Hadoop 3, considerable effort was put into reorganising the arguments of this tool. This is what they look like as of this writing.

Admin Commands: cacheadmin configure the HDFS cache crypto configure HDFS encryption zones debug run a Debug Admin to execute HDFS debug commands dfsadmin run a DFS admin client dfsrouteradmin manage Router-based federation ec run a HDFS ErasureCoding CLI fsck run a DFS filesystem checking utility haadmin run a DFS HA admin client jmxget get JMX exported values from NameNode or DataNode. oev apply the offline edits viewer to an edits file oiv apply the offline fsimage viewer to an fsimage oiv_legacy apply the offline fsimage viewer to a legacy fsimage storagepolicies list/get/set block storage policies Client Commands: classpath prints the class path needed to get the hadoop jar and the required libraries dfs run a filesystem command on the file system envvars display computed Hadoop environment variables fetchdt fetch a delegation token from the NameNode getconf get config values from configuration groups get the groups which users belong to lsSnapshottableDir list all snapshottable dirs owned by the current user snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot version print the version Daemon Commands: balancer run a cluster balancing utility datanode run a DFS datanode dfsrouter run the DFS router diskbalancer Distributes data evenly among disks on a given node httpfs run HttpFS server, the HDFS HTTP Gateway journalnode run the DFS journalnode mover run a utility to move block replicas across storage types namenode run the DFS namenode nfs3 run an NFS version 3 gateway portmap run a portmap service secondarynamenode run the DFS secondary namenode zkfc run the ZK Failover Controller daemon

The bulk of the disk access verbs most people familiar with Linux will recognise are kept under the dfs argument.

Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>] [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...] [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-v] [-x] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] [-skip-empty-file] <src> <localdst>] [-head <file>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touchz <path> ...] [-truncate [-w] <length> <path> ...] [-usage [cmd ...]]

Notice how the top usage line doesn't mention hdfs dfs but instead hadoop fs. You'll find that either prefix will provide the same functionality if you're working with HDFS as an endpoint.

A Golang-based HDFS Client

In 2014, Colin Marc started work on a Golang-based HDFS client. This tool has two major features that stand out to me. The first is that there is no JVM overhead so execution begins much quicker than the JVM-based client. Second is the arguments align more closely with the GNU Core Utilities commands like ls and cat. This isn't a drop-in replacement for the JVM-based client but it should be a lot more intuitive for those already familiar with GNU Core Utilities file system commands.

The following will install the client on a fresh Ubuntu 16.04.2 LTS system.

$ wget -c -O gohdfs.tar.gz \ https://github.com/colinmarc/hdfs/releases/download/v2.0.0/gohdfs-v2.0.0-linux-amd64.tar.gz $ tar xvf gohdfs.tar.gz $ gohdfs-v2.0.0-linux-amd64/hdfs

The release also includes a bash completion script. This is handy for being able to hit tab and get a list of commands or to complete a partially typed-out list of arguments.

I'll include the extracted folder name below to help differentiate this tool from the Apache HDFS CLI.

$ gohdfs-v2.0.0-linux-amd64/hdfs

Valid commands: ls [-lah] [FILE]... rm [-rf] FILE... mv [-nT] SOURCE... DEST mkdir [-p] FILE... touch [-amc] FILE... chmod [-R] OCTAL-MODE FILE... chown [-R] OWNER[:GROUP] FILE... cat SOURCE... head [-n LINES | -c BYTES] SOURCE... tail [-n LINES | -c BYTES] SOURCE... du [-sh] FILE... checksum FILE... get SOURCE [DEST] getmerge SOURCE DEST put SOURCE DEST df [-h]

As you can see, prefixing many GNU Core Utilities file system commands with the HDFS client will produce the expected functionality on HDFS.

$ gohdfs-v2.0.0-linux-amd64/hdfs df -h

Filesystem Size Used Available Use% 11.7G 24.0K 7.3G 0%

The GitHub homepage for this project shows how listing files can be two orders of magnitude quicker using this tool versus the JVM-based CLI.

This start up speed improvement can be handy if HDFS commands are being invoked a lot. The ideal file size of an ORC or Parquet file for most purposes is somewhere between 256 MB and 2 GB and it's not uncommon to see these being micro-batched into HDFS as they're being generated.

Below I'll generate a file containing a gigabyte of random data.

$ cat /dev/urandom \ | head -c 1073741824 \ > one_gig

Uploading this file via the JVM-based CLI took 18.6 seconds on my test rig.

$ hadoop fs -put one_gig /one_gig

Uploading via the Golang-based CLI took 13.2 seconds.

$ gohdfs-v2.0.0-linux-amd64/hdfs put one_gig /one_gig_2

Spotify's Python-based HDFS Client

In 2014 work began at Spotify on a Python-based HDFS CLI and library called Snakebite. The bulk of commits on this project we put together by Wouter de Bie and Rafal Wojdyla. If you don't require Kerberos support then the only requirements for this client are the Protocol Buffers Python library from Google and Python 2.7. As of this writing Python 3 isn't supported.

The following will install the client on a fresh Ubuntu 16.04.2 LTS system using a Python virtual environment.

$ sudo apt install \ python \ python-pip \ virtualenv $ virtualenv .snakebite $ source .snakebite/bin/activate $ pip install snakebite

This client is not a drop-in replacement for the JVM-based CLI but shouldn't have a steep learning curve if you're already familiar with GNU Core Utilities file system commands.

snakebite [general options] cmd [arguments] general options: -D --debug Show debug information -V --version Hadoop protocol version (default:9) -h --help show help -j --json JSON output -n --namenode namenode host -p --port namenode RPC port (default: 8020) -v --ver Display snakebite version commands: cat [paths] copy source paths to stdout chgrp <grp> [paths] change group chmod <mode> [paths] change file mode (octal) chown <owner:grp> [paths] change owner copyToLocal [paths] dst copy paths to local file system destination count [paths] display stats for paths df display fs stats du [paths] display disk usage statistics get file dst copy files to local file system destination getmerge dir dst concatenates files in source dir into destination local file ls [paths] list a path mkdir [paths] create directories mkdirp [paths] create directories and their parents mv [paths] dst move paths to destination rm [paths] remove paths rmdir [dirs] delete a directory serverdefaults show server information setrep <rep> [paths] set replication factor stat [paths] stat information tail path display last kilobyte of the file to stdout test path test a path text path [paths] output file in text format touchz [paths] creates a file of zero length usage <cmd> show cmd usage

The client is missing certain verbs that can be found in the JVM-based client as well as the Golang-based client described above. One of which is the ability to copy files and streams to HDFS.

That being said I do appreciate how easy it is to pull statistics for a given file.

$ snakebite stat /one_gig

access_time 1539530885694 block_replication 1 blocksize 134217728 file_type f group supergroup length 1073741824 modification_time 1539530962824 owner mark path /one_gig permission 0644

To collect the same information with the JVM client would involve several commands. Their output would also be harder to parse than the key-value pairs above.

As well as being a CLI tool, Snakebite is also a Python library.

from snakebite.client import Client client = Client("localhost", 9000, use_trash=False) [x for x in client.ls(['/'])][:2]

[{'access_time': 1539530885694L, 'block_replication': 1, 'blocksize': 134217728L, 'file_type': 'f', 'group': u'supergroup', 'length': 1073741824L, 'modification_time': 1539530962824L, 'owner': u'mark', 'path': '/one_gig', 'permission': 420}, {'access_time': 1539531288719L, 'block_replication': 1, 'blocksize': 134217728L, 'file_type': 'f', 'group': u'supergroup', 'length': 1073741824L, 'modification_time': 1539531307264L, 'owner': u'mark', 'path': '/one_gig_2', 'permission': 420}]

Note I've asked to connect to localhost on TCP port 9000. Out of the box Hadoop uses TCP port 8020 for the NameNode RPC endpoint. I've often changed this to TCP port 9000 in many of my Hadoop guides.

You can find the hostname and port number configured for this end point on the master HDFS node. Also note that for various reasons HDFS, and Hadoop in general, need to use hostnames rather than IP addresses.

$ sudo vi /opt/hadoop/etc/hadoop/core-site.xml

... <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> ...

Python-based HdfsCLI

In 2014, Matthieu Monsch also began work on a Python-based HDFS client called HdfsCLI. Two features this client has over the Spotify Python client is that it supports uploading to HDFS and Python 3 (in addition to 2.7).

Matthieu has previously worked at LinkedIn and now works for Google. The coding style of this project will feel very familiar to anyone that's looked at a Python project that has originated from Google.

This library includes support for a progress tracker, a fast AVRO library, Kerberos and Pandas DataFrames.

The following will install the client on a fresh Ubuntu 16.04.2 LTS system using a Python virtual environment.

$ sudo apt install \ python \ python-pip \ virtualenv $ virtualenv .hdfscli $ source .hdfscli/bin/activate $ pip install 'hdfs[dataframe,avro]'

In order for this library to communicate with HDFS, WebHDFS needs to be enabled on the master HDFS node.

$ sudo vi /opt/hadoop/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hdfs/datanode</value> <final>true</final> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hdfs/namenode</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.http-address</name> <value>localhost:50070</value> </property> </configuration>

With the configuration in place the DFS service needs to be restarted.

$ sudo su $ /opt/hadoop/sbin/stop-dfs.sh $ /opt/hadoop/sbin/start-dfs.sh $ exit

A configuration file is needed to store connection settings for this client.

[global] default.alias = dev [dev.alias] url = http://localhost:50070 user = mark

A big downside of the client is that only a very limited subset of HDFS functionality is supported. That being said the client verb arguments are pretty self explanatory.

HdfsCLI: a command line interface for HDFS. Usage: hdfscli [interactive] [-a ALIAS] [-v...] hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH hdfscli -L | -V | -h Commands: download Download a file or folder from HDFS. If a single file is downloaded, - can be specified as LOCAL_PATH to stream it to standard out. interactive Start the client and expose it via the python interpreter (using iPython if available). upload Upload a file or folder to HDFS. - can be specified as LOCAL_PATH to read from standard in. Arguments: HDFS_PATH Remote HDFS path. LOCAL_PATH Path to local file or directory. Options: -A --append Append data to an existing file. Only supported if uploading a single file or from standard in. -L --log Show path to current log file and exit. -V --version Show version and exit. -a ALIAS --alias=ALIAS Alias of namenode to connect to. -f --force Allow overwriting any existing files. -s --silent Don't display progress status. -t THREADS --threads=THREADS Number of threads to use for parallelization. 0 allocates a thread per file. [default: 0] -v --verbose Enable log output. Can be specified up to three times (increasing verbosity each time). Examples: hdfscli -a prod /user/foo hdfscli download features.avro dat/ hdfscli download logs/1987-03-23 - >>logs hdfscli upload -f - data/weights.tsv <weights.tsv HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.

The following finished in 68 seconds. This is an order of magnitude slower than some other clients I'll explore in this post.

$ hdfscli upload \ -t 4 \ -f - \ one_gig_3 < one_gig

That said, the client is very easy to work with in Python.

from hashlib import sha256 from hdfs import Config client = Config().get_client('dev') [client.status(uri) for uri in client.list('')][:2]

[{u'accessTime': 1539532953515, u'blockSize': 134217728, u'childrenNum': 0, u'fileId': 16392, u'group': u'supergroup', u'length': 1073741824, u'modificationTime': 1539533029897, u'owner': u'mark', u'pathSuffix': u'', u'permission': u'644', u'replication': 1, u'storagePolicy': 0, u'type': u'FILE'}, {u'accessTime': 1539533046246, u'blockSize': 134217728, u'childrenNum': 0, u'fileId': 16393, u'group': u'supergroup', u'length': 1073741824, u'modificationTime': 1539533114772, u'owner': u'mark', u'pathSuffix': u'', u'permission': u'644', u'replication': 1, u'storagePolicy': 0, u'type': u'FILE'}]

Below I'll generate a SHA-256 hash of a file located on HDFS.

with client.read('/user/mark/one_gig') as reader: print sha256(reader.read()).hexdigest()[:6]

Apache Arrow's HDFS Client

Apache Arrow is a cross-language platform for in-memory data headed by Wes McKinney. It's Python bindings "PyArrow" allows Python applications to interface with a C++-based HDFS client.

Wes stands out in the data world. He has worked for Cloudera in the past, created the Pandas Python package and has been a contributor to the Apache Parquet project.

The following will install PyArrow on a fresh Ubuntu 16.04.2 LTS system using a Python virtual environment.

$ sudo apt install \ python \ python-pip \ virtualenv $ virtualenv .pyarrow $ source .pyarrow/bin/activate $ pip install pyarrow

The Python API behaves in a clean and intuitive manor.

from hashlib import sha256 import pyarrow as pa hdfs = pa.hdfs.connect(host='localhost', port=9000) with hdfs.open('/user/mark/one_gig', 'rb') as f: print sha256(f.read()).hexdigest()[:6]

The interface is very performant as well. The following completed in 6 seconds. This is the fastest any client transfer this file on my test rig.

with hdfs.open('/user/mark/one_gig_4', 'wb') as f: f.write(open('/home/mark/one_gig').read())

HDFS Fuse and GNU Core Utilities

One of the most intuitive ways to interact with HDFS for a newcomer could most likely be with the GNU Core Utilities file system functions. These can be run on an HDFS mount exposed via a file system fuse.

The following will install an HDFS fuse client on a fresh Ubuntu 16.04.2 LTS system using a Debian package from Cloudera's repository.

$ wget https://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/archive.key -O - \ | sudo apt-key add - $ wget https://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/cloudera.list -O - \ | sudo tee /etc/apt/sources.list.d/cloudera.list $ sudo apt update $ sudo apt install hadoop-hdfs-fuse

$ sudo mkdir -p hdfs_mount $ sudo hadoop-fuse-dfs \ dfs://127.0.0.1:9000 \ hdfs_mount

The following completed in 8.2 seconds.

$ cp one_gig hdfs_mount/one_gig_5

Regular file system commands run as expected. The following gives how much space has been used and is available.

Filesystem Size Used Avail Use% Mounted on fuse_dfs 12G 4.9G 6.8G 42% /home/mark/hdfs_mount

The following shows how much disk space has been used per parent folder.

This will give a file listing by file size.

$ ls -lhS hdfs_mount/user/mark/

-rw-r--r-- 1 mark 99 1.0G Oct 14 09:03 one_gig -rw-r--r-- 1 mark 99 1.0G Oct 14 09:05 one_gig_2 -rw-r--r-- 1 mark 99 1.0G Oct 14 09:10 one_gig_3 ...

Thank you for taking the time to read this post. I offer consulting, architecture and hands-on development services to clients in North America & Europe. If you'd like to discuss how my offerings can help your business please contact me via

DataTau published first on DataTau

#Technology #Finance

0 notes

knoldus · 6 years

Link

In this blog, we are going to deploy a sample load balanced app over DCOS and expose it to the outside of the cluster using Marathon-lb. Here we will be using a containerized application which serves a DCOS site.

Here we are using Marathon-lb as an external load balancer. It is based on HAProxy which provides proxying and load balancing for TCP and HTTP based applications.

Prerequisites–

You should have access to a running DCOS cluster.

You should have at least one public and one private agent.

Now we will follow several steps to make our app running and accessible from outside of the cluster.

Steps–

Install DCOS CLI on the local machine.

curl -O https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.10/dcos sudo mv dcos /usr/local/bin chmod +x /usr/local/bin/dcos dcos cluster setup address_of_master_node

Note: You can leave this step if you have already installed DCOS CLI and It is pointing to running DCOS cluster.

2. Install Marathon-lb using DCOS CLI. Marathon-lb by defaults runs on public agents.

dcos package install marathon-lb

3. Create Marathon JSON for your application.

{ "id": "sample-dcos-website", "container": { "type": "DOCKER", "portMappings": [ { "hostPort": 0, "containerPort": 80, "servicePort": 10004 } ], "docker": { "image": "mesosphere/dcos-website:cff383e4f5a51bf04e2d0177c5023e7cebcab3cc" } }, "instances": 2, "cpus": 0.25, "mem": 100, "networks": [ { "mode": "container/bridge" } ], "healthChecks": [{ "protocol": "HTTP", "path": "/", "portIndex": 0, "timeoutSeconds": 2, "gracePeriodSeconds": 15, "intervalSeconds": 3, "maxConsecutiveFailures": 2 }], "labels":{ "HAPROXY_DEPLOYMENT_GROUP":"dcos-website", "HAPROXY_DEPLOYMENT_ALT_PORT":"10005", "HAPROXY_GROUP":"external", "HAPROXY_0_REDIRECT_TO_HTTPS":"true", "HAPROXY_0_VHOST": "Public-Node-IP" } }

Note: Set the value of HAPROXY_0_VHOST to IP address of the public node.

4. Run your application using this command with DCOS CLI.

dcos marathon app add path_to_above_json_file

Note: It will take one or two minutes to deploy your application over DCOS.

5. Now go to DCOS UI and you can see 2 instances of your service running there.

6. Now you can go to the browser and check the service running there.

Public_Node_IP:Port

Note– Here Public_Node_IP is the address of the public node where Marathon-lb is running and Port is the servicePort defined in Marathon JSON above which is 10004 in our case.

Thanks.

References–

Mesosphere

#WordPress

0 notes

pcnerds-tech · 7 years

Text

Local port Forwarding on Windows

What is port forwarding? In computer networking, port forwarding or portmapping is an application of network address translation (NAT) that redirects a communication request from one address and port number combination to another while the packets are traversing a network gateway, such as a router or firewall. What is cool about local ssh Port Forwarding? SSH port forwarding, or TCP/IP…

View On WordPress

0 notes

securitynewswire · 7 years

Text

rpcinfo Portmap DUMP Call Amplification Distributed Denial Of Service

SNPX.com : http://dlvr.it/PSWkSG

0 notes

mlbors · 7 years

Text

Setting up a workflow with Docker, GitHub, Travis and AWS

In this post, we are going to set up a workflow to test and deploy automatically a PHP app on Amazon Web Services Elastic Beanstalk.

The aim of this article is to set up a basic workflow using a few tools to automate our deployment process. First of all, we need to have an account on each of these platforms: GitHub, Docker Hub, Amazon Web Services (AWS) and Travis CI. We also need to have Docker and Git installed on our machine.

Basically, we are going to do the following things: use Docker to create our environment, develop our app, create a repository, commit and push everything to GitHub, let Travis CI do the tests, build a Docker image with our code and push everything to Elastic Beanstalk (EB).

Setting up the environment

To create our environment, we are going to use the PHP Docker Boilerplate from WebDevOps. We can clone the repository by simply use:

git clone https://github.com/webdevops/php-docker-boilerplate.git our-appt

Cloning webdevops/php-docker-boilerplate

We will just adjust a very few things. First, in the Dockerfile.production file, at the end of it, we are going to add this simple line:

COPY ./app/. /app/

Here, we are copying our app folder instead of sharing it with the host.

If we use another image, we may also need to expose ports and run a command when launching our container like so:

EXPOSE 80 EXPOSE 443 CMD ["supervisord"]

Initializing our project

We can now go to GitHub and create a new repository. After that, we can make our first commit:

git init git add . git commit –m "First commit" git remote add origin https://github.com/user/repository.git git push –u origin master

Initializing our repository

For our example, we create a really simple app using Composer. We must be sure to have the appropriate .gitignore file. A .dockerignore file could also be a good idea:

*.md .git* backup/* bin/* documentation/*

.dockerignore file example

We can now link our GitHub repository to Travis CI.

AWS / EB

We are now going to set a few things on AWS. First, we need to create a new application on EB. We can name it however we want. Then, we have to create new environment. Here, we need to select "Docker Multicontainers" for the platform option. We also need to enable VPC. We can then proceed.

When it is done, we need to assure that our instance profile can communicate with Amazon ECS. To be sure of this, we can attach the AWSElasticBeanstalkMulticontainerDocker policy to it. To set the instance profile, in the EB dashboard, we can go to Configuration > Instances settings. To attach a policy to a role, in the IAM dashboard, we can go to Roles, choose the concerned role and then attach the desired policy.

We also need to create a new user. If there is no available group, we have to create on first. When our user is created, we can request credentials. We need to be sure to keep the secret key somewhere because we won't be able to access it later.

Information we need

Before going any further, we need to be sure to have the following information:

Our app name (APP_NAME)

Our Docker username (DOCKER_USERNAME)

Our Docker repository name (DOCKER_REPOSITORY)

Our Docker password (DOCKER_PASSWORD - secure)

Our email adress used with Docker (DOCKER_EMAIL)

An image name (IMAGE_NAME)

The AWS bucket name (BUCKET_NAME)

Our deployment region (DEPLOYMENT_REGION)

The deployment bucket (DEPLOYMENT_BUCKET)

The deployment environment (DEPLOYMENT_ENV_NAME)

The deployment's id (DEPLOYMENT_ENV - secure)

Our AWS access key (AWS_ACCESS_KEY - secure)

Our AWS secret key (AWS_SECRET_KEY - secure)

Because we are using Travis CI, we can set these values in our Travis CI account in the settings sections of our project or put them in the .travis.yml file and encrypt the private ones with the following command (Travis CI gem need to be installed):

travis encrypt SOMEVAR=secretvalue --add

Docker Configuration

In case we would like to use a private repository on Docker Hub, we need to create a file named .dockercfg like so:

{ "https://index.docker.io/v1/": { "auth": "", "email": "" } }

.dockercfg file

The Amazon ECS container agent will use this file for authentication. Because Docker create a file with another format, we are going to feed this file later with the values that in the files generated by Docker.

Now we can create another file named Dockerrun.aws.json. This file will be used by EB to deploy our Docker containers. Here again, we are going to replace values in this file later.

{ "AWSEBDockerrunVersion": 2, "authentication": { "bucket": "", "key": ".dockercfg" }, "volumes": [ { "name": "storage", "host": { "sourcePath": "/var/data" } }, { "name": "app", "host": { "sourcePath": "/var/app/current/app" } } ], "containerDefinitions": [ { "name": "db", "image": "mysql:5.6", "essential": true, "memory": 512, "portMappings": [ { "hostPort": 3306, "containerPort": 3306 } ] "mountPoints": [ { "sourceVolume": "storage", "containerPath": "/var/lib/mysql" } ], "environment": [ { "name": "MYSQL_ROOT_PASSWORD", "value": "password" }, { "name": "MYSQL_DATABASE", "value": "my_db" } ] }, { "name": "app", "image": ":", "essential": true, "memory": 256, "portMappings": [ { "hostPort": 80, "containerPort": 80 } ], "links": [ "db" ], "mountPoints": [ { "sourceVolume": "app", "containerPath": "/var/www/html", "readOnly": true } ] } ] }

Dockerrun.aws.json file

We tell Travis CI what to do

Now, with the .travis.yml file, we can tell Travis CI what to do.

sudo: required language: php python: - "3.4" - "pypy-5.3.1" services: - docker before_install: # Install dependencies - gem update --system - sudo apt-get install -y python3.4 - sudo apt-get install --upgrade -y python-pip - sudo apt-get install jq - sudo pip install --user virtualenv # Create a virtual environment for AWS CLI - virtualenv my_py3 --python=/usr/bin/python3.4 - source my_py3/bin/activate - pip install --upgrade awscli - pip install --upgrade awsebcli # Set AWS information - aws configure set aws_access_key_id $AWS_ACCESS_KEY - aws configure set aws_secret_access_key $AWS_SECRET_KEY - aws configure set default.region $DEPLOYMENT_REGION - aws configure set metadata_service_timeout 1200 - aws configure set metadata_service_num_attempts 3 - aws configure list # Copy the docker-compose.production.yml file to docker-compose.yml file - cp docker-compose.production.yml docker-compose.yml # Build and create our containers - docker-compose up -d - docker ps -a before_script: # Install dependencies in the app container - docker-compose exec -T app composer self-update - docker-compose exec -T app composer install --no-interaction - docker-compose exec -T app composer dump-autoload -o script: # Run unit tests in the app container - docker-compose exec -T app vendor/bin/php-cs-fixer fix app --verbose - docker-compose exec -T app vendor/bin/phpunit --coverage-clover=coverage.xml after_success: # Stop containers and build our image - docker-compose stop - docker-compose build --no-cache # Send coverage information to Codecov (if needed) - bash <(curl -s https://codecov.io/bash) # Push image to Docker Hub and update EB environment - if [ "$TRAVIS_BRANCH" == "master" ]; then docker login -u="$DOCKER_USERNAME" -p="$DOCKER_PASSWORD"; docker tag $IMAGE_NAME $DOCKER_USERNAME/$DOCKER_REPOSITORY:$TRAVIS_BUILD_ID; docker push $DOCKER_USERNAME/$DOCKER_REPOSITORY:$TRAVIS_BUILD_ID; ./scripts/upload_image_to_elastcbeanstalk.sh $TRAVIS_BUILD_ID $DEPLOYMENT_BUCKET $DEPLOYMENT_ENV $APP_NAME $DEPLOYMENT_REGION $IMAGE_NAME $DEPLOYMENT_ENV_NAME $DOCKER_USERNAME $DOCKER_REPOSITORY $DOCKER_PASSWORD $DOCKER_EMAIL; fi notifications: email: false env: global: - APP_NAME=app_name - DOCKER_USERNAME=docker_username - DOCKER_REPOSITORY=docker_repository (here same as app_name) - IMAGE_NAME=image_name - BUCKET_NAME=bucket_name (here same as app_name) - DEPLOYMENT_REGION=us-east-2 (for example) - DEPLOYMENT_BUCKET=elasticbeanstalk-us-east-2-xxxxxxxxxxxx - DEPLOYMENT_ENV_NAME=env_name - DOCKER_EMAIL=docker_email - secure: some_secure_value - secure: some_secure_value - secure: some_secure_value - secure: some_secure_value

.travis.yml file

Let's recap what we are doing here:

We install the dependencies we need

We install AWS CLI in a virtual environment to avoid conflicts and set information

Create a docker-compose.yml file from the docker-compose.production.yml file

Start our containers

Install dependencies in the app container

Run tests in the app container

Build our image

If everything is alright and if we are on master branch, we push the image to Docker Hub and call our script to update the EB environment

EB environment

In our .travis.yml file, we call a script named upload_image_to_elastcbeanstalk.sh. Let's see what it is:

#! /bin/bash # Variables DOCKER_TAG=$1 DOCKERRUN_FILE="Dockerrun.aws.json" DOCKERCFG=".dockercfg" DOCKER_CONFIG="/home/travis/.docker/config.json" EB_BUCKET=$2 EB_ENV=$3 PREFIX="deploy/$DOCKER_TAG" APP_NAME=$4 DEPLOYMENT_REGION=$5 IMAGE_NAME=$6 DEPLOYMENT_ENV_NAME=$7 DOCKER_USERNAME=$8 DOCKER_REPOSITORY=$9 DOCKER_PASSWORD=${10} DOCKER_EMAIL=${11} DOCKER_IMAGE="$DOCKER_USERNAME/$DOCKER_REPOSITORY" # Generate dockercfg echo "::::: Creating .dockercfg file :::::" DOCKER_AUTH=($(sudo jq -r '.auths["https://index.docker.io/v1/"].auth' $DOCKER_CONFIG)) cat "$DOCKERCFG" \ | sed 's||'$DOCKER_AUTH'|g' \ | sed 's||'$DOCKER_EMAIL'|g' \ > $DOCKERCFG sleep 30 aws s3 cp $DOCKERCFG s3://$EB_BUCKET/.dockercfg sleep 30 echo "::::: Creating Dockerrun.aws.json file :::::" # Replace vars in the DOCKERRUN_FILE cat "$DOCKERRUN_FILE" \ | sed 's||'$EB_BUCKET'|g' \ | sed 's||'$DOCKER_IMAGE'|g' \ | sed 's||'$DOCKER_TAG'|g' \ > $DOCKERRUN_FILE sleep 30 aws s3 cp $DOCKERRUN_FILE s3://$EB_BUCKET/$PREFIX/$DOCKERRUN_FILE sleep 30 echo "::::: Creating new Elastic Beanstalk version :::::" # Run aws command to create a new EB application with label aws elasticbeanstalk create-application-version \ --region=$DEPLOYMENT_REGION \ --application-name $APP_NAME \ --version-label $DOCKER_TAG \ --source-bundle S3Bucket=$EB_BUCKET,S3Key=$PREFIX/$DOCKERRUN_FILE sleep 30 echo "::::: Updating Elastic Beanstalk environment :::::" aws elasticbeanstalk update-environment \ --environment-id $EB_ENV \ --environment-name $DEPLOYMENT_ENV_NAME \ --application-name $APP_NAME \ --version-label $DOCKER_TAG echo "::::: Removing file :::::" sleep 30 rm $DOCKERCFG rm $DOCKERRUN_FILE

upload_image_to_elastcbeanstalk.sh file

Let's recap what we are doing here:

We parse the file /home/travis/.docker/config.json and extract the "auth" value from it

We replace values in the .dockercfg and upload it on AWS S3

We replace values in Dockerrun.aws.json file and upload it on AWS S3

We create a new version for the application

We update the EB environment

We remove the configuration files

The end!

If everything is alright, our app will be deployed on AWS EB by the next push.

Test locally and fix problems

If we need to run our image locally, we can do the following thing:

docker pull username/repository:tag docker run --expose 80 -p 80:80 -it username/repository:tag

Or if we need to use sh:

docker run -ti username/repository:tag sh

To get console output from the instance, we can use the following command (awscli has to be installed; we may need to run aws configure first):

aws ec2 get-console-output --instance-id instance_id

Getting console output

We may also want to connect to our instance using SSH. To do so, we just have to use the following command (awsebcli has to be installed; we may need to run eb init first):

eb ssh --setup // If needed eb ssh environment-name

Using SSH

We can retrieve and display logs like so:

eb logs

Displaying logs

Further configuration and optimization

We may want to extend our configuration. For that, we can place some .config files in the .ebextentions folder. For example, we may need a script to clean unused images:

option_settings: - namespace: aws:elasticbeanstalk:command option_name: Timeout value: 1200 commands: docker_clean_containers: command: docker rm -v $(docker ps -a -q) ignoreErrors: true docker_clean_images: command: docker rmi $(docker images -q) ignoreErrors: true

Cleaning unused images - .ebextentions/docker.config

In our example, we also use a MYSQL container for our database. Using AWS RDS instead can be a reliable choice for production.

Conclusion

Setting up this whole process may seem to be quite a headache that involves many different tools. On the other hand, we have the opportunity to automate our deployment process while preserving control and security.

#git #github #travisci #docker #aws #eb #amazon web services #elastic beanstalk #php #deployment #continuous integration

0 notes

computernotes · 8 years

Text

Configuring the OSX firewall to block specific ports in 10.8+ using pf

Apple changed the firewall from ipfw to pf sometime around 10.8. Sometimes, one must block specific ports or perform more advanced firewalling than ApplicationFirewall allows. For example, I sought to block portmap reflection attacks on UDP 111, which cannot be done using ApplicationFirewall or the Security system preference.

I found this discussion of pf on OSX very helpful in building these rules: http://blog.scottlowe.org/2013/05/15/using-pf-on-os-x-mountain-lion/

The steps:

1) Create your own pf.conf file. We create a different pf.conf file rather than modifying the system one because Apple has a habit of overwriting system files in updates.

sudo cp /etc/pf.conf /etc/my.pf.conf sudo nano /etc/my.pf.conf

2) Add these lines to the bottom:

anchor "myrules" load anchor "myrules" from "/etc/pf.anchors/my.rules"

3) Then save and exit. Note that OSX has a tendency to replace " with “ which will cause issues. Quotes must be regular quotes and not the fancy OSX ones.

4) Now, create your ruleset. This is a basic set that allows SSH, HTTPD and AFP only. Create this file in /etc/pf.anchors/my.rules

sudo nano /etc/pf.anchors/my.rules

set block-policy drop set fingerprints "/etc/pf.os" set ruleset-optimization basic set skip on lo0 # Scrub incoming packets scrub in all no-df # Antispoof antispoof log quick for { lo0 en0 en2 }

# Block to/from illegal destinations or sources block in log quick from no-route to any

# Block by default block in log

# Block to/from illegal destinations or sources block in log quick from no-route to any

# Block portmap reflection attacks block in quick proto udp from any to port 111

# Allow critical system traffic pass in quick inet proto udp from any port 67 to any port 68

# allow ssh, http, AFP pass in quick inet proto tcp from any to port 22 pass in quick inet proto tcp from any to port 80 pass in quick inet proto tcp from any to port 548

# Allow outgoing traffic pass out inet proto tcp from any to any keep state pass out inet proto udp from any to any keep state

5) Test the ruleset. You should see your rules printed. If there are any syntax errors or etc here, fix them.

sudo pfctl -vnf /etc/my.pf.conf

6) Create a new LaunchDaemon plist file for the firewall to load on boot

sudo nano /Library/LaunchDaemons/my.pf.plist

7) Fill the file with this XML, and note you may have to modify the location of your custom pf config file generated in step 1 if you used something other than /etc/my.pf.conf

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE plist PUBLIC "-//Apple Computer/DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>my.pf</string> <key>Program</key> <string>/sbin/pfctl</string> <key>ProgramArguments</key> <array> <string>/sbin/pfctl</string> <string>-e</string> <string>-f</string> <string>/etc/my.pf.conf</string> </array> <key>RunAtLoad</key> <true/> <key>ServiceDescription</key> <string>FreeBSD Packet Filter (pf) daemon</string> <key>StandardErrorPath</key> <string>/var/log/pf.log</string> <key>StandardOutPath</key> <string>/var/log/pf.log</string> </dict> </plist>

8) Change ownership of the file to root

sudo chown root /Library/LaunchDaemons/my.pf.plist

9) Fire it up with launchctl!

sudo launchctl load /Library/LaunchDaemons/my.pf.plist

10) And verify it’s loaded:

launchctl list | grep my

All done!

#osx #firewall #pf #portmap #network security

0 notes

terabitweb · 5 years

Text

Original Post from Talos Security Author:

By Edmund Brumaghin and Holger Unterbrink.

Executive summary

Orcus RAT and RevengeRAT are two of the most popular remote access trojans (RATs) in use across the threat landscape. Since its emergence in 2016, various adversaries used RevengeRAT to attack organizations and individuals around the world. The source code associated with RevengeRAT was previously released to the public, allowing attackers to leverage it for their own malicious purposes. There are typically numerous, unrelated attackers attempting to leverage this RAT to compromise corporate networks for the purposes of establishing an initial point of network access, the performance of lateral movement, as well as to exfiltrate sensitive information that can be monetized. Orcus RAT was in the news earlier this year due to Canadian law enforcement activity related to the individual believed to have authored the malware.

Cisco Talos recently discovered a threat actor that has been leveraging RevengeRAT and Orcus RAT in various malware distribution campaigns targeting organizations including government entities, financial services organizations, information technology service providers and consultancies. We discovered several unique tactics, techniques, and procedures (TTPs) associated with these campaigns including the use of persistence techniques most commonly associated with “fileless” malware, obfuscation techniques designed to mask C2 infrastructure, as well as evasion designed to circumvent analysis by automated analysis platforms such as malware sandboxes.

The characteristics associated with these campaigns evolved over time, showing the attacker is constantly changing their tactics in an attempt to maximize their ability to infect corporate systems and work toward the achievement of their longer-term objectives.

Malicious email campaigns

There have been several variations of the infection process associated with these malware distribution campaigns over time. In general, the emails in every case claim to be associated with complaints against the organization being targeted. They purport to be from various authorities such as the Better Business Bureau (BBB). Below is an example of one of these emails:

Phishing email

In addition to Better Business Bureau, Talos has also observed emails purporting to be associated with other entities such as Australian Competition & Consumer Commission (ACCC), Ministry of Business Innovation & Employment (MBIE) and other regional agencies.

Earlier malware campaigns contained a hyperlink that directed potential victims to the malicious content responsible for initiating the malware infection. The attacker made use of the SendGrid email delivery service to redirect victims to an attacker-controlled malware distribution server.

The link in one example email was pointed to the following SendGrid URL:

https://u12047697[.]ct[.]sendgrid[.]net/wf/click?upn=X2vR6-2FdIf8y2XI902U8Tc8qh9KOPBogeTLss4h7AKXe0xRjCQw1VcMTssPPPTU28KY7PwUPERvVvIa8n4VQD-2Fw-3D-3D_tIiqtngjMfK6xwiZyGxyMuaZ5weLruJKBoFJsVrKYBziY2h51ElcQ2ocLru0oJCxt-2FOlkcr6RH8ktqTc-2B-2BQjmMscOQaeiy2zw8OOUb6nD0f1srQnQG-2B-2BIXtpubqjWMnnIHxJg3TvgFRq0itu75WQHjsdUv1O1g-2FrQzQAyJkGQN6vC9fH5R4R4FyLG9ahUnvbnHt-2FEmdUJQuft0jfw2c5uPBA2M5Yspgi-2Fodr8cEU2b8-3D

This URL is responsible for redirecting the client to a URL hosted on an attacker-controlled server that hosts a ZIP archive containing the malicious PE32 used to infect the system. Below, you can see the HTTP GET request that is responsible for retrieving this and continuing the infection process.

ZIP File download

A PE32 executable is inside of the ZIP archive. It needs to be executed by the victim to infect the system with Orcus RAT. The PE32 filename features the use of double extensions (478768766.pdf.exe) which, by default on the Windows operating system, will only display the first extension (.PDF.) The PE32 icon has been set to make the file appear as if it is associated with Adobe Acrobat.

Double extensions trick

This loader (478768766.pdf.exe) is protected by the SmartAssembly .NET protector (see below), but can easily be deobfuscated via d4dot. It is responsible for extracting and decrypting the Orcus RAT. It extracts the Orcus executable from its Resource “人豆认关尔八七” as shown in the screenshots below.

Orcus loader resources

The Class5.smethod_1 method, shown in the screenshot below, decodes the content from the resource section and restores the original Orcus RAT PE file.

Resource section payload decoding

The smethod_3 shown below finally starts another instance of the loader (478768766.pdf.exe) and injects the Orcus PE file into this loader process. Then it resumes the process, which executes the Orcus RAT PE file in memory in the 478768766.pdf.exe process context. This means the original Orcus RAT PE file is never written to disk in clear text. This makes it more difficult for anti virus systems to detect it.

Process injection method

The loader achieves persistence by creating a shortcut that points to its executable and storing the shortcut in the following Startup directory:

C:UsersAppDataRoamingMicrosoftWindowsStart MenuProgramsStartup

The dropper also copies itself over to %APPDATA%Roamingtrfgtfrfgrf.exe and creates and starts the rfgrf.exe.bat file, which you can see below. The bat file executes the copy of the loader every 60 seconds.

rfgrf.exe.bat

In later campaigns, the adversary modified the infection process and emails no longer leveraged the SendGrid URLs. Later emails featured the same themes and verbiage but were modified to contain ZIP archive attachments.

Phishing email

The attached ZIP archives contain malicious batch files responsible for retrieving the malicious PE32 file and executing it, thus infecting the system. Early versions of the batch file retrieved additional malicious content from the same server previously used to host the ZIP archives.

Malicious .bat downloader

One interesting thing to note about the batch files was the use of an obfuscation technique that is not commonly seen. In early campaigns, the attacker prepended the bytes “FF FE 26 63 6C 73 0D 0A” into the file, causing various file parsers to interpret the file contents as UTF-16 LE, resulting in the parsers failing to properly display the contents of the batch file.

Unicode obfuscation standard editor

The hex view of the same file shows these prepended bytes which are responsible for this parsing issue.

Unicode obfuscation hex view

This is a well-known technique as can be observed in the forum thread here.

Later versions of the .bat downloader featured the use of obfuscation in an attempt to make analysis more difficult. They are using a simple obfuscation method and are just replacing all characters by variables that are resolved at runtime.

Obfuscated RevengeRat .bat downloader

The decoded version of the .bat file looks like this. Like in the non-obfuscated versions of the .bat file, the adversaries are downloading the .js file to a local directory (C:windowsr2.js) and executing it.

Decoded obfuscated .bat file

This r2.js file is another obfuscated script. It is filled with a bunch of rubbish and one long line of code.

Downloaded r2.js file

This scripts writes the ‘TVqQ…’ string into the registry.

r2.js payload

Stored encoded malware in registry key

It loads this string at the end of the infection process, decodes it and executes it.

r2.js payload decoding routine

Decompiling this payload in dnSpy shows an old friend: RevengeRAT.

RevengeRAT decompiled binary

Command and control (C2) obfuscation

As is the case with many popular RATs, the C2 infrastructure was observed leveraging Dynamic Domain Name System (DDNS) in an attempt to obfuscate the attacker’s infrastructure. In the case of these malware campaigns, the attacker took an additional step. They pointed the DDNS over to the Portmap service to provide an additional layer of infrastructure obfuscation.

Portmap is a service designed to facilitate external connectivity to systems that are behind firewalls or otherwise not directly exposed to the internet.

Port forwarding service

These systems initiate an OpenVPN connection to the Portmap service, which is responsible for handling requests to those systems via port mapping. We have recently observed an increase in the volume of malicious attackers abusing this service to facilitate the C2 process across various malware families.

HTTPS certificate

As demonstrated above, the DNS configuration for the DDNS hostname used by the malware for C2 has actually been pointed to the Portmap service. Let’s Encrypt issued the SSL certificate associated with this host.

Payload analysis

The adversaries used at least two different RATs in the campaigns which we have closely analyzed: Orcus RAT and RevengeRAT. For both RATs, the source code was leaked in the underground and several adversaries have used it to build their own versions. You can see the comparison of the leaked version of RevengeRAT and the one we analyzed below.

Compairson leaked malware and modified one

The adversaries changed the source code slightly. They moved the original code into separate functions and changed the execution order a bit plus added other minor changes like additional variables, but overall the code is still very similar to the leaked code. On the other hand, it is modified so that the resulting binary looks different for AVs.

It is interesting to see that both (Client) IDs are pointing to the same name: CORREOS. In the Nuclear_Explosion file, aka RevengeRAT, it is only base64 encode “Q09SUkVPUw==“.

RevengeRAT Atomic class config

Orcus decoded XML config

Conclusion

These malware distribution campaigns are ongoing and will likely continue to be observed targeting various organizations around the world. RevengeRAT and Orcus RAT are two of the most popular RATs in use across the threat landscape and will likely continue to be heavily favored for use during the initial stages of attacks.

Organizations should leverage comprehensive defense-in-depth security controls to ensure that they are not adversely impacted by attacks featuring these malware families. At any given point in time, there are several unrelated attackers distributing these RATs in different ways. Given that the source code of both of these malware families is readily available, we will likely continue to see new variants of each of these RATs for the foreseeable future.

Coverage

Additional ways our customers can detect and block this threat are listed below.

Advanced Malware Protection (AMP) is ideally suited to prevent the execution of the malware used by these threat actors.

Cisco Cloud Web Security (CWS) or Web Security Appliance (WSA) web scanning prevents access to malicious websites and detects malware used in these attacks.

Email Security can block malicious emails sent by threat actors as part of their campaign.

Network Security appliances such as Next-Generation Firewall (NGFW), Next-Generation Intrusion Prevention System (NGIPS), and Meraki MX can detect malicious activity associated with this threat.

AMP Threat Grid helps identify malicious binaries and build protection into all Cisco Security products.

Umbrella, our secure internet gateway (SIG), blocks users from connecting to malicious domains, IPs, and URLs, whether users are on or off the corporate network.

Open Source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.

Indicators of Compromise (IOCs)

The following indicators of compromise (IOCs) have been observed to be associated with malware campaigns.

ZIP Hashes (SHA256):

c66c96c8c7f44d0fd0873ea5dbaaa00ae3c13953847f0ca308d1f56fd28f230c d6c5a75292ac3a6ea089b59c11b3bf2ad418998bee5ee3df808b1ec8955dcf2a

BAT Hashes (SHA256):

20702a8c4c5d74952fe0dc050025b9189bf055fcf6508987c975a96b7e5ad7f5 946372419d28a9687f1d4371f22424c9df945e8a529149ef5e740189359f4c8d

PE32 Hashes (SHA256):

ff3e6d59845b65ad1c26730abd03a38079305363b25224209fe7f7362366c65e 5e4db38933c0e3922f403821a07161623cd3521964e6424e272631c4492b8ade

JS Hashes (SHA256):

4c7d2efc19cde9dc7a1fcf2ac4b30a0e3cdc99d9879c6f5af70ae1b3a846b64b

Domains:

The following domains have been observed to be associated with malware campaigns:

skymast231-001-site1[.]htempurl[.]com qstorm[.]chickenkiller[.]com

IP Addresses:

The following IP addresses have been observed to be associated with malware campaigns:

193[.]161[.]193[.]99 205[.]144[.]171[.]185

Go to Source Author: RAT Ratatouille – Backdooring PCs with leaked RATs Original Post from Talos Security Author: By Edmund Brumaghin and Holger Unterbrink. Executive summary Orcus RAT…

#Talos Security

0 notes