#blue is userspace processes | Explore Tumblr posts and blogs

kracekumar · 8 years ago

Text

Notes from Root Conf Day 1 - 2017

Root Conf is a conference on DevOps and Cloud Infrastructure. 2017 edition’s theme is service reliability. Following is my notes from Day 1.

State of the open source monitoring landscape

The speaker of the session is the co-founder of Icinga monitoring system. I missed first ten minutes of the talk. -The talk is a comparison of all available OSS options for monitoring, visualization.

Auto-discovery is hard.

As per 2015 monitoring tool usage survey, Nagios is widely used.

Nagios is reliable and stable.

Icinga 2 is a fork of Nagios, rewrite in c++. It’s modern, web 2.0 with APIs, extensions and multiple backends.

Sensu has limited features on OSS side and a lot of features on enterprise version. OSS version isn't useful much.

Zabbix is full featured, out of box monitoring system written in C. It provides logging and graphing features. Scaling is hard since all writes are written to single Postgres DB.

Riemann is stream processor and written in Clojure. The DSL stream processing language needs knowledge of Clojure. The system is stateless.

OpenNMS is a network monitoring tool written in Java and good for auto discovery. Using plugins for a non-Java environment is slow.

Graphite is flexible, a popular monitoring tool for time series database.

Prometheus is flexible rule-based alerting and time series database metrics.

Elastic comes with Elastic search, log stash, and kibana. It’s picking up a lot of traction. Elastic Stack is extensible using X-PACK feature.

Grafana is best for visualizing time series database. Easy to get started and combine multiple backends. - - Grafana annotations easy to use and tag the events.

There is no one tool which fits everyone's case. You have to start somewhere. So pick up a monitoring tool, see if it works for you else try the next one til you settle down.

Deployment strategies with Kubernetes

This was talk with a live demo.

Canary deployment: Route a small amount of traffic to a new host to test functioning.

If new hosts don't act normal roll back the deployment.

Blue Green Deployment is a procedure to minimize the downtime of the deployment. The idea is to have two set of machines with identical configuration but one with the latest code, rev 2 and other with rev 1. Once the machines with latest code act correctly, spin down the machines with rev 1 code.

Then a demo of kubectl with adding a new host to the cluster and roll back.

A little bot for big cause

The talk is on a story on developing, push to GitHub, merge and release. And shit hits the fan. Now, what to do?

The problem is developer didn’t get the code reviewed.

How can automation help here?

Enforcing standard like I unreviewed merge is reverted using GitHub API, Slack Bot, Hubot.

As soon as developer opens a PR, alice, the bot adds a comment to the PR with the checklist. When the code is merged, bot verifies the checklist, if items are unchecked, the bot reverts the merge.

The bot can do more work. DM the bot in the slack to issue commands and bot can interact with Jenkins to roll back the deployed code.

The bot can receive commands via slack personal message.

Necessary tooling and monitoring for performance critical applications

The talk is about collecting metrics for German E-commerce company Otto.

The company receives two orders/sec, million visitors per day. On an average, it takes eight clicks/pages to complete an order.

Monitor database, response time, throughput, requests/second, and measure state of the system

Metrics everywhere! We talk about metrics to decide and diagnose the problem.

Metrics is a Clojure library to measure and record the data to the external system.

The library offers various features like Counter, gauges, meters, timers, histogram percentile.

Rather than extracting data from the log file, measure details from the code and write to the data store.

Third party libraries are available for visualization.

The demo used d3.js application for annotation and visualization. In-house solution.

While measuring the metrics, measure from all possible places and store separately. If the web application makes a call to the recommendation engine, collect the metrics from the web application and recommendation for a single task and push to the data store.

What should be PID 1 in a container?

In older version of Docker, Docker doesn’t reap child process correctly. As a result, for every request, docker spawns a new application and never terminated. This is called PID 1 zombie problem.

This will eat all available PIDs in the container.

Use Sysctl-a | grep pid_max to find maximum available PIDs in the container.

In the bare metal machine, PID 1 is systemd or any init program.

If the first process in the container is bash, then is PID 1 zombie process doesn’t occur.

Using bash is to handle all signal handlers is messy.

Yelp came up with Yelp/dumb-init. Now, dumb-init is PID 1 and no more zombie processes.

Docker-1.13, introduced the flag, --init.

Another solution uses system as PID 1

Docker allows running system without privilege mode.

Running system as PID 1 has other useful features like managing logs.

‘Razor’ sharp provision for bare metal servers

I attended only first half of the talk, fifteen minutes.

When you buy physical rack space in a data server how will you install the OS? You’re in Bangalore and server is in Amsterdam.

First OS installation on bare metal is hard.

There comes Network boot!

PXELinux is a syslinux derivative to boot OS from NIC card.

Once the machine comes up, DHCP request is broadcasted, and DHCP server responds.

Cobbler helps in managing all services running the network.

DHCP server, TFTP server, and config are required to complete the installation.

Microkernel in placed in TFTP server.

Razor is a tool to automate provisioning bare metal installation.

Razor philosophy, consume the hardware resource like the virtual resource.

Razor components - Nodes, Tags, Repository, policy, Brokers, Hooks

FreeBSD is not a Linux distribution

FreeBSD is a complete OS, not a distribution

Who uses? NetFlix, WhatsApp, Yahoo!, NetApp and more

Great tools, mature release model, excellent documentation, friendly license.

Now a lot of forks NetBSD, FreeBSD, OpenBSD and few more

Good file system. UFS, and ZFS. UFS high performance and reliable. - If you don’t want to lose data use ZFS!

Jails - GNU/Linux copied this and called containers!

No GCC only llvm/clang.

FreeBSD is forefront in developing next generation tools.

Pluggable TCP stacks - BBR, RACK, CUBIC, NewReno

Firewalls - Ipfw , PF

Dummynet - live network emulation tool

FreeBSD can run Linux binaries in userspace. It maps GNU/Linux system call with FreeBSD.

It can run on 256 cores machine.

Hard Ware - NUMA, ARM64, Secure boot/UEFI

Politics - Democratically elected core team

Join the Mailing list and send patches, you will get a commit bit.

Excellent mentor program - GSoC copied our idea.

FreeBSD uses SVN and Git revision control.

Took a dig at GPLV2 and not a business friendly license.

Read out BSD license on the stage.

#devops #notes #conference

0 notes