#Fluentbit
Explore tagged Tumblr posts
Text
Logging Frameworks that can communicate directly with Fluent Bit
While the typical norm is for applications to write their logs to file or to stdout (console), this isn’t the most efficient way to handle logs (particularly given I/O performance for the storage devices). Many logging frameworks have addressed this by providing more direct outputs to commonly used services such as ElasticSearch and OpenSearch. This is fine, but the only downside is that there is…
View On WordPress
#.net#Erlang#Fluentbit#Go#Golang#java#languages#libraries#loggiong#node.js#OCAML#perl#PHP#python#Ruby#Scala
0 notes
Text
Introducing Manticore Integration with Fluentbit
http://dlvr.it/StVmf4
0 notes
Link
0 notes
Text
Fluent Bit | Grafana Loki documentation
Fluent Bit | Grafana Loki documentation
https://grafana.com/docs/loki/latest/clients/fluentbit/ Enviado do meu telemóvel
View On WordPress
0 notes
Text
Kubernetes must know:
First thing to know is that Kubernetes has many competitors such as Docker Swarm, Zookeeper, Nomad etc.. and Kubernetes is not the solution for every architecture so please define your requirements and check other alternatives first before starting with Kuberenetes as it can be complex or not really that beneficial in your case and that an easier orchestrator can do the job.
If you are using a cloud provider, and you want a managed kubernetes service, you can check EKS for AWS, GCP for Google Cloud or AKS for Azure.
Make sure to have proper monitoring and alerting for your cluster as this enables more visibility and eases the management of containerized infrastructure by tracking utilization of cluster resources including memory, CPU, storage and networking performance. It is also recommended to monitor pods and applications in the cluster. The most common tools used for Kubernetes monitoring are ELK/EFK, datadog, Prometheus and Grafana which will be my topic for the next article, etc..
Please make sure to backup your cluster’s etcd data regularly.
In order to ensure that your kubernetes cluster resources are only accessed by certain people, it's recommended to use RBAC in your cluster in order to build roles with the right access.
Scalability and what's more important than scalability, 3 types we must know and include in our cluster architecture are Cluster autoscaler, HPA and VPA.
Resource management is important as well, setting and rightsizing cluster resources requests and limits will help avoiding issues like OOM and Pod eviction and saves you money!
You may want to check Kubernetes CIS Benchmark which is a set of recommendations for configuring Kubernetes to support a strong security posture, you can take a look at this article to learn more about it.
Try to always get the latest Kubernetes stable GA version for newer functionalities and if using cloud, for supported versions.
Scan containers for security vulnerabilities is very important as well, here we can talk about tools like Kube Hunter, Kube Bench etc..
Make use of Admission controllers when possible as they intercept and process requests to the Kubernetes API prior to persistence of the object, but after the request is authenticated and authorized, which is used when you have a set of constraints/behavior to be checked before a resource is deployed. It can also block vulnerable images from being deployed.
Speaking about Admission controller, you can also enforce policies in Kubernetes using a tool like OPA which lets you define sets of security and compliance policies as code.
Using a tool like Falco for auditing the cluster, this is a nice way to log and monitor real time activities and interactions with the API.
Another thing to take a look at is how to handle logging of applications running in containers (I recommend checking logging agents such fluentd/fluentbit) and especially how to setup Log rotation to reduce the storage growth and avoid performance issues.
In case you have multiple microservices running in the cluster, you can also implement a service mesh solution in order to have a reliable and secure architecture and other features such as encryption, authentication, authorization, routing between services and versions and load balancing. One of the famous service mesh solutions is Istio. You can take a look at this article for more details about service mesh.
One of the most important production ready clusters features is to have a backup&restore solution and especially a solution to take snapshots of your cluster’s Persistent Volumes. There are multiple tools to do this that you might check and benchmark like velero, portworx etc..
You can use quotas and limit ranges to control the amount of resources in a namespace for multi-tenancy.
For multi cluster management, you can check Rancher, weave Flux, Lens etc..
0 notes
Text
Accelerating your application modernization with Amazon Aurora Machine Learning
Organizations that store and process data in relational databases are making the shift to the cloud. As part of this shift, they often wish to modernize their application architectures and add new cloud-based capabilities. Chief among these are machine learning (ML)-based predictions such as product recommendations and fraud detection. The rich customer data available in relational databases is a good basis for transforming customer experiences and business operations. Organizations that store and process relational data seek to adopt ML services in the cloud quickly and broadly. In this post, we see how Amazon Aurora Machine Learning, a feature of Amazon Aurora, makes it easy to make ML-based predictions on relational data, using a video game as a use case. Databases and ML Incorporating an ML algorithm into a software application has traditionally required a lengthy integration process. It typically involves having a data scientist, who selects and trains the model, and in some cases an application developer, who needs to write application code to read data from the database, format it for the ML algorithm, call an ML service such as Amazon SageMaker to run the algorithm, format the output, and retrieve the results back to the application. It can take several days or weeks to build an integration that achieves an application’s scalability requirements, especially if there are low latency requirements measured in milliseconds, which is typical for product recommendations, fraud detection, and many other applications. And after the integration is built, it requires maintenance when updating the ML algorithm or when the input data deviates from the training data. Aurora Machine Learning simplifies the integration process by making ML algorithms available to run via SQL functions directly in the application. After you deploy the algorithm in SageMaker, you can run a SQL query in Aurora, and Aurora does the heavy lifting of efficiently transferring the data to SageMaker, managing input and output formatting, and retrieving the results. In our example, we show how to train a fraud detection model, then deploy it to SageMaker so a software developer with little to no ML expertise can add predictions via SQL statements. Let’s play a game! To demonstrate how we train a fraud detection algorithm and call it from an application via SQL statements, we use a video game use case. Our goal is to find cheaters—players who write bots to play on their behalf or coordinate with other players to gain advantage over legitimate players. Let’s explore which ML models can detect these cheats and how to run the analysis from the customer application using SQL. The scenario that we simulate uses a successful multiplayer game launched in late 2020 on AWS. Although the game is a lot of fun, the customer care team received complaints about players cheating in the game. Our task is to catch these cheats and remove them from the game, so we want to build a cheat detection system that extends the customer care application and provides hints with good efficacy. We assume the customer care application uses an Aurora MySQL database, and we minimize changes to the application by using Aurora MySQL tools. The game we use is SuperTuxKart, a free and open-source kart racing game. Players take actions like collecting and using cans of Nitro. They also use various power-up items obtained by driving into item boxes laid out on the course. These power-ups include mushrooms to give players a speed boost, Koopa shells to be thrown at opponents, and banana peels and fake item boxes that can be laid on the course as hazards. Player actions are defined as a collection of game actions such as kart steer, brake, drift, look back, and more. Cheating allows bots to benefit from power-up items while steering or braking the karts. Our main goal is to classify player behavior and distinguish between human and bot actions. After classifying suspicious player actions, we cross-reference these players with other customer records that are already stored in the application’s database. Key data includes customer care history such as in-game microtransactions and customer care events. Therefore, we train two models. The first identifies bots by classifying player moves, and the second detects suspicious in-game microtransactions. The data pipeline and schema In our scenario, the game server runs in Amazon Elastic Kubernetes Service (Amazon EKS). The player actions are written to the game server standard output. We used the Fluentbit project to stream the server stdout to Amazon Kinesis Data Firehose. Kinesis Data Firehose stores the player actions in Amazon Simple Storage Service (Amazon S3), and we load the data using an AWS Lambda function to an Aurora table called actions. To enable rapid response to cheat activities in the game, we need to minimize the ingestion latency. In our example, we ingest the game events as online transactions. The time it takes to get player actions into Aurora from the time the game action took place is a few minutes and scales horizontally as Kinesis Data Firehose and Lambda scales. The game play actions are defined as the struct Action: struct Action { int p_guid; int m_ticks; int m_kart_id; int m_action; int m_value; int m_value_l; int m_value_r; }; The game server emits player game action logs in near-real time as the game progresses. p_guid is the player unique identifier, m_ticks is a counter that increments upon any player action. m_kart_id is the player kart unique ID. The m_value, m_value_l, and m_value_r fields indicate the action’s magnitude; for example, when a player attempts to slow down abruptly, the brake action carries the max integer 32768. It’s similar for acceleration and kart steering. create table if not exists actions ( id int NOT NULL AUTO_INCREMENT, p_guid int, m_ticks int, m_kart_id int, m_action int, m_value int, m_value_l int, m_value_r int, class tinyint, primary key (id) ); To train the cheat detection model, we facilitated hundreds of legitimate multiplayer game sessions and bot simulated game sessions. (One of this post’s authors played many rounds of SuperTuxKart with his 9-year-old son—not a bad way to gain a reputation as a cool dad.) We used the class field to manually classify the game sessions into legitimate and bot sessions. Prior to each game session, we captured the last game sequence ID; after the session, we updated the classified with 1 in the case of a bot simulated session or 0 in the case of a legitimate game session for the sequence of player actions: update actions set class=1 where id>Num and class is null; Formulating the ML problem The next step is to look at legitimate player actions and compare them with non-legitimate player (bot) actions. We used SageMaker Jupyter notebooks to discover trends that distinguish between the two groups. In the following graphs, the X axis is the player ID (id) and the Y axis is the value of the ticks (m_ticks). The red plot shows bot game actions, and the blue plot shows legitimate human player actions. We can see that the bot game action frequency was more consistent than a legitimate human player, which gives us a way to differentiate between the two, as we now discuss. The game simulates a kart’s motions that move at a dynamic acceleration along a non-straight line. We can use fundamental kinematic physics to calculate the average velocity and acceleration changes and train a linear regression-based model that predicts bot or human kart velocity and acceleration. We found that the values of the actions generated by a bot are distributed differently than a human player. We attribute the findings to a naively written bot, and to the behavior of the specific human player level that tends to generate more hectic action values than a bot that knows the right path to take. In the real world, bot writers improve their bots continuously to avoid detection, and we have to continuously refine our detection capabilities. The good news is that the methodology we propose here is not limited to the specific bot implementation, and can indeed be continuously refined. In the following section, we package the SQL statements with MySQL views that calculate the actions’ velocity and acceleration for brevity and demonstration purposes. Let’s first calculate the player actions’ velocity, vel, in a session for bots and humans, using prev.class=curr.class as follows: create or replace view v_actions_m_value_velocity as select id,m_action,m_kart_id,c_v,p_v,(c_v-p_v) vel,c_v_l,p_v_l,(c_v_l-p_v_l) vel_l,c_v_r,p_v_r,(c_v_r-p_v_r) vel_r,party_size,session,class from ( select curr.id,curr.m_action,curr.m_kart_id, curr.m_value c_v,prev.m_value p_v, curr.m_value_l c_v_l,prev.m_value_l p_v_l, curr.m_value_r c_v_r,prev.m_value_r p_v_r, curr.party_size,curr.session,curr.class from actions prev, actions curr where prev.id=curr.id-1 and prev.class=curr.class and curr.m_kart_id=prev.m_kart_id and curr.m_action=prev.m_action and curr.party_size=prev.party_size and curr.session=prev.session and party_size=2 order by curr.m_kart_id,curr.id ) v In this example, we assume a session (curr.session=prev.session) is tagged (bot or human) during the data ingestion to Aurora. We also include moves made by a single player curr.m_kart_id=prev.m_kart_id, same party size (curr.party_size=prev.party_size), and same classification (prev.class=curr.class). We then use the velocity values and calculate the average acceleration, accel, for bots and humans in a similar way, as follows: create or replace view v_actions_m_value_accel as select id,m_action,m_kart_id,c_v,p_v,c_vel,p_vel,(c_vel-p_vel) accel,c_v_l,p_v_l,c_vel_l,p_vel_l,(c_vel_l-p_vel_l) accel_l,c_v_r,p_v_r,c_vel_r,p_vel_r,(c_vel_r-p_vel_r) accel_r,party_size,session,class from ( select curr.id,curr.m_action,curr.m_kart_id, curr.c_v,curr.p_v,curr.vel c_vel,prev.vel p_vel, curr.c_v_l,curr.p_v_l,curr.vel_l c_vel_l, prev.vel_l p_vel_l,curr.c_v_r,curr.p_v_r, curr.vel_r c_vel_r,prev.vel_r p_vel_r, curr.party_size,curr.session,curr.class from v_actions_m_value_velocity prev,v_actions_m_value_velocity curr where prev.id=curr.id-1 and prev.class=curr.class and curr.m_kart_id=prev.m_kart_id and curr.m_action=prev.m_action and curr.party_size=prev.party_size and curr.session=prev.session and curr.party_size=2 order by curr.m_kart_id,curr.id ) v To observe the acceleration and velocity patterns, we populated two DataFrames using the following simple queries: select id,accel,class from v_actions_ticks_accel where class=0 select id,accel,class from v_actions_ticks_accel where class=1 As we discussed earlier, the class column differentiates between bots and humans: class=1 is bot acceleration, class=0 is human acceleration. We can see that the kart average acceleration values accel generated by bots (class=1) scatter across a broader range of values, whereas human game actions (class=0) tend to be extreme. The average acceleration distribution can be used as a logistic function to model the classification binary dependent variable that indicates if an action was made by a bot or a human. Therefore, we use the SageMaker linear learner built-in algorithm to predict human or bot action, and combine this player move model with a separate, in-game transaction fraud detection model for a fuller picture. The cheater detection process We used Aurora as the data source for data exploration in our Jupyter notebook using the MySQL Python client, and also used Aurora to prepare the data for model training. After the model was trained, we hosted it in SageMaker with the endpoint name stk-bot-detect-actions. We defined a function in Aurora that calls the classification model against freshly streamed player data, as in the following code: DROP FUNCTION IF EXISTS bot_detect_actions_score; CREATE FUNCTION bot_detect_actions_score( value int,velocity int,accel int, value_l int, velocity_l int,accel_l int, value_r int,velocity_r int,accel_r int, m_action_0 int,m_action_1 int,m_action_2 int, m_action_3 int,m_action_4 int, m_action_5 int, m_action_6 int ) RETURNS varchar(256) alias aws_sagemaker_invoke_endpoint endpoint name 'stk-bot-detect-actions' ; For more information about calling SageMaker endpoints from Aurora, and how the two services work together to simplify ML integration into your applications, see Using machine learning (ML) capabilities with Amazon Aurora. Our model endpoint accepts a player action record in a multi-player session. The record includes the action value, the average velocity, and average acceleration of the player move. The idea is that the call to the model is done via a SQL query triggered by the customer care app. The app queries the MySQL view v_actions_m_value_accel and m_action_encoding. The following query scans unclassified game records (class is null) and assumes that unclassified game events are the latest to be scanned: SELECT bot_detect_actions_score(c_v,c_vel,accel,c_v_l,c_vel_l,accel_l,c_v_r,c_vel_r,accel_r,t2.i_0,t2.i_1,t2.i_2,t2.i_3,t2.i_4,t2.i_5,t2.i_6) as cls FROM v_actions_m_value_accel t1,m_action_encoding t2 WHERE t1.m_action=t2.m_action and class is null The model query returns suspicious player moves as classified by our model, when cls>0. It’s a good starting point for further investigation of these players, but not necessarily the final determination that these are bots. We also use m_action_encoding, which is populated in the notebook after encoding (OneHotEncoding) the m_action values for better model accuracy. A customer care representative could now call other models against these suspicious users to get a more accurate picture. For example, the customer care application might use a player microtransaction classifier or player auth activities using the following MySQL queries: SELECT t.timestamp, t.playerGuid FROM (SELECT timestamp, playerGuid, auth_cheat_score(uagent, day, month, hour, minute, src_ip_encoded) cls FROM auth) AS t WHERE cls>0; SELECT t.timestamp, t.playerGuid FROM (SELECT playerGuid, timestamp, trans_cheat_score(month, day, hour, minute, name_encoded, uagent) cls FROM transactions t) AS t WHERE cls>0; Cheat detection is an ongoing game of cat and mouse between us and the cheaters. As soon as they discover the methods we employ, they’ll surely learn to overcome them. For example, they write bots that produce less predictable player ticks, so the ML problem morphs and requires continuous data exploration. Detecting bots with the players’ actions requires us to look at game session snippets with all their attributes, such as a series of player ticks, activities, and values of a specific player. The supervised algorithms employ a logistic function to model the probability of a bot or a human. We could also explore other model options, such as Naive Bayes or KNN, which are outside the scope of this post. How a customer care operator can use the model Our solution implements a stored procedure that, given a player name, compiles the user’s recent game sessions, queries the model, and updates the classification prediction in the players’ session tables (ticks_session_sample). A customer care application can expose the cheating indications in the player page that a customer service representative can view. The representative could trigger calls to other models for detecting potential fraud, such as credit card or suspicious logins. After the representative is satisfied that the determination (human or bot) is correct, we can add the results into the next training of our ML model. Try it yourself You can try this end-to-end solution, but for those who don’t have time to set up the EKS cluster, deploy the game server, and train the model, we offer a sample dataset that we trained. If you choose to use the sample dataset, skip steps 1, 2, 4, and 5. You can load the file to your Aurora MySQL and train the model as instructed in step 6. Create an EKS cluster with a worker node group. Deploy the game server. Create an Aurora MySQL cluster and allow SageMaker calls from the database. Configure the data pipeline: Enable player network datagrams. Create Kinesis Data Firehose. Deploy FluentBit to stream the player actions to Kinesis Data Firehose. Play the game, a lot! Then play it against bots. Train and deploy the model. Play another game with bot and call the function. Hopefully, you catch the bot! Conclusion ML adoption is a complete process that includes integration into data sources, model training, inference, and continuous updating and refinement. As you build or move your applications to the cloud, make sure to take advantage of the ML services and tools AWS built. We encourage you to read recent announcements about these topics, including several at AWS re:Invent 2020. If you’re not ready to build your own models, you can still work with a data scientist or use the many pre-built models available. About the Authors Yahav Biran is a Solutions Architect in AWS, focused on game tech at scale. Yahav enjoys contributing to open-source projects and publishes in the AWS blog and academic journals. He currently contributes to the K8s Helm community, AWS databases and compute blogs, and Journal of Systems Engineering. He delivers technical presentations at technology events and works with customers to design their applications in the cloud. He received his PhD (Systems Engineering) from Colorado State University. Yoav Eilat is Senior Product Manager for Amazon RDS and Amazon Aurora. He joined AWS in 2016 after holding product roles for several years at Oracle and other technology companies. At AWS, he managed the launches of Aurora PostgreSQL, Aurora Serverless, Aurora Global Database, and other major features. Yoav currently focuses on new capabilities for the MySQL and PostgreSQL database engines. https://aws.amazon.com/blogs/database/accelerating-your-application-modernization-with-amazon-aurora-machine-learning/
0 notes
Text
It’s been awhile y’all
It's been a hot minute since I've documented some of my work, so I guess in keeping with making a main blog post, I'll make a devblog post today too.
cfn-mode / flycheck-cfn
https://gitlab.com/worr/cfn-mode/
I've been an emacs user for some time, and at my current job, I've been hurting for good support for cloudformation templates in my editor. I wrote this mode and flychecker to at least add some basic syntax highlighting and linter support. I'm currently in the process of getting them added to MELPA.
imdb-api
I made a bunch of changes fairly recently to imdb-api, most notably adding front-end support, migrating to Gitlab and migrating to ky after node-requests was deprecated. Normally I'd link patches, but there are too many since my last update. Here's the changelog: https://gitlab.com/worr/node-imdb-api/-/blob/master/CHANGELOG.md
fluent-bit
At work, we discovered an issue where our fluent-bits were sticky to the same instance of fluentd if we turned on keepalive and used a load-balancer.
To mitigate this, I ended up adding a new option to fluent-bit that will recycle keepalive connections after sending a number of messages, to cycle between backend instances periodically.
https://github.com/fluent/fluent-bit/commit/44190c2a1c4b939dc9ecb2908148d38c82a40831
https://github.com/fluent/fluent-bit-docs/commit/8d43b502123e366a1722a0051918ce7d78a8506b
fluentd-s3-plugin
Also at work, we found a case where the fluend plugin for s3 would spend forever trying to upload logs. By default, the naming scheme for the log chunks would be something like <time_slice>_<idx>. The time slice is the time when the log was uploaded, and the idx value is a monotonically increasing integer.
The problem, is that if you have mutliple threads uploading (or multiple fluentd's, or both), they have to check the presence of the filename to ensure that the formulated filename doesn't exist. Additionally, it doesn't track the last-used index, so when doing this check, fluentd will start at 1, check, increment, check again, increment again, etc. This obviously doesn't scale very well when you are outputing a ton of logs.
We fixed this my changing our file format to include a uuid and disabling the behavior to check for collisions.
However, since the defaults are dangerous, I've submitted this PR to try and make things less dangerous for new users (not accepted yet at the time of this writing).
https://github.com/fluent/fluent-plugin-s3/pull/355/files
This works by tracking the last used index in an atomic that's shared between threads. As outlined in the PR, it doesn't solve the whole problem, but it does make the defaults considerably safer.
logging-operator
Perhaps you've noticed a theme with my recent, work-driven contributions. :)
logging-operator is a kubernetes operator to automate adminitration of logging infrastructure in a kubernetes cluster. I've been contributing a bit to it lately, since we adopted it fairly early, and have needed to add a few features to make it work for us.
This first diff adds support not just for the configurable I added to fluent-bit that I mentioned earlier, but exposes all net parameters as configurables.
https://github.com/banzaicloud/logging-operator/commit/3c9e3938590209716918bc7cc197b43b09bb4361
There was a string conversion bug in how the operator would report on prometheus configuration.
https://github.com/banzaicloud/logging-operator/commit/86503b52150cf0dcf62d4b636eb247d0807101e7
We needed to configure where in s3 fluentd was uploading these logs
https://github.com/banzaicloud/logging-operator/commit/29fccfc2b8cee6c38c88fb34cf73a112eeb534de
We also needed way more support for configuring certain scheduling attributes in fluentd and fluentbit
https://github.com/banzaicloud/logging-operator/commit/45dffe5ebb38a3dbba4ecb217235f45c13f7856e
https://github.com/banzaicloud/logging-operator/commit/961fd788bb90f8f46d188a731aac0a916b30f933
https://github.com/banzaicloud/logging-operator/commit/0ec91f72831e1e63bd560224450454b33084553d
I also had to expose a number of these features in their helm charts
https://github.com/banzaicloud/logging-operator/commit/efc74711c5336063a6da72bf39239c57c81c7dff
https://github.com/banzaicloud/logging-operator/commit/f581da2e9daadae9b786362f69d379f8151ad918
https://github.com/banzaicloud/logging-operator/commit/4e74e36dfe7d63212b19401fe645a198734da1fd
wsh
Someone reached out to me privately to report several Solaris 11 compatibility bugs with wsh, my multi-host ssh tool.
Use -std=c11 flag with SunStudio: https://github.com/worr/wsh/commit/b11d2668ef6b85913d1901cfbfe6eb612be69bdc
Don't use __attribute__ with SunStudio, since none of the ones I used were supported: https://github.com/worr/wsh/commit/25ed3fc6fa36a1202e33c8fb36893d03cd5bce8c
Don't unconditionally compile in memset_s (Found because Solaris actually has a libc with this function): https://github.com/worr/wsh/commit/3876745a5cc4bce80d5e9fff0ab70b2dc429287f
This also led to a protobuf PR for better SunStudio support, which it looks like I need to follow up on.
https://github.com/protocolbuffers/protobuf/pull/7049/files
python-fido2
Last post, I mentioned I was working on getting my yubikey to work on OpenBSD. Part of that included adding support in ykman, which also required changes in python-fido2.
First, I added general OpenBSD support
https://github.com/Yubico/python-fido2/pull/82/files
This impl is arguably a bit brittle, since I essentially had to build the device probing for it in scratch from python, using the primitives from libusbhid to probe every uhid(4) device to see if it was a yubikey.
However, some time later, fido(4) was rolled into OpenBSD meaning that this code could be greatly simplified. I think someone reached out to me about this directly? I don't really remember, since it was awhile ago.
https://github.com/Yubico/python-fido2/pull/87/files
What a year
That's basically been the last year or so for me. Honestly, it's been a weird one, and I haven't been able to really do as much OSS as I've wanted to. A lot of it has been through work, which while nice, doesn't touch the types of projects that I want to be doing.
I am working on a gemini server on OpenBSD, which has been feeling quite rewarding, and I have other projects kicking around in my head that I'm going to be following up on.
0 notes
Photo

Learn how to improve your container observability through better application logging using Fluentd & Fluentbit on Amazon ECS & Amazon EKS. https://t.co/UoZpmqaASq https://t.co/KLwZucWrkd (via Twitter http://twitter.com/awscloud/status/1191086902763511809)
0 notes
Text
Checking your OpenTelemetry pipeline with Telemetrygen
Testing OpenTelemetry configuration pipelines without resorting to instrumented applications, particularly for traces, can be a bit of a pain. Typically, you just want to validate you can get an exported/generated signal through your pipeline, which may not be the OpenTelemetry Collector (e.g., FluentBit or commercial solutions such as DataDog). This led to the creation of Tracegen, and then the…
View On WordPress
0 notes
Text
Mastering FluentD configuration syntax
Mastering FluentD configuration syntax
Getting to grips with FluentD configuration which describes how to handle logging event(s) it has to process can be a little odd (at-least in my opinion) until you appreciate a couple of foundation points, at which point things start to click, and then you’ll find it pretty easy to understand.
It would be hugely helpful, if the online documentation provided some of the points I’ll highlight…
View On WordPress
0 notes
Text
New Article for SE Daily...
We’ve just had a new article published for Software Engineering Daily which looks at monitoring in multi-cloud and hybrid use cases and highlights some strategies that can help support the single pane of glass by exploiting features in tools such as Fluentd and Fluentbit that perhaps aren’t fully appreciated. Check it out … Challenges of Multi-Cloud and Hybrid Monitoring
View On WordPress
0 notes
Text
基于Kubernetes实现的大数据采集与存储实践总结
一、前言
近来我司部门内部搭建的电商大数据平台一期工程进入了尾声工作,不仅在技术上短期内从零到一搭建起属于团队的大数据平台,而且在业务上可以满足多方诉求。笔者很有幸参与到其中的建设,在给优秀的团队成员点赞的同时,也抽空整理了一下文档,那么今天就和大家来聊一下我们是如何结合Kubernetes实现数据采集与存储的,谈谈里面实现方案、原理和过程。这里笔者放一张我们前期设计时借鉴阿里的大数据架构图:
本文重点讲述的是上图中『数据采集』部分,暂不涉及『数据计算』和『数据服务』的过程。在数据采集中,我们通过运行在Kubernetes中的清洗服务,不断地消费Kafka中由定时任务��取的业务数据,并通过Fluentbit、Fluentd等日志采集工具对容器中打印到标准输出的数据压缩存储至AWS S3中。如果你对这块有兴趣的话,那就一起开始今天的内容吧。
二、基础篇
2.1 Docker日志管理
我们的应用服务都运行在Docker容器中,Docker的日志有两种:dockerd运行时的引擎日志和容器内服务产生的容器日志。在这里我们不用关心引擎日志,容器日志是指到达标准输出(stdout)和标准错误输出(stderr)的日志,其他来源的日志不归Docker管理,而Docker将所有容器打到 stdout 和 …
from 基于Kubernetes实现的大数据采集与存储实践总结 via KKNEWS
0 notes