#performance metrics pattern microservices example | Explore Tumblr posts and blogs

generativeinai · 25 days ago

Text

How AIOps Platform Development Is Revolutionizing IT Incident Management

In today’s fast-paced digital landscape, businesses are under constant pressure to deliver seamless IT services with minimal downtime. Traditional IT incident management strategies, often reactive and manual, are no longer sufficient to meet the demands of modern enterprises. Enter AIOps (Artificial Intelligence for IT Operations)—a game-changing approach that leverages artificial intelligence, machine learning, and big data analytics to transform the way organizations manage and resolve IT incidents.

In this blog, we delve into how AIOps platform development is revolutionizing IT incident management, improving operational efficiency, and enabling proactive issue resolution.

What Is AIOps?

AIOps is a term coined by Gartner, referring to platforms that combine big data and machine learning to automate and enhance IT operations. By aggregating data from various IT tools and systems, AIOps platforms can:

Detect patterns and anomalies

Predict and prevent incidents

Automate root cause analysis

Recommend or trigger automated responses

How AIOps Is Revolutionizing Incident Management

1. Proactive Issue Detection

AIOps platforms continuously analyze massive streams of log data, metrics, and events to identify anomalies in real time. Using machine learning, they recognize deviations from normal behavior—often before the end-user is affected.

🔍 Example: A retail platform detects abnormal latency in the checkout API and flags it as a potential service degradation—before users start abandoning their carts.

2. Noise Reduction Through Intelligent Correlation

Instead of flooding teams with redundant alerts, AIOps platforms correlate related events across systems. This reduces alert fatigue and surfaces high-priority incidents that need attention.

🧠 Example: Multiple alerts from a database, server, and application layer are grouped into a single, actionable incident, pointing to a failing database node as the root cause.

3. Accelerated Root Cause Analysis (RCA)

AI algorithms perform contextual analysis to identify the root cause of an issue. By correlating telemetry data with historical patterns, AIOps significantly reduces the Mean Time to Resolution (MTTR).

⏱️ Impact: What used to take hours or days now takes minutes, enabling faster service restoration.

4. Automated Remediation

Advanced AIOps platforms can go beyond detection and diagnosis to automatically resolve common issues using preconfigured workflows or scripts.

⚙️ Example: Upon detecting memory leaks in a microservice, the platform automatically scales up pods or restarts affected services—without human intervention.

5. Continuous Learning and Improvement

AIOps systems improve over time. With every incident, the platform learns new patterns, becoming better at prediction, classification, and remediation—forming a virtuous cycle of operational improvement.

Benefits of Implementing an AIOps Platform

Improved Uptime: Proactive incident detection prevents major outages.

Reduced Operational Costs: Fewer incidents and faster resolution reduce the need for large Ops teams.

Enhanced Productivity: IT staff can focus on innovation instead of firefighting.

Better User Experience: Faster resolution leads to fewer service disruptions and happier customers.

Real-World Use Cases

🎯 Financial Services

Banks use AIOps to monitor real-time transaction flows, ensuring uptime and compliance.

📦 E-Commerce

Retailers leverage AIOps to manage peak traffic during sales events, ensuring site reliability.

🏥 Healthcare

Hospitals use AIOps to monitor critical IT infrastructure that supports patient care systems.

Building an AIOps Platform: Key Components

To develop a robust AIOps platform, consider the following foundational elements:

Data Ingestion Layer – Collects logs, events, and metrics from diverse sources.

Analytics Engine – Applies machine learning models to detect anomalies and patterns.

Correlation Engine – Groups related events into meaningful insights.

Automation Framework – Executes predefined responses to known issues.

Visualization & Reporting – Offers dashboards for monitoring, alerting, and tracking KPIs.

The Future of IT Incident Management

As businesses continue to embrace digital transformation, AIOps is becoming indispensable. It represents a shift from reactive to proactive operations, and from manual processes to intelligent automation. In the future, we can expect even deeper integration with DevOps, better NLP capabilities for ticket automation, and more advanced self-healing systems.

Conclusion

AIOps platform development is not just an upgrade—it's a revolution in IT incident management. By leveraging artificial intelligence, organizations can significantly reduce downtime, improve service quality, and empower their IT teams to focus on strategic initiatives.

If your organization hasn’t begun the AIOps journey yet, now is the time to explore how these platforms can transform your IT operations—and keep you ahead of the curve.

#artificial intelligence #ai

0 notes

atplblog · 2 months ago

Text

Price: [price_with_discount] (as of [price_update_date] - Details) [ad_1] Delve into the second edition to master serverless proficiency and explore new chapters on security techniques, multi-regional deployment, and optimizing observability.Key FeaturesGain insights from a seasoned CTO on best practices for designing enterprise-grade software systemsDeepen your understanding of system reliability, maintainability, observability, and scalability with real-world examplesElevate your skills with software design patterns and architectural concepts, including securing in-depth and running in multiple regions.Book DescriptionOrganizations undergoing digital transformation rely on IT professionals to design systems to keep up with the rate of change while maintaining stability. With this edition, enriched with more real-world examples, you'll be perfectly equipped to architect the future for unparalleled innovation.This book guides through the architectural patterns that power enterprise-grade software systems while exploring key architectural elements (such as events-driven microservices, and micro frontends) and learning how to implement anti-fragile systems.First, you'll divide up a system and define boundaries so that your teams can work autonomously and accelerate innovation. You'll cover the low-level event and data patterns that support the entire architecture while getting up and running with the different autonomous service design patterns.This edition is tailored with several new topics on security, observability, and multi-regional deployment. It focuses on best practices for security, reliability, testability, observability, and performance. You'll be exploring the methodologies of continuous experimentation, deployment, and delivery before delving into some final thoughts on how to start making progress.By the end of this book, you'll be able to architect your own event-driven, serverless systems that are ready to adapt and change.What you will learnExplore architectural patterns to create anti-fragile systems.Focus on DevSecOps practices that empower self-sufficient, full-stack teamsApply microservices principles to the frontendDiscover how SOLID principles apply to software and database architectureGain practical skills in deploying, securing, and optimizing serverless architecturesDeploy a multi-regional system and explore the strangler pattern for migrating legacy systemsMaster techniques for collecting and utilizing metrics, including RUM, Synthetics, and Anomaly detection.Who this book is forThis book is for software architects who want to learn more about different software design patterns and best practices. This isn't a beginner's manual - you'll need an intermediate level of programming proficiency and software design experience to get started.You'll get the most out of this software design book if you already know the basics of the cloud, but it isn't a prerequisite.Table of ContentsArchitecting for InnovationsDefining Boundaries and Letting GoTaming the Presentation TierTrusting Facts and Eventual ConsistencyTurning the Cloud into the DatabaseA Best Friend for the FrontendBridging Intersystem GapsReacting to Events with More EventsRunning in Multiple RegionsSecuring Autonomous Subsystems in DepthChoreographing Deployment and DeliveryOptimizing ObservabilityDon't Delay, Start Experimenting Publisher ‏ : ‎ Packt Publishing; 2nd ed. edition (27 February 2024) Language ‏ : ‎ English Paperback ‏ : ‎ 488 pages ISBN-10 ‏

: ‎ 1803235446 ISBN-13 ‏ : ‎ 978-1803235448 Item Weight ‏ : ‎ 840 g Dimensions ‏ : ‎ 2.79 x 19.05 x 23.5 cm Country of Origin ‏ : ‎ India [ad_2]

0 notes

devopssentinel · 1 year ago

Text

POE AI: Redefining DevOps with Advanced Predictive Operations

Enter POE AI, an advanced tool designed to bring predictive operations to the forefront of DevOps. By leveraging cutting-edge artificial intelligence, it provides powerful predictive insights that help teams proactively manage their infrastructure, streamline workflows, and enhance operational stability. Predictive Maintenance and Monitoring One of the core strengths of POE AI lies in its predictive maintenance and monitoring capabilities. This is particularly valuable for DevOps teams responsible for maintaining complex IT infrastructures where unexpected failures can have significant impacts. POE AI continuously analyzes system data, identifying patterns and anomalies that may indicate potential issues. Imagine you're managing a large-scale distributed system. This tool can monitor the performance of various components in real-time, predicting potential failures before they happen. For example, it might detect that a particular server is showing early signs of hardware degradation, allowing you to take preemptive action before a critical failure occurs. This proactive approach minimizes downtime and ensures that your infrastructure remains robust and reliable. Enhancing Workflow Efficiency POE AI goes beyond predictive maintenance by also enhancing overall workflow efficiency. The tool integrates seamlessly with existing DevOps pipelines and tools, providing insights that help streamline processes and optimize resource allocation. This integration ensures that DevOps teams can operate more efficiently, focusing on strategic initiatives rather than firefighting issues. For instance, POE AI can analyze historical deployment data to identify the most efficient deployment strategies and times. By leveraging these insights, you can schedule deployments during periods of low activity, reducing the risk of disruptions and improving overall system performance. This optimization not only enhances workflow efficiency but also ensures that your team can deliver high-quality software more consistently. AI-Powered Root Cause Analysis When issues do arise, quickly identifying the root cause is crucial for minimizing their impact. POE AI excels in this area by offering AI-powered root cause analysis. The tool can rapidly sift through vast amounts of data, pinpointing the exact cause of an issue and providing actionable recommendations for resolution. Consider a scenario where your application experiences a sudden performance drop. Instead of manually combing through logs and metrics, you can rely on it to identify the root cause, such as a specific microservice consuming excessive resources. This rapid identification allows you to address the issue promptly, restoring optimal performance and reducing the time spent on troubleshooting. Integration with DevOps Tools POE AI's ability to integrate with a wide range of DevOps tools makes it a versatile addition to any tech stack. Whether you're using Jenkins for continuous integration, Kubernetes for container orchestration, or Splunk for log analysis, POE AI can seamlessly integrate to enhance your operational workflows. For example, integrating AI with your monitoring tools can provide real-time predictive insights directly within your dashboards. This integration enables you to visualize potential issues and take proactive measures without switching between different applications. By consolidating these insights into a single platform, POE AI enhances situational awareness and simplifies operational management. Security and Compliance In the realm of DevOps, maintaining security and compliance is paramount. POE AI understands this and incorporates robust security measures to protect sensitive data. The tool adheres to major data protection regulations, including GDPR, ensuring that user data is handled securely and responsibly. For organizations with stringent compliance requirements, POE AI offers on-premises deployment options. This feature allows organizations to maintain full control over their data, ensuring that it remains within their secure environment. By prioritizing security, AI enables DevOps teams to leverage its powerful capabilities without compromising on data protection. Real-World Applications and Success Stories To understand the impact of POE AI, let’s explore some real-world applications and success stories. Many organizations have integrated POE AI into their workflows, resulting in significant improvements in operational efficiency and stability. One example is a global financial services company that implemented POE AI to enhance their IT infrastructure management. By using predictive maintenance and root cause analysis, the company significantly reduced downtime and improved system reliability. This proactive approach allowed their IT team to focus on strategic projects rather than constantly addressing issues. Another success story involves a multinational manufacturing firm that used POE AI to optimize their production workflows. By analyzing historical data and predicting potential bottlenecks, AI provided actionable insights that improved production efficiency and reduced operational costs. This optimization led to higher output quality and increased overall productivity. Future Prospects of AI in DevOps As artificial intelligence continues to advance, the capabilities of tools like POE AI are expected to expand even further. Future advancements in machine learning and natural language processing (NLP) will enhance the tool’s ability to provide even more accurate and nuanced predictions. One exciting prospect is the potential for real-time adaptive learning. Imagine a scenario where POE AI continuously learns from new data, adapting its predictive models in real-time to reflect the latest trends and patterns. This capability would enable DevOps teams to stay ahead of emerging issues and continuously optimize their workflows. Another potential development is the integration of advanced NLP capabilities, allowing POE AI to understand and interpret unstructured data such as textual reports and logs. This integration would provide deeper insights and recommendations, further enhancing the tool’s value in managing complex DevOps environments. Maximizing the Benefits of POE AI To fully leverage the benefits of POE AI, DevOps teams should consider incorporating best practices for using the tool effectively. Here are some tips to get started: - Integrate with Existing Tools: Ensure that POE AI is integrated with your existing DevOps tools and platforms. This integration will streamline predictive analysis and make it easier to access insights. - Customize Alerts and Notifications: Take advantage of POE AI's customization options to tailor alerts and notifications to your specific needs. Configure the tool to highlight the most critical issues and provide actionable recommendations. - Review and Act on Insights: Regularly review the insights and recommendations provided by POE AI. Use this information to make data-driven decisions and optimize your workflows for greater efficiency. - Train Your Team: Provide training and resources to help your team members get the most out of POE AI. Encourage them to explore the tool's features and incorporate it into their daily workflows. - Monitor Security: Ensure that POE AI's security settings are configured to meet your organization's requirements. Regularly review and update security measures to protect data and maintain compliance. By following these best practices, DevOps teams can maximize the benefits of POE AI and create a more efficient, predictive operational environment. Embracing the Future of Predictive Operations Integrating POE AI into your DevOps processes isn't just about adopting new technology—it's about fundamentally transforming how you anticipate and address operational challenges. By leveraging predictive insights, you can move from a reactive to a proactive approach, minimizing downtime and optimizing performance. POE AI empowers your team to foresee potential issues, streamline workflows, and enhance overall productivity. This tool will not only save you time and resources but also enable you to make smarter, more informed decisions, driving your team's success to new heights. Read the full article

0 notes

devopssentinel2000 · 1 year ago

Text

POE AI: Redefining DevOps with Advanced Predictive Operations

0 notes

codeonedigest · 2 years ago

Text

Performance Metrics Design Pattern Tutorial with Examples for Software Programmers

Full Video Link https://youtu.be/ciERWgfx7Tk Hello friends, new #video on #performancemetrics #designpattern for #microservices #tutorial for #programmers with #examples is published on #codeonedigest #youtube channel. Learn #performance #metr

In this video we will learn about Performance Metrics design pattern for microservices. This is the 2nd design principle in Observability design patterns category for microservices. Microservice architecture structures an application as a set of loosely coupled microservices and each service can be developed independently in agile manner to enable continuous delivery. But how to analyse and…

View On WordPress

0 notes

dreamtech11 · 2 years ago

Text

Scaling for Success: How Dream11 Uses Predictive Analytics and Real-time Monitoring to Ensure 100% Uptime During Peak Seasons(Part-1)

At Dream11, we strive to offer the best sports engagement experience for our users. As the largest fantasy sports platform with 150 million fans participating in over 10,000 contests, it can be difficult to predict traffic patterns, especially during high-demand events like the IPL and World Cup. To maintain 100% uptime during these matches, we use prediction-based scaling and scale-out infrastructure.

We generate an immense amount of data from our mobile applications and microservices, capturing user action events, system metrics, network metrics, and more. Our machine learning algorithms process this data to predict demand based on factors such as players, tournaments, virality, and user playing patterns. During the T20 ICC World Cup, for example, our platform is capable of managing up to 6.21 million concurrent users at the edge layer.

At Dream11, our Dreamsters play a crucial role in ensuring an optimal user experience. Our service owners have readiness lists and runbooks in place to quickly resolve any incidents, and our customer service team is equipped to handle incidents efficiently.

The observability of our network through a single dashboard helps us stay efficient, and monitoring through monitoring tools is critical for accelerated troubleshooting. Our monitoring tools track the performance of our infrastructure, application, and DNS, and our status pages and dashboards allow us to quickly identify and address issues. Our unified dashboard provides a bird's eye view of the entire Dream11 infrastructure, reducing Mean Time To Detect (MTTD) and Mean Time To Resolution (MTTR).

Overall, it's evident that Dream11 is using technology to tackle some of the biggest IT industry challenges, and it's great to see the emphasis on delivering an exceptional user experience.

0 notes

globalmediacampaign · 5 years ago

Text

Deriving real-time insights over petabytes of time series data with Amazon Timestream

Time series data is one of the fastest growing categories across a variety of industry segments, such as application monitoring, DevOps, clickstream analysis, network traffic monitoring, industrial IoT, consumer IoT, manufacturing, and many more. Customers want to track billions of time series monitoring hundreds of millions of devices, industrial equipment, gaming sessions, streaming video sessions, and more in a single system that can reliably and continuously ingest terabytes of data per day, answer fast queries on recent data, and efficiently analyze petabytes of recent and historical data. Many single-node or instance-based systems can’t handle this scale and tip over, resulting in an unresponsive or unavailable system. To address this need, we purpose-built Amazon Timestream from the ground up to be a highly scalable and highly available time series database. Timestream is serverless, so you don’t need to provision resources upfront. The ingestion, storage, and query subsystems in Timestream automatically scale independently based on the load. This independent scaling enables a key scenario in time series use cases where you want high throughput data ingestion and concurrent queries run in parallel to ingestion that derive real-time insights. Timestream durably stores all data, seamlessly moves older data (based on user-specified configuration) into cost-optimized storage, and scales resources based on the volume of data a query accesses, allowing a single query to efficiently analyze terabytes of data. These scaling characteristics allow you to store and analyze time series data of any scale with Timestream. After you design your application using Timestream, as the application’s data and request volumes grow, Timestream automatically scales the resources. You only pay for what you use without needing to over-provision for peak, or redesign your application as your workload scales. For more information about the key benefits of Timestream and its use cases, see Timestream documentation. In this post, we discuss the scale and performance characteristics of Timestream using an example application modeled on a DevOps use case. This example workload is derived from conversations with hundreds of customers with many different types of use cases, such as gaming, clickstream analysis, monitoring streaming applications, monitoring services, industrial telemetry, and IoT scenarios. Overview of the workload In this section, we discuss the ingestion and query workload corresponding to an application which is monitored using Timestream. Application model For this post, we use a sample application mimicking a DevOps scenario monitoring metrics from a large fleet of servers. Users want to alert on anomalous resource usage, create dashboards on aggregate fleet behavior and utilization, and perform sophisticated analysis on recent and historical data to find correlations. The following diagram provides an illustration of the setup where a set of monitored instances emit metrics to Timestream. Another set of concurrent users issues queries for alerts, dashboards, or ad-hoc analysis, where queries and ingestion run in parallel. The application being monitored is modeled as a highly scaled-out service that is deployed in several regions across the globe. Each region is further subdivided into a number of scaling units called cells that have a level of isolation in terms of infrastructure within the region. Each cell is further subdivided into silos, which represent a level of software isolation. Each silo has five microservices that comprise one isolated instance of the service. Each microservice has several servers with different instance types and OS versions, which are deployed across three availability zones. These attributes that identify the servers emitting the metrics are modeled as dimensions in Timestream. In this architecture, we have a hierarchy of dimensions (such as region, cell, silo, and microservice_name) and other dimensions that cut across the hierarchy (such as instance_type and availability_zone). The application emits a variety of metrics (such as cpu_user and memory_free) and events (such as task_completed and gc_reclaimed). Each metric or event is associated with eight dimensions (such as region or cell) that uniquely identify the server emitting it. Additional details about the data model, schema, and data generation can be found in the open-sourced data generator. In addition to the schema and data distributions, the data generator provides an example of using multiple writers to ingest data in parallel, using the ingestion scaling of Timestream to ingest millions of measurements per second. Ingestion workload We vary the following scale factors representing the use cases we observe across many customers: Number of time series – We vary the number of hosts being monitored (100,000–4 million), which also controls the number of time series tracked (2.6–104 million) Ingestion volume and data scale – We vary the interval at which data is emitted (from once every minute to once every 5 minutes). The following table summarizes the data ingestion characteristics and corresponding data storage volumes. Depending on the number of hosts and metric interval, the application ingests between 156 million–3.1 billion data points per hour, resulting in approximately 1.1–21.7 TB per day. These data volumes translate to approximately 0.37–7.7 PB of data ingested over a year. Data Scale Data Interval (seconds) Number of Hosts Monitored (million) Number of Time Series (million) Average Data Points/Second Average Data Points/Hour (million) Average Ingestion Volume (MB/s) Data Size/Hour (GB) Data Size/Day (TB) Data Size/Year (PB) Small 60 0.1 2.6 43,333 156 13 45 1 0.37 Medium 300 2 52 173,333 624 51 181 4.3 1.5 Large 120 4 104 866,667 3,120 257 904 21.7 7.7 The ingestion and data volumes are for an individual table. Internally, we have tested Timestream at a much larger ingestion scale, upwards of several GB/s ingestion per table, and thousands of databases and tables per AWS account. Query workload The query workload is modeled around observability use cases we see across customers. The queries correspond to three broad classes: Alerting – Computes aggregate usage of one or more resources across multiple hosts to identify anomalous resource usage (for example, computes the distribution of CPU utilization, binned by 1 minute, across all hosts within a specified microservice for the past 1 hour) Populating dashboards – Computes aggregated utilization and patterns across a larger number of hosts to provide aggregate visibility on overall service behavior (for example, finds hosts with resource utilization higher than the average observed in the fleet for the past 1 hour) Analysis and reporting – Analyzes large volumes of data for fleet-wide insights over longer periods of time (for example, obtains the CPU and memory utilization of the top k hosts within a microservice that have the highest GC pause intervals, or finds the hours in a day with the highest CPU utilization within a region for the past 3 days). Each class has two distinct queries: Q1 and Q2 are alerting, Q3 and Q4 are dashboarding, and Q5 and Q6 are analysis. The alerting queries analyze data for a few thousand hosts for recently ingested data for the past hour. For instance, Q1 and Q2 query an hour of data, where depending on the scale factor, 156 million–3.12 billion data points have been ingested. The dashboards are populated by analyzing data across tens to a few hundred thousand hosts over 1–3 hours of data. That translates to about 156 million–9.3 billion data points. The analysis queries process metrics across hundreds of thousands to millions of hosts and span several days of data. For instance, Q6 analyzes 3 days of data, which at the largest scale corresponds to about 60 TB of stored time series data (224 billion data points). You can refer to the preceding table to cross-reference the data sizes and data point count corresponding to the time ranges relevant to the query. We model many concurrent user sessions running different types of alerts, loading many concurrent dashboards, and multiple users issuing ad-hoc analysis queries or generating periodic reports. We generate concurrent activity by simulating sessions where each session randomly picks a query from the three classes, runs it, and consumes the results. Each session also introduces a randomized think time between two consecutive queries. Each class of query is assigned a weight to resemble what we observe from real workloads that our customers run. For more information, see the open-sourced query workload generator. Performance and scale We now present aggregated scale and performance numbers across many different runs. To model typical customer behavior for time series applications, we report all performance numbers where data is continuously being ingested and many concurrent queries are running parallel to ingestion. Timestream optimizes for good out-of-the-box performance and scale. Timestream automatically identifies table schema, automatically scales resources based on the workload’s requirements, automatically partitions the data, uses query predicates to prune out irrelevant partitions, and uses fine-grained indexing within a partition to efficiently run queries. You can ingest and query using thousands of threads and Timestream automatically adapts as the workload changes. For these performance runs, we configured a database for each workload size and one table per database. Each table is configured with 2 hours of memory store retention and 1 year of magnetic store retention. We ran continuous data ingest for several months while also running concurrent queries. For the query workload, we ran a hundred concurrent sessions for each scale factor. Each session ran 50,000 query instances, randomly picking the query to run, with Q1 and Q2 run with 95% probability, Q3 and Q4 run with 4.9% probability, and Q5 and Q6 run with 0.1% probability. Each session also used a think time, randomly picked between 5–300 seconds, between consecutive query runs. We ran the client sessions on an Amazon Elastic Compute Cloud (Amazon EC2) host running in the same Region where the Timestream database was created. The following plot reports the latency for each query type, Q1–Q6, across the different scale factors. The primary y-axis is the end-to-end query execution time in seconds (plotted in log scale), as observed by the client, from the time the query was issued to the last row of the query result was read. Each session reports the geometric mean of query latency for each query type. The plot reports the latency averaged across the hundred sessions (the average of the geometric mean across the sessions). Each clustered bar corresponds to one scale factor (small, medium, and large; see the table in the preceding section for additional details). Each bar within a cluster corresponds to one of the query types. The key takeaways from the query performance results are: Timestream seamlessly and automatically scales to ingestion volume of greater than 250 MB/s and tables with petabytes of data. The decoupled ingestion, storage, and query scaling allows the system to scale to time series data of any scale. We internally tested Timestream for several GB/s data ingestion volume. Timestream handles scaling automatically, without any upfront provisioning. Even when managing petabytes of time series data and hundreds of millions of time series in a single table, Timestream runs hundreds of concurrent alerting queries, analyzing data across thousands of devices, within hundreds of milliseconds. The latency of these queries remains almost unchanged between the medium- and large-scale factors, where the amount of data ingested increases by more than five times. Timestream scales resources depending on the complexity and amount of data accessed by the query. As a result, even as the data volumes increase, the query latency increases by a much smaller factor. For instance, for the dashboarding and analysis queries, we see a data volume increase of approximately 20 times larger, and a 40 times increase in the number of time series monitored between the small- and the larger-scale factors. However, the query latency increase is in the range of two to eight times higher. Timestream seamlessly handles concurrent ingestion and queries at large scale and enables you to easily analyze data across millions of time series, combining data in the memory and magnetic store. Conclusions In this post, we covered the performance and scale requirements that we observed across many time series applications and in various use cases spanning different industry segments. We used a sample ingestion and query workload representative of customers using Timestream. We saw how Timestream efficiently processes hundreds of millions of time series, seamlessly scales both ingestion and query, and stores petabytes of data across its memory and magnetic storage tiers. We reported on how Timestream’s serverless API runs SQL queries, providing a query response time of hundreds of milliseconds for real-time alerting use cases. The same query interface can also analyze tens of terabytes of time series data to perform sophisticated analysis and derive insights over recent and historical data. We also measured performance at various scale points, showing how the system scales as the data and request volumes for the application grows, so you can design your application once and Timestream automatically scales without needing to re-architect the application. To facilitate reproducing the scale and performance numbers reported in this post, we��re also making available a sample data ingestion load and query load generator. The workload generators are configurable to enable you to try out different workloads and study the scale and performance characteristics of Timestream. Detailed instructions to use the sample workload generators are included as part of the open-source release. You can explore the getting started experience to understand how Timestream fits your application needs. We also recommend following the best practices guide to optimize the scale and performance of your workload. About the Authors Sudipto Das is a Principal Engineer in AWS working on Timestream. He is an engineering and research leader known for his work on scalable database management systems for cloud platforms. https://sudiptodas.github.io/ Tim Rath is a Senior Principal Engineer in AWS working on Timestream. He is a leading expert in scalable and distributed data management and replication systems, having worked on DynamoDB and several other scalable data platforms at AWS. https://aws.amazon.com/blogs/database/deriving-real-time-insights-over-petabytes-of-time-series-data-with-amazon-timestream/

0 notes

t-baba · 5 years ago

Photo

What's the carbon footprint of your website?

#426 — February 5, 2020

Read on the Web

Frontend Focus

CO2 Emissions On The Web — Sites are said to now be four times bigger than they were in 2010. This post looks at the energy costs of data transfer and what you can do to reduce your web carbon footprint.

Danny van Kooten

Old CSS, New CSS — This is a tale of one individuals personal history with CSS and web design, offering a comprehensive, detailed “blend of memory and research”. A great trip down memory lane for any of you who got started on the web in the 90s.

Evelyn Woods

Open-Source Serverless CMS Powered by React, Node & GraphQL — The way we build, deploy and operate the web is evolving. Webiny is a developer-friendly serverless CMS. Use it to build websites, APIs and apps and deploy them as microservices. SEO-friendly pages and fast performance. View on GitHub.

Webiny sponsor

How Smashing Magazine Manages Content: Migration From WordPress To JAMStack — An interesting look at how a big site like Smashing Magazine went about migrating from WordPress to a static infrastructure. This technical case study covers the gains and losses, things the Smashing team had wished they’d known earlier, and what they were surprised by.

Sarah Drasner

▶ What’s New in DevTools in Chrome 80 — New features include improved WebAssembly debugging, network panel updates, support for let and class redeclarations and more.

Google Chrome Developers

A New Technique for Making Responsive, JavaScript-Free Charts — There are countless libraries for generating charts on the web, but they all seem to require JavaScript. Here’s how the New York Times approached JS-free charts with SVG.

Rich Harris

Bringing The Microsoft Edge DevTools to More Languages

Erica Draud (Microsoft)

💻 Jobs

Frontend Developer at X-Team (Remote) — Work with the world's leading brands, from anywhere. Travel the world while being part of the most energizing community of developers.

X-Team

Find a Dev Job Through Vettery — Vettery is completely free for job seekers. Make a profile, name your salary, and connect with hiring managers from top employers.

Vettery

📙 News, Tutorials & Opinion

HTML Attributes to Improve Your Users' Two Factor Authentication Experience — How to use the HTML autocomplete, inputmode and pattern attributes to improve the user experience of logging in.

Phil Nash

▶ Exploring the Frontend Performance of the UK's National Rail Website — Runs through using the layers panel in Chrome DevTools to diagnose performance issues on a high traffic website.

Umar Hansa

Transitioning Hidden Elements — Paul Hebert has written a little JavaScript utility to wrap up all of the intricacies of dealing with transitioning hidden elements - often tricky as they’re not in the document flow.

Cloud Four

How I Recreated A Polaroid Camera with CSS Gradients Only — A high-level tutorial showing how to recreate physical products with just CSS. The end result here is a Polaroid camera made entirely out of gradients.

Sarah L. Fossheim

The React Hooks Guide: In-Depth Tutorial with Examples. Start Learning — Learn all about React Hooks as we comprehensively cover: State and Effects, Context, Reducers, and Custom React Hooks.

Progress KendoReact sponsor

How To Create A Headless WordPress Site On The JAMstack

Sarah Drasner & Geoff Graham

Using the CSS line-height Property to Improve Readability

William Le

Possibly The Easiest Way to Run a Static Site Generator

CSS Tricks

Getting Keyboard-focusable Elements — A quick JavaScript function for managing focus.

Zell Liew

Shopping for Speed On eBay — This case study explains how eBay increased key metrics by optimizing the performance of their web/app experiences.

Addy Osmani

How We Started Treating Frontend Performance as a Feature — A non-technical guide to the decisions HubSpot made (and the tools they now utilise) that helped them start treating frontend performance as a feature.

Adam Markon

🔧 Code, Tools and Resources

LegraJS — Lego Brick Graphics — A small (3.36KB gzipped) JS library that lets you draw using LEGO like brick shapes on an HTML <canvas> element.

Preet Shihn

massCode: A Free and Open Source Code Snippets Manager

Anton Reshetov

Axe Pro: Free Accessibility Testing Tool Created for Development Teams

Deque sponsor

micro-jaymock: Tiny API Mocking Microservice for Generating Fake JSON Data

Meeshkan

Craft 3.4 is Here — Version 3.4 of this paid CMS brings improvements to user experience, collaboration, GraphQL, and more.

Craft

🗓 Upcoming Events

Flashback Conference, February 10-11 — Orlando, USA — Looks at cutting-edge web dev, browser APIs and tooling, but adds how they’ve evolved from the past to the web of today.

Frontend Developer Love, February 19-21 — Amsterdam, Netherlands — Three full days of talks from 35+ global JavaScript leaders from around the world.

ConveyUX, March 3-5 — Seattle, USA — This West Coast user experience conference features over 65 sessions across three days.

W3C Workshop on Web & Machine Learning, 24-25 March — Berlin, Germany — Hosted by Microsoft, this free event aims to “bring together providers of Machine Learning tools and frameworks with Web platform practitioners to enrich the Open Web Platform with better foundations for machine learning”.

by via Frontend Focus https://ift.tt/31qlvPZ

#HTML5 Branding #HTML5 Weekly

0 notes

faizrashis1995 · 5 years ago

Text

What’s After the MEAN Stack?

Introduction

We reach for software stacks to simplify the endless sea of choices. The MEAN stack is one such simplification that worked very well in its time. Though the MEAN stack was great for the last generation, we need more; in particular, more scalability. The components of the MEAN stack haven’t aged well, and our appetites for cloud-native infrastructure require a more mature approach. We need an updated, cloud-native stack that can boundlessly scale as much as our users expect to deliver superior experiences.

Stacks

When we look at software, we can easily get overwhelmed by the complexity of architectures or the variety of choices. Should I base my system on Python? Or is Go a better choice? Should I use the same tools as last time? Or should I experiment with the latest hipster toolchain? These questions and more stymie both seasoned and newbie developers and architects.

Some patterns emerged early on that help developers quickly provision a web property to get started with known-good tools. One way to do this is to gather technologies that work well together in “stacks.” A “stack” is not a prescriptive validation metric, but rather a guideline for choosing and integrating components of a web property. The stack often identifies the OS, the database, the web server, and the server-side programming language.

In the earliest days, the famous stacks were the “LAMP-stack” and the “Microsoft-stack”. The LAMP stack represents Linux, Apache, MySQL, and PHP or Python. LAMP is an acronym of these product names. All the components of the LAMP stack are open source (though some of the technologies have commercial versions), so one can use them completely for free. The only direct cost to the developer is the time to build the experiment.

The “Microsoft stack” includes Windows Server, SQL Server, IIS (Internet Information Services), and ASP (90s) or ASP.NET (2000s+). All these products are tested and sold together.

Stacks such as these help us get started quickly. They liberate us from decision fatigue, so we can focus instead on the dreams of our start-up, or the business problems before us, or the delivery needs of internal and external stakeholders. We choose a stack, such as LAMP or the Microsoft stack, to save time.

In each of these two example legacy stacks, we’re producing web properties. So no matter what programming language we choose, the end result of a browser’s web request is HTML, JavaScript, and CSS delivered to the browser. HTML provides the content, CSS makes it pretty, and in the early days, JavaScript was the quick form-validation experience. On the server, we use the programming language to combine HTML templates with business data to produce rendered HTML delivered to the browser.

We can think of this much like mail merge: take a Word document with replaceable fields like first and last name, add an excel file with columns for each field, and the engine produces a file for each row in the sheet.

As browsers evolved and JavaScript engines were tuned, JavaScript became powerful enough to make real-time, thick-client interfaces in the browser. Early examples of this kind of web application are Facebook and Google Maps.

These immersive experiences don’t require navigating to a fresh page on every button click. Instead, we could dynamically update the app as other users created content, or when the user clicks buttons in the browser. With these new capabilities, a new stack was born: the MEAN stack.

What is the MEAN Stack?

The MEAN stack was the first stack to acknowledge the browser-based thick client. Applications built on the MEAN stack primarily have user experience elements built in JavaScript and running continuously in the browser. We can navigate the experiences by opening and closing items, or by swiping or drilling into things. The old full-page refresh is gone.

The MEAN stack includes MongoDB, Express.js, Angular.js, and Node.js. MEAN is the acronym of these products. The back-end application uses MongoDB to store its data as binary-encoded JavaScript Object Notation (JSON) documents. Node.js is the JavaScript runtime environment, allowing you to do backend, as well as frontend, programming in JavaScript. Express.js is the back-end web application framework running on top of Node.js. And Angular.js is the front-end web application framework, running your JavaScript code in the user’s browser. This allows your application UI to be fully dynamic.

Unlike previous stacks, both the programming language and operating system aren’t specified, and for the first time, both the server framework and browser-based client framework are specified.

In the MEAN stack, MongoDB is the data store. MongoDB is a NoSQL database, making a stark departure from the SQL-based systems in previous stacks. With a document database, there are no joins, no schema, no ACID compliance, and no transactions. What document databases offer is the ability to store data as JSON, which easily serializes from the business objects already used in the application. We no longer have to dissect the JSON objects into third normal form to persist the data, nor collect and rehydrate the objects from disparate tables to reproduce the view.

The MEAN stack webserver is Node.js, a thin wrapper around Chrome’s V8 JavaScript engine that adds TCP sockets and file I/O. Unlike previous generations’ web servers, Node.js was designed in the age of multi-core processors and millions of requests. As a result, Node.js is asynchronous to a fault, easily handling intense, I/O-bound workloads. The programming API is a simple wrapper around a TCP socket.

In the MEAN stack, JavaScript is the name of the game. Express.js is the server-side framework offering an MVC-like experience in JavaScript. Angular (now known as Angular.js or Angular 1) allows for simple data binding to HTML snippets. With JavaScript both on the server and on the client, there is less context switching when building features. Though the specific features of Express.js’s and Angular.js’s frameworks are quite different, one can be productive in each with little cross-training, and there are some ways to share code between the systems.

The MEAN stack rallied a web generation of start-ups and hobbyists. Since all the products are free and open-source, one can get started for only the cost of one’s time. Since everything is based in JavaScript, there are fewer concepts to learn before one is productive. When the MEAN stack was introduced, these thick-client browser apps were fresh and new, and the back-end system was fast enough, for new applications, that database durability and database performance seemed less of a concern.

The Fall of the MEAN Stack

The MEAN stack was good for its time, but a lot has happened since. Here’s an overly brief history of the fall of the MEAN stack, one component at a time.

Mongo got a real bad rap for data durability. In one Mongo meme, it was suggested that Mongo might implement the PLEASE keyword to improve the likelihood that data would be persisted correctly and durably. (A quick squint, and you can imagine the XKCD comic about “sudo make me a sandwich.”) Mongo also lacks native SQL support, making data retrieval slower and less efficient.

Express is aging, but is still the defacto standard for Node web apps and apis. Much of the modern frameworks — both MVC-based and Sinatra-inspired — still build on top of Express. Express could do well to move from callbacks to promises, and better handle async and await, but sadly, Express 5 alpha hasn’t moved in more than a year.

Angular.js (1.x) was rewritten from scratch as Angular (2+). Arguably, the two products are so dissimilar that they should have been named differently. In the confusion as the Angular reboot was taking shape, there was a very unfortunate presentation at an Angular conference.

The talk was meant to be funny, but it was not taken that way. It showed headstones for many of the core Angular.js concepts, and sought to highlight how the presenters were designing a much easier system in the new Angular.

Sadly, this message landed really wrong. Much like the community backlash to Visual Basic’s plans they termed Visual Fred, the community was outraged. The core tenets they trusted every day for building highly interactive and profitable apps were getting thrown away, and the new system wouldn’t be ready for a long time. Much of the community moved on to React, and now Angular is struggling to stay relevant. Arguably, Angular’s failure here was the biggest factor in React’s success — much more so than any React initiative or feature.

Nowadays many languages’ frameworks have caught up to the lean, multi-core experience pioneered in Node and Express. ASP.NET Core brings a similarly light-weight experience, and was built on top of libuv, the OS-agnostic socket framework, the same way Node was. Flask has brought light-weight web apps to Python. Ruby on Rails is one way to get started quickly. Spring Boot brought similar microservices concepts to Java. These back-end frameworks aren’t JavaScript, so there is more context switching, but their performance is no longer a barrier, and strongly-typed languages are becoming more in vogue.

As a further deterioration of the MEAN stack, there are now frameworks named “mean,” including mean.io and meanjs.org and others. These products seek to capitalize on the popularity of the “mean” term. Sometimes it offers more options on the original MEAN products, sometimes scaffolding around getting started faster, sometimes merely looking to cash in on the SEO value of the term.

With MEAN losing its edge, many other stacks and methodologies have emerged.

The JAM Stack

The JAM stack is the next evolution of the MEAN stack. The JAM stack includes JavaScript, APIs, and Markup. In this stack, the back-end isn’t specified – neither the webserver, the back-end language, or the database.

In the JAM stack we use JavaScript to build a thick client in the browser, it calls APIs, and mashes the data with Markup — likely the same HTML templates we would build in the MEAN stack. The JavaScript frameworks have evolved as well. The new top contenders are React, Vue.js, and Angular, with additional players from Svelte, Auralia, Ember, Meteor, and many others.

The frameworks have mostly standardized on common concepts like virtual dom, 1-way data binding, and web components. Each framework then combines these concepts with the opinions and styles of the author.

The JAM stack focuses exclusively on the thick-client browser environment, merely giving a nod to the APIs, as if magic happens behind there. This has given rise to backend-as-a-service products like Firebase, and API innovations beyond REST including gRPC and GraphQL. But, just as legacy stacks ignored the browser thick-client, the JAM stack marginalizes the backend, to our detriment.

Maturing Application Architecture

As the web and the cloud have matured, as system architects, we have also matured in our thoughts of how to design web properties.

As technology has progressed, we’ve gotten much better at building highly scalable systems. Microservices offer a much different application model where simple pieces are arranged into a mesh. Containers offer ephemeral hardware that’s easy to spin up and replace, leading to utility computing.

As consumers and business users of systems, we almost take for granted that a system will be always on and infinitely scalable. We don’t even consider the complexity of geo-replication of data or latency of trans-continental communication. If we need to wait more than a second or two, we move onto the next product or the next task.

With these maturing tastes, we now take for granted that an application can handle near infinite load without degradation to users, and that features can be upgraded and replaced without downtime. Imagine the absurdity if Google Maps went down every day at 10 pm so they could upgrade the system, or if Facebook went down if a million people or more posted at the same time.

We now take for granted that our applications can scale, and the naive LAMP and MEAN stacks are no longer relevant.

Characteristics of the Modern Stack

What does the modern stack look like? What are the elements of a modern system? I propose a modern system is cloud-native, utility-billed, infinite-scale, low-latency, user-relevant using machine learning, stores and processes disparate data types and sources, and delivers personalized results to each user. Let’s dig into these concepts.

A modern system allows boundless scale. As a business user, I can’t handle if my system gets slow when we add more users. If the site goes viral, it needs to continue serving requests, and if the site is seasonally slow, we need to turn down the spend to match revenue. Utility billing and cloud-native scale offers this opportunity. Mounds of hardware are available for us to scale into immediately upon request. If we design stateless, distributed systems, additional load doesn’t produce latency issues.

A modern system processes disparate data types and sources. Our systems produce logs of unstructured system behavior and failures. Events from sensors and user activity flood in as huge amounts of time-series events. Users produce transactions by placing orders or requesting services. And the product catalog or news feed is a library of documents that must be rendered completely and quickly. As users and stakeholders consume the system’s features, they don’t want or need to know how this data is stored or processed. They need only see that it’s available, searchable, and consumable.

A modern system produces relevant information. In the world of big data, and even bigger compute capacity, it’s our task to give users relevant information from all sources. Machine learning models can identify trends in data, suggesting related activities or purchases, delivering relevant, real-time results to users. Just as easily, these models can detect outlier activities that suggest fraud. As we gain trust in the insights gained from these real-time analytics, we can empower the machines to make decisions that deliver real business value to our organization.

MemSQL is the Modern Stack’s Database

Whether you choose to build your web properties in Java or C#, in Python or Go, in Ruby or JavaScript, you need a data store that can elastically and boundlessly scale with your application. One that solves the problems that Mongo ran into – that scales effortlessly, and that meets ACID guarantees for data durability.

We also need a database that supports the SQL standard for data retrieval. This brings two benefits: a SQL database “plays well with others,” supporting the vast number of tools out there that interface to SQL, as well as the vast number of developers and sophisticated end users who know SQL code. The decades of work that have gone into honing the efficiency of SQL implementations is also worth tapping into.

These requirements have called forth a new class of databases, which go by a variety of names; we will use the term NewSQL here. A NewSQL database is distributed, like Mongo, but meets ACID guarantees, providing durability, along with support for SQL. CockroachDB and Google Spanner are examples of NewSQL databases.

We believe that MemSQL brings the best SQL, distributed, and cloud-native story to the table. At the core of MemSQL is the distributed database. In the database’s control plane is a master node and other aggregator nodes responsible for splitting the query across leaf nodes, and combining the results into deterministic data sets. ACID-compliant transactions ensure each update is durably committed to the data partitions, and available for subsequent requests. In-memory skiplists speed up seeking and querying data, and completely avoid data locks.

MemSQL Helios delivers the same boundless scale engine as a managed service in the cloud. No longer do you need to provision additional hardware or carve out VMs. Merely drag a slider up or down to ensure the capacity you need is available.

MemSQL is able to ingest data from Kafka streams, from S3 buckets of data stored in JSON, CSV, and other formats, and deliver the data into place without interrupting real-time analytical queries. Native transforms allow shelling out into any process to transform or augment the data, such as calling into a Spark ML model.

MemSQL stores relational data, stores document data in JSON columns, provides time-series windowing functions, allows for super-fast in-memory rowstore tables snapshotted to disk and disk-based columnstore data, heavily cached in memory.

As we craft the modern app stack, include MemSQL as your durable, boundless cloud-native data store of choice.

Conclusion

Stacks have allowed us to simplify the sea of choices to a few packages known to work well together. The MEAN stack was one such toolchain that allowed developers to focus less on infrastructure choices and more on developing business value.

Sadly, the MEAN stack hasn’t aged well. We’ve moved on to the JAM stack, but this ignores the back-end completely.

As our tastes have matured, we assume more from our infrastructure. We need a cloud-native advocate that can boundlessly scale, as our users expect us to deliver superior experiences. Try MemSQL for free today, or contact us for a personalized demo.[Source]-https://www.memsql.com/blog/whats-after-the-mean-stack/

62 Hours Mean Stack Developer Training includes MongoDB, JavaScript, A62 angularJS Training, MongoDB, Node JS and live Project Development. Demo Mean Stack Training available.

0 notes

artificialintelligencebits · 6 years ago

Text

Five Trends In Machine Learning Ops: Takeaways From The First Operational ML Conference

Depositphotos I recently co-chaired the first conference on Machine Learning Ops - USENIX OpML 2019 . It was an energetic gathering of experts, practitioners and researchers who came together for one day in Santa Clara CA to talk about the problems, practices, new tools and cutting edge research on Production Machine Learning in industries ranging from finance, insurance, healthcare, security , web scale, manufacturing, and others . While there were many great presentations, papers, panels, and posters (too many to talk about individually - check out all the details here ), there were several emergent trends and themes. I expect each of these will expand and become even more prominent over the next several years as more organizations push ML into production and use machine learning ops practices to scale ML in production. Agile Methodologies meet Machine Learning Many practitioners emphasized the importance of iteration and continuous improvement to successful production ML. Much like software, machine learning improves through iteration and regular production releases. Those who have ML running at scale make it a point to recommend that projects should start with either no ML or simple ML to establish a baseline. As one practitioner put it , you don’t want to spend a year investing in a complex Deep Learning solution, only to find out after deployment that a simpler non-ML method can outperform it!. Bringing agility to ML also requires that the infrastructure be optimized to support agile rollouts . This means that successful production ML infrastructure includes automated deployment, modularity, use of microservices, and also avoiding excessive fine-grained optimization early on. Recognition that ML bugs differ from software bugs, ML specific production diagnostics Various presentations provided memorable examples of how ML errors not only bypass conventional production checks, they can actually look like better production performance! For example - an ML model that fails and generates a default output can actually cause a performance boost! YOU MAY ALSO LIKE Detecting ML bugs in production requires specialized techniques like Model Performance Predictors , comparisons with non-ML baselines, visual debugging tools and metric driven design of the operational ML infrastructure. Facebook, Uber, and other organizations experienced with large scale production machine learning emphasized the importance of ML specific production metrics that range from health checks to ML specific (such as GPU) resource utilization metrics . Rich Open Source ecosystem for all aspects of Machine Learning Ops The rich open source ecosystem for model development (with TensorFlow, ScikitLearn, Spark, Pytorch, R, etc.) is well known. OpML showcased how the open source ecosystem for Machine Learning Ops is growing rapidly, with powerful publicly available tooling used by large and small companies alike. Examples include Apache Atlas for Governance and Compliance, Kubeflow for machine learning ops on Kubernetes, MLFlow for lifecycle management and Tensorflow tracing for monitoring. Classic enterprise vendors are starting to integrate these open source packages to provide full solutions for their customers. An example is Cisco’s support of Kubeflow . Furthermore, web-scale companies are open sourcing the core infrastructure that drives their production ML, such as the ML Orchestration tool TonY from LinkedIn. As these tools become more prominent, practitioners are also documenting end-to-end use cases, creating design patterns that can be used as best practices by others. Cloud-based Services and SaaS make production ML easier For a team trying to deploy ML in production for the first few times, the process can be daunting, even with open source tools available for each stage of the process. The cloud offers an alternative. Since the resource management aspects (such as machine provisioning, auto-scaling, elasticity, etc.) are handled by the cloud backend, cloud deployments can be simpler. When accelerators (GPUs, TPUs, etc.) are used, production resource management can be challenging and using cloud services is a way to get started by leveraging the investments made by cloud providers to optimize accelerator usage . Cloud deployment can also create a ramp-up path for an IT organization to try ML deployment without a large in-house infrastructure roll out. Even on-premise enterprise deployments are moving to a self-service production ML model similar to that of a cloud service, enabling an IT organization to serve the production ML needs of multiple teams and business units. Expertise Leverage: Web-based At-scale ML Operations to Enterprise At-scale experts like LinkedIn, Facebook. Google, Airbnb, Uber, and others, who were the first ML adopters, had to build from scratch all of the infrastructure and practices needed to extract monetary value out of ML. These experts are now sharing not only their code but their practice experiences and hard-won learnings, all of which can be adopted for the benefits of enterprise. As the Experts Panel at OpML pointed out, the best practices that these organizations follow for ML infrastructure (from team composition and reliability engineering to resource management) contain powerful insights that enterprises can benefit from as they seek to expand their production ML footprint. Experiences from scale ML deployments at Microsoft and others can show enterprises how to deliver performant machine learning into their business applications. Other end-to-end experiences from at-scale companies showed how business metrics can be translated into ML solutions, and the consequent ML solution iteratively improved for business benefit. Finally, organizations facing the unique challenges that Edge deployment places on Machine Learning Ops can benefit from learning of scale deployments already in place . Summary A great op-ed piece by Michael Jordan in Medium - “ Artificial Intelligence: The Revolution Hasn’t Happened Yet ”, highlighted the need for an AI engineering practice. OpML 2019, the first Machine Learning Ops conference, illustrated how the ML and AI industry is maturing in this direction, with more and more organizations either struggling with the operational and lifecycle management aspects of production Machine Learning or pushing to scale ML operations and develop operational best practices. This is great news for the AI industry since it is a step further towards generating real ROI from AI investments. Trends like those above should help realize the long-awaited potential of AI-generated business value. Source: www.forbes.com Read the full article

#artificialintelligence #deeplearning #machinelearning

0 notes

repmywind02199 · 7 years ago

Text

Distributed systems: A quick and simple definition

Get a basic understanding of distributed systems and then go deeper with recommended resources.

The technology landscape has evolved into an always-on environment of mobile, social, and cloud applications where programs can be accessed and used across a multitude of devices.

These always-on and always-available expectations are handled by distributed systems, which manage the inevitable fluctuations and failures of complex computing behind the scenes.

“The increasing criticality of these systems means that it is necessary for these online systems to be built for redundancy, fault tolerance, and high availability,” writes Brendan Burns, distinguished engineer at Microsoft, in Designing Distributed Systems. “The confluence of these requirements has led to an order of magnitude increase in the number of distributed systems that need to be built.”

In Distributed Systems in One Lesson, developer relations leader and teacher Tim Berglund says a simple way to think about distributed systems is that they are a collection of independent computers that appears to its user as a single computer.

Virtually all modern software and applications built today are distributed systems of some sort, says Sam Newman, director at Sam Newman & Associates and author of Building Microservices. Even a monolithic application talking to a database is a distributed system, he says, “just a very simple one.”

While those simple systems can technically be considered distributed, when engineers refer to distributed systems they’re typically talking about massively complex systems made up of many moving parts communicating with one another, with all of it appearing to an end-user as a single product, says Nora Jones, a senior software engineer at Netflix.

Think anything from, well, Netflix, to an online store like Amazon, to an instant messaging platform like WhatsApp, to a customer relationship management application like Salesforce, to Google’s search application. These systems require everything from login functionality, user profiles, recommendation engines, personalization, relational databases, object databases, content delivery networks, and numerous other components all served up cohesively to the user.

Benefits of distributed systems

These days, it’s not so much a question of why a team would use a distributed system, but rather when they should shift in that direction and how distributed the system needs to be, experts say.

Here are three inflection points—the need for scale, a more reliable system, and a more powerful system—when a technology team might consider using a distributed system.

Horizontal Scalability

Computing processes across a distributed system happen independently from one another, notes Berglund in Distributed Systems in One Lesson. This makes it easy to add nodes and functionality as needed. Distributed systems offer “the ability to massively scale computing power relatively inexpensively, enabling organizations to scale up their businesses to a global level in a way that was not possible even a decade ago,” write Chad Carson, cofounder of Pepperdata, and Sean Suchter, director of Istio at Google, in Effective Multi-Tenant Distributed Systems.

Reliability

Distributed systems create a reliable experience for end users because they rely on “hundreds or thousands of relatively inexpensive computers to communicate with one another and work together, creating the outward appearance of a single, high-powered computer,” write Carson and Suchter. In a single-machine environment, if that machine fails then so too does the entire system. When computation is spread across numerous machines, there can be a failure at one node that doesn’t take the whole system down, writes Cindy Sridharan, distributed systems engineer, in Distributed Systems Observability.

Performance

In Designing Distributed Systems, Burns notes that a distributed system can handle tasks efficiently because work loads and requests are broken into pieces and spread over multiple computers. This work is completed in parallel and the results are returned and compiled back to a central location.

The challenges of distributed systems

While the benefits of creating distributed systems can be great for scaling and reliability, distributed systems also introduce complexity when it comes to design, construction, and debugging. Presently, most distributed systems are one-off bespoke solutions, writes Burns in Designing Distributed Systems, making them difficult to troubleshoot when problems do arise.

Here are three of the most common challenges presented by distributed systems.

Scheduling

Because the work loads and jobs in a distributed system do not happen sequentially, there must be prioritization, note Carson and Suchter in Effective Multi-Tenant Distributed Systems:

One of the primary challenges in a distributed system is in scheduling jobs and their component processes. Computing power might be quite large, but it is always finite, and the distributed system must decide which jobs should be scheduled to run where and when, and the relative priority of those jobs. Even sophisticated distributed system schedulers have limitations that can lead to underutilization of cluster hardware, unpredictable job run times, or both.

Take Amazon, for example. Amazon technology teams need to understand which aspects of the online store need to be called upon first to create a smooth user experience. Should the search bar be called before the navigation bar? Think of the many ways both small and large that Amazon makes online shopping as useful as possible for its users.

Latency

With such a complex interchange between hardware computing, software calls, and communication between those pieces over networks, latency can become a problem for users.

“The more widely distributed your system, the more latency between the constituents of your system becomes an issue,” says Newman. “As the volume of calls over the networks increases, the more you’ll start to see transient partitions and potentially have to deal with them.”

Over time, this can lead to technology teams needing to make tradeoffs around availability, consistency, and latency, Newman says.

Performance monitoring and observability

Failure is inevitable, says Nora Jones, when it comes to distributed systems. How a technology team manages and plans for failure so a customer hardly notices it is key. When distributed systems become complex, observability into the technology stack to understand those failures is an enormous challenge.

Carson and Suchter illustrate this challenge in Effective Multi-Tenant Distributed Systems:

Truly useful monitoring for multi-tenant distributed systems must track hardware usage metrics at a sufficient level of granularity for each interesting process on each node. Gathering, processing, and presenting this data for large clusters is a significant challenge, in terms of both systems engineering (to process and store the data efficiently and in a scalable fashion) and the presentation-level logic and math (to present it usefully and accurately). Even for limited, node-level metrics, traditional monitoring systems do not scale well on large clusters of hundreds to thousands of nodes.

There are several approaches companies can use to detect those failure points, such as distributed tracing, chaos engineering, incident reviews, and understanding expectations of upstream and downstream dependencies. “There’s a lot of different tactics to achieve high quality and robustness, and they all fit into the category of having as much insight into the system as possible,” Jones says.

Learn more

Ready to go deeper into distributed systems? Check out these recommended resources from O’Reilly’s editors.

Distributed Systems Observability — Cindy Sridharan provides an overview of monitoring challenges and trade-offs that will help you choose the best observability strategy for your distributed system.

Designing Distributed Systems — Brendan Burns demonstrates how you can adapt existing software design patterns for designing and building reliable distributed applications.

The Distributed Systems Video Collection — This 12-video collection dives into best practices and the future of distributed systems.

Effective Multi-Tenant Distributed Systems — Chad Carson and Sean Suchter outline the performance challenges of running multi-tenant distributed computing environments, especially within a Hadoop context.

Distributed Systems in One Lesson — Using a series of examples taken from a fictional coffee shop business, Tim Berglund helps you explore five key areas of distributed systems.

Chaos Engineering — This report introduces you to Chaos Engineering, a method of experimenting on infrastructure that lets you expose weaknesses before they become problems.

Designing Data-Intensive Applications — Martin Kleppmann examines the pros and cons of various technologies for processing and storing data.

Continue reading Distributed systems: A quick and simple definition.

https://ift.tt/2zLySgP

#FEED 10 TECHNOLOGY

0 notes

generativeinai · 3 months ago

Text

AIOps Platform Development in Cloud Computing: Optimizing Performance and Cost Efficiency

Cloud computing has become the backbone of modern IT infrastructure, enabling businesses to scale, innovate, and optimize costs. However, managing cloud environments efficiently is a challenge due to their complexity, dynamic nature, and sheer volume of data. This is where AIOps (Artificial Intelligence for IT Operations) platform development plays a crucial role.

By leveraging AI and machine learning, AIOps enhances cloud operations by automating performance monitoring, predictive analytics, and cost optimization. In this blog, we will explore the importance of AIOps in cloud computing, the key components of AIOps platform development, and how it helps businesses achieve better performance and cost efficiency.

Understanding AIOps in Cloud Computing

What is AIOps?

AIOps (Artificial Intelligence for IT Operations) is an AI-driven approach that automates IT operations, enhances observability, and optimizes cloud resources. It integrates machine learning, data analytics, and automation to proactively detect and resolve issues before they impact business operations.

Why is AIOps Essential in Cloud Computing?

Cloud environments are complex, with multiple workloads, containers, microservices, and hybrid architectures. Traditional IT operations struggle to manage such complexity efficiently. AIOps enables:

Real-time monitoring of cloud applications and infrastructure

Predictive analytics to detect anomalies and prevent outages

Automated issue resolution to reduce human intervention

Cost optimization through intelligent resource allocation

Key Components of AIOps Platform Development in Cloud Computing

1. Data Collection and Integration

AIOps platforms gather data from multiple sources such as logs, metrics, events, and performance reports. Cloud services like AWS CloudWatch, Azure Monitor, and Google Cloud Operations provide valuable insights for analysis.

2. Big Data Processing and Analytics

AIOps processes massive amounts of structured and unstructured data using:

Machine learning algorithms to identify patterns and anomalies

Natural Language Processing (NLP) to analyze log files

Event correlation to find root causes of issues

3. Automation and Remediation

AIOps enables self-healing IT operations by automating responses to incidents. For example:

Scaling up resources during high traffic loads

Restarting failed services automatically

Adjusting cloud configurations for optimal performance

4. Predictive and Preventive Analysis

By analyzing historical data, AIOps predicts potential failures and prevents downtime by recommending proactive measures.

5. Cost Optimization and Resource Management

AIOps platforms monitor cloud usage and suggest cost-saving strategies, such as:

Rightsizing instances to avoid over-provisioning

Auto-scaling to adjust resources based on demand

Identifying unused resources and terminating them

How AIOps Optimizes Performance in Cloud Computing

1. Intelligent Monitoring and Observability

AIOps enables real-time visibility into cloud applications, databases, and networks. It continuously analyzes logs and performance data to detect latency issues, slow database queries, and network congestion before they impact users.

2. Anomaly Detection and Root Cause Analysis

Traditional monitoring tools generate alerts for every small deviation, leading to alert fatigue. AIOps uses AI-driven anomaly detection to filter out noise and focus on critical issues.

3. Auto-healing and Self-correction

AIOps automates responses to cloud issues by:

Restarting failing virtual machines (VMs)

Reallocating workloads dynamically

Deploying patches and updates automatically

4. Workload Optimization and Auto-scaling

AIOps helps optimize cloud workloads by:

Predicting workload spikes and scaling resources accordingly

Balancing workloads across multiple cloud environments

Reallocating underutilized resources to prevent wastage

How AIOps Reduces Cloud Costs

1. Intelligent Cost Monitoring

AIOps tracks cloud expenses in real time and provides recommendations on budget optimization.

2. Eliminating Unused Resources

Many businesses forget to terminate idle cloud instances. AIOps detects and shuts down unnecessary resources, saving costs.

3. Rightsizing Virtual Machines and Containers

AIOps analyzes resource consumption and adjusts VM sizes to match workloads, avoiding over-provisioning.

4. Auto-scaling Based on Demand

Instead of running high-cost instances 24/7, AIOps enables dynamic scaling to match real-time demand.

5. Cost Forecasting and Budget Optimization

By analyzing historical spending patterns, AIOps helps organizations plan and optimize cloud budgets effectively.

Challenges in AIOps Platform Development for Cloud Computing

1. Data Complexity

Cloud environments generate massive amounts of data from different sources. Developing an AIOps platform requires efficient data processing capabilities.

2. Integration with Existing Systems

AIOps platforms must integrate with existing cloud monitoring tools (e.g., AWS CloudWatch, Azure Monitor) to provide unified insights.

3. Accuracy of AI Models

AI-driven predictions need to be highly accurate to avoid false alerts. Training AI models with high-quality datasets is crucial.

4. Security and Compliance

AIOps platforms must adhere to cloud security best practices to prevent unauthorized access and ensure compliance with GDPR, HIPAA, and other regulations.

Best Practices for AIOps Platform Development in Cloud Computing

✔ Use AI and ML for Continuous Improvement – Continuously train AI models using real-time cloud data for better accuracy.

✔ Implement Automation Gradually – Start with automated monitoring, then gradually implement self-healing capabilities.

✔ Ensure Multi-Cloud Compatibility – AIOps should work seamlessly across AWS, Azure, Google Cloud, and hybrid cloud environments.

✔ Focus on User Experience – The AIOps platform should provide a user-friendly dashboard for easy monitoring and decision-making.

✔ Prioritize Security and Compliance – Implement role-based access control (RBAC), encryption, and compliance auditing.

Conclusion

AIOps platform development in cloud computing is redefining IT operations by enhancing performance, reducing costs, and improving reliability. By leveraging AI-driven analytics, automation, and predictive intelligence, businesses can achieve better cloud efficiency and prevent outages before they happen.

As cloud environments continue to evolve, investing in AIOps is no longer an option but a necessity for enterprises looking to stay competitive in the digital age.

#ai #artificial intelligence #aiops

0 notes

atplblog · 2 months ago

Text

: ‎ 1803235446 ISBN-13 ‏ : ‎ 978-1803235448 Item Weight ‏ : ‎ 840 g Dimensions ‏ : ‎ 2.79 x 19.05 x 23.5 cm Country of Origin ‏ : ‎ India [ad_2]

0 notes

devopssentinel · 1 year ago

Text

POE AI: Redefining DevOps with Advanced Predictive Operations

0 notes

devopssentinel2000 · 1 year ago

Text

POE AI: Redefining DevOps with Advanced Predictive Operations

0 notes