#tokudb | Explore Tumblr posts and blogs

williamedwardscoder · 8 years ago

Text

Compressing MySQL databases

TL;DR: use TokuDB. Always. It massively reduces storage requirements and is massively faster than InnoDB, even on low end hardware. My tables were compressed to just 12% original size and that’s fairly typical.

A very simple structure:

CREATE TABLE test ( date DATE NOT NULL, site VARCHAR(8) NOT NULL, order_id VARCHAR(64) NOT NULL, time TIME NOT NULL, kind ENUM(...) NOT NULL, account INTEGER, user INTEGER, client_price BIGINT, supplier_id INTEGER, supplier_cost BIGINT, client_ref VARCHAR(32), data JSON, PRIMARY KEY (date, site, order_id), INDEX (account, user, client_ref) ) DEFAULT CHARSET utf8 PARTITION BY RANGE (TO_DAYS(date)) ( ...

This system is using the RDBMS as a document store; the order items are being flattened into a document store (the ‘data’ field).

The ETL is upserting order events as they are processed, using multi-row INSERT INTO ... ON DUPLICATE KEY UPDATE ...

And I’ve been running performance tests with some real data. 27M rows of real data, all on the same day (so all in the same partition). 37M upsert statements inserting or updating those 27M rows, issued in batches of 1K-2K per multi-row statement. This is enough rows to be able to get a grasp of upsert performance and extrapolate storage requirements.

The test box is an AWS t2.medium instance. This is a very weak VM with 2 (virtual) CPUs and 4GB RAM. Not normal DB fare, so performance beware.

The 27M rows take 10.48GB uncompressed. It takes InnoDB 10 hours to load all the data.

In the old days, MySQL InnoDB offered ‘row_compression’. This compresses individual rows. This reduces disk requirements to 5.73GB (45% space saved). So it halves the disk space! Sadly, the test now took 15 hours instead of 10 hours.

More recently, MySQL introduced ‘table compression’. (MariaDB also added it, calling it ‘page compression’; the CREATE TABLE syntax is infuriatingly slightly different).

Underneath, InnoDB is organized as ‘pages’. This table compression intercepts these page reads and writes and, as pages are written to disk, they are individually compressed and written as a ‘sparse’ file.

As these InnoDB pages are larger than the file system pages, there is the opportunity for a ‘hole’ to be ‘punched’ in each InnoDB page. Imagine an InnoDB page is 12KB, and the file system page size is 4KB. The page may compress down to 1KB, meaning it only needs the first file system page (4KB), and 8KB would be saved. This example illustrates how wasteful this approach can be; there can still be a lot of slack in each page. The total file-size as reported by ls is unchanged, but the actual used disk space (as shown by e.g. ls -s) is reduced. Page compression compresses this table to 4.07GB (61% savings). The bad news is that performance sinks awfully! I aborted the run after 32 hours, as upsert speed had dropped to under 100 orders/sec. I’ve seen this on MariaDB RDS and on my little t2.medium. Its atrocious!

TokuDB is an alternative storage engine to InnoDB. It is nicely packaged by Percona so its very straightforward to install.

TokuDB compresses databases with zlib by default. It compresses pages, but doesn’t store them as sparse files; instead, the total file size you see on disk is the total file size.

TokuDB with zlib compresses to 1.45GB (86% savings)! This is the same compression algorithm as InnoDB page compression and operating at much the same logical page level, but conferring much bigger savings because each ‘page’ is not rounded up to a multiple of the file system page size.

Because InnoDB puts the whole table (or, as in this case, partition) into a single file you cannot see how much of the storage is for the primary rows and how much is for secondary indexes. TokuDB, however, has separate files for each index. So in this case there are two files for the partition, and I can see that the 27M rows takes 1.02GB and the by-client index takes 0.44GB.

TokuDB allows you to specify other compression algorithms. The weakest (thus fastest) is ‘snappy’, which is an LZ77 compressor without entropy coding. It compresses to 2.31GB (78% savings; same ratio rows to index).

There is also QuickLZ (between snappy and zlib in CPU requirements; 1.95GB; 81% savings) and LZMA (stronger than zlib; 1.23GB; 88% savings!).

LZMA is a clear winner here. What’s fascinating about the TokuDB though is not just its space-saving but also its performance: it takes just 2 hours to run my ETL and choice of compression algorithm has no noticeable effect on that!

Now the dire uncompressed InnoDB performance is perhaps the classic “InnoDB is not good with databases that don’t fit into RAM”. I was using Percona packages so the defaults were not as mad as they used to be out-of-the-box with other linux sources, but I did up it to 2GB and a big log. Configuring InnoDB optimally is like reading tea leaves and no amount of doing it makes me confident I’ve hit the sweat spot for any particular workload. This upsert workload seems to be particularly stressful. The t2.medium has two (virtual) CPUs, and as upsert performance dropped I saw ‘top’ reporting load average over 2.0 even though ETL was waiting on the DB. Not good.

However the dire compressed InnoDB performance I saw repeated on a MariaDB RDS m4.2xlarge with the same data-set. There’s something wrong with either InnoDB page compression, or with EBS spare file performance!

On the other hand, TokuDB shows how to do things properly! The out-of-the-box defaults are very sane (it doesn’t use O_DIRECT; instead, it takes 50% to store uncompressed pages, and lets the OS cache the compressed pages in the normal file buffer cache. Inspired!). And the performance is just so stable! On the t2.medium, which is a piddling little box, I saw 5000 orders/sec sustained throughout the entire run.

I think I’ve heard that TokuDB is marginally slower than (uncompressed) InnoDB on small tables. And perhaps most users have small tables. I don’t think there’s actually that measurable a gap when everything is tiny. But the moment you go over a million rows or contemplate partitioning then I think TokuDB is the obvious choice. In fact it would make a very good default.

Oh, and for kicks, try a COUNT(*) on a big InnoDB table. And then go make a coffee! TokuDB ... snap! Not based on meta-data like MyIASM did neither; just so much faster.

TokuDB redefines what a ‘small’ database is. It moves the goalposts. It means I can build bigger systems without scaling sideways. I don’t get why everyone still uses InnoDB ;)

#mysql #tokudb #performance #programming #databases

0 notes

globalmediacampaign · 4 years ago

Text

Heads-Up: TokuDB Support Changes and Future Removal from Percona Server for MySQL 8.0

Back in December 2018, when we announced the general availability of Percona Server for MySQL 8.0, we also announced that the TokuDB Storage Engine has been marked as “deprecated” in this release, recommending to use the MyRocks Storage Engine as an alternative. We believe that MyRocks provides similar benefits for the majority of workloads and is better optimized for modern hardware. Since then, we have continued maintaining the storage engine in the 8.0 release, e.g. by incorporating bug fixes. However, the ongoing amount of changes that are still occurring in the upstream MySQL 8.0 codebase have been a constant challenge and a cause for concern. Maintaining TokuDB as part of the 8.0 codebase has become increasingly difficult and time-consuming. Based on the number of support requests and conversations on our forums, we have seen very little to no adoption of TokuDB in Percona Server for MySQL 8.0. We have therefore decided that the TokuDB storage engine will be disabled in future versions of Percona Server for MySQL 8.0. Beginning with Percona Server version 8.0.25, we’ll add a respective warning notice to the release notes, to inform users about this upcoming change. Timeline Starting with Percona Server version 8.0.26 (expected in Q3 2021), the storage engine will still be included in the binary builds and packages, but disabled by default. Users upgrading from previous versions will still be able to re-enable it manually, so they can perform the necessary migration steps. Starting with Percona Server for MySQL version 8.0.28 (expected to ship in Q1 2022), the TokuDB storage will no longer be supported and will be removed from the installation packages. It will still be part of the 8.0 source code, but not enabled in our binary builds. We intend to fully remove it in the next major version of Percona Server for MySQL (9.0 or whatever version that will be). Note that this change only applies to Percona Server for MySQL version 8.0 – TokuDB remains enabled and supported in Percona Server 5.7 until this release has reached the end of its support period. In case you’re still using TokuDB in Percona Server 8.0, we recommend switching to InnoDB or MyRocks instead as soon as possible. Please consult our documentation on how to migrate and remove the TokuDB storage engine. If you need any assistance with migrating your data or if you have any questions or concerns about this, don’t hesitate to reach out to us – our experts are here to help! https://www.percona.com/blog/2021/05/21/tokudb-support-changes-and-future-removal-from-percona-server-for-mysql-8-0/

0 notes

programmerandcoder · 6 years ago

Text

Using 7 Mysql Strategies Like The Pros

MySQL gives you Let to host lots of databases and it is named by you. Using MySQL and PHPMyAdmin ( my favorite management GUI ) has enabled me to insource numerous solutions we used to cover.

MySql is a database application of It is FREE on media and small scales business, it is supported on systems that were considered. Since 2009 Oracle buy Sun Microsystems ( such as MySQL ) to get 7.5 billons inducing user and programmers to start to debate the fate of their open - source database.

Almost any operating system and is operated in by mySQL Includes a controlled rate that is good I think it's the database manager together with all the rate of reaction to the procedures. Subqueries were one of the significant flaws of MySQL for quite a very long time; it had been notorious for dropping its way using a few degrees of sub-questions.

With MySQL, on the other hand, the Customer library is GPL, and that means you need to pay a commercial charge to Oracle or provide the source code of your program.PostgreSQL additionally supports data about data types, purposes and access methods from the system catalogs together with the typical information regarding databases, tables, and columns which relational databases maintain.

There are ways around the MySQL client library's licensing, the Route Atlassian decide to choose would be telling you where to get the JDBC connector out of for MySQL if you would like to join your Atlassian programs to a MySQL 38, and in which to drop the jar.

Seasoned staff if You'd like competently Accessible on-call assistance without paying serious cash ( DB2 or Oracle - degree paying ) Percona ( and MySQL ) is the friend. Matt Aslett of 451 Research unites ScaleBase to talk: scaling - outside of your MySQL DB, high availability strategies that are fresh, smartly managing a MySQL environment that is dispersed.

Conclusion Scalability is a matter of a theoretical Number of nodes It is also about the capacity to provide predictable performance And also to do this without adding management sophistication, proliferation of cloud, and geo-dispersed programs are adding to the sophistication MySQL hasn't been under so much strain that the mixture of innovative clustering/load balancing and management technology provides a possible solution ( c ) 2013 from The 451 Group.

Flexibility: no need to oversupply Online data Redistribution No downtime Read / Write dividing Optimal for scaling read - intensive software Replication lag - established routing Enhances data consistency and isolation Read stickiness following writes Ensure consistent and dispersed database functioning 100% compatible MySQL proxy Software unmodified Standard MySQL interfaces and tools MySQL databases unmodified Info is protected within MySQL InnoDB / MyISAM / etc.

The dilemma is solved by database encryption, but Once the root accounts are compromised, it can't prevent access. You get rid of the ability of SQL, although application level encryption has become easily the most flexible and protected - it is pretty difficult to use columns in WHERE or JOIN clauses.

It is possible to incorporate with Hashicorp Vault server through A keyring_vault plugin, fitting ( and even expanding - binary log encryption ) the features available in Oracle's MySQL Enterprise version. Whichever MySQL taste you use, so long as it's a current version, you'd have choices to apply data at rest encryption through the database server, so ensuring your information is also secured.

Includes storage - engine frame that System administrators to configure the MySQL database for performance. Whether your system is Microsoft Linux, Macintosh or UNIX, MySQL is a solution that is comprehensive with self - handling features that automate all from configuration and space expansion to database management and information design.

By migrating database programs that are current to MySQL, businesses are currently enjoying substantial cost savings on jobs that are brand new. MySQL is an open source, multi-threaded, relational database management system ( RDBMS ) written in C and C++.

The server is Acceptable for assignment - Critical, heavy - load production systems in addition to for embedding into mass installed applications. MySQL is interactive and straightforward to use, in comparison to other DBMS applications and is protected with a data protection layer providing information with encryption.

MariaDB is a general - purpose DBMS engineered with extensible Structure to support a wide group of use cases through pluggable storage engines.MySQL users may get tens of thousands of metrics in the database, and so this guide we will concentrate on a small number of important metrics that will let you obtain real-time insight into your database wellbeing and functionality.

Users have a number of options for monitoring Latency, by taking advantage of MySQL's both built-in metrics and from querying the operation schema. The default storage engine, InnoDB of MySQL, utilizes an area of memory known as the buffer pool to indexes and tables.

Since program databases -- and information warehouses -- are Constructed on SQL databases, also because MySQL is among the most well-known flavors of SQL, we compiled a listing of the highest MySQL ETL tools that will assist you to transfer data in and from MySQL database programs. KETL is XML - based and operates with MySQL to develop and deploy complex ETL conversion projects which require scheduling.

Blendo's ETL - as - a - service product makes it Simple to get data From several data sources such as S3 buckets, CSVs, and also a massive selection of third - party information sources such as Google Analytics, MailChimp, Salesforce and many others.

In we, Seravo Migrated all our databases from MySQL into MariaDB in late 2013 and through 2014 we also migrated our client's systems to utilize MariaDB. Dynamic column service ( MariaDB just ) is interesting since it allows for NoSQL form performance, and thus a single database port may offer both SQL and" not just SQL" for varied software project requirements.

MariaDB as the Number of storage motors and in excels Other plugins it ships together: Link and Cassandra storage motors for NoSQL backends or rolling migrations from legacy databases, Spider such as sharding, TokuDB with fractal indexes, etc.

MySQL is a relational database - Standard information schema also is composed of columns, tables, views, procedures, triggers, cursors, etc. MariaDB, therefore, has exactly the database structure and indicator and, on the other hand, is a branch of MySQL. Everything -- from the information, table definitions, constructions, and APIs -- stays identical when updating from MySQL into MariaDB.

MariaDB has experienced an increase in terms of Security features such as internal password and security management, PAM and LDAP authentication, Kerberos, user functions, and robust encryption within tablespaces, logs, and tables. MySQL can not do hash link or sort merge join - it merely can perform nested loops method that demands a lot of index lookups which might be arbitrary.

In MySQL single question runs as only ribbon ( with exception Of MySQL Cluster ) and MySQL problems IO requests one for question implementation, so if only query execution time is the concern many hard drives and the large variety of CPUs won't help.

With table layout and application design, you Can build programs working with huge data collections according to MySQL.OPTIMIZE assists for specific issues - ie it types indexes themselves and removers row fragmentation ( all for MyISAM tables ).

Even though it's Booted up to 8 TB, MySQL can't operate effectively with a large database. Mysql continues to be my favorite database because I started programming, so it's simple to install, it is easy to obtain an application that links to the database and perform the management in a graphical manner, many articles supervisors and e-commerce stores utilize MySQL by default, and it has let me execute many projects, I enjoy that many hosting providers have MySQL tutorial service at no extra price.

Mysql is fast the setup, and light requirements Are minimal and with few tools, I've used it in Windows and Linux with no difficulty in either, but the server operating system hasn't been a restriction and that I utilize it in a Linux environment whenever it is potential.

MySQL provides its code Beneath the GPL and gives the choice of Non - GPL commercial supply in the kind of MySQL Enterprise. MariaDB also supplies motor - separate table numbers to enhance the optimizer's functionality, speeding up query processing and data evaluation on the dimensions and arrangement of their tables.

Utilization in MySQL is sub - InnoDB and Optimum tables eventually become fragmented over time, undermining functionality. Shifting from MySQL into MariaDB is relatively simple and is a slice of cake for most systems administrators.

For program, Example Hosts ( even though they need to be okay with attaining MySQL via proxies ), the proxy layer, and perhaps a management host. You ought to check of the logs and settings files and confirm that they're not readable by others.

Data may be moved between MySQL servers, For instance via MySQL replication that is regular or inside a Galera cluster. Flexibility is incorporating the features your company needs, although pushing arbitrary JSON seems elastic.

Among those enterprise qualities, Informix relational Databases, recently launched a new variant ( v12.10. XC2 ) which supports JSON / BSON info as a native from inside the relational database frame and fully supports each the MongoDB APIs so that any program is composed to the MongoDB, protocol may just be pointed in the Informix server and it'll just work.

On top of the IBM Engineers ( Informix Is currently an IBM product ) extended the JSON kind to encourage files Up to 2 GB in size ( MongoDB limitations files to 16 MB). In MySQL and Oracle, working memory Is shared links because links Are serviced by a single procedure.

Noted Also :⇒ Use Of Quit SEO In 5 Days

0 notes

anaymalpani · 8 years ago

Text

Why should you migrate from MySQL to MariaDB?

technical reasons to migrate from MySQL to MariaDB: MariaDB offers several advantages in terms of performance and features.

First and foremost, MariaDB offers more and better storage engines. NoSQL support, provided by Cassandra, allows you to run SQL and NoSQL in a single database system. MariaDB also supports TokuDB, which can handle big data for large organizations and corporate users.

MySQL’…

View On WordPress

#Database #mariaDB #mysql #mysql vs maraDB

0 notes

snakindiaconsultancy-blog · 7 years ago

Text

MySQL and MariaDB. Which is Best?

Database Management has come a long way since the past decade. From Games applications to the government sector, databases are responsible for many of the services we utilize daily. The use of Database management is an absolute necessity in today’s IT-fuelled business environment. There’s always a debate that has been ongoing with no end is regarding whether My SQL is better than MariaDB. In this blog, we are going to compare both SQL and MariaDB in selecting the best database management and break down most important distinctions and discuss two of the key players in the database management: My SQL and MariaDB.But before going further, you need to make sure to have a clear understanding of what SQL and MariaDB are.

What is My SQL and why is it popular?

MySQL is one among the foremost versatile and widely-used options available, making it a safe choice and especially great for complex queries. MySQL is an extremely established database. As far as scalability is concern, SQL databases are vertically scalable, which means that you can increase the load on a single server by increasing things like CPU, RAM and more. My SQL is available for all major platforms, including Linux, Windows, Mac, BSD and Solaris. It is an open source and highly cost-effective, hence best for good business. MySQL may be a robust selection for any business that may have the benefit of its pre-defined structure and set schemas. For example, applications that need multi-row transactions - like accounting systems or systems that monitor inventory - or that run on heritage systems can thrive with the MySQL structure:

Features of MySQL

MySQL is written in C and C++ and has binaries for the following systems: Microsoft Windows, OS X, Linux, AIX, BSDi, FreeBSD, HP-UX, IRIX, NetBSD, Novell Netware, and many more.

MySQL is a relational database system, Client-Server Architecture, SQL Compatibility, Easy to use, scalable, Allows roll-back, Compatible on many operating systems, High Performance, High Flexibility, High Productivity.

MySQL is used by organizations include GitHub, US Navy, NASA, Tesla, Netflix, WeChat, Facebook, Zendesk, Twitter, Zappos, YouTube, Spotify.

What is MariaDB and why is it popular?

MariaDB is a fork of MySQL, the database structure and indexes of MariaDB are the same as MySQL. MariaDB plays a vital role and provides a drop-in replacement for organization running My SQL. It is fast, scalable and robust, with a rich ecosystem of storage engines, plugins and many other tools. MariaDB is more open and vibrant.

Features of MariaDB

MariaDB is written in C, C++, Bash, and Perl and has binaries for the following systems: Microsoft Windows, Linux, OS X, FreeBSD, OpenBSD, Solaris, and many more.

MariaDB comprises an extensive selection of storage engines, including high-performance storage engines, for working with other RDBMS data sources and offers many operations & commands unavailable in MySQL, hence eliminates/replaces features impacting performance negatively.

MariaDB is used by organization like Linux distributions, and more. Some organizations that use MariaDB include Google, Craigslist, Wikipedia, archlinux, RedHat, CentOS, and Fedora.

Who wins the battle MySQL and MariaDB?

Some of the similarities between these two databases are as follows:

Both reads and writes the same database files.

Both follow the same internal architecture

There are lots of differences between both the databases. Each of them has their own pros and cons to ponder. Here are all the pros and cons of both the databases.

Pros of MySQL

Here are the pros of using MY SQL:

MySQL is designed with a focus on the Web, Cloud and Big Data. There are a variety of user interfaces that can be implemented.

MySQL is widely adopted, easy to use and offers their code as open-source feature-rich relational database management software, it's been used in various environments.

My SQL is quick, reliable and battle tested and has an enormous user base, which is still growing.

Cons of MySQL

Here are the cons of using MY SQL:

My SQL has got a few stability issues.

Though MySQL is not for good for large sized data, it is hard to scale at the same time and not compliance to standard SQL.

MySQL isn't community driven and therefore has lagged

Pros of MariaDB

Here are the pros of using:

MariaDB has been optimised for performance and is far additional powerful than MySQL for large data sets.

Through the use of micro-second precision and extended user statistics, MariaDB provides better monitoring throughout.

Comparatively to MySQL, MariaDB supports considerably more engines (SphinxSE, Aria, FederatedX, TokuDB, Spider, ScaleDB, etc).

Cons of MariaDB

Here are the cons of using MariaDB:

The suitable libraries for new releases of MariaDB will not be deployed in time, which can lead to a needed upgrade to a more recent version because of dependencies.

MariaDB’s cluster version is not very stable.

To conclude…

When versatility, scalability and cost-effectiveness are the top concerns then MySQL is the most beneficial for a longer run, however if you require certain optimization like index condition push down, batched key access etc and speed is your main concern then MariaDB is best.

#MySQL MariaDB

0 notes

ankaapmo · 7 years ago

Text

This Week in Data With Colin Charles #54: Percona Server for MySQL Is Alpha - #Ankaa

This Week in Data With Colin Charles #54: Percona Server for MySQL Is Alpha I consider this to be the biggest news for the week: Alpha Build of Percona Server for MySQL 8.0. Experiment with it in a Docker container. It is missing column compression with dictionary support, native partitioning for TokuDB and MyRocks (excited to see that this is coming!), and encryption... https://ankaa-pmo.com/this-week-in-data-with-colin-charles-54-percona-server-for-mysql-is-alpha/ #Database #Mariadb #Mysql #Percona_Server

#database #mariadb #mysql #percona server #Actualités #Méthodes et organisation des process IT

0 notes

newseveryhourly · 7 years ago

Link

TL;DR RocksDB will probably supplant TokuDB as my go-to backend for bigger-than-RAM datasets A long time ago something amazing happened in database storage engines. Fractal Trees inventors spun off a... http://ift.tt/2H9i9oX

#ifttt

0 notes

iwillreadthesesomeday · 8 years ago

Link

via Instapaper: Unread

#IFTTT #Instapaper: Unread #I will read this

0 notes

ukd1-blog · 14 years ago

Link

A new-ish storage engine for MySQL - supports some interesting tradeoffs and an fractal tree indexing.

#mysql #tokudb #storage engine

20 notes · View notes

williamedwardscoder · 6 years ago

Text

Slow database? It might not be your fault

<rant>

Okay, it usually is your fault. If you logged the SQL your ORM was generating, or saw how you are doing joins in code, or realised what that indexed UUID does to your insert rate etc you’d probably admit it was all your fault. And the fault of your tooling, of course.

In my experience, most databases are tiny. Tiny tiny. Tables with a few thousand rows. If your web app is slow, its going to all be your fault. Stop building something webscale with microservices and just get things done right there in your database instead. Etc.

But, quite often, each company has one or two databases that have at least one or two large tables. Tables with tens of millions of rows. I work on databases with billions of rows. They exist. And that’s the kind of database where your database server is underserving you. There could well be a metric ton of actual performance improvements that your database is leaving on the table. Areas where your database server hasn’t kept up with recent (as in the past 20 years) of regular improvements in how programs can work with the kernel, for example.

Over the years I’ve read some really promising papers that have speeded up databases. But as far as I can tell, nothing ever happens. What is going on?

For example, your database might be slow just because its making a lot of syscalls. Back in 2010, experiments with syscall batching improved MySQL performance by 40% (and lots of other regular software by similar or better amounts!). That was long before spectre patches made the costs of syscalls even higher.

So where are our batched syscalls? I can’t see a downside to them. Why isn’t linux offering them and glib using them, and everyone benefiting from them? It’ll probably speed up your IDE and browser too.

Of course, your database might be slow just because you are using default settings. The historic defaults for MySQL were horrid. Pretty much the first thing any innodb user had to do was go increase the size of buffers and pools and various incantations they find by googling. I haven’t investigated, but I’d guess that a lot of the performance claims I’ve heard about innodb on MySQL 8 is probably just sensible modern defaults.

I would hold tokudb up as being much better at the defaults. That took over half your RAM, and deliberately left the other half to the operating system buffer cache.

That mention of the buffer cache brings me to another area your database could improve. Historically, databases did ‘direct’ IO with the disks, bypassing the operating system. These days, that is a metric ton of complexity for very questionable benefit. Take tokudb again: that used normal buffered read writes to the file system and deliberately left the OS half the available RAM so the file system had somewhere to cache those pages. It didn’t try and reimplement and outsmart the kernel.

This paid off handsomely for tokudb because they combined it with absolutely great compression. It completely blows the two kinds of innodb compression right out of the water. Well, in my tests, tokudb completely blows innodb right out of the water, but then teams who adopted it had to live with its incomplete implementation e.g. minimal support for foreign keys. Things that have nothing to do with the storage, and only to do with how much integration boilerplate they wrote or didn’t write. (tokudb is being end-of-lifed by percona; don’t use it for a new project 😞)

However, even tokudb didn’t take the next step: they didn’t go to async IO. I’ve poked around with async IO, both for networking and the file system, and found it to be a major improvement. Think how quickly you could walk some tables by asking for pages breath-first and digging deeper as soon as the OS gets something back, rather than going through it depth-first and blocking, waiting for the next page to come back before you can proceed.

I’ve gone on enough about tokudb, which I admit I use extensively. Tokutek went the patent route (no, it didn’t pay off for them) and Google released leveldb and Facebook adapted leveldb to become the MySQL MyRocks engine. That’s all history now.

In the actual storage engines themselves there have been lots of advances. Fractal Trees came along, then there was a SSTable+LSM renaissance, and just this week I heard about a fascinating paper on B+ + LSM beating SSTable+LSM. A user called Jules commented, wondered about B-epsilon trees instead of B+, and that got my brain going too. There are lots of things you can imagine an LSM tree using instead of SSTable at each level.

But how invested is MyRocks in SSTable? And will MyRocks ever close the performance gap between it and tokudb on the kind of workloads they are both good at?

Of course, what about Postgres? TimescaleDB is a really interesting fork based on Postgres that has a ‘hypertable’ approach under the hood, with a table made from a collection of smaller, individually compressed tables. In so many ways it sounds like tokudb, but with some extra finesse like storing the min/max values for columns in a segment uncompressed so the engine can check some constraints and often skip uncompressing a segment.

Timescaledb is interesting because its kind of merging the classic OLAP column-store with the classic OLTP row-store. I want to know if TimescaleDB’s hypertable compression works for things that aren’t time-series too? I’m thinking ‘if we claim our invoice line items are time-series data…’

Compression in Postgres is a sore subject, as is out-of-tree storage engines generally. Saying the file system should do compression means nobody has big data in Postgres because which stable file system supports decent compression? Postgres really needs to have built-in compression and really needs to go embrace the storage engines approach rather than keeping all the cool new stuff as second class citizens.

Of course, I fight the query planner all the time. If, for example, you have a table partitioned by day and your query is for a time span that spans two or more partitions, then you probably get much faster results if you split that into n queries, each for a corresponding partition, and glue the results together client-side! There was even a proxy called ShardQuery that did that. Its crazy. When people are making proxies in PHP to rewrite queries like that, it means the database itself is leaving a massive amount of performance on the table.

And of course, the client library you use to access the database can come in for a lot of blame too. For example, when I profile my queries where I have lots of parameters, I find that the mysql jdbc drivers are generating a metric ton of garbage in their safe-string-split approach to prepared-query interpolation. It shouldn’t be that my insert rate doubles when I do my hand-rolled string concatenation approach. Oracle, stop generating garbage!

This doesn’t begin to touch on the fancy cloud service you are using to host your DB. You’ll probably find that your laptop outperforms your average cloud DB server. Between all the spectre patches (I really don’t want you to forget about the syscall-batching possibilities!) and how you have to mess around buying disk space to get IOPs and all kinds of nonsense, its likely that you really would be better off perforamnce-wise by leaving your dev laptop in a cabinet somewhere.

Crikey, what a lot of complaining! But if you hear about some promising progress in speeding up databases, remember it's not realistic to hope the databases you use will ever see any kind of benefit from it. The sad truth is, your database is still stuck in the 90s. Async IO? Huh no. Compression? Yeah right. Syscalls? Okay, that’s a Linux failing, but still!

Right now my hopes are on TimescaleDB. I want to see how it copes with billions of rows of something that aren’t technically time-series. That hybrid row and column approach just sounds so enticing.

Oh, and hopefully MyRocks2 might find something even better than SSTable for each tier?

But in the meantime, hopefully someone working on the Linux kernel will rediscover the batched syscalls idea…? ;)

2 notes · View notes

globalmediacampaign · 4 years ago

Text

Percona XtraBackup 8 Enables –lock-ddl by Default

Percona XtraBackup 8.0.23 enables the “lock-ddl” option by default to ensure any DDL events do not corrupt the backups. Any DML events continue to occur, and only DDL events are blocked. A DDL lock protects the definition of tables and views. With the “–lock-ddl” option disabled, Percona XtraBackup allows backups while concurrent DDL events continue to happen. These backups are invalid and fail at the Prepare stage. If DDL events have occurred without the backup user’s knowledge, they may be unaware the backup is corrupt. MySQL 8.0 introduced “Lock Instance for Backup”. This backup lock is lightweight, and the performance for DML operations is almost the same with or without backup locks. This lock is not available in 5.7, because Percona Server for MySQL has its own lightweight backup lock for that version. Percona XtraBackup 8 also has “Lock Tables for Backup”. An active “Lock Tables for Backup” metadata lock (MDL) blocks all DDL statements and updates to MyISAM, CSV, MEMORY, ARCHIVE, TokuDB, and MyRocks tables. The PERFORMANCE_SCHEMA or PROCESSLIST displays “Waiting for backup log” when this lock is active. https://www.percona.com/blog/2021/01/27/percona-xtrabackup-8-enables-lock-ddl-by-default/

0 notes

nosql · 12 years ago

Link

Vadim Tkachenko:

Provide high insertion rate

Provide a good compression rate to store more data on expensive SSDs

Engine should be SSD friendly (less writes per timeperiod to help with SSD wear)

Provide a reasonable response time (within ~50 ms) on SELECT queries on hot recently inserted data

Looking on these requirements I actually think that TokuDB might be a good fit for this task.

There are solutions in the NoSQL space that are optimized for this scenario: Cassandra or OpenTSDB. Indeed using one of these will have an impact on the application side.

Most of the time when the requirements dictate looking into different solutions, the easiest to estimate is the initial costs: development (nb: this doesn’t include only pure development, but also learning costs, etc.) and hardware costs.

Unfortunately many times we ignore taking into consideration long term costs:

maintenance costs (hardware, operations, enhancements)

opportunity costs (features that the current architecture won’t be able to support as being either impossible or too expensive)

accounting for the risks of failed initial designs (the technical debt costs)

Way too many times we optimize for the initial costs (the general excuse is that familiarity delivers faster—with the more scientific forms: time to market is essential and premature optimization is the root of all evil), while ignoring almost completely the ongoing costs.

Original title and link: Considering TokuDB as an engine for timeseries data... or Cassandra or OpenTSDB (NoSQL database©myNoSQL)

#TokuDB #Cassandra #OpenTSDB #column store #BigTable #Dynamo #key-value store

10 notes · View notes

gslin · 10 years ago

Text

Percona 買下 Tokutek (TokuDB 的公司)

剛剛看到的，應該是在 Percona 的 conference 上宣佈的：��Tokutek now part of the Percona family」、「Team Tokutek is proud to join Team Percona!」。

這代表之後 Percona 會大力推動 TokuDB 了…

View On WordPress

#database #db #mysql #percona #tokudb #tokutek

0 notes

bdarfler · 12 years ago

Link

I hope we see a supported MongoDB fork based on Tokutek.

#MongoDB #Tokutek #TokuDB

0 notes

williamedwardscoder · 7 years ago

Text

Why I am excited about RocksDB

TL;DR RocksDB will probably supplant TokuDB as my go-to backend for bigger-than-RAM datasets

A long time ago something amazing happened in database storage engines. Fractal Trees inventors spun off a company called Tokutek to productify them.

The resulting TokuDB storage engine ran rings around all other engines. Now it did this in part because they smudged the integration a bit; running the Tokutek-packaged TokuDB always massively outperformed a generic TokuDB packaged by Percona or MariaDB because the tweak wasn’t just to the storage layer but also, I believe, to the way MySQL integrated with storage engines?

And, roughly at the same time, deep inside Google they combined Log Structured Merge (LSM) Trees with Sorted String Tables (SSTables).

Now if you draw FT and LSM on a white-board side-by side and step back and squint they are actually fairly similar. The dominating thing is that layers are hierarchical - the storage has the same shape and the IO is fairly equivalent.

There’s this pattern emerging where Google publish some ground-breaking utility code (often after they’ve already started on its replacement internally, it seems) and Facebook iterate and maintain it. LevelDB became RocksDB. Snappy got FB sponsoring LZ4 ZStd etc perhaps? And so on.

And so RocksDB emerges as a well-supported, stable but also constantly improving LSM+SSTable that has excellent compression too. The thing to underline here is that RocksDB is not about to be abandoned and bitrot; instead, its only going to get better!

Tokutek did not get the big success it deserved in the DB world, and eventually got brought by Percona. And the TokuDB packaged by Percona or MariaDB using the constraining MySQL storage engine interface was not as world-beating as that originally integrated by Tokutek. TokuDB was still my preferred storage engine though. (previously: Compressing MySQL stats). One thing that is exciting though is that Percona may finally integrate faster UPSERT hints - about time!!

Facebook have already integrated RocksDB into MySQL as myrocks.io. I haven’t used it. What has got me excited though is that Percona seem to be packaging it up in Percona-Server too! Here’s a nice set of slides from a talk I wished I could have watched: TokuDB vs MyRocks

There is another general pattern in the DB world today: swap out your storage engine for RocksDB! Just yesterday we saw the stunning results Instagram got with Cassandra doing it. Even MongoDB went that way, and once upon a time TokuMX was vying for that pie.

So TokuDB may still be better than RocksDB for all kinds of technical reasons, but in the end, backing RocksDB is going to be less pain and technical debt longer-term. It might even support foreign keys and other stuff that TokuDB never got around to.

I am quite sad for Tokutek’s plight and legacy though.

Oh, and I wish someone would bite the bullet and overhaul the MySQL storage engine layer and add NOAR support everywhere and make it so that everyone can have faster MySQL. Tokutek showed there’s still so much room for improvement.

#programming #mysql #databases #tokudb #rocksdb

1 note · View note