Monthly Archives: September 2014

Online Truncate of InnoDB UNDO Tablespaces

We have received a lot of requests from our user community regarding the ability to  truncate UNDO tablespaces (‘truncate’ here means that the size of the tablespace is reset back to what it was when it was first created). We are happy to say that we’ve now been able to implement this anticipated feature.

Introduction

The InnoDB UNDO tablespace(s) host rollback segments that hold rollback information related to database changes. This information is used to rollback a transaction and to retrieve the previous version of a record that has been updated or deleted for multi-version concurrency control (MVCC). Once a transaction is committed, InnoDB will discard the related UNDO log records. UNDO log records for updates or deletes will be kept around as long as there exists an open transaction that may access older versions of the records. When all such open transactions are committed then the associated UNDO log records can be discarded.

Currently the InnoDB purge thread(s) will try to free UNDO log pages containing old UNDO log records so that those pages can be re-used. But depending on the retention window (dictated by active transactions that need those UNDO log records) the UNDO tablespace can continue to grow in size, continuously allocating new pages. Because of that, with multiple active long running transactions the total UNDO tablespace size could continue to increase and grow considerably over time.

In order to re-claim the space and help reduce the disk foot-print of an UNDO tablespace that has grown much larger than is necessary, we have introduced a new option to truncate an UNDO tablespace.

Turning This Feature “ON”

Whether or not to support truncation of UNDO tablespaces is controlled using the innodb_undo_log_truncate option, which is turned “OFF” by default. Users can turn that “ON” and “OFF” dynamically, so as to support the feature on an as needed basis.

There are a few important semantics that affect when the actual truncate operation can run:

  • Truncate will happen only if UNDO logs are configured to use UNDO tablespaces. It will not be possible to truncate UNDO logs space that is stored within the system tablespace.

Condition: --innodb-undo-tablespaces > 0

  • Since the truncate is being done online we don’t want to halt server operations, so for truncate to progress we need to have at least 2 UNDO tablespaces and at least 2 REDO enabled UNDO logs configured to use these UNDO tablespaces.

Condition: --innodb-undo-tablespaces >= 2 and --innodb-undo-logs >= 35

(The value of 35 comes from how the UNDO logs (rollback segments or rsegs) are allocated—0: REDO enabled rseg allocated in the system tablespace, 1-32: non-REDO enabled rsegs allocated in the temporary tablespace, 33-n: REDO enabled rsegs allocated in UNDO tablespaces)

How It Works

Provided all the above mentioned conditions are met, then InnoDB will try to do the following:

  1. Select/Mark an UNDO tablespace for truncate. This is done in a round-robin fashion to avoid bias selection.
  2. Make all the rollback segments (rsegs) residing in the selected UNDO tablespace inactive. Inactive means that these rollback segments will not be allocated to new transactions. Existing transactions that are using these rollback segments will continue to progress without being affected.
  3. The purge system will continue to free rollback segments that are no longer needed. This marks the pages allocated to rollback segments as being free and reduces the logical size of the rollback segments.
  4. Once all rollback segments residing within UNDO tablespaces are freed, an actual file system truncate operation is carried out on the marked UNDO tablespace. It is then re-initialized to the default size that was set when it was created new.
  5. The truncated rollback segments are made active again so that they can be allocated to new transactions.

Accelerating the Freeing of Rollback Segments

As mentioned previously, the purge thread will try to free rollback segments that are no longer needed. Truncation of UNDO tablespaces cannot proceed until it has freed all of the rollback segments.

Previously the purge thread used to invoke the truncation of rollback segments every 128th time a purge was run. This was done to preserve performance, since the truncation activity can be resource intensive.

This static value of 128 is now replaced with the innodb_purge_rseg_truncate_frequency option, giving users more flexibility to control how frequently the freeing of rollback segments should occur.

innodb_purge_rseg_truncate_frequency: Default: 128, Min: 1, Max: 128

For example:
mysql> SET GLOBAL innodb_purge_rseg_truncate_frequency = 64;

Users can continue to use the default setting of 128 during normal operations to maintain the overall performance. This freeing/truncation operations can be accelerated as needed by reducing this variable and thus increasing the frequency.

Minimum Size of UNDO Tablespace to Truncate

We don’t want to truncate each and every tablespace. Neither should this operation result in continuous action such that one after another tablespace is qualified for truncate.

To address this issue there is some minimum size of UNDO tablespace criterion that needs to be met (dictated by innodb_max_undo_logs_size)

innodb_max_undo_logs_size: Threshold after which UNDO tablespace will qualify for truncate.  Default: 1 GB, Min: 10 MB, Max: ULONGLONG_MAX

For example:
mysql> SET GLOBAL innodb_max_undo_logs_size = 100M;

Only if the UNDO tablespace size exceeds the configured threshold will it be considered for truncation. Users should tune the value according to their operating environment and needs.

Performance Impacts

With the feature turned “OFF” there should not be any performance impact whatsoever. With the feature turned “ON”, however, there can be some performance impact. Given that the truncation operations are being done online while the server continues to accept new requests, there can be a performance impact on the user transactions. Let’s understand where the potential impact comes from.

As stated above, rollback segments residing in UNDO tablespaces are marked for truncation and are made inactive. This of course then limits the related available UNDO log management resources.

For example, if you have 2 UNDO tablespaces and 128 UNDO logs, and 95 of these are redo rsegs residing in those 2 UNDO tablespaces, then taking 1 of the UNDO tablespaces “offline” in order to truncate it means that we’re effectively making 48 UNDO logs unavailable for transaction processing, thus reducing the UNDO log processing resources by half. This will of course have some impact on the transaction processing rate (TPS) while the truncation operation runs.

The exact performance impact depends on a number of factors:

  • The number of UNDO tablespaces
  • The number of UNDO logs
  • The size of the UNDO tablespace
  • The speed of the I/O subsystem
  • Whether or not any long running transactions exist
  • Whether or not the system is otherwise heavily loaded

Based on our internal testing we found that — with innodb_undo_tablespaces=8, innnodb_undo_logs=128, and innodb_max_undo_logs_size=100M — at 128 threads the sysbench OLTP_RW workload drop in TPS is less than 5% on a server with an average IO subsystem.

Once the truncate operation completes and all of the rollback segments are made active again the performance will naturally improve again as more resources become available. The performance should in fact be slightly improved compared to before the truncation operation because UNDO log record allocation should be faster as the UNDO tablespace is smaller and less fragmented.

Summary

We now have a server option that allows you to reclaim some of the disk space used for UNDO tablespace management dynamically, without having to stop mysqld. This feature can be turned “ON” and “OFF” dynamically and on an as needed basis, for example when the overall server load is expected to be light (e.g. at 4AM every Saturday).

The related configuration options are:

  1. To enable the general feature: --innodb-undo-log-truncate
  2. To configure undo-tablespaces: --innodb-undo-tablespaces
  3. To configure undo-logs: --innodb-undo-logs
  4. To accelerate rsegs free: --innodb_purge_rseg_truncate_frequency
  5. To set size threshold: --innodb_max_undo_logs_size

We look forward to your feedback on this new feature! Please let us know if you encounter any seeming bugs, or if you have more general feedback. Thank you!

Optimizer Cost Model Improvements in MySQL 5.7.5 DMR

In a previous blog post we presented some of the issues with the current optimizer cost model and listed several ideas for improvements. The new 5.7.5 DMR contains the result of our initial work on improving the optimizer’s cost model:

  • Cost Model for WHERE Conditions. In previous versions of MySQL, the estimated number of rows from a table that will be joined with the next table only takes into account the conditions used by the access method. This often led to record estimates that were far too high and thus to a very wrong cost estimate for the join. With wrong cost estimates, the join optimizer might not find and choose the best join order. To solve this issue, a cost model that includes the entire query condition on the table when calculating the number of rows produced from a table has been implemented. This model estimates the filtering effect that the table’s conditions have on the number of records read from the table. For more details on how condition filtering works, see two earlier blog posts on this topic: part1, part2. With this feature, the join optimizer should in many cases be able to find a more optimal join order and produce a better query plan.
  • More Accurate Index Statistics. When joining tables, index statistics are used for selecting which index to use. In MySQL the index statistics contain an estimate for the number of records that exist per key value within the index. Due to the previous use of integer values for this records per key estimate, the index statistics had a precision that was too low for indexes with high selectivity or cardinality. There had previously been no means to weigh the relative value of an index that has an average of 1.9 records per key value from another one that has 1.1 records per key value. This problem is now fixed by switching to the use of floating point values for the index statistics within the MySQL server. Since the index statistics are maintained by the storage engines, the more accurate index statistics must be supplied by the storage engine. In the 5.7.5 DMR, InnoDB has been changed to provide the more accurate index statistics using the new floating point format.
  • Configurable Cost Constants. Previously, the basic cost values that the optimizer used when calculating cost estimates for queries were hard coded within the source code. Examples of such cost constants were the cost of reading a disk page, the cost of evaluating a query condition on a record, or the cost of comparing key values. The hard coded cost constants have now been replaced by user configurable costs. Two new tables—server_cost and engine_cost—have been added to the mysql system database. The cost values can now be changed by simply updating the corresponding rows in these two new tables.

We have also done some initial refactoring of the cost model code. The main focus here has been to replace the use of hard coded cost constants with function calls into the new cost model API. As part of this work we also introduced a new cost_estimate object that’s used for keeping track of cost estimates within the new cost model code.

If you are attending MySQL Central @ OpenWorld next week, I will present the MySQL Cost Model talk, where I will go into greater detail about all of the new cost model related features that are implemented in the new 5.7.5 DMR. I look forward to seeing you there!

We value your feedback! Please let us know if you encounter any bugs, or if you have more general feedback.

UPDATE: My OpenWorld presentation is now available here!

About the Data Dictionary Labs Release

For a long time, the MySQL development community and many others have wanted a server that worked without FRM files.  The motivation behind removing FRM files, and the design goals around new data dictionary, can be explored in more detail in the blog post by Ståle Deraas “A New Data Dictionary for MySQL”.

And now for the good news! We have a MySQL Labs Release ready with a preview of the new Data Dictionary!

What is in the first MySQL Data Dictionary labs release?

First of all, the FRM files are now gone. The MySQL server no longer creates FRM files, ever. The server stores table meta-data in the data dictionary tables which use the InnoDB storage engine. For more details on the schema definitions of data dictionary tables, see WL#6379. One of the advantages with the Data Dictionary being stored within the InnoDB storage engine is that the MySQL server can now use InnoDB indexes to efficiently search for database objects. In addition to that, it provides crash-safe dictionary operations now and paves the way for transactional operations in the future.

Secondly, we have improved the Information Schema implementation. The Information Schema tables are designed to be simple SQL VIEWs against the data dictionary tables. This eliminates the need for creating temporary tables for every Information Schema table used within a query, as it is done today in MySQL servers without the new Data Dictionary.

The third improvement is the new “Data Dictionary API” framework, which provides object oriented API layers for server, storage engines, and plugins to access the data dictionary.  We have implemented an internal API layer used by the MySQL server to get and update data dictionary information, and an external API layer designed for plugins and storage engines. You can see a diagram of the new API framework below.

You can also see other improvements — for example, InnoDB is now used for the meta-data tables like the ‘time zones’ table, various help tables, a ‘plugins’ table, and a ‘servers’ table in order to provide crash-safe dictionary operations. Dictionary tables are also now created by mysqld itself, removing the need for separate bootstrap operations.

We are working towards the key design goals outlined in Ståle Deraas’ blog post.

Given the fact that this is our very first Data Dictionary labs release, there are a number of issues worthy of special note:

  • Some of the dynamic meta data provided in INFORMATION_SCHEMA may not be up-to-date, for instance columns like CARDINALITY, TABLE_ROWS, AUTO_INCREMENT, MAX_DATA_LENGTH, DATA_LENGTH, and DATA_FREE. Running the ANALYZE TABLE command just before the query would ensure that INFORMATION_SCHEMA has the latest values.
  • INFORMATION_SCHEMA queries under the LOCK TABLES command are not recommended to be used in the labs release.
  • Meta-data for columns within VIEWs are not provided through INFORMATION_SCHEMA tables and SHOW statements.
  • Meta-data for temporary tables is not provided through the DESCRIBE command.
  • Meta-data for foreign key constraints is not provided through INFORMATION_SCHEMA.
  • We are working to make the presentation of Unicode strings better through the Information Schema.
  • Collation and character set data is populated only on server install, and may not be changed on a later server restart.
  • The preview is not suited for large data sets.
  • Upgrading from any previous MySQL database version is not supported.
  • The only platform supported for the preview is Linux.

We do not recommend using the labs release in a production environment, but rather to install it on a spare server so that you’re able to preview what’s coming and provide feedback. We look forward to your input! Please let us know if you encounter any bugs or have other general feedback.

For additional details, you can join Alexander Nozdrin at OOW14 where he will talk about “New Data Dictionary: An Internal Server API That Matters“. If you’re unable to make it in person, we will make the presentation available online shortly afterwards.

Thanks for using MySQL!

UPDATE: Alexander’s presentation is now available here!

A New Data Dictionary for MySQL

For a long time, there have been complaints about deficiencies of the data dictionary of MySQL. Many have expressed a lack of love for FRM files, see Morgan’s blog post and Stewart Smith’s post MySQL Architecture.

We are now designing and implementing a new and improved data dictionary for MySQL, and some key design goals are:

  • Store dictionary information in transactional storage. We will first focus on InnoDB, but other storage engines might follow
  • Consolidate distributed dictionary information for the server into a unified dictionary
  • Store all dictionary information in a uniform way, with uniform APIs for all dictionary objects
  • Get rid of filesystem-property induced problems
  • Make APIs cache-friendly, so caches can eventually be hidden behind the API
  • Pave the way for transactional DDL. This is key for improved reliability of replication
  • Implement Information Schema as views on dictionary tables, allowing optimization of I_S queries
  • Provide an API for serializing dictionary information for ease of integration of other storage engines and for redundancy
  • Add versioning of meta-data for assisting upgrades
  • Provide an upgrade path from MySQL versions with “old” dictionary that does not require dump/restore
  • Provide mechanisms for moving larger data sets around

At http://labs.mysql.com/ you can now preview the new dd in action.  Gopal Shankar’s blog post explains what the labs release contains.

You can join Alexander Nozdrin at OOW14 where he will talk about New Data Dictionary: An Internal Server API That Matters.

Alexander will explain the new data dictionary architecture depicted below:

For further descriptions on the schema definitions see WL#6379. The worklog WL#6380 outlines API principles for unified handling of dictionary objects. Given the uniformity of dictionary objects, there will be a lot of common functions, and those are described in WL#7284. WL#6382 describes the API for Table objects.

Please note that all the descriptions are subject to minor changes, and might be updated.

MySQL Labs Releases @ OpenWorld 2014

In addition to the recently announced 5.7.5 DMR, I am happy to draw your attention to some of our work-in-progress. This work has not yet reached production quality, but we want to share it with you in the form of four Labs releases being made available for MySQL Central @ OpenWorld 2014.

New Data Dictionary

We are excited to demonstrate the new Data Dictionary! With this labs release you will be able to run MySQL with the new Data Dictionary in action. MySQL meta-data has been consolidated in a common database schema and stored within transactional, crash-safe, InnoDB tables. Information Schema is also now implemented as standard SQL VIEWs on these tables, which will greatly improve the overall performance of Information Schema. MySQL can also now bootstrap itself without the need for any external scripts (mysql_install_db, etc.). This is the first milestone for us on the new Data Dictionary project: No more FRM files, improved Information Schema performance, and no more scripts required to start the database! Enjoy, and please stay tuned for even more work on this new project!

For some additional background information, see Morgan Tocker’s blog post “Beyond the FRM: ideas for a native MySQL Data Dictionary“.

MySQL Optimizer/InnoDB/Replication Preview

Optimizer: Computed Virtual Columns

We showcase Virtual Columns. Virtual Columns can be be used in several ways. The non-stored virtual columns can be used as a way to simplify and unify queries. Stored (or persisted) Virtual Columns can be used as a materialized cache for complicated conditions that are costly to calculate on the fly. Another use for stored Virtual Columns is as a substitute for true functional indexes (stay tuned on that topic!); the user would create a stored Virtual Column using the required functional expression, and then define a secondary index on it.

Virtual Columns (WL#411) are based on a community contribution by Andrey Zhakov. Thank you, Andrey!

Optimizer: Query Rewrite Plugins

We showcase our Query Rewrite Plugin, as well as a framework (APIs) for people to write their own query rewrite plugins. A query rewrite plugin specifies how certain queries that arrive at the server should be rewritten before they are processed and executed by the server. One usage is to enforce/prevent a certain query plan by adding hints to a query. Our query rewrite plugin is a superb tool for handling problematic queries when users cannot rewrite the query within the application itself, e.g. because it originates from an external tool (like an Object Relational Mapping tool such as Hibernate) or from some other 3rd party application.

InnoDB: Memcached

We showcase support for multiple get and range searches using the Memcached protocol. InnoDB Memcached’s read-only queries in 5.7 already achieved a remarkable 1.1 million QPS record. Now, the read query bottleneck shifts to the memcached client itself. Thus anything that can batch the queries helps, and this is what multiple get gives you. In addition, “range searches” extend the Memcached protocol to allow users to fetch multiple keys within a range (e.g. get @<100), this gives you additional benefits that standard memcached isn’t able to provide.

InnoDB: Transparent Page Compression

We showcase transparent page level compression within the IO layer. The benefit of this approach over the historical row level compression is that it is faster, along with being much simpler. The idea comes from FusionIO, but the implementation is not limited to their NVMFS implemention (DirectFS). Customers want to reduce their costs by simultaneously compressing data in order to make better use of the available space (fewer devices are needed), while also reducing device write cycles in order to increase the lifespan of the devices. See also Sunny Bains’ blog post “InnoDB Transparent PageIO Compression“.

InnoDB: Support Page 32k and 64k Page Sizes

We showcase support for 32k and 64k page sizes (the default remains 16k). Bigger page sizes will help to improve compression ratios (the bigger the page size, the more redundant bits), thus offering much improved data compression in conjunction with the new transparent page compression. Furthermore, bigger page sizes allow more “on page” or “inline” storage of BLOBs and large VARCHAR/TEXT fields, thus improving I/O performance when such fields are used.

InnoDB: Native Partitioning

We showcase InnoDB native partitioning. Native partitioning paves the way for better overall partitioning. That would include things such as parallel query processing, foreign key support, and full-text searches on partitioned tables.

InnoDB: General Tablespaces

We showcase CREATE TABLESPACE for general use. This will give the user freedom to choose the mapping between tables and tablespaces, i.e. which tablespaces to create and what tables they should contain. This allows for doing things such as grouping all tables for a given user or customer within a single tablespace, and thus having all of their data within a single file on the filesystem.

InnoDB: High Priority Transactions

We showcase high priority transactions within InnoDB, i.e. transactions that shall never be chosen to be aborted in a deadlock scenario. The motivation is to better support MySQL Group Replication, where a transaction cannot abort in one replica and be committed in another.

Replication: Multi-Source Replication

We showcase a refreshed preview of our Multi-Source Replication implementation. It  allows a slave to replicate from multiple sources/masters, but without any additional conflict detection and resolution. One use case is to commission a “fan in slave” for data aggregation or backup. This version takes into consideration the great feedback we got from the community when we first released this on labs some time ago.

Replication: Improved Multi-Threaded Slave

We showcase an improved Multi Threaded Slave (MTS) implementation, using a more precise algorithm to determine which transactions are non-conflicting. This allows more transactions to execute in parallel, thus improving the overall slave performance.

MySQL Group Replication

All servers in the group act as peers and an application can safely send writes to any one of them. Coping with server failure becomes simple as the application can simply switch to another server without the need for database failover. This marks the beginning of official MySQL support for true multi-master replication clusters, offering easy to setup and administer active/active HA solutions.

HTTP Plugin

The HTTP Plugin allows MySQL to listen for HTTP requests and to communicate in JSON. This early development version features three APIs: SQL, CRUD, and DOCUMENT. The SQL endpoint exists for rich query capabilities, whereas the CRUD and DOCUMENT endpoints follow key-document store semantics. Nested JSON objects can be managed through the DOCUMENT API.

The MySQL 5.7.5 Milestone Release is Available

The MySQL Development team is happy to announce our 5.7.5 development milestone release (DMR), now available for download at dev.mysql.com.  You can find the full list of changes and bug fixes in the 5.7.5 release notes.  Here are the highlights. Enjoy!

Scalability

Improve scalability by not using thr_lock locks for InnoDB tables (WL#6671) : This work by Dmitry Lenev improves InnoDB scalability by not using thr_lock locks for InnoDB tables. For InnoDB tables we now rely on MDL + InnoDB row locks. This patch shows good performance/scalability improvements in the single table Sysbench OLTP_RO/ POINT_SELECT tests for InnoDB on multi-core machines.

Truncation of Undo Log

InnoDB: Truncate UNDO logs/tablespace (WL#6965) : This work by Krunal Bauskar implements automatic truncation of undo logs when separate UNDO tablespaces have been configured. InnoDB operates with several undo tablespaces that are periodically truncated, one at a time. While one UNDO tablespace is being truncated the other UNDO tablespaces will be available to service transaction management to ensure minimal impact on transaction processing. The purpose of this work is to avoid ever growing UNDO log file sizes that could occur in some usage scenarios. See also Bug#1341 reported by Scott Ellsworth.

Partitioning

Allow WITHOUT VALIDATION on EXCHANGE PARTITION (WL#5630) : This work by Mattias Jonsson adds support for the [{WITH|WITHOUT} VALIDATION] clause to the EXCHANGE PARTITION WITH TABLE command in order to skip the default row-by-row validation step. When we implemented EXCHANGE PARTITION WITH TABLE (WL#4445) the option to skip the row-by-row validation step was provided via the IGNORE clause. However, this lead to bugs like Bug#55944, and the IGNORE clause was not a good choice of wording since it conflicts with how the reserved IGNORE word was already used in MySQL (e.g. Bug#57708). Therefore, in order to properly allow the DBA to ensure that all rows would fit into the partition, this worklog is adds the new [{WITH|WITHOUT} VALIDATION] clause in order to control whether or not to skip the row-by-row validation step.

Online Buffer Pool Resizing

InnoDB: resize the InnoDB Buffer Pool online (WL#6117) : This work by Yasufumi Kinoshita adds the capability to change the value of innodb_buffer_pool_size dynamically. This provides the ability to tune the buffer pool size—without incurring any downtime—as your database usage patterns evolve over time. Note: this work does not include the possibility of altering the number of buffer pool instances.

Bulk Data Load Improvements

InnoDB: bulk load for create index (WL#7277) : This work by Shaohua Wang implements sorted index builds, thus making CREATE INDEX operations much faster. Prior to this work InnoDB looped through the base table and created one record in the index table for each record in the base table. After this work InnoDB reads many records from the base table, sorts the records using the index key, and then inserts the chunked set of rows into the index table.

Proper Connection ID Handling

Ensure that 32-bit connection-IDs roll over safely (WL#7293) : This work by Tatjana Nurnberg and Jon Olav Hauglid avoids the reuse of any connection IDs that are still in use. Even for 64-bit builds, the wire-protocol still only uses 32 bits for the connection ID for backward compatibility reasons. Some customers have experienced conflicts with old and new connection IDs when the ID value rollover occurs and new connections are created. See also Bug#44167 reported by Jan Kneschke.

Syslog Support

Add native support for syslog on Unixoid platforms (WL#7793) : This work by Tatjana Nurnberg adds the option to redirect the general log output to the system’s syslog facility. This new option can be set at start-up as well as viewed and set at runtime using the new log_syslog system variable (ON/OFF, defaults to OFF). See also Bug#55370 reported by Kyle Joiner and later enhanced by Simon Mudd and Marc Alff.

Multiple User Level Locks

Allow multiple locks in GET_LOCK() (WL#1159) : User-level locks are often used to organize mutual exclusion when accessing some resource in cases when table or row-level locks are not appropriate. This work by Dmitry Lenev – based on a contribution by Konstantin Osipov (Bug#67806) – allows for multiple user level locks per connection. The user is now allowed to request multiple user locks by issuing a series of GET_LOCK statements. GET_LOCK no longer releases all user-level locks previously acquired. The work replaces the original custom user-level lock implementation with one that is based on the MDL lock manager. If there are any resulting deadlocks between different connections acquiring the same user-level locks, then metadata locks (MDLs) are used and any threads waiting for table flushes are detected and errors reported.

Dtrace Support

Improve Dtrace support on Oracle Linux (WL#7894) : This work by Praveenkumar Hulakund adds full Dtrace support to MySQL in the Server 5.6+ packages that we ship for Oracle Linux 6+.

IGNORE Clause

Define and reimplement IGNORE (WL#6614) : This work by Abhishek Ranjan properly defines the meaning and handling of the IGNORE clause. It reimplements IGNORE so that it is consistent across all supported SQL statements while also making it much easier to maintain. The IGNORE clause is a MySQL extension to the SQL standard, the purpose of which is to INSERT as many records as possible, while skipping records that violate various constraints (i.e. taking no action on those records). The same principle applies to DELETE IGNORE, REPLACE IGNORE, and UPDATE IGNORE. However, until now this principle has not been enforced consistently. See also Bug#30191, Bug#49539, Bug#55421, Bug#54543, Bug#54106, Bug#49534, Bug#47788, Bug#46539, and Bug#46425.

STRICT Mode

Define and reimplement STRICT mode (WL#6891) : This work by Raghav Kapoor implements a consistent behaviour of STRICT MODE across all supported SQL statements. This work is similar to what we did for IGNORE mode, but here it is applied to STRICT MODE. The IGNORE clause downgrades errors to warnings to allow statements to skip row(s) which would have otherwise have caused the entire statement to abort. STRICT MODE does just the opposite—it upgrades warnings to errors. Similar to IGNORE, STRICT MODE has not previously been clearly and consistently defined, thus the implementation has been the source of many bugs like Bug#42910, Bug#5929, Bug#43880, Bug#48637, Bug#5912, and Bug#5913. These issues have now been fixed and address by this new work. Note: we have also deprecated ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE and NO_ZERO_IN_DATE SQL MODES as part of this work and made all of that functionality part of STRICT MODE. See WL#7467 for those details.

Make STRICT mode for transactional SEs the default in 5.7 (WL#7764) : This work by Raghav Kapoor makes STRICT MODE the default for all transactional storage engines. This change does not affect non-transactional storage engines such as MyISAM.  In MySQL 5.6 we introduced the my.cnf/my.ini files provided during install which included the the STRICT_TRANS_TABLES setting which enabled STRICT MODE for transactional storage engines. In MySQL Server 5.7 it now becomes the compiled-in default.

The MySQL Upgrade Tool

Refactor mysql_upgrade (WL#7308) : This work by Marcin Babij is a rewrite of the mysql_upgrade tool which fixes many reported bugs while also making mysql_upgrade more robust and easier to maintain. The mysql_upgrade tool has been rewritten using the C API for queries and mysqlcheck functions are now linked directly into the mysql_upgrade executable. This fixes problems it previously had with handling options, logins, special server startup modes, etc. For example, this work fixes Bug#65288 reported by Nicholas Bamber and Bug#71579 reported by Florian Weimer.

The MySQL Install DB Tool

Deprecate mysql_install_db and create a new program in C++ (WL#7688) : This work by Kristofer Pettersson rewrites the mysql_install_db tool from perl/bash to C/C++. At the same time we redesigned it in order to provide a better user experience, cover more functionality, and improve security (all while still offering all legacy features).

Offline Mode

Method to bring servers off line (WL#3836) : This work by Atanu Ghosh introduces the new offline_mode global server variable which is settable by a user with SUPER privilege. The intended usage is for upgrade purposes. Setting the server to offline mode will gracefully disconnect all connected clients except those with the SUPER privilege. “Super” users are allowed to connect and manage the system while in offline mode; this also allows replication slave threads to continue applying changes to the server.

Temporary Tables

Optimizing temporary tables (WL#7682) : This work by Krunal Bauskar introduces a sub-class of temporary tables called “intrinsic temporary tables”. An intrinsic temporary table is like a normal temporary table but with relaxed ACID and MVCC semantics. The purpose is to support internal use cases where internal modules (such as the optimizer) demand light weight and ultra-fast tables for quick intermediate operations.

Support InnoDB as an additional storage engine for tmp tables (WL#6711) : This work by Benny Wang makes it possible to switch between using MyISAM or InnoDB for storage of internal temporary tables created by the optimizer as part of the query execution. The switch is in the form of a new global system variable called INTERNAL_TMP_DISK_STORAGE_ENGINE.

Fulltext Search Optimizations

Additional query optimizations for Full Text Search (WL#7123) : This work by Sergey Gluhov implements hints passed on to InnoDB about a query so that InnoDB may skip part of the full text search processing, thus improving  performance. For example, it no longer computes the ranking values if they are not needed such as: SELECT COUNT(*) FROM test.wp WHERE MATCH(text) AGAINST (‘+for +the +this+that’ in boolean mode) ;  Moreover, it adds hint to InnoDB that we can use later to further improve some other query performance. The hint information it adds includes the following: 1) No need to sort on ranking. When the ranking is not important for the result set, sorting on ranking is not needed. 2) No ranking needed. In the cases where the ranking values are not needed InnoDB can skip calculating and holding ranking values.

Optimizer – ONLY_FULL_GROUP_BY mode

Better ONLY_FULL_GROUP_BY mode (WL#2489) : This work by Guilhem Bichot improves the behavior of ONLY_FULL_GROUP_BY mode and makes it a default in 5.7. This work makes this SQL mode far less strict about selected/order expressions because the server now properly recognizes functional dependencies. The user can also now specify the ANY_VALUE clause which then relaxes the rules for the HAVING, DISTINCT, and ORDER BY clauses. This work addresses many user complaints as illustrated by Bug#51058 and by Roland Bouman’s blog Debunking GROUP BY myths.

Security

System account auth plugin (WL#7726) : This work by Todd Farmer implements a mysql_no_login authentication plugin. Accounts defined as using the mysql_no_login authentication plugin will reject all client connections. Albeit rejecting client connections, such accounts may be assigned privileges like any user and they may be used as the DEFINER or INVOKER for stored programs, events and views, and they may be used as a base user for proxy users.

Remove the deprecated old_password plugin (WL#8006) : This work by Georgi Kodinov removes the mysql_old_password authentication plugin, thus removing support for passwords using the older unsafe pre-4.1 password hashing format. Accounts that use this plugin are now disabled at startup and the server writes an “unknown plugin” message to the error log.

Performance Schema

We are continuing to deliver on our monitoring roadmap. In 5.7.5 we reduce overhead, facilitate progress reporting for long running statements, expose user variables associated with a thread, instrument the newly introduced “SX lock” and instrument memory allocation in InnoDB.

Batch Table IO stats (WL#7802) : This work by Haixiang Li and Marc Alff implements changes in the table IO instrumentation to record statistics in batches of N operations. This work significantly reduces the overhead for table IO instrumentation.

USER VARIABLES (WL#6884) : This work by Marc Alff introduces a new table called performance_schema.user_variables_by_thread. This table exposes user variables and values associated with each thread. Only SELECT statements are supported as this is a read-only table.

Statement progress (WL#7415) : This work by Marc Alff improves the stage instrumentation by keeping track of a progress indicator. This allows users to monitor the progress of long running statements such as ALTER TABLE.

Instrument SX-lock for rw_lock (WL#7445) :  This work by Marc Alff enhances the performance schema instrumentation for read/write locks to support the new SX-lock operation.

Integrate memory instrumentation with InnoDB (WL#7777) : This work by Vasil Dimov implements memory instrumentation within InnoDB using the interface provided by Performance Schema. To support this a new mechanism for allocating memory in InnoDB has been introduced. This work complements the memory instrumentation previously done at the Server layer in 5.7.3.

InnoDB GIS Support

We have implemented spatial indexing in InnoDB. InnoDB spatial index can be used with all existing syntax that has been developed for MyISAM spatial indexes. In addition, InnoDB spatial index supports full transaction properties, as well as isolation levels. It employs predicate lock to prevent the phantom scenario. See also Jimmy Yang’s article “InnoDB Spatial Indexes in 5.7.4 LAB release“.

R-tree index support (WL#6968) : This work by Jimmy Yang implements spatial indexes within InnoDB. The R-tree search and traversal is different from that of a B-tree in the sense that a search criteria could be met in multiple leaf pages of different search paths. A search/query bounding box can also intersect with or contain multiple leaf and non-leaf bounding boxes. Because of this we also refactored the row_search_for_mysql() and btr_cur_search_to_nth_level() functions to do this work properly.

Support DML operations for InnoDB R-tree indexes (WL#6745) : This work by Zheng Lai adds R-tree index support in InnoDB for the Geometry datatype (R-tree split and shrink operations). This work adds InnoDB support for all MBR (Minimum Bounding Rectangle) manipulation functions and 2 dimensional GIS datatypes that MyISAM previously had.

Support predicate locking for spatial indexes (WL#6609) : This work by Jimmy Yang implements predicate locks to enforce consistent reads for the spatial (R-tree) indexes. For InnoDB spatial indexes we cannot use next key locking to support the REPEATABLE READ or SERIALIZABLE isolation levels as there is no such absolute ordering concept in multi-dimensional data. So it is difficult to define the “next” key. Instead, this work uses predicate locks to enforce consistent reads for spatial indexes. In this case we might simply lock the Minimum Bounding Rectangle (MBR) which was used for the query, and other transactions cannot insert or modify a row that would have matched the search condition. The predicate in this case is limited to the MBR used in the spatial query (e.g. ST_Within or ST_Contains).

Store Geometry POINT datatype as fixed length column rather than BLOB (WL#6942) : This work by Bin Su implements an optimization for POINTs; a commonly used GeoSpatial datatype. The optimization is to avoid storing POINTs as a BLOB and instead use a fixed length datatype. This increases overall performance, while also saving on disk space and CPU time.

GeoSpatial / GIS

Now that InnoDB is the default storage engine for MySQL, our user base is rapidly transitioning to InnoDB. One capability that they have been demanding is a performant and scalable GIS implementation. Along with adding R-tree index support to InnoDB, we decided to replace the original GIS algorithms with a more powerful, reliable, effective and efficient geometric engine.  See also Manyi Lu’s article “Why Boost.Geometry in MySQL?” and David Zhao’s article “Making Use of Boost Geometry in MySQL GIS“.

Spatial relation check functions (WL#7220) : This work by David Zhao implements MySQL GIS relation check functions using Boost. Geometry. Many type combinations which were not previously supported before are now supported, e.g. functions like within, intersects, equals, disjoint, cross, and overlaps.

Geometry set operations (WL#7221) : This work by David Zhao implements MySQL GIS geometry set operations using Boost.Geometry. Many type combinations which were not previously supported are now supported, e.g. set operations like intersection, union, and difference.

Spatial analysis functions (WL#7236) : This work by David Zhao refactors the spatial analysis functions including area(), centroid(), convexhull(), distance() and envelope() using Boost.Geometry functions. The semantics of these functions is now implemented according to the OGC standard specifications.

WKB geometry container (WL#7280) : This work by David Zhao implements WKB (well known binary) containers which conform to the Boost.Range concept, so as to be used
as adapters between the existing WKB Geometry data and the Boost.Geometry algorithms. The aim of this work is to avoid unnecessary conversions between WKB encoded byte strings and Boost.Geometry objects as such conversions can be expensive.

GeoHash support (WL#7928) : This work by Erik Frøseth adds functions to encode and decode GeoHashes. GeoHash is a system for encoding latitude and longitude coordinates of arbitrary precision into a text string. The added functions will make it possible for applications to import and export data from MySQL using the GeoHash format. It will also make it possible to index and search for geographical data in one-dimensional (B-tree) indexes.

GeoJSON support (WL#7444) : This work by Erik Frøseth adds functions for parsing and generating GeoJSON [1,2] documents for GIS data types: ST_AsGeoJSON and ST_GeomFromGeoJSON. GeoJSON is an open standard for encoding geometric/ geographical features. GeoJSON supports the same geometric/geographic datatypes that are already supported by MySQL. GeoJSON also includes the possibility to declare which coordinate reference system  is used for the feature (WKT and WKB lack this).

Parser Refactoring

We are in the process of refactoring the SQL parser in an incremental way. The old parser had critical limitations because of its grammar complexity and top-down parsing style which caused poor maintainability and extensibility. See article by Gleb Shchepa “SQL parser refactoring in 5.7.4 LAB release“.

True bottom-up server parser—a common framework for the refactoring (WL#7199) : This work by Gleb Shchepa provides a base framework of classes and a methodology to transform the current SQL parser to a pure bottom-up parser in subsequent small, separate, and safe steps.

True bottom-up server parser—refactoring of the SELECT statement (WL#7200) : This work by Gleb Shchepa refactors the SELECT statement related parser grammar rules in a pure bottom-up style to make the grammar context-independent for improved maintainability and extendability.

True bottom-up server parser—refactoring of the SET statement (WL#7203) : This work by Gleb Shchepa refactors SET statement related parser grammar rules in a pure bottom-up style to make the grammar context-independent for improved maintainability and extendability.

Optimizer Refactoring – New Layered Architecture for the Optimizer

Currently, the phases of parsing, optimizing and execution are intermixed. Almost every module is spread over different optimizer parts. As a consequence, the cost of maintaining the codebase is high and extensibility is poor.  We started out an optimizer refactoring project with the goal of clear separation of these phases. In 5.7.5 we deliver yet another two increments:

Split JOIN_TAB structure in two parts: optimizer’s and executor’s (WL#6042) : This work by Guilhem Bichot splits the internal JOIN_TAB data structure into one structure used during optimization (“the optimization plan”) and one used for execution (“the execution plan”). It also gets rid of redundant allocations of SQL_SELECTs and JOIN_TABs. This work improves maintainability and reduces memory usage.

Move properties from TABLE to TABLE_LIST: table number and map (WL#7540) : This work by Roy Lyseng represents a step in refactoring and improving prepared statements. The lifetime of a TABLE object is one execution, whereas a TABLE_LIST object lasts as long as the prepared statement it is used for. Hence, it may be reasonable to collect table specific properties that are constant throughout all executions of a query in the TABLE_LIST object. Examples of such properties are table number and table map. This will also focus the TABLE_LIST object as the optimizer’s view of a table throughout query preparation and optimization.

Work towards a new Cost Model

We want to improve the cost based optimizer and replace existing heuristics with cost based decisions. We want to produce better cost estimates which also take into account new hardware architectures (larger buffers, caches, SSDs, etc.). Better cost estimates will lead to better decisions by the optimizer and thus to better performance.  See also Jørgen Løland’s article “The MySQL Optimizer Cost Model Project“.

Use Cost_estimate values within the range Optimizer (WL#5869) : This work by Benny Wang changes the internal cost representations used within the range optimizer from double values to Cost_estimate objects. The old double value was the sum of disk lookup cost, CPU comparison costs, etc. With the new Cost_estimate object the CPU, memory, and disk costs are kept separate. This work does not introduce any functional changes.

Make use of condition filtering in the Optimizer (WL#6635) : This work by Jørgen Løland improves join ordering. It provides a much better prefix rows estimate by taking into account not only conditions that are used by the chosen access method but all other relevant conditions as well. See also “A New Dimension to MySQL Query Optimization” part1 and part2 by Jørgen Løland.

Remove the InnoDB rec_per_keys hack (WL#6068) : This work by Vasil Dimov removes the historical InnoDB hard coded “correction” of index selectivity. At some point in the past, InnoDB judged MySQL to favor table scans too much over index searches, and decided to report index selectivity to be 2 times better than the actual estimate.

Use improved records per key estimate interface in the Optimizer (WL#7339) : This work by Olav Sandstå changes the optimizer to use the new interface for providing “records per key” values using float values (instead of integers). Using this new interface instead of using the previously used rec_per_key values will give more correct record estimates for ref access against an indexed column.

Configuration tables for the Optimizer Cost Model (WL#7276) : This work by Olav Sandstå introduces tables for storing “cost constants”. Until now the Optimizer cost model has been using “cost constants” that were hard-coded in the source code. These are now replaced by “cost constants” that can be changed without making changes to the source code and can be adjusted by the server administrator. This work defines the tables that will be used for storing the “cost constants” to be used by the new Cost Model. These tables will then be read and used by the Optimizer.

Optimizer Cost Model—main memory management of cost constants (WL#7315) : This work by Olav Sandstå implements reading cost constants from the new cost constant tables and maintaining these in a main memory cache, providing the current version of them to the Optimizer’s Cost Model. This “cost constant cache” is shared by all running queries and it supports concurrent access and sharing of the cached cost constants.

Optimizer cost model—command for online updating of cost model constants (WL#7316) : This work by Olav Sandstå implements the statement and methods used for reloading the cost constants in a running server, re-loading the data into the cost constant cache.

Replication

So far in 5.7 the most prominent replication features are  semisync-replication and improved muti-threaded slaves. In 5.7.5 we deliver a few improvements to multi-threaded slaves and we deliver some preparatory work for multi-source replication (in Labs).

Support SLAVE_TRANSACTION_RETRIES in multi-threaded slave mode (WL#6964) : This work by Libing Song enables Multi-Threaded Slave (MTS) replication slaves to retry a transaction after a temporary failure. Before this work only non-MTS replication slave servers attempted to retry a transaction after a temporary failure. See Bug#68465 reported by Yoshinori Matsunobu.

Slave performance—waiting for more transactions to enter the Binlog Group Commit (BGC) queue (WL#7742) : This work by Luis Soares adds two new options to introduce an artificial delay in order to increase the binary log group commit size. This gives a chance for more transactions to be flushed and synced together to disk, thus reducing the overall time spent in committing the transactions (the bigger the groups the fewer sync operations). With the correct tuning, this can make the slave perform several times faster without compromising the master’s throughput. These options are named: binlog-group-commit-sync-delay and binlog-group-commit-sync-no-delay-count.

Move status variables to replication Performance Schema tables (WL#7817) : This work by Shivji Jha moves replication system variables to Performance Schema tables so they can be monitored per source and not as global variables. The replication variables affected are: Slave_running, Slave_retried_transactions, Slave_last_heartbeat, Slave_received_heartbeats, and Slave_heartbeat_period. This work is a requirement for multi-source replication.

WAIT_FOR_EXECUTED_GTID_SET (WL#7796) : This work by Manish Kumar adds a new sql function WAIT_FOR_EXECUTED_GTID_SET which makes the master-slave syncing option independent of the slave threads. If the slave thread is not running the WAIT_FOR_EXECUTED_GTID_SET(GTID_SET [, TIMEOUT]) function keeps waiting until success (0) or timeout (1).

Optimize GTIDs for passive slaves—store GTIDs in a table (WL#6559) : This work by Daogang Qu adds the ability to store GTIDs in a table instead of in the binary log. A common use case is when having a slave that is only used for read scale-out and is never going to become a master and thus may not have any use for the transactions in the binary log, but may have a use for the GTIDs. E.g. to allow fail over to another new master.

Multi-threaded slaves—ordered commits (sequential consistency) (WL#6813) : This work by Libing Song ensures that the commits by slave applier/worker threads running in parallel will be done using the same execution order as on the master. This also means that the slave will never externalize a database state which was never externalized by the master. This is a requirement when the applications reading from the slave must observe the same set of states that existed on the master due to some application enforced constraint. This has become a necessity after WL#6314 implemented in 5.7.2 which enables multiple transactions to be executed in parallel by the slave threads, few of which may be modifying a single database.

Refactoring

We are continuing to refactor the source code to pave the way for new features, improve reliability, improve security, improve maintainability, utilize newer compiler features, and in some cases fix historical flaws. Here are the things we did in 5.7.5:

Refactoring for WL#6835 (WL#8023) This work by Sunny Bains adds the infrastructure to track what a transaction is doing at any given time. This is pure refactoring only, no functional change.

Repurpose FIL_PAGE_FLUSH_LSN (WL#7990) : This work by Marko Mäkelä  will stop writing FIL_PAGE_FLUSH_LSN to any files except for the first file of the InnoDB system tablespace (page number 0:0).

Simplify tablespace discovery during crash recovery  (WL#7142) : This work by Marko Mäkelä eliminates the use of the file system as a “data dictionary” during redo log processing (before applying redo log). See also “InnoDB Crash Recovery Improvements in MySQL 5.7″ by Marko Makela.

Memroot allocator for C++ STL (WL#6074) : This work by Jon Olav Hauglid implements two memory allocators: one that takes memory from a MySQL MEM_ROOT, and one that uses heap (via my_malloc/my_free). The latter is important for Boost.Geometry as it allows for Performance Schema tracking of its memory usage.

Remove custom rwlock windows implementation (WL#7914) : This work by Jon Olav Hauglid removes the custom rwlock implementation on Windows as this function is now provided natively on all of our supported Windows platforms (Windows 7+, Server 2008 R2+).

Remove MY_ATOMIC_MODE_RWLOCKS (WL#7655) : This work by Jon Olav Hauglid removes our homegrown atomics implementation (using mutexes) as it is no longer needed. We are now using either GCC builtins, Windows interlocked operations, or Solaris atomics.

New data dictionary—don’t hold the LOCK_open mutex while reading table definitions (WL#7593) : This work by Sivert Sørumgård makes it possible to read more than one table definition from the new DD concurrently (or also to read it from .FRMs concurrently).

Refactor strings—use of (const char *) in LEX_USER (WL#7582) : This work by Thayumanavar Sachithanantham changes (char *) and LEX_STRING to (const char *) and LEX_CSTRING respectively in the LEX_USER structure and related functions. This work is part of an effort to move to a more reliable string implementation.

Refactor strings—use of (const char *) in THD (WL#7583) : This work by Thayumanavar Sachithanantham changes (char *) and LEX_STRING to (const char *) and LEX_CSTRING respectively in the THD class and related functions. This work is part of an effort to move to a more reliable string implementation.

Move time zone and help tables from MyISAM to transactional storage (WL#7159) : This work by Dmitry Shulga moves time zone and help tables from MyISAM to InnoDB. The purpose of this work is to increase reliability by storing system tables in a transactional engine.

Server support for attachable transactions (WL#8003) : This work by Alexander Nozdrin updates the server code so that it is able to use InnoDB attachable transactions. This work is also a first step to full server support of nested transaction contexts. This work is motivated by the work on the new Data Dictionary.

Attachable transaction support in InnoDB (WL#7828) : This work by Alexander Nozdrin implements attachable transactions in InnoDB. An attachable transaction is always a AC-RO-RC-NL transaction (AutoCommit-ReadOnly-ReadCommitted-NonLocking). This work is motivated by work on the new Data Dictionary. The new Data Dictionary needs read-write access to system tables (dictionary tables) stored in InnoDB. Reading data should be always performed in the READ COMMITTED isolation level no matter which isolation level is active within the user transaction.

Improved Test Coverage

We are continuing to improve the public testsuites, to ensure that users who make changes and compile from source can do so with the knowledge that test coverage is good. In 5.7.5 added 229 new MTR tests and we also harmonized testing related aspects between MySQL Server and MySQL Cluster by making include/sync_slave_sql_with_master.inc work as a replacement to sync_slave_with_master for NDB in MTR tests (WL#7774).

Deprecation and Removal

We are continuing to deprecate and remove old options and behavior that is no longer relevant or have been replaced with better alternatives. In 5.7.5 we did the following:

Remove limited YEAR(2) support (WL#6263) : This work by Haixiang Li removes all remnants of YEAR(2) related code in 5.7. Overall we removed the support for YEAR(2) in 5.6, but we kept some limited support for old YEAR(2) columns to allow for upgrade procedures to MySQL 5.6. This limited support has now been removed.

Deprecate skip-innodb in 5.6 and remove it in 5.7 (WL#7976) : This work by Alexander Nozdrin removes the server options which disable InnoDB. The system tables are being moved from MyISAM to InnoDB, so in 5.7+ it will be not possible to run the MySQL server without InnoDB (which is the default SE since 5.5).

Remove the storage_engine system variable in 5.7 (WL#7148) : This work by Jon Olav Hauglid removes the storage_engine system variable (deprecated in 5.5 and removed in 5.7). You should instead use the default_storage_engine server option.

Deprecate and remove the timed_mutexes system variable (WL#7436) : This work by Jon Olav Hauglid removes the timed_mutexes system variable (deprecated in 5.5/5.6 and removed in 5.7). The timed_mutexes system variable has been disconnected from any underlying InnoDB code, so it no longer serves any purpose.

Deprecate and remove mysqlbug (WL#7689) : This work by Erlend Dahl removes the mysqlbug script because it is obsolete (deprecated in 5.5/5.6 and removed in 5.7).

Deprecate and remove mysql_zap and mysql_waitpid (WL#7826) : This work by Erlend Dahl removes the mysql_zap and mysql_waitpid scripts because they are obsolete (deprecated in 5.6 and removed in 5.7).

Deprecate and remove mysqlhotcopy (WL#7854) : This work by Erlend Dahl removes the mysqlhotcopy perl script because it is obsolete (deprecated in 5.6 and removed in 5.7).

Deprecate the EOF packet (WL#7766) : This work by Bharathy Satish replaces the EOF packet with the new OK packet in the MySQL client-server protocol. The EOF and OK packets serve the same purpose — to mark the end of a query execution result. The protocol is kept backward compatible as clients and servers have one additional capability flag to denote their support for the new OK packet. When this capability flag is in place, any SQL statement which produces a result set will send an OK packet as a marker to indicate the end of data rather than an EOF packet.

Evolving the MySQL Server Architecture @ Oracle Open World

MySQL Central @ Open World is coming up next week. I look forward to present “Evolving the MySQL Server Architecture”, Tue, Sep 30th 5pm-5.45pm at Moscone room 250.

Join me to discuss the big picture and directions for MySQL Server Architecture evolution.

In this talk I will discuss the driving forces and challenges motivating changes, and outline the vision for the MySQL Server Architecture. Topics to be discussed include the main architectural themes:

  • Server Refactoring / Modularity
  • Extensibility
  • Performance and Scalability
  • Security
  • Integration and Manageability
  • Everything online

I will demonstrate with examples how these principles are currently being applied, and how they may affect the direction of future development. Looking forward to meet you at Open World!

Nuances of MySQL Fabric User GRANTs

MySQL Fabric is a new product and some folks are running into issues when trying to deploy test instances that grow beyond a single server.

The Fabric documentation has an example GRANT statement illustrating the simple localhost use case:

The example shown above belies the more complex way that Fabric uses the fabric login.

The Fabric config file has 3 sections that deal with user and password values, I will only focus on the [servers] section of the configuration file.  The user and password defined in this section is used by the mysqlfabric utility to connect to managed servers and set up replication between managed servers in HA groups.

Fabric has two different types of HA group, the global group and shard groups.  The global group is used to store tables which are part of the Fabric deployment but are not sharded.  These are typically small tables or tables that are needed for joins on each shard.  The global HA group is similar to the shard HA groups, there is a master and one or more slaves.  The topology begins to get a little more complex because the global group is replicated to each shard group.  Each shard group contains a complete copy of the data stored in the global group so that joins can be performed locally on each shard.  This creates a GRANT dependency whereby the Fabric user must be able to connect as a replication slave to every other MySQL instance in the Fabric deployment.

Here is a simple topology relationship between the global group and shard groups:

Simple Fabric topology diagram

The diagram has a total of 9 servers, each of which can become a master or a slave through the promote or demote mysqlfabric commands.

To accommodate all possible master/slave combinations in the fabric deployment you would need the following example GRANTs:

The dynamic master/slave relationship between each MySQL instance means that you must have these GRANTs on each server.  A simpler approach is to use the MySQL wildcard capabilities to define a single user GRANT.

Here is an example GRANT statement which would allow all servers on the same subnet to authenticate using the same account:

Using wildcards can be dangerous without following a few best practices:

  • Never use ‘%’ as the host, this can permit any client IP address to connect with that account.  MySQL does a good job of extending the typical user/password security model by requiring the host to match too.  You should not allow any client IP to connect to a single account; client IP addresses should be whitelisted whenever possible.
  • All of your Fabric hosts should be on a logically isolated VLAN or network.  Best performance is achieved when the network path between servers is short and fast.  Security and performance both have the same requirement:  a dedicated network segment.  You can use a VLAN to isolate your MySQL servers from outside traffic and make it easier to manage physical servers.  If the IP network the MySQL servers are using is part of a larger routable IP space, rogue users could potentially break-in to your network and use their access to compromise data on your MySQL servers.  Many companies are subject to compliance requirements which favor such isolation.
  • Traffic to and from your MySQL servers should be managed by a router, firewall, unified threat management device, or load balancer.  This isn’t a strict rule, but it is a good practice to implement.  Your MySQL servers contain business critical data and should be protected by more than one layer of security.  Many UTM devices can detect attacks that happen via your user applications.  You might have a security vulnerability in your application which is undiagnosed and the UTM can alert you to this and block attacks before they compromise your data.
  • The skip-name-resolve option should be enabled on production servers to reduce the DNS lookup overhead and prevent reverse DNS spoofing.

The TL;DR is that you need to ensure each MySQL instance in your Fabric deployment has the necessary permissions to act as a replication slave to every other MySQL instance and allow the Fabric node to access your instances, this requires individual GRANTs or a wildcard GRANT for the user defined in the [servers] section your fabric.cfg file.

 

Building MySQL 5.7

The 5.7.5 DMR is now available, and we’ve made some changes to our build system in this one, so I wanted to spend some time discussing how you would now build MySQL.

When we released our April labs release, I wrote about building MySQL with Boost. Now that the first GIS work using Boost.Geometry has passed all the hurdles and landed in a development milestone release (DMR), it’s time to revisit the topic. From now on (5.7.5 and newer), MySQL needs Boost headers to compile. It’s not optional.

We got a few bug reports on the build process for the labs release. Hartmut Holzgraefe quickly submitted bugs 72172, 72185, 72186 and 72188, which have all now been fixed. Thank you, Hartmut! It’s very encouraging for all of us working on this to get quality feedback on a labs release!

Getting Started

Assuming that you’ve just downloaded the 5.7.5 tarball:

This is almost exactly the same way we did it in the labs release. Actually, that CMake command line way still works, I’ve only changed $HOME to ~ to demonstrate that it now works. We now also support relative paths (bug 72185).

We’ve also improved the error messages if anything goes wrong (bug 72172).

This is the recommended way for developers to build MySQL. If you already have the correct Boost version installed in your include path (we only need the header files), you don’t need any CMake options at all.

After the first build, you can drop the build options on later CMake runs. CMake will remember your settings. However, it doesn’t hurt to always use them.

Advanced Build Options

There are two new build options: WITH_BOOST and DOWNLOAD_BOOST. I’ll describe both here, but if you really want to look at all the details and inner workings, have a look at cmake/boost.cmake in the source tree.

WITH_BOOST

This specifies where CMake will find Boost. It can point to:

  • a tarball/zip file
  • a directory containing a tarball/zip file (it has to have the standard Boost tarball/zip file name)
  • a directory where the tarball/zip file was unpacked

You can specify both absolute and relative paths, and the tilde will be expanded. And of course, any path name expansion your shell does for you will work. There’s no default value. If you don’t specify anything, you’ll need to have Boost in you normal include path.

The WITH_BOOST build option can be set on the CMake command line like in the example above, or you can use the WITH_BOOST environment variable. Our CMake scripts will also recognize the BOOST_ROOT environment variable that is commonly used by projects using Boost as an alias for WITH_BOOST.

The order of precedence is (highest to lowest):

  1. WITH_BOOST option on the CMake command line
  2. WITH_BOOST environment variable
  3. BOOST_ROOT environment variable

If CMake can’t find Boost, it will fail with a friendly error message telling you how to use DOWNLOAD_BOOST.

DOWNLOAD_BOOST

This boolean option tells CMake to download the Boost tarball or zip file from Boost’s standard download location (currently Sourceforge). The default is DOWNLOAD_BOOST=0, which means don’t download. Set it to 1 to enable downloading.

In order to provide a stable code base, MySQL depends on a specific version of Boost. When we change our code to depend on a newer version of Boost, DOWNLOAD_BOOST=1 will make sure the new version will be automatically downloaded for you.

MySQL Central @ OpenWorld: What’s New in MySQL 5.7?

MySQL Central @ OpenWorld is coming up soon. I look forward to present What’s New in MySQL 5.7? on Tuesday, Sep 30, 12:00 PM – 12:45 PM – Moscone South – 252.  Join product manager Mike Frank and myself to discuss the big picture! We will cover:

  • InnoDB Performance & Scalability: Over 600 thousand QPS for SQL queries, over 1.1 million QPS for  Memcached API, over 60 thousand connects/ disconnects per second, parallel flushing, bulk data load and more.
  • Optimizer: New optimizations,  new architecture for the parser, new architecture for the optimizer, new additions leading to a new cost model designed to handle latest OS and hardware innovations.
  • Performance Schema:  Added instrumentation for metadata locking, transactions, memory usage, stored programs, and prepared statements. Exposing “show slave status” information and “user variables”,  reducing overhead, reducing footprint, and more.
  • Temporary Tables: Performance improvements for InnoDB temporary tables, their usage and the way forward.
  • GIS: InnoDB spatial indexes, new GIS algorithms based on Boost.Geometry, and the way forward.
  • Online: Online buffer pool resizing, truncation of undo log, additions to online alter table.
  • Replication: Enhanced multi-threaded slaves, semisync-replication,  and more.
  • Security:  Secure by default, new asymmetric encryption, key support, improved password handling, and more.
  • Cleaning up semantics: IGNORE Clause, STRICT Mode, ONLY_FULL_GROUP_BY mode, triggers, and more.
  • Community contributions: Statement timeout, multiple user level locks, and computed columns
  • New Data Dictionary and more @ Labs
  • And much more: Fabric support, improved tools, fulltext search, partitioning, buffer pool dump and load enhancements, triggers, error reporting, error logging, …

I am looking forward to meet you @ OpenWorld !
-geir