Speeding up sort performance in Postgres 15
Let’s explore each of the 4 improvements in PostgreSQL 15 that make sort performance go faster:
Change 1: Improvements sorting a single column
Change 2: Reduce memory consumption by using generation memory context
Change 3: Add specialized sort routines for common datatypes
Change 4: Replace polyphase merge algorithm with k-way merge
How to contact Google SRE: Dropping a shell in cloud SQL
Google Cloud SQL is a fully managed relational database service. Customers can deploy a SQL, PostgreSQL or MySQL server which is secured, monitored and updated by Google. More demanding users can easily scale, replicate or configure high-availability. By doing so users can focus on working with the database, instead of dealing with all the previously mentioned complex tasks. Cloud SQL databases are accessible by using the applicable command line utilities or from any application hosted around the world. This write-up covers vulnerabilities that we have discovered in the MySQL versions 5.6 and 5.7 of Cloud SQL.
Three bugs in the Go MySQL Driver
Adding to this challenge, authzd is deployed to our Kubernetes clusters, where we’ve been experiencing issues with high latencies when opening new TCP connections, something that particularly affects the pooling of connections in the Go MySQL driver. One of the most dangerous lies that programmers tell themselves is that the network is reliable, because, well, most of the time the network is reliable. But when it gets slow or spotty, that’s when things start breaking, and we get to find out the underlying issues in the libraries we take for granted.
Good walkthrough of dealing with some unfriendly bugs.
10 Things I Hate About PostgreSQL
Over the last few years, the software development community’s love affair with the popular open-source relational database has reached a bit of a fever pitch. This Hacker News thread covering a piece titled “PostgreSQL is the worlds’ best database”, busting at the seams with fawning sycophants lavishing unconditional praise, is a perfect example of this phenomenon. While much of this praise is certainly well-deserved, the lack of meaningful dissent left me a bit bothered. No software is perfect, so exactly what are PostgreSQL’s imperfections?
I’ve been hands-on with PostgreSQL in production since 2003 with deployments ranging from small (gigabytes) to modest to very large (~petabyte). My perspective is largely from building and running systems that are at least intended to be continuously available. Needless to say, I have gained first-hand experience with PostgreSQL’s particular idiosyncrasies through some painful production issues over the years.
Elixir and Postgres: A Rarely Mentioned Problem
Last time, we talked about the magic trick to make your full text searches go fast. This time, I’ll tell you about another performance issue I encountered that probably also affects your performance, at least if you are using Ecto and PostgreSQL.
New In PostgreSQL 12: Generated Columns
PostgreSQL 12 comes with a new feature called generated columns. Other popular RDBMSes already support generated columns as “computed columns” or “virtual columns.” With Postgres 12, you can now use it in PostgreSQL as well. Read on to learn more.
Postgres Execution Plans - Field Glossary
There are lots of guides out there to the basics of execution plans, but a lot are quite scarce on the details - how to interpret particular values, what they really mean, and where the pitfalls are.
We’ve spent a lot of time over the last 18 months learning, clarifying, and downright misinterpreting how each of these fields work — and there’s still further for us to go on that.
But we have come a long way, and I’d like to share the guide that I wish had existed when we started out — a glossary of the most common fields you’ll see on the operations in a query plan, and a detailed description of what each one means.
Thoughts on Arbitrary Pagination
Pagination is the act of breaking a data set into multiple pages to limit the amount of data that has to be processed and sent by a server at once. We’re going to be changing how pagination works on crates.io, and I wanted to share some musings about the issues with supporting this as a generic abstraction. While I’m going to be talking about some PostgreSQL internals in this article, the general ideas presented apply to any SQL database.
ORMs are backwards
I think all ORM users have a journey from ‘there should be a way to’ to ‘this is saving me so much work’ to ‘I have to reach into the vending machine to get my change out’.
Keeping CALM: when distributed consistency is easy
When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.
So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.
Postgres full-text search is Good Enough!
When you have to build a web application, you are often asked to add search. The magnifying glass is something that we now add to wireframes without even knowing what we are going to search.
The search has become an important feature and we’ve seen a big increase in the popularity of tools like elasticsearch and SOLR which are both based on lucene. They are great tools but before going down the road of Weapons of Mass Destruction Search, maybe what you need is something a bit lighter which is simply good enough!
New in Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures
Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11
In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.
Are we there yet? Are we there yet?
Columnstore and B+ tree – are hybrid physical designs important?
It is generally understood that columnstores are crucial to achieving high performance for analytic queries and that B+ tree indexes are key to supporting transactional workloads efficiently. However, it is not well understood whether hybrid physical designs – both columnstore and B+ tree indices on the same database and potentially the same table – are important for any of the above workloads.
Through a series of benchmarks the authors show that hybrid physical designs can result in more than an order of magnitude lower execution costs for many workloads when compared to alternatives using B+ tree-only or columnstore-only. The Database Engine Tuning Advisor (DTA) for SQL Server is extended to analyze and recommend the appropriate indices for a given workload.
PostgreSQL 11 and Just In Time Compilation of Queries
One of the big changes in the next PostgreSQL release is the result of Andres Freund’s work on the query executor engine. Andres has been working on this part of the system for a while now, and in the next release we are going to see a new component in the execution engine: a JIT expression compiler!
In our benchmarking, PostgreSQL 11 JIT is an awesome piece of technology and provides up to 29.31% speed improvements, executing TPC-H Q1 at scale factor 10 in 20.5s instead of 29s when using PostgreSQL 10.
A tale of RPMs
If you don’t want to read the following, feel free to skip it. The TL;DR is that it’s an awful idea to read the RPM DB directly without using the RPM libs and I eventually gave up. I did manage to access it without executing the RPM command, but then again I’ve just used the RPM libs.
You'd Need an Oracle to Understand These Docs
This time, I trace through a few references, discover another footnote which is itself reference to a white paper, which itself contains a footnote. “If cogs_behavior is set,RM_CONS_BY_UT,RM_CONS_IN_FRMLU_FPT,RM_CALC_FRMLUandCOGS_RM_CALC_USAGE may alter data in an unrecoverable fashion.”
How not to structure your database-backed web applications: a study of performance bugs in the wild
This is a fascinating study of the problems people get into when using ORMs to handle persistence concerns in their web applications. The authors study real-world applications and distil a catalogue of common performance anti-patterns. There are a bunch of familiar things in the list, and a few that surprised me with the amount of difference they can make. By fixing many of the issues that they find, Yang et al., are able to quantify how many lines of code it takes to address the issue, and what performance improvement the fix delivers.
Note that fundamentally a lot of the issues stem from the fact that the ‘O’ in ORM could just as easily stand for ‘Opaque.’
The State of VACUUM
In a recent blog post, I talked about why every system that implements MVCC needs some scheme for removing old row versions, and how VACUUM meets that need for PostgreSQL. In this post, I’d like to examine the history of VACUUM improvements in recent years, the state of VACUUM as it exists in PostgreSQL today, and how it might be improved in the future.
SQL Keys in Depth
The internet is full of dogmatic commandments for choosing and using keys in relational databases. At times it verges on a holy war: should you use natural or artificial keys? Auto-incrementing integers, UUIDs?
Trying to Represent a Tree Structure Using Postgres
This week I’ll publish a series of blog posts on the Postgres LTREE extension. I’ll get started today by trying to insert a tree structure into a Postgres table using standard SQL, and during the rest of the week I’ll take a close look at LTREE: how to install it, how to use it, and how it works.