Elixir and Postgres: A Rarely Mentioned Problem
> Last time, we talked about the magic trick to make your full text searches go fast. This time, I’ll tell you about another performance issue I encountered that probably also affects your performance, at least if you are using Ecto and PostgreSQL.
New In PostgreSQL 12: Generated Columns
> PostgreSQL 12 comes with a new feature called generated columns. Other popular RDBMSes already support generated columns as “computed columns” or “virtual columns.” With Postgres 12, you can now use it in PostgreSQL as well. Read on to learn more.
Postgres Execution Plans - Field Glossary
> There are lots of guides out there to the basics of execution plans, but a lot are quite scarce on the details - how to interpret particular values, what they really mean, and where the pitfalls are.
> We’ve spent a lot of time over the last 18 months learning, clarifying, and downright misinterpreting how each of these fields work — and there’s still further for us to go on that.
> But we have come a long way, and I’d like to share the guide that I wish had existed when we started out — a glossary of the most common fields you’ll see on the operations in a query plan, and a detailed description of what each one means.
Thoughts on Arbitrary Pagination
> Pagination is the act of breaking a data set into multiple pages to limit the amount of data that has to be processed and sent by a server at once. We’re going to be changing how pagination works on crates.io, and I wanted to share some musings about the issues with supporting this as a generic abstraction. While I’m going to be talking about some PostgreSQL internals in this article, the general ideas presented apply to any SQL database.
ORMs are backwards
> I think all ORM users have a journey from ‘there should be a way to’ to ‘this is saving me so much work’ to ‘I have to reach into the vending machine to get my change out’.
Keeping CALM: when distributed consistency is easy
> When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.
> So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.
Postgres full-text search is Good Enough!
> When you have to build a web application, you are often asked to add search. The magnifying glass is something that we now add to wireframes without even knowing what we are going to search.
> The search has become an important feature and we’ve seen a big increase in the popularity of tools like elasticsearch and SOLR which are both based on lucene. They are great tools but before going down the road of Weapons of Mass Destruction Search, maybe what you need is something a bit lighter which is simply good enough!
New in Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures
> Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11
> In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.
Are we there yet? Are we there yet?
Columnstore and B+ tree – are hybrid physical designs important?
> It is generally understood that columnstores are crucial to achieving high performance for analytic queries and that B+ tree indexes are key to supporting transactional workloads efficiently. However, it is not well understood whether hybrid physical designs – both columnstore and B+ tree indices on the same database and potentially the same table – are important for any of the above workloads.
> Through a series of benchmarks the authors show that hybrid physical designs can result in more than an order of magnitude lower execution costs for many workloads when compared to alternatives using B+ tree-only or columnstore-only. The Database Engine Tuning Advisor (DTA) for SQL Server is extended to analyze and recommend the appropriate indices for a given workload.
PostgreSQL 11 and Just In Time Compilation of Queries
> One of the big changes in the next PostgreSQL release is the result of Andres Freund’s work on the query executor engine. Andres has been working on this part of the system for a while now, and in the next release we are going to see a new component in the execution engine: a JIT expression compiler!
> In our benchmarking, PostgreSQL 11 JIT is an awesome piece of technology and provides up to 29.31% speed improvements, executing TPC-H Q1 at scale factor 10 in 20.5s instead of 29s when using PostgreSQL 10.
A tale of RPMs
> If you don’t want to read the following, feel free to skip it. The TL;DR is that it’s an awful idea to read the RPM DB directly without using the RPM libs and I eventually gave up. I did manage to access it without executing the RPM command, but then again I’ve just used the RPM libs.
You'd Need an Oracle to Understand These Docs
> This time, I trace through a few references, discover another footnote which is itself reference to a white paper, which itself contains a footnote. “If cogs_behavior is set,RM_CONS_BY_UT,RM_CONS_IN_FRMLU_FPT,RM_CALC_FRMLUandCOGS_RM_CALC_USAGE may alter data in an unrecoverable fashion.”
How not to structure your database-backed web applications: a study of performance bugs in the wild
> This is a fascinating study of the problems people get into when using ORMs to handle persistence concerns in their web applications. The authors study real-world applications and distil a catalogue of common performance anti-patterns. There are a bunch of familiar things in the list, and a few that surprised me with the amount of difference they can make. By fixing many of the issues that they find, Yang et al., are able to quantify how many lines of code it takes to address the issue, and what performance improvement the fix delivers.
> Note that fundamentally a lot of the issues stem from the fact that the ‘O’ in ORM could just as easily stand for ‘Opaque.’
The State of VACUUM
> In a recent blog post, I talked about why every system that implements MVCC needs some scheme for removing old row versions, and how VACUUM meets that need for PostgreSQL. In this post, I’d like to examine the history of VACUUM improvements in recent years, the state of VACUUM as it exists in PostgreSQL today, and how it might be improved in the future.
SQL Keys in Depth
> The internet is full of dogmatic commandments for choosing and using keys in relational databases. At times it verges on a holy war: should you use natural or artificial keys? Auto-incrementing integers, UUIDs?
Trying to Represent a Tree Structure Using Postgres
> This week I’ll publish a series of blog posts on the Postgres LTREE extension. I’ll get started today by trying to insert a tree structure into a Postgres table using standard SQL, and during the rest of the week I’ll take a close look at LTREE: how to install it, how to use it, and how it works.
KV-Direct: High-performance in-memory key-value store with programmable NIC
> What we’re seeing here is a glimpse of how large-scale systems software of the future may well be constructed. As the power ceiling puts a limit on multi-core scaling, people are now turning to domain-specific architectures for better performance.
pspg - Postgres Pager
> Everybody who uses psql uses less pager. It is working well, but there is not any special support for tabular data. I found few projects, but no one was completed for this purpose. I decided to write some small specialized pager for usage as psql pager.
Startup Mistakes: Choice of Datastore
> Relational databases are, despite the surprising misnomer, actually built to handle relations.
> Having to rewrite all your code to use a different data store is a hard task in itself, having to rewrite your code while simultaneously changing your entire data model? Good luck with that.
secure your Apache Solr servers since a zero-day exploit has been reported