B-Trees: More Than I Thought I'd Want to Know
https://benjamincongdon.me/blog/2021/08/17/B-Trees-More-Than-I-Thought-Id-Want-to-Know/ [benjamincongdon.me]
2025-01-04 11:26
tags:
compsci
database
programming
storage
systems
In my college Data Structures and Algorithms course, we covered B-Trees, but I didn’t grok why I’d choose to use one. As presented, B-Trees were essentially “better” Binary Search Trees, with some hand-waving done that they had improved performance when used in database applications. I remember needing to memorize a bunch of equations to determine the carrying capacity of a M-degree B-Tree, and a vague understanding of B-Tree lookup/insertion/deletion, but not much else. Which is a shame! They’re interesting structures.
source: HN
Antithesis of a One-in-a-Million Bug: Taming Demonic Nondeterminism
https://www.cockroachlabs.com/blog/demonic-nondeterminism/ [www.cockroachlabs.com]
2024-03-22 19:46
tags:
database
development
fuzzing
testing
Bugs are compounded by the number of distinct nodes operating in a distributed system, each providing their own sources of nondeterminism with thread timings, network conditions, hardware, and more. Finding and fixing these bugs requires new approaches to testing and debugging.
Like any emerging technology, the Antithesis platform is not without rough edges. Deterministic replay doesn’t immediately get you a reproduction, particularly across distinct code changes as you might see with a unit or integration test. In our experience, a significant amount of effort was invested in instrumenting the logs, as well as reasoning about injected failure states in order to recover the state machine which reproduces the bug. The rinse-and-repeat cycle means that a modified binary (with new instrumentation) may not always hit the same terminal state. Although, in practice determinism between runs is very high assuming the code changes are localized.
source: HN
When Good Correlation is Not Enough
https://hakibenita.com/postgresql-correlation-brin-multi-minmax [hakibenita.com]
2023-07-28 02:39
tags:
database
development
perf
sql
Choosing to use a block range index (BRIN) to query a field with high correlation is a no-brainer for the optimizer. The small size of the index and the field’s correlation makes BRIN an ideal choice. However, a recent event taught us that correlation can be misleading. Under some easily reproducible circumstances, a BRIN index can result in significantly slower execution even when the indexed field has very high correlation.
source: HN
Speeding up sort performance in Postgres 15
https://www.citusdata.com/blog/2022/05/19/speeding-up-sort-performance-in-postgres-15/ [www.citusdata.com]
2022-05-20 23:02
tags:
database
perf
sorting
sql
update
Let’s explore each of the 4 improvements in PostgreSQL 15 that make sort performance go faster:
Change 1: Improvements sorting a single column
Change 2: Reduce memory consumption by using generation memory context
Change 3: Add specialized sort routines for common datatypes
Change 4: Replace polyphase merge algorithm with k-way merge
source: HN
How to contact Google SRE: Dropping a shell in cloud SQL
https://offensi.com/2020/08/18/how-to-contact-google-sre-dropping-a-shell-in-cloud-sql/ [offensi.com]
2020-08-20 16:29
tags:
cloud
database
exploit
programming
security
sql
Google Cloud SQL is a fully managed relational database service. Customers can deploy a SQL, PostgreSQL or MySQL server which is secured, monitored and updated by Google. More demanding users can easily scale, replicate or configure high-availability. By doing so users can focus on working with the database, instead of dealing with all the previously mentioned complex tasks. Cloud SQL databases are accessible by using the applicable command line utilities or from any application hosted around the world. This write-up covers vulnerabilities that we have discovered in the MySQL versions 5.6 and 5.7 of Cloud SQL.
source: L
Three bugs in the Go MySQL Driver
https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-driver/ [github.blog]
2020-05-21 06:03
tags:
bugfix
database
go
networking
programming
Adding to this challenge, authzd is deployed to our Kubernetes clusters, where we’ve been experiencing issues with high latencies when opening new TCP connections, something that particularly affects the pooling of connections in the Go MySQL driver. One of the most dangerous lies that programmers tell themselves is that the network is reliable, because, well, most of the time the network is reliable. But when it gets slow or spotty, that’s when things start breaking, and we get to find out the underlying issues in the libraries we take for granted.
Good walkthrough of dealing with some unfriendly bugs.
source: HN
10 Things I Hate About PostgreSQL
https://medium.com/@rbranson/10-things-i-hate-about-postgresql-20dbab8c2791 [medium.com]
2020-05-17 05:50
tags:
database
development
sql
Over the last few years, the software development community’s love affair with the popular open-source relational database has reached a bit of a fever pitch. This Hacker News thread covering a piece titled “PostgreSQL is the worlds’ best database”, busting at the seams with fawning sycophants lavishing unconditional praise, is a perfect example of this phenomenon. While much of this praise is certainly well-deserved, the lack of meaningful dissent left me a bit bothered. No software is perfect, so exactly what are PostgreSQL’s imperfections?
I’ve been hands-on with PostgreSQL in production since 2003 with deployments ranging from small (gigabytes) to modest to very large (~petabyte). My perspective is largely from building and running systems that are at least intended to be continuously available. Needless to say, I have gained first-hand experience with PostgreSQL’s particular idiosyncrasies through some painful production issues over the years.
Elixir and Postgres: A Rarely Mentioned Problem
https://blog.soykaf.com/post/postgresql-elixir-troubles/ [blog.soykaf.com]
2020-02-19 06:02
tags:
benchmark
database
perf
sql
Last time, we talked about the magic trick to make your full text searches go fast. This time, I’ll tell you about another performance issue I encountered that probably also affects your performance, at least if you are using Ecto and PostgreSQL.
New In PostgreSQL 12: Generated Columns
https://pgdash.io/blog/postgres-12-generated-columns.html [pgdash.io]
2019-10-02 20:05
tags:
database
sql
update
PostgreSQL 12 comes with a new feature called generated columns. Other popular RDBMSes already support generated columns as “computed columns” or “virtual columns.” With Postgres 12, you can now use it in PostgreSQL as well. Read on to learn more.
source: HN
Postgres Execution Plans - Field Glossary
https://www.pgmustard.com/blog/2019/9/17/postgres-execution-plans-field-glossary [www.pgmustard.com]
2019-09-19 19:34
tags:
database
perf
programming
sql
There are lots of guides out there to the basics of execution plans, but a lot are quite scarce on the details - how to interpret particular values, what they really mean, and where the pitfalls are.
We’ve spent a lot of time over the last 18 months learning, clarifying, and downright misinterpreting how each of these fields work — and there’s still further for us to go on that.
But we have come a long way, and I’d like to share the guide that I wish had existed when we started out — a glossary of the most common fields you’ll see on the operations in a query plan, and a detailed description of what each one means.
source: L
Thoughts on Arbitrary Pagination
https://blog.seantheprogrammer.com/thoughts-on-arbitrary-pagination [blog.seantheprogrammer.com]
2019-09-19 03:25
tags:
database
development
programming
sql
Pagination is the act of breaking a data set into multiple pages to limit the amount of data that has to be processed and sent by a server at once. We’re going to be changing how pagination works on crates.io, and I wanted to share some musings about the issues with supporting this as a generic abstraction. While I’m going to be talking about some PostgreSQL internals in this article, the general ideas presented apply to any SQL database.
source: L
ORMs are backwards
https://abe-winter.github.io/2019/09/03/orms-backwards.html [abe-winter.github.io]
2019-09-04 11:35
tags:
database
development
programming
sql
I think all ORM users have a journey from ‘there should be a way to’ to ‘this is saving me so much work’ to ‘I have to reach into the vending machine to get my change out’.
source: L
Keeping CALM: when distributed consistency is easy
https://blog.acolyer.org/2019/03/06/keeping-calm-when-distributed-consistency-is-easy/ [blog.acolyer.org]
2019-03-06 22:26
tags:
compsci
concurrency
database
development
paper
perf
When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.
So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.
Also: https://blog.acolyer.org/2019/03/08/a-generalised-solution-to-distributed-consensus/
Postgres full-text search is Good Enough!
http://rachbelaid.com/postgres-full-text-search-is-good-enough/ [rachbelaid.com]
2019-01-18 01:44
tags:
database
development
sql
text
When you have to build a web application, you are often asked to add search. The magnifying glass is something that we now add to wireframes without even knowing what we are going to search.
The search has become an important feature and we’ve seen a big increase in the popularity of tools like elasticsearch and SOLR which are both based on lucene. They are great tools but before going down the road of Weapons of Mass Destruction Search, maybe what you need is something a bit lighter which is simply good enough!
source: L
New in Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures
https://pganalyze.com/blog/postgres11-jit-compilation-auto-prewarm-sql-stored-procedures [pganalyze.com]
2018-10-05 21:02
tags:
database
jit
perf
sql
Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11
In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.
Are we there yet? Are we there yet?
source: HN
Columnstore and B+ tree – are hybrid physical designs important?
https://blog.acolyer.org/2018/09/28/columnstore-and-b-tree-are-hybrid-physical-designs-important/ [blog.acolyer.org]
2018-09-28 17:40
tags:
compsci
database
paper
perf
storage
It is generally understood that columnstores are crucial to achieving high performance for analytic queries and that B+ tree indexes are key to supporting transactional workloads efficiently. However, it is not well understood whether hybrid physical designs – both columnstore and B+ tree indices on the same database and potentially the same table – are important for any of the above workloads.
Through a series of benchmarks the authors show that hybrid physical designs can result in more than an order of magnitude lower execution costs for many workloads when compared to alternatives using B+ tree-only or columnstore-only. The Database Engine Tuning Advisor (DTA) for SQL Server is extended to analyze and recommend the appropriate indices for a given workload.
PostgreSQL 11 and Just In Time Compilation of Queries
https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/ [www.citusdata.com]
2018-09-13 01:29
tags:
benchmark
beta
database
jit
perf
sql
One of the big changes in the next PostgreSQL release is the result of Andres Freund’s work on the query executor engine. Andres has been working on this part of the system for a while now, and in the next release we are going to see a new component in the execution engine: a JIT expression compiler!
In our benchmarking, PostgreSQL 11 JIT is an awesome piece of technology and provides up to 29.31% speed improvements, executing TPC-H Q1 at scale factor 10 in 20.5s instead of 29s when using PostgreSQL 10.
source: L
A tale of RPMs
https://medium.com/@claudio.criscione/building-minions-5-a-tale-of-rpms-e65e6646a813 [medium.com]
2018-08-07 03:09
tags:
admin
database
format
linux
swtools
If you don’t want to read the following, feel free to skip it. The TL;DR is that it’s an awful idea to read the RPM DB directly without using the RPM libs and I eventually gave up. I did manage to access it without executing the RPM command, but then again I’ve just used the RPM libs.
source: grugq
You'd Need an Oracle to Understand These Docs
https://thedailywtf.com/articles/you-d-need-an-oracle-to-understand-these-docs [thedailywtf.com]
2018-07-25 17:55
tags:
database
development
docs
This time, I trace through a few references, discover another footnote which is itself reference to a white paper, which itself contains a footnote. “If cogs_behavior is set,RM_CONS_BY_UT,RM_CONS_IN_FRMLU_FPT,RM_CALC_FRMLUandCOGS_RM_CALC_USAGE may alter data in an unrecoverable fashion.”
How not to structure your database-backed web applications: a study of performance bugs in the wild
https://blog.acolyer.org/2018/06/28/how-_not_-to-structure-your-database-backed-web-applications-a-study-of-performance-bugs-in-the-wild/ [blog.acolyer.org]
2018-06-30 00:41
tags:
database
paper
perf
programming
sql
web
This is a fascinating study of the problems people get into when using ORMs to handle persistence concerns in their web applications. The authors study real-world applications and distil a catalogue of common performance anti-patterns. There are a bunch of familiar things in the list, and a few that surprised me with the amount of difference they can make. By fixing many of the issues that they find, Yang et al., are able to quantify how many lines of code it takes to address the issue, and what performance improvement the fix delivers.
Note that fundamentally a lot of the issues stem from the fact that the ‘O’ in ORM could just as easily stand for ‘Opaque.’
source: L