inks

tag: database

I Went To SQL Injection Court

https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/ [sockpuppet.org]

2025-03-14 23:52

Should public bodies in Illinois, like cities and school districts and sheriff’s departments, be allowed to hide information from Freedom of Information requests by keeping them in databases? That question is before the 104th Illinois General Assembly, thanks to a bill sponsored by Donald P. DeWitte, elected state senator by the wise citizens of Batavia and Elgin (motto: “The City In The Suburbs”; indeed), and prompted in part by my friend Matt Chapman.

B-Trees: More Than I Thought I'd Want to Know

https://benjamincongdon.me/blog/2021/08/17/B-Trees-More-Than-I-Thought-Id-Want-to-Know/ [benjamincongdon.me]

2025-01-04 11:26

tags: compsci database programming storage systems

In my college Data Structures and Algorithms course, we covered B-Trees, but I didn’t grok why I’d choose to use one. As presented, B-Trees were essentially “better” Binary Search Trees, with some hand-waving done that they had improved performance when used in database applications. I remember needing to memorize a bunch of equations to determine the carrying capacity of a M-degree B-Tree, and a vague understanding of B-Tree lookup/insertion/deletion, but not much else. Which is a shame! They’re interesting structures.

source: HN

Antithesis of a One-in-a-Million Bug: Taming Demonic Nondeterminism

https://www.cockroachlabs.com/blog/demonic-nondeterminism/ [www.cockroachlabs.com]

2024-03-22 19:46

tags: database development fuzzing testing

Bugs are compounded by the number of distinct nodes operating in a distributed system, each providing their own sources of nondeterminism with thread timings, network conditions, hardware, and more. Finding and fixing these bugs requires new approaches to testing and debugging.

Like any emerging technology, the Antithesis platform is not without rough edges. Deterministic replay doesn’t immediately get you a reproduction, particularly across distinct code changes as you might see with a unit or integration test. In our experience, a significant amount of effort was invested in instrumenting the logs, as well as reasoning about injected failure states in order to recover the state machine which reproduces the bug. The rinse-and-repeat cycle means that a modified binary (with new instrumentation) may not always hit the same terminal state. Although, in practice determinism between runs is very high assuming the code changes are localized.

source: HN

When Good Correlation is Not Enough

https://hakibenita.com/postgresql-correlation-brin-multi-minmax [hakibenita.com]

2023-07-28 02:39

tags: database development perf sql

Choosing to use a block range index (BRIN) to query a field with high correlation is a no-brainer for the optimizer. The small size of the index and the field’s correlation makes BRIN an ideal choice. However, a recent event taught us that correlation can be misleading. Under some easily reproducible circumstances, a BRIN index can result in significantly slower execution even when the indexed field has very high correlation.

source: HN

Speeding up sort performance in Postgres 15

https://www.citusdata.com/blog/2022/05/19/speeding-up-sort-performance-in-postgres-15/ [www.citusdata.com]

2022-05-20 23:02

tags: database perf sorting sql update

Let’s explore each of the 4 improvements in PostgreSQL 15 that make sort performance go faster:
Change 1: Improvements sorting a single column
Change 2: Reduce memory consumption by using generation memory context
Change 3: Add specialized sort routines for common datatypes
Change 4: Replace polyphase merge algorithm with k-way merge

source: HN

How to contact Google SRE: Dropping a shell in cloud SQL

https://offensi.com/2020/08/18/how-to-contact-google-sre-dropping-a-shell-in-cloud-sql/ [offensi.com]

2020-08-20 16:29

tags: cloud database exploit programming security sql

Google Cloud SQL is a fully managed relational database service. Customers can deploy a SQL, PostgreSQL or MySQL server which is secured, monitored and updated by Google. More demanding users can easily scale, replicate or configure high-availability. By doing so users can focus on working with the database, instead of dealing with all the previously mentioned complex tasks. Cloud SQL databases are accessible by using the applicable command line utilities or from any application hosted around the world. This write-up covers vulnerabilities that we have discovered in the MySQL versions 5.6 and 5.7 of Cloud SQL.

source: L

Three bugs in the Go MySQL Driver

https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-driver/ [github.blog]

2020-05-21 06:03

tags: bugfix database go networking programming

Adding to this challenge, authzd is deployed to our Kubernetes clusters, where we’ve been experiencing issues with high latencies when opening new TCP connections, something that particularly affects the pooling of connections in the Go MySQL driver. One of the most dangerous lies that programmers tell themselves is that the network is reliable, because, well, most of the time the network is reliable. But when it gets slow or spotty, that’s when things start breaking, and we get to find out the underlying issues in the libraries we take for granted.

Good walkthrough of dealing with some unfriendly bugs.

source: HN

10 Things I Hate About PostgreSQL

https://medium.com/@rbranson/10-things-i-hate-about-postgresql-20dbab8c2791 [medium.com]

2020-05-17 05:50

tags: database development sql

Over the last few years, the software development community’s love affair with the popular open-source relational database has reached a bit of a fever pitch. This Hacker News thread covering a piece titled “PostgreSQL is the worlds’ best database”, busting at the seams with fawning sycophants lavishing unconditional praise, is a perfect example of this phenomenon. While much of this praise is certainly well-deserved, the lack of meaningful dissent left me a bit bothered. No software is perfect, so exactly what are PostgreSQL’s imperfections?

I’ve been hands-on with PostgreSQL in production since 2003 with deployments ranging from small (gigabytes) to modest to very large (~petabyte). My perspective is largely from building and running systems that are at least intended to be continuously available. Needless to say, I have gained first-hand experience with PostgreSQL’s particular idiosyncrasies through some painful production issues over the years.

Elixir and Postgres: A Rarely Mentioned Problem

https://blog.soykaf.com/post/postgresql-elixir-troubles/ [blog.soykaf.com]

2020-02-19 06:02

tags: benchmark database perf sql

Last time, we talked about the magic trick to make your full text searches go fast. This time, I’ll tell you about another performance issue I encountered that probably also affects your performance, at least if you are using Ecto and PostgreSQL.

New In PostgreSQL 12: Generated Columns

https://pgdash.io/blog/postgres-12-generated-columns.html [pgdash.io]

2019-10-02 20:05

tags: database sql update

PostgreSQL 12 comes with a new feature called generated columns. Other popular RDBMSes already support generated columns as “computed columns” or “virtual columns.” With Postgres 12, you can now use it in PostgreSQL as well. Read on to learn more.

source: HN

Postgres Execution Plans - Field Glossary

https://www.pgmustard.com/blog/2019/9/17/postgres-execution-plans-field-glossary [www.pgmustard.com]

2019-09-19 19:34

tags: database perf programming sql

There are lots of guides out there to the basics of execution plans, but a lot are quite scarce on the details - how to interpret particular values, what they really mean, and where the pitfalls are.

We’ve spent a lot of time over the last 18 months learning, clarifying, and downright misinterpreting how each of these fields work — and there’s still further for us to go on that.

But we have come a long way, and I’d like to share the guide that I wish had existed when we started out — a glossary of the most common fields you’ll see on the operations in a query plan, and a detailed description of what each one means.

source: L

Thoughts on Arbitrary Pagination

https://blog.seantheprogrammer.com/thoughts-on-arbitrary-pagination [blog.seantheprogrammer.com]

2019-09-19 03:25

tags: database development programming sql

Pagination is the act of breaking a data set into multiple pages to limit the amount of data that has to be processed and sent by a server at once. We’re going to be changing how pagination works on crates.io, and I wanted to share some musings about the issues with supporting this as a generic abstraction. While I’m going to be talking about some PostgreSQL internals in this article, the general ideas presented apply to any SQL database.

source: L

ORMs are backwards

https://abe-winter.github.io/2019/09/03/orms-backwards.html [abe-winter.github.io]

2019-09-04 11:35

tags: database development programming sql

I think all ORM users have a journey from ‘there should be a way to’ to ‘this is saving me so much work’ to ‘I have to reach into the vending machine to get my change out’.

source: L

Keeping CALM: when distributed consistency is easy

https://blog.acolyer.org/2019/03/06/keeping-calm-when-distributed-consistency-is-easy/ [blog.acolyer.org]

2019-03-06 22:26

tags: compsci concurrency database development paper perf

When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.

So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.

Also: https://blog.acolyer.org/2019/03/08/a-generalised-solution-to-distributed-consensus/

Postgres full-text search is Good Enough!

http://rachbelaid.com/postgres-full-text-search-is-good-enough/ [rachbelaid.com]

2019-01-18 01:44

tags: database development sql text

When you have to build a web application, you are often asked to add search. The magnifying glass is something that we now add to wireframes without even knowing what we are going to search.

The search has become an important feature and we’ve seen a big increase in the popularity of tools like elasticsearch and SOLR which are both based on lucene. They are great tools but before going down the road of Weapons of Mass Destruction Search, maybe what you need is something a bit lighter which is simply good enough!

source: L

New in Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures

https://pganalyze.com/blog/postgres11-jit-compilation-auto-prewarm-sql-stored-procedures [pganalyze.com]

2018-10-05 21:02

tags: database jit perf sql

Everyone’s favorite database, PostgreSQL, has a new release coming out soon: Postgres 11

In this post we take a look at some of the new features that are part of the release, and in particular review the things you may need to monitor, or can utilize to increase your application and query performance.

Are we there yet? Are we there yet?

source: HN

Columnstore and B+ tree – are hybrid physical designs important?

https://blog.acolyer.org/2018/09/28/columnstore-and-b-tree-are-hybrid-physical-designs-important/ [blog.acolyer.org]

2018-09-28 17:40

tags: compsci database paper perf storage

It is generally understood that columnstores are crucial to achieving high performance for analytic queries and that B+ tree indexes are key to supporting transactional workloads efficiently. However, it is not well understood whether hybrid physical designs – both columnstore and B+ tree indices on the same database and potentially the same table – are important for any of the above workloads.

Through a series of benchmarks the authors show that hybrid physical designs can result in more than an order of magnitude lower execution costs for many workloads when compared to alternatives using B+ tree-only or columnstore-only. The Database Engine Tuning Advisor (DTA) for SQL Server is extended to analyze and recommend the appropriate indices for a given workload.

PostgreSQL 11 and Just In Time Compilation of Queries

https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/ [www.citusdata.com]

2018-09-13 01:29

tags: benchmark beta database jit perf sql

One of the big changes in the next PostgreSQL release is the result of Andres Freund’s work on the query executor engine. Andres has been working on this part of the system for a while now, and in the next release we are going to see a new component in the execution engine: a JIT expression compiler!

In our benchmarking, PostgreSQL 11 JIT is an awesome piece of technology and provides up to 29.31% speed improvements, executing TPC-H Q1 at scale factor 10 in 20.5s instead of 29s when using PostgreSQL 10.

source: L

A tale of RPMs

https://medium.com/@claudio.criscione/building-minions-5-a-tale-of-rpms-e65e6646a813 [medium.com]

2018-08-07 03:09

tags: admin database format linux swtools

If you don’t want to read the following, feel free to skip it. The TL;DR is that it’s an awful idea to read the RPM DB directly without using the RPM libs and I eventually gave up. I did manage to access it without executing the RPM command, but then again I’ve just used the RPM libs.

source: grugq

You'd Need an Oracle to Understand These Docs

https://thedailywtf.com/articles/you-d-need-an-oracle-to-understand-these-docs [thedailywtf.com]

2018-07-25 17:55

tags: database development docs

This time, I trace through a few references, discover another footnote which is itself reference to a white paper, which itself contains a footnote. “If cogs_behavior is set,RM_CONS_BY_UT,RM_CONS_IN_FRMLU_FPT,RM_CALC_FRMLUandCOGS_RM_CALC_USAGE may alter data in an unrecoverable fashion.”