PyPy's new JSON parser
> In the last year or two I have worked on and off on making PyPy’s JSON faster, particularly when parsing large JSON files. In this post I am going to document those techniques and measure their performance impact.
Python is not context free
> The interesting thing about Python’s syntax is, of course, its use of indentation to indicate program structure.
A nice review of interaction between lexing and parsing.
Beginner Problems With TCP & The socket Module in Python
> Your operating system will deceive you and re-assemble the string you sock.recv(n) differently from the ones you sock.send(data). But here is the deceptive part. It will work sometimes, but not always. These bugs will be difficult to chase. If you have two programs communicating over TCP via the loopback device in your operating system (the virtual network device with IP 127.0.0.1), then the data does not leave your RAM, and packets are never fragmented to fit into the maximum size of an Ethernet frame or 802.11 WLAN transmission. The data arrives immediately because it’s already there, and the other side gets to read via sock.recv(n) exactly the bytestring you sent over sock.send(data). If you connect to localhost via IPv6, the maximum packet size is 64 kB, and all the packets are already there to be reassembled into a bytestream immediately! But when you try to run the same code over the real Internet, with lag and packet loss, or when you are unlucky with the multitasking/scheduling of your OS, you will either get more data than you expected, leftover data from the last sock.send(data), or incomplete data.
Not strictly a python problem, either.
Looking inside the box
> This blog post talks about reverse engineering the Dropbox client, breaking its obfuscation mechanisms, de-compiling it to Python code as well as modifying the client in order to use debug features which are normally hidden from view. If you’re just interested in relevant code and notes please scroll to the end. As of this writing it is up to date with the current versions of Dropbox which are based on the CPython 3.6 interpreter.
Python Project Tooling explained
> For this reason I’ve decided to create a post that lists the most important tools, when and why they are used and what problem they solve. I will try to explain with simple words how you should approach each one of these tools. If a tool is here, it means that, as a Python programmer, you’re supposed to at least know its existence. I will list only tools that can be applied to any project or workflow and that you should consider every time you start a new project. This doesn’t mean you always have to use each one of them on every single project. Too much tooling can be easily be an overkill and become hard to maintain in some cases.
Go Concurrency from the Ground Up
> Sometimes the best way to learn something is to build it. This guide will walk you through how to reproduce Go’s concurrency features in another programming language.
PyPy v7.1 released
> This release, coming fast on the heels of 7.0 in February, finally merges the internal refactoring of unicode representation as UTF-8. Removing the conversions from strings to unicode internally lead to a nice speed bump.
Python Decorators: Syntactic Artificial Sweetener
> Python has a feature called function decorators. With a little bit of syntax, the behavior of a function or class can be modified in useful ways. Python comes with a few decorators, but most of the useful ones are found in third-party libraries.
> The problem is that the Python language reference doesn’t parse an expression after @. It matches a very specific pattern that just so happens to look like a Python expression. It’s not syntactic sugar, it’s syntactic artificial sweetener!
Even a Feature That You Do Not Use Can Bite You
In this case, an obscure bit of python syntax. Not quite what I predicted, but maybe that’s the point.
> The reason is that it is actually valid Python code that uses variable annotations, which is a feature introduced in Python 3.6. In the previous versions of Python, the code would have raised SyntaxError.
Oneliner-izer: An Exercise in Constrained Coding
> We’ll describe the ideas and implementation behind Oneliner-izer, a “”compiler“” which can convert most Python 2 programs into one line of code. As we discuss how to construct each language feature within this unorthodox constraint, we’ll explore the boundaries of what Python permits and encounter some gems of functional programming – lambda calculus, continuations, and the Y combinator.
PyPy for low-latency systems
> Recently I have merged the gc-disable branch, introducing a couple of features which are useful when you need to respond to certain events with the lowest possible latency.
Removing a recursion in Python
> A recent question on Stack Overflow got me thinking about how to turn a recursive algorithm into an iterative one, and it turns out that Python is a pretty decent language for this.
> Of course, the technique that I’m going to show you is not necessarily “Pythonic”. There are probably more Pythonic solutions using generators and so on. What I’d like to show here is that to remove this sort of recursion, you can do so by re-organizing the code using a series of small, careful refactorings until the program is in a form where removing the recursion is easy. Let’s see first how to get the program into that form.
And part 2: https://ericlippert.com/2018/12/17/removing-a-recursion-in-python-part-2/
Real time numbers recognition (MNIST) on an iPhone with CoreML from A to Z
> Learn how to build and train a deep learning network to recognize numbers (MNIST),how to convert it in the CoreML format to then deploy it on your iPhoneX and make it recognize numbers in realtime!
Remote Code Execution on a Facebook server
> If we were able to forge our own session that contains arbitrary pickle content, we could execute commands on the system. However, the SECRET_KEY that is used by Django for signing session cookies is not available in the stacktrace. However, the SENTRY_OPTIONS list contains a key named system.secret-key, that is not snipped. Quoting the Sentry documentation, system.secret-key is “a secret key used for session signing. If this becomes compromised it’s important to regenerate it as otherwise its much easier to hijack user sessions.“; wow, it looks like it’s a sort of Django SECRET-KEY override!
Painting by Prime Number
> A prime portrait is a prime number formatted as a matrix with X digits per line. When we select a color for each digit, we can generate an image.
> Instead of doing this for many prime numbers and color schemes until you find something that resembles a known image, I have turned the process around. I have taken iconic images, such as the Mona Lisa and Starry Night, and converted them to images with only 10 colors. I assigned a digit to each color. Then I generated many similar images with a little bit of ‘noise’ added. The noise changed the colors in the images slightly, and thus the digits. If the digits in the image formed a prime number, I found a prime portrait!
Copy-on-write friendly Python garbage collection
> Enabling GC could alleviate this problem and slow down the memory growth, but undesired Copy-on-write (COW) would still increase the overall memory footprint. So we decided to see if we could make Python GC work without COW, and hence, the memory overhead.
> Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters. Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool). Numba supports compilation of Python to run on either CPU or GPU hardware, and is designed to integrate with the Python scientific software stack.
Using Rust in Mercurial
> This page describes the plan and status for leveraging the Rust programming language in Mercurial.
Quite a bit more ambitious than rewriting a few C extensions, with accompanying difficulties.
3x faster Flask apps - Quart as a upgrade to Flask
> Python has evolved since Flask was first released around 8 years ago, particularly with the introduction of asyncio. Asyncio has allowed for the development of libraries such as uvloop and asyncpg that are reported (here, and here) to improve performance far beyond what was previously possible. Sadly Flask is not easily combined with asyncio or these libraries. However the Flask-API can be used with asyncio via the Quart framework.