Virtu CEO Doug Cifu Explains the Future of HFT (Podcast)
When the GameStop and Robinhood story exploded at the end of January, suddenly everyone took an interest in market structure, and things like payment for order flow, and the role that high-frequency trading shops play in enabling free retail trading. This of course gave rise to lots of conspiracy theories about ways retail traders are taken advantage of. On the new Odd Lots, we speak with Doug Cifu, the CEO of Virtu, which is one of the largest HFT shops in the country, to get his perspective on how this part of the market really works.
Hour long, pretty thorough.
Cameras and Lenses
Cameras and the lenses inside them may seem a little mystifying. In this blog post I’d like to explain not only how they work, but also how adjusting a few tunable parameters can produce fairly different results:
This is amazing work.
This Goes to Eleven - Decimating Array.Sort with AVX2
Let’s get in the ring and show what AVX/AVX2 intrinsics can really do for a non-trivial problem, and even discuss potential improvements that future CoreCLR versions could bring to the table.
Everyone needs to sort arrays, once in a while, and many algorithms we take for granted rely on doing so. We think of it as a solved problem and that nothing can be further done about it in 2020, except for waiting for newer, marginally faster machines to pop-up. However, that is not the case, and while I’m not the first to have thoughts about it; or the best at implementing it, if you join me in this rather long journey, we’ll end up with a replacement function for Array.Sort, written in pure C# that outperforms CoreCLR’s C++2 code by a factor north of 10x on most modern Intel CPUs, and north of 11x on my laptop. Sounds interesting? If so, down the rabbit hole we go…
Very well done.
Twelve Million Phones, One Dataset, Zero Privacy
Every minute of every day, everywhere on the planet, dozens of companies — largely unregulated, little scrutinized — are logging the movements of tens of millions of people with mobile phones and storing the information in gigantic data files. The Times Privacy Project obtained one such file, by far the largest and most sensitive ever to be reviewed by journalists. It holds more than 50 billion location pings from the phones of more than 12 million Americans as they moved through several major cities, including Washington, New York, San Francisco and Los Angeles.
Each piece of information in this file represents the precise location of a single smartphone over a period of several months in 2016 and 2017. The data was provided to Times Opinion by sources who asked to remain anonymous because they were not authorized to share it and could face severe penalties for doing so. The sources of the information said they had grown alarmed about how it might be abused and urgently wanted to inform the public and lawmakers.
Hacking GitHub with Unicode's dotless 'i'.
GitHub’s forgot password feature could be compromised because the system lowercased the provided email address and compared it to the email address stored in the user database. If there was a match, GitHub would send the reset password link to the email address provided by the attacker- which was technically speaking, not the same email address.
This is beautiful.
Meet the ZedRipper – a 16-core, 83 MHz Z80 powerhouse as portable as it is impractical. The ZedRipper is my latest attempt to build a fun ‘project’ machine, with a couple of goals in mind:
Text Rendering Hates You
Rendering text, how hard could it be? As it turns out, incredibly hard! To my knowledge, literally no system renders text “perfectly”. It’s all best-effort, although some efforts are more important than others.
I lost it at multicolored ligatures.
Binary symbolic execution with KLEE-Native
KLEE is a symbolic execution tool that intelligently produces high-coverage test cases by emulating LLVM bitcode in a custom runtime environment. Yet, unlike simpler fuzzers, it’s not a go-to tool for automated bug discovery. Despite constant improvements by the academic community, KLEE remains difficult for bug hunters to adopt. We’re working to bridge this gap!
My internship produced KLEE-Native; a version of KLEE that can concretely and symbolically execute binaries, model heap memory, reproduce CVEs, and accurately classify different heap bugs. The project is now positioned to explore applications made possible by KLEE-Native’s unique approaches to symbolic execution. We will also be looking into potential execution time speed-ups from different lifting strategies. As with all articles on symbolic execution, KLEE is both the problem and the solution.
A very deep dive into iOS Exploit chains found in the wild
Earlier this year Google’s Threat Analysis Group (TAG) discovered a small collection of hacked websites. The hacked sites were being used in indiscriminate watering hole attacks against their visitors, using iPhone 0-day.
There was no target discrimination; simply visiting the hacked site was enough for the exploit server to attack your device, and if it was successful, install a monitoring implant. We estimate that these sites receive thousands of visitors per week.
TAG was able to collect five separate, complete and unique iPhone exploit chains, covering almost every version from iOS 10 through to the latest version of iOS 12. This indicated a group making a sustained effort to hack the users of iPhones in certain communities over a period of at least two years.
I’ll investigate what I assess to be the root causes of the vulnerabilities and discuss some insights we can gain into Apple’s software development lifecycle. The root causes I highlight here are not novel and are often overlooked: we’ll see cases of code which seems to have never worked, code that likely skipped QA or likely had little testing or review before being shipped to users.
In Memoriam: J. C. R. Licklider
Two papers. Man-Computer Symbiosis and The Computer as a Communication Device.
The first argues for interactive systems. The computer can’t be an extension of our mind if it’s not responsive.
The second is a vision for networked communications. It sounds a lot like today, but more optimistic. Where did we go wrong?
Let’s talk about files! Most developers seem to think that files are easy.
In this talk, we’re going to look at how file systems differ from each other and other issues we might encounter when writing to files. We’re going to look at the file “stack”, starting at the top with the file API, moving down to the filesystem, and then moving down to disk.
Unraveling the JPEG
JPEG images are everywhere in our digital lives, but behind the veil of familiarity lie algorithms that remove details that are imperceptible to the human eye. This produces the highest visual quality with the smallest file size—but what does that look like? Let’s see what our eyes can’t see!
This article is about how to decode a JPEG image. In other words, it’s about what it takes to convert the compressed data stored on your computer to the image that appears on the screen. It’s worth learning about not just because it’s important to understand the technology we all use everyday, but also because, as we unravel the layers of compression, we learn a bit about perception and vision, and about what details our eyes are most sensitive to. It’s also just a lot of fun to play with images this way.
Reverse-engineering Broadcom wireless chipsets
In this blogpost I provided an account of various activities during my 6 months as an intern at Quarkslab, my project involved understanding the Linux kernel drivers, analyzing Broadcom firmware, reproducing publicly known vulnerabilities, working on an emulator to run portions of firmware, fuzzing and finding 5 vulnerabilities (CVE-2019-8564, CVE-2019-9500, CVE-2019-9501, CVE-2019-9502, CVE-2019-9503). Two of those vulnerabilities are present both in the Linux kernel and firmware of affected Broadcom chips. The most common exploitation scenario leads to a remote denial of service. Although it is technically challenging to achieve, exploitation for remote code execution should not be discarded as the worst case scenario.
Don’t miss the disclosure timeline at the end.
Iconic consoles of the IBM System/360 mainframes, 55 years old
The IBM System/360 was a groundbreaking family of mainframe computers announced on April 7, 1964. Designing the System/360 was an extremely risky “bet-the-company” project for IBM, costing over $5 billion. Although the project ran into severe problems, especially with the software, it was a huge success, one of the top three business accomplishments of all time. System/360 set the direction of the computer industry for decades and popularized features such as the byte, 32-bit words, microcode, and standardized interfaces. The S/360 architecture was so successful that it is still supported by IBM’s latest z/Architecture mainframes, 55 years later.
The lower part of the Model 30 console was used for operator intervention. Note the binary-to-hexadecimal conversion chart below the hexadecimal dials.
While we’re looking: http://www.righto.com/2019/04/a-look-at-ibm-s360-core-memory-in-1960s.html
What has your microcode done for you lately?
Did you ever wonder what is inside those microcode updates that get silently applied to your CPU via Windows update, BIOS upgrades, and various microcode packages on Linux? Well, you are in the wrong place, because this blog post won’t answer that question (you might like this though).
In fact, the overwhelming majority of this this post is about the performance of scattered writes, and not very much at all about the details of CPU microcode. Where the microcode comes in, and what might make this more interesting than usual, is that performance on a purely CPU-bound benchmark can vary dramatically depending on microcode version. In particular, we will show that the most recent Intel microcode version can significantly slow down a store heavy workload when some stores hit in the L1 data cache, and some miss.
Reverse emulating the NES!
CGA in 1024 Colors - a New Mode: the Illustrated Guide
One of our “hey, this hardware shouldn’t be doing that!“-moments was extending the CGA’s color palette by a cool order of magnitude or two. How’d we pull that off? - reenigne has already posted an excellent technical article answering that very question. To complement his writeup, I’ll take a bit of a different approach – here’s my ‘pictorial’ take on how we arrived at this:
The colors between the colors.
Okay, so you’re a CS graduate and you did a hardware course as part of your degree, but perhaps that was a few years ago now and you haven’t really kept up with the details of processor designs since then. In particular, you might not be aware of some key topics that developed rapidly in recent times...
pipelining (superscalar, OOO, VLIW, branch prediction, predication)
multi-core and simultaneous multi-threading (SMT, hyper-threading)
SIMD vector instructions (MMX/SSE/AVX, AltiVec, NEON)
caches and the memory hierarchy
Fear not! This article will get you up to speed fast. In no time, you’ll be discussing the finer points of in-order vs out-of-order, hyper-threading, multi-core and cache organization like a pro. But be prepared – this article is brief and to-the-point.
I would say all of that is accurate except the brief part. It’s quite long, but very dense. Excellent resource.
A Matter of Perspective
What happens if you take the shoreline of a lake, cut it, and unfurl it?
The once-closed shoreline of the lake now becomes linear, providing a new perspective on a familiar feature. Warm up your scrolling finger, because here’s what happened when I linearized Lake Michigan: