LinkedIn's Brooklin diagram

LinkedIn is open sourcing its tool for real-time data streaming at scale. Brooklin is intended for streaming data across multiple data storage and messaging systems.

Brooklin has been in production at LinkedIn since 2016. It is the company’s primary data streaming solution and powers thousands of data streams and 2 trillion messages every day.

“Supporting a rapidly increasing variety of data storage and messaging systems has proven to be an equally critical aspect of any viable solution. We built Brooklin to address our growing needs for a system that is capable of scaling both in terms of data volume and systems variance,” Celia Kung, engineering manager at LinkedIn, wrote in a post.

According to LinkedIn, there are two major use cases for the service: streaming bridge and change data capture.

First, it can be used as a bridge to stream data across different environments. For example, it can move data between cloud services, clusters within a data center, or across data centers.

Second, it can be used to stream database updates in real time. This is useful for LinkedIn because they have several applications that need to know when a new job is posted, a professional connection is made, or someone’s profile is updated. By using Brooklin to stream these updates, LinkedIn doesn’t have to query the online database to detect updates.

Going forward, LinkedIn will continue building connectors to support extra data sources and destinations. It also will add optimizations to the service, including the ability to auto-scale based on traffic, skip decompression and recompression of messages in mirroring scenarios, and additional read and write optimizations.