Blog

Exchange Integrations — Overview

This blog provides an insight into the mechanics and principles underlying our system. In this particular blog, we aim to offer a clear and comprehensive explanation of how our standard exchange integrations function.

  • December 13, 2021
  • Vlad Cealicu

The first thing a lot of our developers do when they join CCDatais to integrate a new exchange. It’s our bread and butter and it’s the first thing that we ever developed as a company. We needed data before we had an API and website. The blue processes in the diagram below are the services we will cover in the next blog posts.

As you can imagine, it’s the core part of our software that everything else is built upon. We’ve kept improving it over the years. From bespoke integrations to integrations with common code, to templated integrationsto sharded integrations. We’ve moved mapping logic from the base layer to a middle layer and now to the edge layers. We’ve developed internal tools to manage exchanges, manage instrument mapping, manage integrations, manage data recovery. It’s been our most active area of developmentthroughout our companies history.

Simplified exchange integration diagram — our V2 system

Why is it so hard to get right and why does it need so many iterations you might ask. Well, it’s because we have very stringent requirements from our customers and from the regulators. You can’t be the primary data provider of the cryptocurrency industry and have low standards.

There are two things that historically we have valued above everything else: Reliability and Broad Data Sets.

Reliability —if everything fails, an integration that is consuming data, will continue to consume data

  • No single point of failure — two mirrored data centers: Azure and OVH
  • Issues with our internal infrastructure/networking should not impact data consumption — local data is all that is needed to run an integration
  • After the initial setup, an exchange integration should have all the information it needs to work in isolation — state sync service that keeps local data in sync with the central cluster
  • Higher adoption of our API should not impact data consumption — decoupled databases (only using replicas for reading) and Data Distributor / Router
  • Always have access to data if the instance is up / no rate limits — Proxy Swarm
  • Process up to 50k incoming and 150k outgoing messages per second per exchange (trade, order book, funding rate, etc.) — Exchange Sharding

Broad data sets — as much data as possible:

  • Split across multiple instances per exchange — Exchange Discovery Service
  • As many data points as the exchanges allow us to have, the more granular the better — Proxy Swarm
  • As much precision as the exchanges allow us to have — Multiple integrations per data type
  • Orchestrated and balanced by our internal team — Exchange Discovery Dashboard

The diagram above is not that far off what we originally had in 2014 when we started however now it is a lot better thought out. In the last 7 years, we've come around to rely a lot more heavily on Redis (it is one of the best pieces of software ever written). In our original design we had everything saved in files, but in the time since we’ve realized it’s almost impossible to crash Redis and it is a lot faster and easier to work with than files.

Big changes since older versions:

Each individual message type generates its own queue. Data comes inbound from polling and streaming, gets deduplicated using a Redis list, and then gets pushed into a second Redis list that behaves like a queue, to be consumed by the output service.

None of the integrations have any state data in their memory, they rely on reading Redis keys to figure out what instruments they are in charge of, what endpoints they need to poll, etc. To chain requests, we parse the first request data that we need and attach it to the metadata of the next request.

Disclaimer: Please note that the content of this blog post was created prior to our company's rebranding from CryptoCompare to CCData.

Exchange Integrations — Overview

The first thing a lot of our developers do when they join CCDatais to integrate a new exchange. It’s our bread and butter and it’s the first thing that we ever developed as a company. We needed data before we had an API and website. The blue processes in the diagram below are the services we will cover in the next blog posts.

As you can imagine, it’s the core part of our software that everything else is built upon. We’ve kept improving it over the years. From bespoke integrations to integrations with common code, to templated integrationsto sharded integrations. We’ve moved mapping logic from the base layer to a middle layer and now to the edge layers. We’ve developed internal tools to manage exchanges, manage instrument mapping, manage integrations, manage data recovery. It’s been our most active area of developmentthroughout our companies history.

Simplified exchange integration diagram — our V2 system

Why is it so hard to get right and why does it need so many iterations you might ask. Well, it’s because we have very stringent requirements from our customers and from the regulators. You can’t be the primary data provider of the cryptocurrency industry and have low standards.

There are two things that historically we have valued above everything else: Reliability and Broad Data Sets.

Reliability —if everything fails, an integration that is consuming data, will continue to consume data

  • No single point of failure — two mirrored data centers: Azure and OVH
  • Issues with our internal infrastructure/networking should not impact data consumption — local data is all that is needed to run an integration
  • After the initial setup, an exchange integration should have all the information it needs to work in isolation — state sync service that keeps local data in sync with the central cluster
  • Higher adoption of our API should not impact data consumption — decoupled databases (only using replicas for reading) and Data Distributor / Router
  • Always have access to data if the instance is up / no rate limits — Proxy Swarm
  • Process up to 50k incoming and 150k outgoing messages per second per exchange (trade, order book, funding rate, etc.) — Exchange Sharding

Broad data sets — as much data as possible:

  • Split across multiple instances per exchange — Exchange Discovery Service
  • As many data points as the exchanges allow us to have, the more granular the better — Proxy Swarm
  • As much precision as the exchanges allow us to have — Multiple integrations per data type
  • Orchestrated and balanced by our internal team — Exchange Discovery Dashboard

The diagram above is not that far off what we originally had in 2014 when we started however now it is a lot better thought out. In the last 7 years, we've come around to rely a lot more heavily on Redis (it is one of the best pieces of software ever written). In our original design we had everything saved in files, but in the time since we’ve realized it’s almost impossible to crash Redis and it is a lot faster and easier to work with than files.

Big changes since older versions:

Each individual message type generates its own queue. Data comes inbound from polling and streaming, gets deduplicated using a Redis list, and then gets pushed into a second Redis list that behaves like a queue, to be consumed by the output service.

None of the integrations have any state data in their memory, they rely on reading Redis keys to figure out what instruments they are in charge of, what endpoints they need to poll, etc. To chain requests, we parse the first request data that we need and attach it to the metadata of the next request.

Disclaimer: Please note that the content of this blog post was created prior to our company's rebranding from CryptoCompare to CCData.

Stay Up To Date

Get our latest research, reports and event news delivered straight to your inbox.