Home How It Works

How Data Sync Works

Enterprise-grade log-based Change Data Capture (CDC) technology that reads database transaction logs in real-time, delivering 90% data compression for efficient cloud replication.

Log-Based CDC

Real-Time Capture

90%

Data Compression

Network Efficiency

Zero Impact

On Production DBs

Log-Based Change Data Capture

Unlike traditional query-based replication, Data Sync reads the database transaction log files directly—capturing every INSERT, UPDATE, and DELETE in real-time without impacting database performance.

Transaction Log Reading

How Transaction Logs Work

1

Database Writes to Log

Every database transaction (INSERT, UPDATE, DELETE) is first written to a transaction log file before being committed to the database.

2

Data Sync Reads the Log

Our CDC agent continuously monitors and reads these log files, parsing transaction records in real-time without querying the production database.

3

Change Events Created

Each transaction is converted into a structured change event with before/after values, timestamp, and transaction metadata.

4

Compressed & Replicated

Changes are compressed (up to 90% reduction) and streamed to cloud destinations in near real-time, preserving transactional consistency.

Source Database

Oracle, SQL Server, SAP HANA

Transaction: UPDATE users SET status='active'

Transaction Log File

Archived transaction records

SCN: 12345678 | LSN: 0x00000034 | Operation: UPDATE

Data Sync CDC Agent

Reads, compresses & streams

Zero Database Impact

No queries run against production databases—log reading has minimal CPU/memory overhead.

Real-Time Capture

Sub-second latency from database commit to change event delivery to the cloud.

Complete Data Fidelity

Captures all operations including DELETEs and schema changes, with exact ordering preserved.

90% Data Compression Technology

Our intelligent compression algorithms reduce network bandwidth requirements by up to 90%, enabling cost-effective replication of large enterprise datasets to the cloud.

Compression Pipeline

Uncompressed Data 100 GB

Raw Transaction Data

Compressed Data 10 GB

90% Reduced

90%

Bandwidth Saved

10x

Faster Transfer

85%

Cost Reduction

Intelligent Compression

Multi-Layer Compression Strategy

Data Sync employs multiple compression techniques optimized for database change data:

Columnar Compression

Similar values in column batches are compressed together using dictionary encoding and run-length encoding.

Delta Encoding

Only changed columns can be transmitted, not entire rows—drastically reducing payload size for UPDATEs.

LZ4 Stream Compression

High-speed LZ4, snappy, or gzip algorithm compresses data streams with minimal CPU overhead and sub-millisecond latency.

Schema-Aware Optimization

Data types are encoded efficiently—integers use variable-length encoding, strings use dictionary compression.

Result: A typical 1TB SAP HANA database replicates to the cloud using only ~100GB of network bandwidth per full sync.

Data Flow & Type Mapping

Automated schema discovery and intelligent data type mapping ensures seamless replication from enterprise databases to modern cloud platforms.

Source

SAP HANA

VARCHAR(100)

DECIMAL(18,2)

TIMESTAMP

BLOB

Read Log

CDC Agent

Data Sync

Parse Log

Map Types

Compress 90%

Stream

Replicate

Target

Snowflake

VARCHAR(100)

NUMBER(18,2)

TIMESTAMP_NTZ

BINARY

Automated Type Mapping Examples

Source Database	Source Type	Target Platform	Target Type
SAP HANA	`VARCHAR(255)`	Snowflake	`VARCHAR(255)`
Oracle	`NUMBER(18,2)`	BigQuery	`NUMERIC(18,2)`
SQL Server	`DATETIME2`	Databricks	`TIMESTAMP`
PostgreSQL	`JSONB`	Snowflake	`VARIANT`
MySQL	`TINYINT(1)`	BigQuery	`BOOL`
SAP HANA	`BLOB`	Databricks	`BINARY`

Type mappings are automatically detected and configured during initial schema discovery. Custom mappings can be defined for special cases, above are examples you can map as well.

Technical Architecture

Enterprise-grade architecture designed for reliability, scalability, and security at every layer.

CDC Agent Layer

Deployed on-premises or in private cloud near source databases
Reads transaction logs using native database APIs
Minimal resource footprint (2-4 CPU cores, 4-8GB RAM)
High availability with automatic failover

Stream Processing

In-memory event buffering and batching
Real-time data transformation and enrichment
Schema evolution and type conversion
Guaranteed exactly-once delivery semantics

Cloud Delivery

Native connectors for Snowflake, Databricks, BigQuery
Optimized bulk loading with COPY/MERGE operations
Automatic retry with exponential backoff
End-to-end encryption (TLS 1.3)

Security & Compliance

Built with enterprise security standards from the ground up. All data is encrypted in transit and at rest, with comprehensive audit logging.

SOC 2 Type II GDPR Compliant HIPAA Ready ISO 27001

End-to-End Encryption

TLS 1.3 for all data in transit

Role-Based Access

Granular permissions & SSO

Audit Logging

Complete activity tracking

Private Networking

VPC peering & PrivateLink

Ready to Modernize Your Data Infrastructure?

Experience log-based CDC with 90% compression. Get your enterprise data flowing to the cloud in minutes, not months.

Request a Demo Read Documentation

Trusted by enterprise data teams worldwide

No Production Impact

Sub-Second Latency

Enterprise Support