Skip to main content

On This Page

Agoda Unifies Data Pipelines with Apache Spark to Achieve 95.6% Uptime

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Agoda Unified Data Pipelines

Agoda recently consolidated multiple independent financial data pipelines into a centralized Apache Spark-based platform, improving data consistency and achieving 95.6% uptime. The Financial Unified Data Pipeline (FINUDP) processes millions of daily booking transactions, providing hourly updates to downstream teams.

The move addresses a common enterprise issue: siloed data pipelines leading to inconsistent metrics and potential financial reporting errors. Without a unified system, discrepancies can impact critical business decisions and regulatory compliance, costing organizations significant time and resources to reconcile.

Key Insights

  • 64% of organizations cite poor data quality as their biggest challenge, 2023.
  • Data contracts define expectations for schemas and quality requirements between data producers and consumers, Gartner.
  • Apache Spark is used by companies like Netflix and Databricks for large-scale data processing.

Working Example

# Example of a basic data validation check in PySpark
from pyspark.sql.functions import col

def validate_data(df, column_name, min_value, max_value):
  """
  Validates that values in a specified column fall within a given range.
  """
  return df.filter((col(column_name) >= min_value) & (col(column_name) <= max_value))

# Assuming 'sales_df' is a Spark DataFrame with a 'amount' column
validated_df = validate_data(sales_df, "amount", 0, 1000)
validated_df.show()

Practical Applications

  • Financial Institutions: Implementing a unified data pipeline for accurate regulatory reporting and risk management.
  • Pitfall: Over-reliance on automated validations without data contracts can lead to undetected schema drift and data quality issues.

References:

Continue reading

Next article

Microsoft Disrupts RedVDS Cybercrime Service, Seizing Key Infrastructure

Related Content