Google BigQuery Integrates SQL-Native Managed Inference for Hugging Face Models

BigQuery’s SQL-Native Managed Inference for Hugging Face Models

Google has introduced a significant update to BigQuery, allowing data teams to deploy and run Hugging Face models using plain SQL, thereby eliminating the need for separate Kubernetes or Vertex AI management. This new capability enables automated resource governance and secure identity-based execution, making it easier for data teams to integrate machine learning into their workflows.

Why This Matters

The integration of SQL-native managed inference for Hugging Face models in BigQuery addresses a long-standing problem faced by data teams, who previously had to manage complex ML infrastructure, including Kubernetes clusters and multiple tools. This resulted in significant operational overhead, making AI capabilities inaccessible to many teams. By collapsing the ML lifecycle into a unified SQL interface, BigQuery’s new feature reduces the friction and cost associated with deploying and running open-source models, with the potential to save thousands of dollars in operational costs.

Key Insights

180,000+ Hugging Face models are now available for SQL-native managed inference in BigQuery, offering a wide range of options for data teams.
BigQuery’s automated resource governance via endpoint_idle_ttl ensures efficient resource utilization and cost management.
The feature supports customization for production use cases, including setting machine types, replica counts, and endpoint idle times, making it suitable for large-scale deployments.

Working Example

-- Create a model with a Hugging Face model ID
CREATE MODEL my_model
OPTIONS (MODEL_TYPE='HUGGING_FACE',
         MODEL_ID='sentence-transformers/all-MiniLM-L6-v2');

-- Run inference using AI.GENERATE_TEXT
SELECT AI.GENERATE_TEXT(my_model, 'This is a sample input text');

Practical Applications

Use Case: Data analysts can experiment with ML models without leaving their SQL environment, enabling faster prototyping and testing.
Pitfall: Failing to properly configure endpoint_idle_ttl can result in unnecessary costs, highlighting the need for careful resource management.

References:

On This Page

BigQuery’s SQL-Native Managed Inference for Hugging Face Models

Why This Matters

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

Unified Access to 50+ Chinese LLMs via OpenAI-Compatible API

Mastering Edge AI Performance and Power on Android: Stop Guessing, Start Profiling

Hugging Face and Google Cloud Partnership Accelerates Open AI Adoption with 10x Growth in Model Usage