Skip to main content

On This Page

Google BigQuery Integrates SQL-Native Managed Inference for Hugging Face Models

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

BigQuery’s SQL-Native Managed Inference for Hugging Face Models

Google has introduced a significant update to BigQuery, allowing data teams to deploy and run Hugging Face models using plain SQL, thereby eliminating the need for separate Kubernetes or Vertex AI management. This new capability enables automated resource governance and secure identity-based execution, making it easier for data teams to integrate machine learning into their workflows.

Why This Matters

The integration of SQL-native managed inference for Hugging Face models in BigQuery addresses a long-standing problem faced by data teams, who previously had to manage complex ML infrastructure, including Kubernetes clusters and multiple tools. This resulted in significant operational overhead, making AI capabilities inaccessible to many teams. By collapsing the ML lifecycle into a unified SQL interface, BigQuery’s new feature reduces the friction and cost associated with deploying and running open-source models, with the potential to save thousands of dollars in operational costs.

Key Insights

  • 180,000+ Hugging Face models are now available for SQL-native managed inference in BigQuery, offering a wide range of options for data teams.
  • BigQuery’s automated resource governance via endpoint_idle_ttl ensures efficient resource utilization and cost management.
  • The feature supports customization for production use cases, including setting machine types, replica counts, and endpoint idle times, making it suitable for large-scale deployments.

Working Example

-- Create a model with a Hugging Face model ID
CREATE MODEL my_model
OPTIONS (MODEL_TYPE='HUGGING_FACE',
         MODEL_ID='sentence-transformers/all-MiniLM-L6-v2');

-- Run inference using AI.GENERATE_TEXT
SELECT AI.GENERATE_TEXT(my_model, 'This is a sample input text');

Practical Applications

  • Use Case: Data analysts can experiment with ML models without leaving their SQL environment, enabling faster prototyping and testing.
  • Pitfall: Failing to properly configure endpoint_idle_ttl can result in unnecessary costs, highlighting the need for careful resource management.

References:

Continue reading

Next article

Google Releases TranslateGemma Open Models for Efficient Multilingual Translation

Related Content