Datatune Logo

Datatune

Transform your data with natural language using LLMs

🚀 Fast & Scalable 🤖 Multiple LLM Support 📦 Open Source

What is Datatune?

Datatune is a Python library that lets you transform tabular data using natural language prompts powered by Large Language Models. Unlike traditional tools that rely on schema-based queries, Datatune gives the LLM access to your actual data—enabling intelligent, context-aware transformations at the row level that go far beyond what SQL or pandas can express.

Try out the interactive examples below to see Datatune in action!

🏷️ Smart Classification & Extraction

Extract and categorize information from unstructured data

Use Case: Map operations let you add new columns to your dataset by extracting, classifying, or transforming information from existing columns. This is perfect for enriching your data with insights that would be tedious to code manually—like categorizing products, analyzing sentiment, or extracting structured information from text.

Products
Customer Support
Financial Data
Scenario: You have a product catalog with names and descriptions, but no organized categories. Use Datatune to automatically classify products and extract metadata that helps with search, filtering, and analytics.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>
Scenario: Customer support emails flood your inbox daily. Use Datatune to automatically route tickets to the right department, prioritize urgent issues, and analyze customer sentiment—all from the subject line and message content.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>
Scenario: Bank transactions come with merchant names and descriptions, but no built-in categorization. Use Datatune to automatically classify expenses, detect recurring payments, and identify which purchases are essential versus discretionary.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>

🔍 Intelligent Filtering

Filter rows based on semantic criteria across multiple columns

Use Case: Filter operations let you remove rows based on complex, semantic criteria that would be difficult to express with traditional code. Instead of writing nested if-statements or regex patterns, describe what you want to keep or remove in natural language. Perfect for data cleanup, quality control, and targeted analysis.

Products
Reviews
Employee Data
Scenario: You need to create targeted product lists based on semantic understanding rather than simple filters—products that promote health and wellness, items ideal for remote work setups, or thoughtful gifts under $50. These require understanding product purpose and context, not just matching exact values.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>
Scenario: You have thousands of product reviews but need to find specific types—positive testimonials for marketing, detailed feedback from verified buyers, or recent negative reviews that mention bugs. Semantic filtering makes this easy.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>
Scenario: HR needs to identify employees for different initiatives—senior engineers for mentorship programs, remote tech workers for team building, or high-performers ready for promotion. Filter based on multiple criteria without complex joins.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>

🕶️ Data Anonymization & Privacy

Protect sensitive information while preserving data utility

Use Case: When working with sensitive data, you need to protect privacy while maintaining data utility for analysis. Datatune can intelligently redact PII (Personally Identifiable Information), PHI (Protected Health Information), or other sensitive data based on context—much smarter than simple find-and-replace operations.

Customer Data
Medical Records
HR Documents
Scenario: You need to share customer data with a third-party analytics team, but must protect personal information. Use Datatune to apply different levels of anonymization—full anonymization for maximum privacy, or partial anonymization that preserves some context.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>
Scenario: Medical researchers need patient data for studies, but HIPAA compliance requires removing all PHI. Use Datatune to create research-safe datasets that preserve diagnostic information while protecting patient identities and ensuring compliance.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>
Scenario: HR analytics teams need to analyze salary trends and performance patterns, but must protect employee privacy. Use Datatune to anonymize personal identifiers while keeping role and performance data intact for meaningful analysis.
>>> import dask.dataframe as dd
>>> import datatune as dt
>>> llm = dt.llm.OpenAI('gpt-3.5-turbo')
>>>
>>>
>>>

🗺️ Map

Transform and extract data with natural language prompts

🔍 Filter

Remove rows based on semantic criteria

🤖 Agents

Let AI plan and execute complex transformations

⚡ Scalable

Process datasets larger than LLM context windows

Ready to Transform Your Data?

⭐ Star on GitHub 📚 Read Docs