Română

End of Siloed Data: A Guide to HubSpot's Data Hub & Google's Gemini AI File Search 2025

End of Siloed Data: A Guide to HubSpot's Data Hub & Google's Gemini AI File Search 2025
12.11.2025

Business data is often sitting in disconnected software projects and repositories, from email to ERP, CRM, website etc. At the same time nearly every business uses a CRM and interacts with AI like ChatGPT, yet applying AI to your private business data has remained a complex and expensive challenge.

In Fall 2025 HubSpot introduced Data Hub, and in November 2025 Google previewed File Search for the Gemini API. Together, they make unified data and managed RAG really practical for mid-size businesses. This post gives you two concise checklists to test each, then shows how to combine them into a single agent workflow

But before we do that, note that the latter launch generated significant buzz in the industry. Industry developer Robin Ebers captured the sentiment:




Part 1: Unified Data Foundation with Data Hub

For an intelligent system, you need reliable data from all your sources, even for non-technical users. This is the problem HubSpot’s new Data Hub aims to solve. It moves beyond simple app connections to making the CRM the foundation for everything else.


Key capabilities:

  • The Data Studio: A powerful, visual environment where employees can blend data from the HubSpot portal with external sources: tour data warehouses or other business apps to create new unified datasets and reports.

HubSpot Data Studio Create Dataset
Figure 1: Without writing code, a marketing manager can now join deal data with financial data from an external source to create a single, unified view of LTV.


  • AI-Powered Data Cleaning: The system now includes AI recommendations to help you format properties, fix inconsistencies, and maintain data hygiene automatically. Also includes bulk duplicate management and data health dashboard.

HubSpot Data Hub Smart Columns
Figure 2: Automate data enrichment. The AI can instantly add missing information like company size or industry to all your contacts, great for segmentation and analysis


  • Advanced Data Sync and Automation: With features like Reverse ETL, you can not only bring data into HubSpot but also push your clean, unified HubSpot data back out to other critical systems, like your ERP or financial software.Don’t forget the already existing HubSpot - Google BigQuery data integration (read more in OPTI’s previous guide here ➛)

HubSpot Data Studio Sources
Figure 3: Bring all your company data into HubSpot CRM, wherever it is.


Such tools were usually maintained by internal data teams, yet they now become available as a HubSpot package (in the cloud as SaaS).

As for pricing, as of November 2025, it starts at $700/mo with core seat for the Professional edition or $2000/mo for the Enterprise full edition (including Custom Objects and Data Warehouse integration such as BigQuery)




How-To: First Steps in Data Hub

Here is a four-step approach to begin unifying your data.


Step 1: Audit and Connect Your Key Data Sources

Before you sync anything, map out your data. Ask: Where does our most critical customer data live? This often includes your e-commerce platform (e.g., Shopify), financial software (e.g., NetSuite), and previous CRMs.

  1. Inside HubSpot, navigate to Settings > Integrations > Connected Apps.

  2. Use the App Marketplace to find and install the native integrations for your key systems.

  3. During setup, you will authorize the connection and establish the initial object mappings (e.g., connecting a Shopify Customer to a HubSpot Contact). All these linkages need to be correct to create a data foundation.


Step 2: Perform an Initial Data Clean-Up

Now, focus on the data already in HubSpot. A clean, deduplicated database will be easy to maintain, even by non-technical people.

  1. Navigate to Contacts and use the Actions > Merge Duplicates feature. HubSpot’s AI will suggest potential duplicates based on email, name, and company domain.

  2. Create an "Active List" of contacts with formatting issues, such as First Name contains all-caps letters or Phone Number does not conform to a standard format.

  3. Use HubSpot Workflows to standardize these properties. For example, create a workflow that triggers when a contact is added to your "cleanup list" and uses the "Format Data" action to change text to "Title Case" or properly format a date.


Step 3: Create Your First Two-Way Sync

The power of the new Data Hub is in bi-directional syncing. A place to start is with your sales or support team's primary tool.

  1. Go to the settings for your connected app (e.g., Salesforce or another CRM).

  2. Set up a new sync rule. Define the trigger and the action. For instance: "When a Contact's lifecycle stage is updated to Customer in HubSpot, update the corresponding record in the other system."

  3. Crucially, set up a rule going the other way: "When a deal is marked Closed Won in our sales tool, update the associated Deal record in HubSpot and enroll the contact in our 'New Customer Onboarding' workflow." This creates a living connection, ensuring consistency.


Step 4: Build a New View with the Data Studio

This is where the magic of Data Hub will happen, in letting you query datasets without technical knowledge. Let's create a report that joins marketing engagement with sales outcomes.

  1. Navigate to Data Management > Data Studio.

  2. Click "Create dataset."

  3. As your primary source, select Contacts. As a secondary source, select Deals.
    HubSpot Create Data Set
    Figure 4: HubSpot will keep all your primary and secondary datasets and can join them on request.




  4. Join them based on the Associated Deal ID. Now add another source: your advertising platform data, which you've synced into HubSpot as e.g. custom objects. Join this data on the contact's email address.

  5. You can now build a report that visualizes which ad campaigns brought in leads that eventually turned into the highest-value deals - a single view that was previously impossible without a data analyst.

As HubSpot themselves put it:

“When we say HubSpot unifies data, we mean all business data. It's designed for teams who are tired of manual exports, broken integrations, and data that never seems to match up.”

Learn more about the new capabilities on the HubSpot official blog ➛




Part 2: Grounded Intelligence with Gemini File Search

Google File Search is a managed Retrieval-Augmented Generation (RAG) system. In plain English, you can upload your documents: product manuals, support knowledge bases, case studies, HR policies, or financial reports to the Gemini API. Then you ask Gemini questions about them in natural language, and the AI will respect the documents you added.

Gemini File Search
Figure 5: Official announcement visual for File Search from Google.


The only programming involved is your connection to upload files and ask questions to Gemini, as we see below.

Status (November 2025): File Search is a public preview feature of the Gemini API. You upload files via the Files API, then add them to a store (a persistent, searchable container for your documents), and query with a model configured with the File Search tool.


File Search unlocks:

  • Democratized RAG at a low price: Instead of significant investment in vector databases for RAG, simple API calls signify a predictable scalable cost.


  • Internal Expertise: Build chatbots that can answer highly specific questions for your employees, like "What are the protocols of an audit according to our compliance docs?"


  • Accuracy: Every answer the AI generates includes citation metadata that links back to the exact source document and passage, so as to allow verification. You can display them in your frontend app using a little programming.


To preview the next section, see this diagram we created for how a full File Search centric business would look like:

Diagram - from data to apps using HubSpot and File Search for business
Figure 6: The workflow is simple: (1) Your business documents are ingested (2) via the Files API into a secure File Search Store. (3) This allows Gemini to power applications like internal chatbots and assistants with verifiable answers.


As for pricing, public preview (November 2025) posts note $0.15/1M tokens for the initial indexing of your files in Gemini, with storage and subsequent generation currently free. Remember, a good rule of thumb is that 1 token is about 4 characters.




How-To: First steps with File Search API

The process below is developer-oriented (at least for the moment), but it connects in just four steps your whole document trove to a powerful AI.


Step 1: Get Your API Key and Set Up Environment

  1. Go to Google AI Studio (ai.google.dev).

  2. On the left-hand menu, click Get API Key and then Create API key in new project. Save this key securely.

  3. Ensure you have Python installed on your machine. Then, install the necessary Google library by running this command in your terminal:
    pip install -q google-generativeai



Step 2: Upload Your First Set of Documents

This is the core of indexing your knowledge. Let's say you have a folder with three PDF documents: product_specs.pdf, support_faq.pdf, and report_draft.pdf.

Create a Python script and add the following code. This will upload each file and store its API resource names.


from google import genai
from google.genai import types
import time

client = genai.Client()

# 1) Create a File Search Store (persistent container)
store = client.file_search_stores.create(config={'display_name': 'my-reports-store'})

# 2) Upload + import a file into the store (name will show up in citations)
op = client.file_search_stores.upload_to_file_search_store(
    file="report_draft.pdf",
    file_search_store_name=store.name,
    config={'display_name': 'My Report Draft v1'}
)

Step 3: Wait for Processing


# 3) Wait for indexing to finish (poll)
while not op.done:
    time.sleep(1)
    op = client.operations.get(op)

Advanced customization: The API also supports chunking (upload in parts) and parallel uploads


Step 4: Ask a Question and Get the Response and Citations

Now you can query your knowledge base.


# 4) Ask the model via File Search against the created store
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Summarize the key claims and list 3 sources to verify.",
    config=types.GenerateContentConfig(
        tools=[types.Tool(
            file_search=types.FileSearch(
                file_search_store_names=[store.name]
            )
        )]
    )
)

print(resp.text)

# Optional: access citation metadata / grounding data to render visually
print(resp.candidates[0].grounding_metadata)

Security considerations:

  • The API also supports tagging your documents so to query only those documents tagged in a specific matter.

  • You can use these tags to isolate documents by subject matter or department.

  • You can also use tenant-isolated separate File Search Stores for maximum isolation.

  • Don’t forget to use least-privilege API keys and log everything.


Practical Outputs of File Search:

  • Natural language questions and answers

  • Grounding is verifiably in private documents you can isolate by tag (for security)

  • Supports PDF, JSON, TXT AND DOCX (Microsoft Word)

All at a low cost, that of the initial indexing of documents (as of November 2025)

Explore the official technical details on the Google Developers Blog ➛

Read our in-depth case study on using GenAI for financial analysis ➛




Part 3: The Synergy - Integrate Both via Agents



Why integrate?

A clean HubSpot Data Hub can become your single source of truth. Google's File Search can become your instant expert on company documents. But a huge advantage could come from making them speak to each other in real-time.

As we've detailed, Gemini's File Search queries a static set of documents. Your CRM HubSpot data is live and dynamic. Building the bridge between them requires a custom AI Agent. This is an intelligent orchestrator that knows which tool to use for which question.

This is where an integrator comes in, such as OPTI. See our case studies for building such systems and revisit this diagram. We create the connections:

Diagram - from company data to HuBSpot to Google Gemini File Search to chatbots and quoting for business
Figure 7: Example flow of integrated company knowledge. Explore a similar architecture with our AI Sales ➛ product



Concrete example

Instead of spending an hour digging through spreadsheets, emails, and the CRM, imagine a sales executive preparing for a major client call. They open an internal chatbot custom built for them.

They ask: "Give me a one-page summary on Client XYZ, including their current MRR, the status of their last support ticket, and any relevant case studies we have for their industry."

Here's what happens:

  1. The Sales Executive wants to prepare for a call. They open a chat window.

  2. Multi-Tool AI Agent receives the query. The custom-built AI agent parses the request into three distinct tasks.

  3. Live Data Retrieval. The agent calls a custom HubSpot Connector for MRR and ticket status of the client.

  4. Static Document Retrieval: For the case studies, the agent queries the File Search tool for relevant case studies you've uploaded (with citations).

  5. Intelligent Synthesis. The agent using Gemini synthesizes the information and composes a full data-aware brief, with citations pointing back to the source.

Example flow of managed RAG integration for a company
Figure 8: Example flow of managed RAG integration for a company

Get a demo



In conclusion, tech is changing for mid-sized businesses:

  • With HubSpot’s Data Hub, you build a reliable foundation: a single source of truth in your CRM cloud.

  • With Google’s File Search, you layer on any static collection of documents a powerful intelligence engine.

If you want to use both, connecting these two new worlds in a secure manner via agents will result in a truly intelligent business.

As a HubSpot Solution Partner and Google Cloud Partner (ISO 27001/9001 certified), OPTI designs secure connections and integrations for software. Including our AI Sales platform.

Let’s have a chat about your implementation of this guide.

Get a demo


Quick Questions

What is the main problem solved by integrating HubSpot Data Hub with Gemini AI?

The main problem is 'siloed data'. The integration allows a custom AI Agent to combine the live, dynamic data from your CRM (HubSpot) with the static knowledge from your internal company documents (PDFs, DOCX files indexed by Gemini), providing a complete, unified view.

Do I need to be a developer to use HubSpot Data Hub?

No. HubSpot Data Hub is designed for business (non-technical) users. Its tools, like the Data Studio, provide a visual, 'no-code' environment to clean, enrich, and blend datasets.

What is 'RAG' (Retrieval-Augmented Generation)?

RAG is an AI technique where the language model (like Gemini) is forced to base its answers exclusively on a set of documents you provide. Gemini File Search is a managed RAG system that ensures answers about your company are accurate and verifiable, with citations back to the original source.

Can my marketing team implement Gemini File Search?

In its current public preview stage, implementing Gemini File Search is a technical, developer-oriented process. It requires using the API, managing API keys, and writing scripts (e.g., in Python) to upload files and query the system.

What are the costs involved for HubSpot Data Hub and Gemini File Search?

According to the article (as of November 2025), HubSpot Data Hub starts at $700/mo for the Professional edition and $2000/mo for Enterprise. For Gemini File Search, the cost is $0.15 per 1 million tokens for the initial indexing, with storage and subsequent generation being free during the preview period.

Why should I integrate both? What is the main advantage of an AI Agent?

The main advantage is synergy. A custom AI Agent acts as an 'intelligent orchestrator'. It knows when to query HubSpot for live data (e.g., 'What is the client's MRR?') and when to query Gemini File Search for static data (e.g., 'What does the contract with this client say?'), combining the answers to provide a complete insight.

Can you provide a concrete example of how such an AI Agent is used?

Yes. A sales executive can ask an internal chatbot: 'Give me a summary of Client XYZ, the status of their last support ticket, and relevant case studies for their industry.' The agent will pull the live data (ticket status) from HubSpot, search for the static documents (case studies) in Gemini File Search, and synthesize it all into a single, coherent answer.

What is the role of an integrator like OPTI in this process?

OPTI's role is to design and build the custom 'Multi-Tool AI Agent'. We create the secure connections between HubSpot and the Gemini API, develop the agent's logic that knows which tool to use, and build the interface (e.g., the internal chatbot) through which your employees can interact with this new intelligent system.

What technologies and methodologies are involved?

Technologies: HubSpot Data Hub, HubSpot Data Studio, HubSpot Workflows, Google Gemini API, Gemini File Search, Python, Google AI Studio, API
Methodologies: Data Unification (ETL/Reverse ETL), AI-Powered Data Cleaning, Bi-Directional Sync, Retrieval-Augmented Generation (RAG), Natural Language Querying, Multi-Tool AI Agent Development

Marian Călborean

Article written by

Marian Călborean

Manager, Software Architect, PhD. in Logic, Fulbright Visiting Scholar (CUNY GC, 2023)

See on LinkedIn →
Interesat?

Interested?

Schedule a meeting

Get a Free Audit

News and Guides

More News