Back to all posts

Bring Your Own AI to CSV and Excel Import

Every tool is adding AI right now. Import tools too. Drop in a model, let it guess the columns, let it rewrite the values, ship it. It looks great in a demo.

But import is not a demo. The files people upload are real: customer lists, payroll exports, patient records, transactions. The moment that file goes through a vendor's AI, you have put a third party in the middle of your most sensitive step. And then you have to answer for it. Where did the data go? How long was it kept? Who else saw it? Did any of it train a model? Those are heavy questions to take on from an import widget.

Updog stays out of that. It needed AI, but it did not need to hold your data while the AI runs.

The model should be yours

AI is good at this work: mapping a column called First Name to your firstName field, sorting messy values into the options a column allows, or turning "lowercase all the email addresses" into a real edit. By hand, this takes a while. A model does it in seconds.

The question was never whether Updog should use AI. It was who holds the data when it runs. The answer is you, since you already have it.

Updog does not ship an AI of its own. Look through the SDK and you will not find a model, an API key, or a line of OpenAI, Anthropic, or Gemini code. When AI runs in an Updog import, it runs because your app called your endpoint with your provider. Updog gives you the context, you make the call, and Updog applies what comes back. This is how the SDK is built, not a setting you can forget to turn on.

So Updog works the same with a hosted model, a self-hosted one behind your firewall, or no AI at all. You can use it either way.

Three places you can plug in a model

There are three places where you can plug in AI. Each one is a single callback to your own endpoint, and each passes only what the model needs, never the whole dataset.

Column matching

When someone uploads a file, Updog can ask your code to match the file's headers to your schema. Your callback gets the header names and your column definitions. That is all it gets.

onColumnMatch={async (headers, columns) => {
  const res = await fetch("/api/match-columns", {
    method: "POST",
    body: JSON.stringify({ headers, columns }),
  });
  return res.json();
}}

Return a map of { header: columnId }. Anything you skip, or set to null, falls back to Updog's built-in matching: exact match, synonyms, edit distance. The model handles the unclear headers. The obvious ones never needed it. And the point worth repeating: your endpoint sees a list of column names here, never a row of data.

Value matching

After the columns are matched, the values can still be off. The file says Eng, engineering, R&D. Your Department column expects Engineering. Updog gives your callback the distinct values it found and the options you allow.

onValueMatch={async (valuesToMatch) => {
  const res = await fetch("/api/match-values", {
    method: "POST",
    body: JSON.stringify({ valuesToMatch }),
  });
  return res.json();
}}

Return { columnId: { importedValue: option } }. Anything left over falls back to fuzzy matching, and the user checks it before it lands. Again, the data that leaves is small: the unique values in a dropdown and the options you already set, never the rows behind them.

The data-cleaning chat

A chat panel sits next to the grid. The user types a fix in plain words: "remove rows with no email," "normalize the phone numbers," "convert dates to ISO." Updog calls your onMessage with context about the data and streams back whatever your model returns.

What matters is what Updog sends and what it gets back. It does not send the whole table. It sends the column schema, the counts, a summary of the errors, and a small sample of rows. Your model does not need a million rows to write a fix. It needs the shape of the data and a few examples.

chat={{
  async *onMessage(context) {
    yield { type: "status", content: "Thinking..." };

    const res = await fetch("/api/clean", {
      method: "POST",
      body: JSON.stringify({
        message: context.message,
        columns: context.columns,
        errorSummary: context.errorSummary,
        sample: context.sample,
      }),
    }).then((r) => r.json());

    yield { type: "ops", content: res.ops };
    yield { type: "message", content: res.reply };
  },
}}

And it does not send back rewritten data. It sends back a transformation: a small function, written as text, that Updog runs on each row in the browser.

{
  action: "edit",
  fn: "(r) => { r.email = r.email.trim().toLowerCase(); }",
}

Updog runs that function across every row in the current view, locally, and groups the whole change into one undo step. One function can clean a million rows without any of them reaching the model. That is faster than sending data back and forth. And it keeps the data private.

The human stays in control

When the model returns a fix, Updog runs it and the result is right there in the grid. The user sees exactly what changed. If it is wrong, one undo takes the whole thing back. And nothing leaves until they submit the import.

So the roles stay clear. You choose the intelligence behind the import: the model, the prompt, the rules it follows. The user owns what happens to their data. Updog is the connection between the two, and it stays out of the data itself.