The viewer has a chat panel. Press Ask in the toolbar, type a question in plain English, and get a filtered table, a bar chart, or a scalar stat back — all computed locally in your browser. The LLM authors a query plan; a deterministic JavaScript executor runs it against your data. No SQL. No Python. No upload. This post walks through the full capability set with real examples for analysts, finance teams, HR, operations, data entry staff, and developers.
Your row data never leaves your browser. The model sees only column names and a small sample of values — enough to understand the schema, not enough to read your data. Every calculation in this post runs locally. Nothing is uploaded.
How it works under the hood
When you type a question, the panel sends the LLM a schema digest — column names, their inferred type (number, text, date, boolean), a null-fraction estimate, and up to five sample values per column. For numeric columns it also includes min, max, and mean. That is the entire payload. A 500,000-row file produces exactly the same schema digest as a 50-row file. Your data stays where it is.
The LLM returns a JSON plan — a small pipeline of typed steps. A plan for "which region has the most orders" might look like: filter where order_date is after 2024-01-01, then group by Region and count rows, then sort descending. The local executor runs those steps against your data using the same deterministic JavaScript that powers the rest of the viewer. If the plan references a column that does not exist, the validator catches it before any execution and returns a readable error instead of silently wrong numbers.
The plan is always visible. Click "Show plan" beneath any result and you see the exact steps the model generated. This matters for regulated environments — you are not trusting a black box, you are trusting a transparent, inspectable pipeline. You can also re-ask with corrections ("use Revenue instead of Sales") and watch the plan change.
Pinpoint the exact column with @ tagging
When a file has multiple columns with similar names — "Sale price", "List price", "Cost price" — a plain question like "what is the average price?" leaves the model guessing. The @ column picker removes that ambiguity. Type @ in the chat input and a fuzzy-search dropdown appears showing every column in your file. Arrow-key through the matches, press Enter or Tab, and the column name is inserted as a highlighted chip. Multi-word names are automatically quoted.
The chip also appears in your conversation history so you can see at a glance which column each question targeted. When you re-ask or follow up, you can @ a different column without retyping the whole question. This makes exploratory analysis faster — iterate through "Sale price" vs "List price" vs "Cost price" in three keystrokes each.
For analysts — aggregates, group-bys, and ranked bar charts
Group-by questions are the bread and butter for anyone doing data analysis. "Which category has the highest average margin?" generates a group_by step with an avg aggregate, followed by a sort. The result renders as a ranked bar chart with the group labels on the left and the computed value on the right. Clicking any bar drills down into the source rows that make it up, applying a live filter to the main table so you can keep investigating from there.
group_by by=["Category"] agg={avg_margin: avg(Margin %)}
→ sort by=[avg_margin desc]Five aggregate functions are supported: count, sum, avg, min, and max. The model picks the right one from context — "how many" produces count, "total" produces sum, "average" produces avg. You can stack multiple aggregates in a single question: "what is the min, max, and average price per category" generates a group_by with three computed columns in one plan. The result renders as a table with each aggregate in its own column, exportable as TSV or CSV.
After each answer the panel surfaces up to three follow-up suggestions computed locally — no extra LLM call. A group-by result suggests sorting the other direction, showing only the top five, or breaking down by a second categorical column. A filter result suggests aggregating the filtered rows or grouping them. Clicking any suggestion fires it as the next question, so a line of investigation feels more like a conversation than a series of disconnected queries.
For finance and accounting — totals, cost analysis, and period filters
Finance teams spend a lot of time answering the same question in different slices: total cost this month, total cost by supplier, cost variance versus last quarter. These map directly onto the aggregate and group-by operations. Scalar questions — "what is the total purchase cost?" — render as a stat card. The card picks up the currency symbol from the source column and formats the result accordingly, so a column full of "€6.26" values produces a Euro-formatted total, not a raw number.
aggregate total_cost=sum(Purchase Price)
Period filters pair with aggregates naturally. "What is the total revenue for orders placed between January and March?" generates a filter on the date column followed by an aggregate. The date comparison uses locale-aware parsing — ISO dates, European dd/mm/yyyy formats, and month-name formats all work. You do not need to know what format your date column uses; the executor figures it out.
For cost variance work, the derive step covers the gap. "Add a column showing the difference between sale price and purchase price" generates a subtract expression. "Add a column showing VAT amount at 21%" generates a multiply. Once added to the table, the new column can be summed, averaged, filtered, and exported exactly like any source column. It is the spreadsheet formula workflow, minus the spreadsheet.
For HR and people analytics — headcount, salary bands, and attrition
HR CSV exports from payroll systems, ATS tools, or HRIS platforms are often the messiest data an analyst encounters — inconsistent department names, salary stored as text with currency symbols, hire dates in four different formats. The Ask panel handles all of these gracefully because the schema digest captures samples, not assumptions. If the model sees "€45,000" as a sample value for the Salary column, it knows the column needs currency stripping before arithmetic.
group_by by=["Department"] agg={headcount: count(*), avg_salary: avg(Salary)}
→ sort by=[headcount desc]Attrition analysis is a filter-then-count workflow: "how many employees left in 2024?" filters on the termination date column (isNotEmpty plus dateAfter), then counts. "What is the attrition rate by department?" adds a group_by on top of that filter. The plan is transparent — you can verify what "left" means before trusting the number.
Salary banding is a derive use case. "Add a column showing whether each employee is above or below the department average salary" requires a two-step question — first ask for the average per department, note the values, then ask for the derive column. The current derive step supports arithmetic expressions, so "is salary above 1.2 times the company average" becomes a single divide and compare. The result can be added to the table and used as a filter for subsequent questions.
For operations and logistics — inventory alerts and stock analysis
Operations teams are usually looking for exceptions — products below reorder level, shipments delayed past a threshold, suppliers missing delivery windows. These are filter questions with numeric or date comparisons. "Find products where stock is below 10" generates a single filter step. "Find products where stock is below 10 and lead time is more than 14 days" generates a two-condition AND filter. Results land in a mini table in the chat panel; you can click the drill-down arrow on any row to jump to that row in the main table.
filter where Stock < 10 AND Lead_Time_Days > 14
| Product | Category | Stock | Lead_Time_Days |
|---|---|---|---|
| COMPO BIO Granuplant 10L | Garden | 9 | 21 |
| Weitech Pest Repeller Ultra | Pest Control | 3 | 18 |
| Catchmaster Pantry Pest Glue | Household | 7 | 16 |
| Green Protect Trap Refill | Pest Control | 2 | 28 |
Inventory valuation is a derive question. "Add a column showing stock value as stock multiplied by purchase price" generates a multiply expression and runs it over every row. Once added to the table you can immediately ask "what is the total stock value by category?" and get a breakdown across the newly added column. Questions chain naturally — the schema digest rebuilds whenever you add a column, so derived columns are immediately available for the next question.
For data entry and QA — finding gaps, fixing formats, and cleaning text
Data entry work is mostly about finding what is wrong before it causes a problem downstream. Missing values, inconsistent capitalisation, barcodes in the wrong format, duplicate rows. The Ask panel handles all of these without requiring anyone to know SQL or write a script. The isEmpty and isNotEmpty operators find blanks; contains and regex_replace handle format normalisation; the derive step with string functions covers cleanup transforms.
filter where Barcode isEmpty → select cols=["Product", "Category", "SKU", "Barcode"]
| Product | Category | SKU | Barcode |
|---|---|---|---|
| COMPO SANA Cactus Soil 5L | Garden | CS-5L | — |
| Luxan ECO Ant Powder 100g | Pest Control | LA-100 | — |
| Pest Stop Rodent Bait Station | Pest Control | PS-RBS | — |
Text cleanup is where the string functions shine for data entry teams. "Add a column with the product name in title case and trimmed of extra spaces" generates a derive step with trim and a case conversion in sequence. "Extract the size from the product name — the part in parentheses" becomes a regex_replace that captures the parenthetical segment. "Combine SKU and barcode into a single identifier separated by a dash" becomes a concat. Each cleaned column can be added to the table and exported, replacing the original messy column in your downstream system.
derive Product_Code = concat(SKU, "-", Barcode)
| Product | SKU | Barcode | Product_CodeNEW |
|---|---|---|---|
| Green Protect Insect Spray 500ml | GP-500 | 5060525035562 | GP-500-5060525035562 |
| Pest Stop Quick-Snap Rat Trap | PS-QS | 5014055040222 | PS-QS-5014055040222 |
| COMPO BIO Granuplant 10L | CB-10L | 4003949000247 | CB-10L-4003949000247 |
For sales and marketing — rankings, top-N, and conversion analysis
Sales data questions tend to fall into two shapes: who is performing best (top-N ranking) and where are deals coming from (group-by breakdown). Top-N is a group-by followed by sort and limit. "Show me the top 10 products by total sales" generates a group_by with sum, a sort descending, and a limit 10. The result renders as a bar chart so the ranking is immediately visual. Hovering a bar shows the exact value and its share of the total.
group_by by=["Product"] agg={total_sales: sum(Sale price)}
→ sort by=[total_sales desc]
→ limit n=5Marketing questions often involve date filtering combined with group-by: "how many leads came in per channel last month?" filters on a date range, then groups by the channel column, then counts. "What is the conversion rate per campaign?" requires a derive step — divide converted by total — followed by a group-by if the campaign column is present. The key insight is that these questions are just combinations of the same small set of operations; the model handles the composition automatically.
For data scientists — distributions, distinct values, and custom derivations
Data scientists tend to use the Ask panel for quick exploration before writing proper analysis code. "What are the distinct values in the Status column?" runs a distinct step and returns a deduplicated list — useful for spotting unexpected categories before you build a pivot table. "How many rows per status?" chains distinct into a group-by count. "What percentage of rows have a null email?" uses an aggregate with an isEmpty filter to get the count, then divides.
Derived columns replace the usual "add a quick formula column in Excel, check the distribution, delete it" loop. "Add a column that flags rows where margin is more than two standard deviations from the mean" is a more complex derive — you would ask for the mean and standard deviation first (two separate aggregate questions), note the values, then ask for the flag column using arithmetic on those thresholds. Each step is transparent and the plan shows the exact expression.
derive Price_Flag = "Purchase Price" / 18.40
| Product | Purchase Price | Price_FlagNEW |
|---|---|---|
| COMPO BIO Granuplant 10L | €62.56 | 3.40 |
| Weitech Pest Repeller Ultra | €38.20 | 2.08 |
| Catchmaster Flying Insect Trap | €5.37 | 0.29 |
| Green Protect Insect Spray 500ml | €6.26 | 0.34 |
The column stats bar integrates automatically with the chat panel. When a question involves an aggregate over a column — "average margin", "total stock" — the stats bar jumps to that column and shows count, distinct, sum, avg, min, max, and median without a separate click. When you click a drill-down arrow on a bar chart row, the stats bar follows to the most relevant column. You can override it manually from the dropdown at any point.
Filters and comparisons — the full operator set
The executor supports fourteen comparison operators across text, numeric, and date columns. Multiple conditions combine with AND by default; ask for OR explicitly. Here is the full list of what you can express in a plain English question:
- –eq / neq — exact match or exclusion ("where status is active", "where region is not EMEA")
- –contains / startsWith / endsWith — substring matching ("where email contains @gmail")
- –in — membership check ("where country is in UK, France, Germany")
- –gt / gte / lt / lte — numeric comparisons ("where price is greater than 100")
- –between — range check on numbers or dates ("where order date is between Jan 1 and Mar 31")
- –isEmpty / isNotEmpty — null and blank detection ("where barcode is missing")
- –dateBefore / dateAfter / dateBetween — date-aware comparisons with locale-tolerant parsing
String functions — the complete list
String functions are used in derive steps to transform text columns. Describe the transformation; the model picks the right function and arguments. Every derived column can be added to the table with Add col and then filtered, sorted, and exported.
- –replace(col, old, new) — exact string substitution ("replace all occurrences of GmbH with Ltd")
- –regex_replace(col, pattern, replacement) — regex-based substitution ("extract the numeric part of the SKU")
- –trim(col) — strip leading and trailing whitespace ("clean up the product name column")
- –upper(col) / lower(col) — case conversion ("convert country codes to uppercase")
- –concat(col1, separator, col2) — join two columns ("combine first name and last name with a space")
- –split(col, delimiter, index) — extract a segment ("take the first part of the email address before the @")
Bring your own API key
The feature requires a Google Gemini key or an OpenRouter key. Paste it into the AI settings panel (the gear icon in the chat header); it is stored only in your browser's local storage and never sent to csvdiff.app. Requests go directly from your browser to Google or OpenRouter. We never see your key, your questions, or your data. This matters for regulated industries — finance, healthcare, legal — where even metadata about what you are querying needs to stay inside your perimeter.
Gemini Flash is the default on the Gemini path — fast and cheap for this workload. On OpenRouter you can pick any model that supports JSON-mode output. Most questions consume fewer than 1,000 input tokens because the schema digest is compact. A session of twenty questions on a fifty-column file typically costs a few cents at current model pricing. The session history is preserved in browser storage so you can close the panel, do other work, and continue the conversation when you come back.
Honest limitations
A few things to set expectations correctly. The feature does not write back to your file — derive results have to be explicitly added with Add col, and filter results do not modify the underlying data. It does not join across multiple files in a single question (open two tabs and use the diff tool for cross-file comparisons). Very ambiguous column names can trip the model; the UI surfaces a "Used X — switch to Y?" chip when it detects an auto-pick among multiple plausible columns, so you can correct a wrong guess with one click. And it will not hallucinate columns it cannot see — if you reference a column that does not exist, the validator rejects the plan before any execution and returns a specific error message.
Complex statistical operations — standard deviation, percentile ranking, rolling averages, cohort analysis — are not supported yet. The operation set is intentionally small: filter, aggregate, group_by, sort, limit, distinct, select, and derive. Everything the executor does is deterministic and inspectable, which is a deliberate trade-off against supporting an unbounded set of operations that would be harder to validate and explain.
To try it: open any CSV in the viewer at csvdiff.app/view, press Ask in the toolbar, add your Gemini or OpenRouter key in the settings panel, and type any question from this post. The schema digest is built from your file locally — nothing is uploaded before you even ask.