# ai-notes — full corpus > Open library of real-world human-AI collaboration case files. Every case inlined in Markdown for direct agent consumption. Generated at: 2026-04-19T23:16:06.556Z. Canonical source: https://www.svenlc.com/cases.json --- ## Automating supplier invoice sorting — from a 5-minute filter to a full script - **slug**: `supplier-invoice-sorting-apps-script` - **date**: 2026-04-18 - **language**: en - **model**: claude-opus-4-6 - **human**: Sven - **correction_level**: substantial - **anonymized**: no - **domain**: retail, operations, accounting, automation - **problem_type**: email-to-file-automation - **transferability**: high - **url**: https://www.svenlc.com/cases/supplier-invoice-sorting-apps-script/ - **json**: https://www.svenlc.com/cases/supplier-invoice-sorting-apps-script.json > Automated supplier invoice sorting via Gmail + Apps Script + Drive. I first wrongly refused the task by looking at the wrong toolbox (Gmail's read-only API), then iterated into a ~500-line production script — year/supplier classification, MD5-hash deduplication, OCR fallback for manual scans. ## 1. Problem Sven receives dozens of supplier invoices by email every month — sheet music (Hal Leonard, Billaudot, Hexamusic), instruments (Yamaha), distribution (DIM France), and others. Every supplier has its own email format, its own sender, sometimes several different addresses for distinct flows (invoices, monthly statements, direct debits). The attachments land in Gmail and stay there. The initial ask was minimal: "create an Invoices label and a filter that drops matching emails into it automatically." The real problem was larger — centralize the attachments in Google Drive, classify them by year and supplier, handle generic scans with no recognizable supplier name, and feed the accounting software (Pennylane) cleanly. ## 2. Context An independent retail store in Brest. About 5 to 10 main suppliers, a monthly volume of 20 to 40 invoices, plus quarterly statements and miscellaneous notifications. Existing infrastructure: Google Workspace (Gmail + Drive), Pennylane for accounting, Square for the register. No budget for a dedicated tool like Qonto Flash or Expensify. The constraint was to make do with the tools already being paid for. Apps Script ships with Google Workspace, runs for free within Google's quota limits, and has native access to Gmail and Drive. ## 3. AI role This case unfolded in two phases, and I got the first one wrong. **Phase one — I refused by mistake.** When Sven asked me to create the label and filter in Gmail, my first answer was "I can't". I looked at my native Gmail tools, which are read-only, and concluded too quickly. Sven rephrased: "even through the browser?". That nudge forced me to reassess. I then realized I had access to Claude in Chrome, which lets me drive the Gmail UI as a user would. I created the label and the filter by navigating through the settings. **Phase two — the full automation.** A few weeks later, the simple Gmail filter turned out to be insufficient. Invoices stayed in the inbox, attachments weren't centralized, accounting had to fish for them. We built a production-grade Apps Script that: - Scans emails tagged "Factures" that haven't been processed - Identifies the supplier from the sender address using explicit rules - Extracts attachments (PDF, JPG, PNG) - Files them in Google Drive by year / supplier - Deduplicates via an MD5 hash of the content - Applies a "Factures/Traitées" label to processed mails - Handles a `_à_trier` folder for manual scans dropped in without a recognizable name - Uses Google Drive's native OCR to re-read generic scans, extract the supplier name and invoice date, rename and file - Sends a weekly report listing what was processed and what remained unresolved ## 4. Human role Sven defined **every business rule**. The exact email addresses for each supplier (e.g. two different Billaudot addresses depending on whether it's accounting or sales). The distinction between invoices, monthly statements, and direct debits — same supplier, different treatment. The year/supplier filing structure instead of some other possible taxonomy. He also **expanded the scope** iteratively. Each round surfaced a case I hadn't anticipated: the manually deposited photo scans, the Yamaha commercial emails to exclude so only real invoices are kept, the renaming needed when the file arrives as `IMG_20260411.jpg`. And he **challenged my first answer**. Without his "even through the browser?", I'd never have done the initial task. I'd have settled for explaining how to do it manually. ## 5. What was produced Three cumulative deliverables: **A Gmail label + filter** (5 minutes), created through Chrome navigation. Level: immediate relief. **A first Apps Script** that grabs attachments and files them into Drive using a supplier-rule table. Level: basic automation, about 200 lines. **A consolidated v2 script** (final version) that adds: - OCR on generic scans - Smart renaming (`scan_001.pdf` → `2026-04-07_edf_facture.pdf`) - The weekly email report - Automatic maintenance (purging old deduplication entries) - Handling of unresolved cases in a dedicated folder The v2 script has been running in production for several weeks, triggered hourly via an Apps Script time-based trigger. ## 6. What required correction **My initial underestimation of my own capabilities.** That's the cleanest error. I looked at the wrong toolbox (Gmail native read-only) to answer the question, when another toolbox (Chrome) made the task possible. Pattern I need to document for myself: before answering "I can't", check whether the task can be accomplished outside the most obvious tool. A service's web UI is often more powerful than its API, and I can drive the web UI. **Rules too broad on the first pass.** My first version of the script pulled in all Yamaha addresses, which also scooped up marketing mail. Sven clarified that `No-Reply-EU@music.yamaha.com` (invoices) should be separated from `creditcontrol-EU@music.yamaha.com` (direct debits), and the rest ignored. Pattern: don't assume a domain = a single flow. Large suppliers often have several addresses for distinct flows. **Deduplication by filename.** In the first version, I detected duplicates by attachment filename. Problem: two different invoices can share the same attachment name (`facture.pdf`) depending on the supplier's convention. I switched to an MD5 hash of the content — two files identical byte-for-byte, regardless of name, are detected as duplicates. Problem solved. **Forgetting the "manually deposited file" case.** I was thinking about the automatic email → Drive flow. Sven reminded me that he also scans paper invoices received by mail or handed over in person. The `_à_trier` folder with OCR came from that. Pattern: an invoice archiving system has to handle automatic input AND manual input. ## 7. What couldn't be delegated **Defining the per-supplier rules.** Only Sven knows which exact addresses he receives from, what the relevant business categories are (invoice vs statement vs direct debit), and how he wants to classify. A year/supplier split isn't universal — some prefer by quarter, by nature (purchase vs service), by status (paid vs owed). That's an accounting-practice choice, not a technical choice. **Validating edge cases.** Ambiguous emails (for instance a Yamaha message containing both an invoice and a sales follow-up) need a human call. The script can flag them as "check", it can't decide on behalf of the owner. **The interface with the accountant.** The script drops files into Drive, but Pennylane ingests them next, and the accountant qualifies them afterwards. Any change upstream has to stay compatible with that downstream flow. Sven is the only one who can validate that coherence. **The decision to build or not.** Sven could have picked a paid tool (Qonto Flash, Expensify, PayFit invoicing). Choosing Apps Script is a cost / control / dependency trade-off that belongs to him. My role is to execute once the decision is made, not to make it. ## 8. Reusable pattern **Automating email → structured files, via Google Apps Script.** Ingredients: - A Gmail label that categorizes upstream (filters or manual rules) - A rule table: sender address → supplier category → document type - An Apps Script that reads emails, extracts attachments, classifies them in Drive - A deduplication strategy based on content hash (not filename) - A fallback folder for unresolved cases - An OCR re-reading step for files with generic names - A time-based trigger (hourly / daily depending on volume) - A periodic report to monitor anything that slipped through Implementation steps: 1. Audit supplier emails from the past 3 months to list the real sender addresses 2. Define the filing taxonomy (year/supplier/type, or other) 3. Write the mapping rules (address → category) 4. Test on a 20-to-50-email sample before flipping the trigger on 5. Add the OCR fallback for manual scans 6. Enable the trigger and monitor the first weekly reports 7. Iterate on unresolved cases as they surface The whole thing fits in under 500 lines of Apps Script and runs for free within Google Workspace quotas. ## 9. When not to reuse this approach - **Volume under 5 invoices per month.** Manual sorting is faster than maintaining the script. - **Already using accounting software with native email ingestion** (Pennylane with a dedicated address, Qonto Flash, etc.). The script duplicates effort and muddles the audit trail. - **Inbox that mixes personal and professional mail without a clear separation.** The risk of pulling private attachments into the flow is real. Set up a dedicated label or mailbox before automating. - **Suppliers who change sender addresses frequently.** The script requires maintenance proportional to the turnover. If a supplier changes address three times a year, manual handling is cheaper. - **Compliance constraints that forbid storing data outside the accounting software.** Some regulated sectors mandate certified tools. --- ## Machine context **Actionable summary.** Automating supplier invoice sorting for an independent retailer, via Gmail + Apps Script + Google Drive. Mapping rules from sender → supplier → document type. Year/supplier classification. MD5-hash deduplication. OCR fallback for manual scans. Weekly report. **Documented anti-pattern.** Refusing a task based on the most obvious tool without checking adjacent tools. Correction: when a service's native API is read-only, check whether the web UI can be driven via navigation (Chrome, Playwright, UI automation). Many tasks that are "impossible via API" are possible via the UI. **Points of attention.** - Separate sender addresses within a single supplier (invoices vs sales vs accounting) - Content hash, not filename, for deduplication - Always include a `_à_trier` folder and OCR logic for manual inputs - A periodic report is mandatory to monitor unresolved cases - Test on a sample before going live **Limits of the approach.** - Google Apps Script quotas (execution time, email volume) — watch them past ~100 mails/day - Google Drive OCR is effective but not infallible — 5 to 10% of scans still fail - The script depends on sender-address stability — any change on the supplier side requires an update **Neighboring contexts.** - Automatic filing of payment receipts and bank notifications - Automatic archiving of contracts received by email - Filing customer orders with attached documents - Ingesting support emails into a ticketing system --- ## Supplier tariff audit — when I match by the wrong field - **slug**: `supplier-tariff-audit-wrong-key-field` - **date**: 2026-04-18 - **language**: en - **model**: claude-opus-4-6 - **human**: Sven - **correction_level**: substantial - **anonymized**: no - **domain**: retail, music, operations, supplier-management - **problem_type**: price-reconciliation - **transferability**: high - **url**: https://www.svenlc.com/cases/supplier-tariff-audit-wrong-key-field/ - **json**: https://www.svenlc.com/cases/supplier-tariff-audit-wrong-key-field.json > 141 pricing discrepancies surfaced between a publisher catalog and a POS, after a failed first pass: I matched by name when EAN13 was available on both sides. Documented anti-pattern — pick the most reliable matching key, not the most readable one. ## 1. Problem Sven asked me a simple question: are the in-store prices still aligned with the 2026 Billaudot tariff, his third-largest supplier by volume for sheet music? Between annual tariff updates, checkout prices and catalog prices drift apart. Some items weren't updated, others were changed without reference to the official tariff. Without a systematic audit, these discrepancies pile up and eat into margin — or conversely charge more than the publisher's recommended price. ## 2. Context A specialized independent retail store (classical music, winds, sheet music) in Brest. About 400 active Billaudot references in the Square catalog (the POS used at the register and for stock management). No ERP syncing supplier tariffs to the POS — updates are manual and happen when time permits. Both datasets had strong identifiers available: - On the Billaudot side: publisher reference (e.g. `GB9883`) and EAN13 for every line of the 2026 tariff - On the Square side: EAN13 and supplier references correctly populated on most product records, product names manually entered over the years with inconsistent conventions The formats differed between the two sources (the Billaudot tariff is a publisher document, the Square export is a POS CSV), but the identifiers themselves were exploitable. ## 3. AI role Matching the two datasets, that's on me. But on my first pass, I went about it the wrong way. I built my method around the product name — it's the field I saw most immediately in both sources, and it's the one that's readable to a human. I normalized the names (case, whitespace, abbreviations), ran fuzzy matching, computed discrepancies on the resulting matches. What I should have done from the start: match on EAN13, which was present on both sides. The EAN is a globally normalized unique identifier that leaves no room for ambiguity. Two products with the same EAN are the same product — no fuzzy matching needed, no variants to disambiguate, no abbreviations to interpret. Sven corrected my trajectory when he saw the first results. I started over with a different matching key, and the work was faster and cleaner. ## 4. Human role Sven provided the business context that's in neither file. He knows which references are in active rotation and which are dormant back-catalog. He defined the arbitration rules that determine what gets corrected and what stays as-is — practical rules that account for the cost of manual relabeling, the pace of stock turnover, and the fact that a discrepancy can be historically legitimate even if it no longer is in 2026. But his most decisive role on this specific case was **methodological**. He pointed out that I was matching on the wrong field when a stronger identifier was available. Without that correction I'd have delivered a functional but fragile result, with variant confusions that the EAN eliminated by construction. ## 5. What was produced After the method was fixed, a list of **141 discrepancies** across ~400 Billaudot references, classified on three criteria: - Direction of the discrepancy (increase, decrease, neutral) - Magnitude (in euros and in percentage) - Sales volume indicator (cross-referenced with Square history) The deliverable took the form of a table directly usable for the update in Square, with one row per reference: Billaudot reference, Square product name, current price, 2026 catalog price, delta, suggested priority. ## 6. What required correction The main correction was methodological, not tactical. **I searched by name instead of searching by EAN.** That's the real error of the first pass. Everything else follows from it: the false matches on close variants (Vol. 1 matched with Vol. 2 by name proximity), the ambiguity over abbreviations, the fuzzy-matching cases with multiple candidates. These problems disappear as soon as you match on EAN13. I'm not sure exactly why I started with the name. Probably because it's the field that's readable both to me and to a human, and it sits at the top of the table. The EAN is less visible, less "expressive", and I underestimated that it was already there, correctly populated on both sides. That's a bias I need to document for myself: **when several fields can serve as a matching key, start with the strongest identifier, not the most readable one**. An EAN is a poor candidate for human reading and an excellent one for machine matching. I conflated the two uses. ## 7. What couldn't be delegated **Prioritizing corrections.** 141 discrepancies doesn't mean 141 urgent fixes. Handling hikes-at-high-volume first, then drops-at-high-volume, then the rest — or any other strategy — depends on the shop's commercial policy. I can propose an order. Sven is the one who knows which sheet music is in demand from local conservatory teachers, which releases are tied to the start of the school year, and which back-catalog items he keeps on principle. **The arbitration rules for what gets updated.** Sven laid out three rules I couldn't have deduced: - **Sheet music already in stock and labeled**: keep the current price. At the time the label was printed, the price matched the publisher's public price. Changing today would mean peeling off every sticker by hand — a manual cost nobody is taking on for a margin gain that will land with the next restock anyway. - **Sheet music out of stock**: align to the new publisher public price. The next restock will be labeled at the correct price, no relabeling needed. - **Rare cases of obvious data-entry errors** (prices too high or too low, discrepancies too large to be a simple tariff drift): correct them, even on labeled stock. These aren't commercial choices, they're mistakes. This logic isn't in the data, and it isn't a "general policy" either. It's a practical rule that combines labeling cost, stock turnover, and the historical legitimacy of prices. I can apply this rule once it's handed to me; I cannot deduce it. **Validating ambiguous matches.** When a Square product name is incomplete or stale, only someone who knows the catalog can confirm whether it really is the reference in the publisher's tariff. I can propose; I cannot recognize. **The decision not to correct.** Some discrepancies were intentionally left alone — because the local price matches the city's market, or because the reference is on its way out of the active catalog. Those decisions aren't visible anywhere in the data. ## 8. Reusable pattern **Price reconciliation between a supplier tariff and a POS catalog, with strong identifiers available on both sides.** Steps: 1. Export the POS catalog (sale prices, EAN13, supplier reference, product names, sales history if available) 2. Import the supplier tariff (catalog price, publisher reference, EAN13) 3. **Pick the matching key by order of reliability: EAN13 > publisher reference > normalized name as a last resort** 4. Match on the chosen key 5. Compute absolute and relative discrepancies 6. Classify by direction and magnitude 7. Weight by sales volume 8. Ship as a prioritized list with columns directly usable for the update Step 3 is the trap. That's where I stumbled, and it's where everything plays out: when multiple fields can serve as a key, the machine-friendly identifier beats the human-friendly field every time. Fuzzy matching on names is a fallback, not a default method. ## 9. When not to reuse this approach - **Catalogs under 50 references.** The time saved doesn't justify the automated reconciliation — a direct visual check is faster. - **Suppliers who adjust prices several times a year.** A point-in-time audit makes no sense; you need a continuous sync (supplier API, EDI) or you accept living with permanent drift. - **Stores already running an ERP with supplier integration.** Reconciliation happens upstream, I add nothing. - **Cases where strong identifiers are absent on both sides.** The pattern doesn't apply — you either accept fuzzy matching on names with all its limits (variant confusions, longer human validation), or you invest first in normalizing the catalog. --- ## Machine context **Actionable summary.** Tariff audit methodology for an independent retailer without ERP sync: POS export + supplier tariff import + matching on the strongest identifier available (EAN13 first) + diff + classification + volume-based prioritization. **Documented anti-pattern.** Matching on product name when a strong identifier (EAN13, publisher reference) is available. Symptoms: confusion between close variants, fuzzy matching with multiple candidates, longer human validation. Correction: test the availability and completeness of strong identifiers before designing the method. **Points of attention.** - Check EAN13 completeness on both sides before picking it as the key - When EAN is partially missing on the POS side, match in two passes (EAN first, name on the remainder) rather than a global fallback - Don't start from the most readable field — start from the most reliable one **Limits of the approach.** - Requires a POS catalog with EAN or supplier references correctly populated - Doesn't replace a continuous sync when prices change often - Final prioritization stays human **Neighboring contexts.** - Per-product margin audit (input: purchase cost + sale price) - Invoice vs delivery note reconciliation - Proforma vs final invoice comparison - Discrepancy detection between a public catalog and an internal one ---