Dr J's Binding Protocol — Document 19

Troubleshooting Protocol

Mandatory diagnostic procedures for debugging, fixing, and deploying changes to production systems. Every fix follows this protocol. No exceptions.

← Back to Protocol Index

Contents

  1. The Prime Directive
  2. Pre-Fix Diagnostic Sequence (Mandatory)
  3. Data Flow Tracing
  4. The Three-Tier Verification
  5. Field Name Reconciliation
  6. When a Fix Doesn't Work
  7. Forbidden Assumptions
  8. Pre-Deploy Verification
  9. Post-Deploy Verification
  10. Escalation Protocol
  11. Case Studies: Real Failures

1. The Prime Directive

NEVER change code based on assumptions about how data flows. READ THE CODE FIRST. Every time. No exceptions.

Every troubleshooting failure in this system has followed the same pattern:

  1. Assumption made about where data comes from or what field names are used
  2. Code changed based on that assumption without reading the actual source
  3. Something breaks because the assumption was wrong
  4. Wrong diagnosis of the breakage (blaming caching, database, etc.) instead of re-reading the code
  5. More changes based on more assumptions, compounding the original error

This protocol exists to break that cycle at step 1. You do not touch code until you have read and understood the complete data flow from source to render.

Companion: Doc 20 — Code Confidence Protocol. Before editing any code, also check the confidence annotations (@confidence GREEN/YELLOW/RED) and the site's CODE-CONFIDENCE.md registry. GREEN code requires full diagnostic proof before touching. Doc 19 tells you how to diagnose. Doc 20 tells you what you're looking at and what's at stake.

2. Pre-Fix Diagnostic Sequence (Mandatory)

Before writing a single line of code to fix any bug, complete all five steps in order. Do not skip steps. Do not combine steps. Each step produces a concrete finding that informs the next.

1

Identify the Symptom Location

What page/endpoint shows the problem? What file renders it? Open and read that file. Not "I think it's this file" — actually read it.

2

Trace the Data Source

In the rendering file, find exactly where the data comes from. Is it a fetch() call to an API? A server-rendered PHP variable? A <script src="..."> JS file? Read the code — don't guess.

3

Read the Data Source

Open the API endpoint / JS file / PHP template that produces the data. Read the SELECT query or the object construction. Write down the exact field names it returns.

4

Match Fields to Rendering

Compare the field names from step 3 to the field names used in the rendering code from step 1. They must match exactly. retail_price_cents is NOT priceCents. full_description is NOT description.

5

Identify the Actual Bug

Only now, with full knowledge of the data flow and field names, can you identify what's actually wrong. Write down the bug in one sentence before touching any code.

Gate check: If you cannot state the bug in one sentence that references specific field names and specific files, you have not completed the diagnostic sequence. Go back to step 1.

3. Data Flow Tracing

Every page in this system gets its data from one of these sources. You must identify which one BEFORE making changes.

Data Source How to Identify Field Names Come From
API fetch fetch('...api/endpoint.php') in JS The PHP file's SELECT query column aliases
Static JS file <script src="js/products-data.js"> The PHP generator that builds the JS file
Server-rendered PHP PHP variables embedded in HTML (<?= $var ?>) The PHP query at the top of the same file
Inline JSON JSON.parse() of embedded <script> data The PHP that generates the JSON block

Critical: Pages Can Use Multiple Sources

A single page may use DIFFERENT data sources for different parts of the page. For example:

These sources use DIFFERENT field names for the same data. The API returns retail_price_cents. The JS file has priceCents. You cannot use replace_all to swap one for the other without understanding which source each piece of code reads from.

4. The Three-Tier Verification

When a product data issue occurs, check all three tiers in order. The bug is wherever the data diverges from what's expected.

Tier Table How to Check What Lives Here
1. Portal portal_products Direct DB query or Portal UI Source of truth — data is born here
2. Hub thr_products Hub admin API or DB query Distribution copy — admin APIs read here
3. Site DB ths_products, pc_products Site public API or DB query Website reads — ALL public pages read here

Tier Verification Steps

  1. Query the product in portal_products — is the data correct?
  2. Query the product in thr_products — does it match portal? If not, the sync is broken.
  3. Query the product in ths_products (or pc_products) — does it match Hub? If not, deploy is broken.
  4. Call the public API — does it return correct data? If not, the API query is broken.
  5. Check the rendered page — does it display the API data correctly? If not, the rendering code is broken.

The bug is at whichever step the data goes wrong. Fix THAT step. Do not change downstream code to compensate for an upstream failure.

5. Field Name Reconciliation

This system has two parallel naming conventions for product data. They are NOT interchangeable.

API Response (public-products.php) Static JS (products-data.js) Notes
retail_price_cents priceCents API uses SQL column name; JS uses camelCase
short_description description JS file uses short_description with fallback
category_type categoryType camelCase in JS
price_type priceType camelCase in JS
compare_price_cents (not included) Only in API response
NEVER use replace_all to swap field names across an entire file unless you have verified that the file uses exactly one data source. If the file mixes API data and JS data, a blanket replacement will break one of them.

Before Any Field Name Change

6. When a Fix Doesn't Work

When the user says "that didn't fix it" — do not guess at the cause. Follow the diagnostic sequence below to determine whether the problem is in code, data, deployment, or caching. Let the evidence lead you.

Mandatory Response Sequence

Work through these steps in order. Each step either identifies the problem or eliminates a possibility. Do not skip ahead.

  1. Re-read the file you changed — verify your edit is actually present and correct in the local file
  2. Check the served versioncurl the production URL to confirm the deployed code matches your edit. If it doesn't match, the deploy didn't land — that's the problem.
  3. Test the data endpointcurl the API or JS file to verify the data is correct. If data is wrong, the problem is upstream (API query, deploy, sync).
  4. Compare field names — does the rendering code reference the exact field names the data source returns? If there's a mismatch, that's the bug — fix the rendering code.
  5. If steps 1–4 all check out — the server is returning correct code with correct data and correct field names — NOW caching is a legitimate suspect. Proceed to the Cache Diagnostic below.

Cache Diagnostic

Caching is a real issue that has caused real problems in this system. But it can only be diagnosed through evidence, not assumption. If steps 1–4 above all pass (server is correct), work through these questions:

  1. What are the Cache-Control headers?curl -I the resource. Long max-age values (e.g., 604800 = 7 days) on JS/CSS files mean browsers WILL serve stale versions.
  2. Did the filename change? — Cache-busting via query string (?v=123) or filename change (products-data-v2.js) forces browsers to fetch fresh. Was this done?
  3. Is there a CDN or proxy layer? — Nginx, Cloudflare, or other reverse proxies may cache independently of the browser.
  4. Can you reproduce with a fresh request?curl with Cache-Control: no-cache header, or test in a private/incognito window.
When caching IS the problem: Confirm it with evidence (server returns correct content, browser shows old content, cache headers explain why), then apply the appropriate fix: cache-busting query string, filename change, or header adjustment. Tell the user what you found and why you're confident it's cache.

What NOT to Do

Violation: Jumping to Caching Without Evidence

"It must be a browser cache issue. Try clearing your cache." — This skips the entire diagnostic sequence. If the code is actually wrong, telling the user to clear cache wastes their time and erodes trust. Caching is a valid diagnosis only AFTER you've confirmed the server-side code is correct.

Violation: Repeating "Cache" After User Pushes Back

If the user says "it's not caching" or "I checked in three browsers" — that is strong evidence against caching. Go back to steps 1–4 and re-examine the code. The user is closer to the symptom than you are.

Correct: Evidence-Based Diagnosis

"I curl'd the production shop.html and the served code reads p.priceCents, but the API returns retail_price_cents. The field name doesn't match the data source. That's the bug — not caching." — This is diagnosis backed by evidence.

Correct: Evidence-Based Cache Diagnosis

"I curl'd the production JS file and it has the fix. The Cache-Control header is max-age=604800 (7 days). Your browser is serving a cached copy from before the fix. Changing the filename from products-data.js to products-data-v2.js will force a fresh fetch." — This is a cache diagnosis backed by evidence.

7. Forbidden Assumptions

These assumptions have caused production failures. They are explicitly prohibited. When you catch yourself making one, stop and verify.

Forbidden Assumption Reality How to Verify
"This page uses products-data.js" shop.html uses the API; product.php uses both Read the <script> block — find the fetch() call
"All files use the same field names" API uses snake_case; JS uses camelCase Read the API response AND the JS generator
"The column names match across all site tables" pc_products and ths_products have different schemas Run SHOW COLUMNS FROM {table}
"It's a caching issue" (without evidence) Caching is real but requires evidence — complete steps 1–4 of Section 6 first curl the URL, verify server content is correct, THEN check Cache-Control headers
"replace_all is safe for this change" It breaks when a file uses multiple data sources Count data sources in the file BEFORE using replace_all
"This fix only affects one thing" Changing a field name affects every reference in the file Search for ALL occurrences before editing

8. Pre-Deploy Verification

Before running devhub-deploy push, complete this checklist:

The cost of one extra minute of verification is zero. The cost of a broken production deploy is the user's trust and revenue.

9. Post-Deploy Verification

After deploying, verify the fix is working on production. Do not rely on "the deploy succeeded" as proof.

Mandatory Post-Deploy Checks

  1. curl the API: Verify the endpoint returns correct data with correct field names
  2. curl the page: Verify the HTML/JS contains the correct rendering code (not the old broken version)
  3. Spot-check a product: Pick a specific product, verify its price/data appears correctly in the API response
  4. Test the rendering logic: Mentally trace one product through the rendering code — would it display the price or "Contact for Price"?
# Post-deploy verification example # 1. Check API returns prices curl -s 'https://thehighroadmanufacturing.com/api/public-products.php?action=list' | \ python3 -c "import json,sys; d=json.load(sys.stdin); \ print(f'Priced: {sum(1 for p in d[\"products\"] if p.get(\"retail_price_cents\"))}'); \ print(f'Contact: {sum(1 for p in d[\"products\"] if not p.get(\"retail_price_cents\"))}')" # 2. Check page uses correct field name curl -s 'https://thehighroadmanufacturing.com/shop.html' | grep -o 'retail_price_cents\|priceCents' # 3. Check a specific product curl -s 'https://thehighroadmanufacturing.com/api/public-products.php?action=get&slug=test-product' | \ python3 -c "import json,sys; p=json.load(sys.stdin); print(p.get('name'), p.get('retail_price_cents'))"

10. Escalation Protocol

When a fix has failed twice, stop making more changes. The pattern of "try something, it breaks, try something else" is how small bugs become site-wide outages.

After Two Failed Fix Attempts

  1. Stop editing files. Do not make a third attempt without completing the full diagnostic sequence from Section 2.
  2. Read every file in the data chain from database query to rendered output.
  3. Write down the complete data flow with exact field names at each stage.
  4. Identify the exact point of failure — which field name, in which file, doesn't match its data source?
  5. State the fix in one sentence before writing any code.
  6. If still unclear, ask the user for more information about what they're seeing. Do not guess.
Three consecutive failed fixes on the same issue is a protocol violation. If you reach this point, the diagnostic sequence was not properly completed. Go back to Section 2 and start from step 1 with fresh eyes.

11. Case Studies: Real Failures

These are real incidents from this system. They are documented here so the same mistakes are never repeated.

Case 1: The Price Display Disaster (March 2026)

Symptom

Many products on THR showed "Contact for Price" instead of actual prices.

Root Cause

The ths_products table has both price_cents and retail_price_cents columns. Deploy was writing to price_cents but the API was reading retail_price_cents, which was NULL.

Correct Fix

Add COALESCE(retail_price_cents, price_cents) AS retail_price_cents to the API query. Also fix deploy to populate retail_price_cents.

What Went Wrong

After the correct API fix, an incorrect assumption was made that shop.html reads from products-data.js. A replace_all changed retail_price_cents to priceCents in shop.html. But shop.html actually fetches from the API, which returns retail_price_cents. Result: every single product showed "Contact for Price" because p.priceCents was undefined.

Compounding Errors

  1. When the user reported it wasn't fixed, caching was blamed instead of re-reading the code
  2. The JS file was renamed to bust cache — irrelevant since shop.html doesn't use the JS file for rendering
  3. Multiple deploy cycles pushed broken code to production
  4. The user explicitly said "this is not a caching issue" and was ignored

Protocol Violations

Cost

Approximately 2 hours of downtime for pricing on a live e-commerce site. Multiple production deploys of broken code. Complete loss of user trust in the troubleshooting process.

Case 2: The Deploy Column Mismatch (March 2026)

Symptom

Deploy to PC site failed: "Unknown column 'client_id' in field list."

Root Cause

deploy.php had a hardcoded column list matching ths_products schema. pc_products has a different schema (no client_id, no mfg_* columns).

Correct Fix

Dynamic column detection via SHOW COLUMNS FROM {table}. Build field map, filter to columns that exist in the target table.

What Should Have Been Done First

Before writing the hardcoded column list originally, the schema of ALL target tables should have been compared. The assumption "all site tables have the same columns" should have been verified.