Skip to content

Spatial Detection Analysis

This page documents the investigation into the combined detection capability of the library's two spatial analysis systems, the false positive characterisation, and the design decisions that resulted from it.


The two detection systems

Spatial pattern classifier (classifyPattern, called by analyzeWaferMap when enablePatternClassification is true) uses pure geometry — connected components, radial distance distributions, eccentricity, and linear scores — to label the whole-wafer failure signature as edge-ring, center, scratch, etc. It is rule-based with no trained model.

Statistical regional analysis (ring, quadrant, sector, cluster, edge-arc findings) compares specific zones of the wafer against the rest using significance tests. It runs independently of the classifier and produces its own findings.

Both systems run together when you call analyzeWaferMap. When the classifier identifies a pattern, correlated regional findings (e.g. the ring finding that supports an edge-ring classification) are downgraded to info severity to avoid visual redundancy — the classifier finding is the headline, the regional findings are supporting evidence.


Benchmark dataset

All benchmarks were run against WM-811K — 25,519 labelled wafers from TSMC 300mm fabrication (Wu et al., IEEE Transactions on Semiconductor Manufacturing, 2015). This is the standard public dataset for wafer map pattern classification.

Scripts are in scripts/:

Script Purpose
run-benchmark-npz.mjs Classifier-only per-class recall/precision
run-benchmark-combined.mjs Combined classifier + regional detection rates
run-benchmark-fp.mjs FP rate on WM-811K "Random"/"none" wafers
run-benchmark-synthetic-fp.mjs FP rate on synthetic i.i.d. random wafers

Dataset source: /home/paul/projects/LSWMD.pkl → converted to tests/fixtures/wm811k-benchmark.npz via scripts/prepare-benchmark.py.


Detection rates

Classifier alone

Pattern Recall Notes
Near-full 100%
Edge-ring 74%
Edge-local 65%
Center 60%
Random 59%
Scratch 26% Fragmented patterns miss
Donut 15% Geometric overlap with center

Overall exact-match: 64% · Detection rate (any pattern flagged): 86.2%

Combined (classifier + regional analysis)

Of the 2,915 wafers the classifier missed, the regional analysis recovered:

Measure Rate
Any regional finding fired 92.9% of misses
Semantically matched finding 92.0% of misses

Combined detection rate: 99.0% (any) / 98.9% (semantically matched)

The any/match gap is small — regional findings on classifier misses are almost always semantically correct, not noise.

Per-label rescue breakdown (classifier misses only):

Label Misses Any rescue Match rescue
center 690 98.8% 98.8%
donut 238 98.7% 98.3%
edge-local 940 93.1% 93.1%
edge-ring 488 81.1% 79.5%
scratch 559 93.0% 90.2%

False positive characterisation

WM-811K "Random"/"none" wafers (N = 4,459)

Method FP rate
Classifier only 41.2%
Regional analysis only 87.7%
Combined 91.8%

Important caveat: WM-811K "Random" is a catch-all label — ambiguous, multi-modal, or low-confidence wafers all end up there. Some fraction genuinely do have spatial structure. This FP rate is not a clean baseline.

Synthetic random wafers (i.i.d. Bernoulli, N = 500/cell)

To get a clean FP baseline, wafers were generated with purely random die failures at varying failure rates and grid sizes (20×20 to 52×52, matching WM-811K's range):

Fail rate Regional FP rate
2% 13.9%
5% 15.1%
10% 55.9% ← known issue
20% 2.6%
40% 2.2%
60% 3.7%

The 10% failure rate spike is a fundamental statistical power problem: with 1,700+ dies on a large wafer, the two-proportion z-test is powerful enough to call random regional clustering noise significant. This cannot be fixed with post-hoc count gates — any gate that reduces the number of candidates fed into the Bonferroni correction also relaxes the correction, cancelling the benefit. The correct fix (replacing Bonferroni with Benjamini-Hochberg FDR applied globally) is a larger refactor; this spike is documented as a known limitation.


Comparison to published methods

Method Accuracy on WM-811K Notes
Our classifier (exact match) 64% Rule-based, no training
Our system (any detection) 86.2% / 99.0% combined
Decision tree + Radon features >98% 59 handcrafted features, trained ensemble
CNN-based (various) 96–99.9% Trained on balanced/oversampled subsets

Our system is not directly comparable to CNN figures — they solve a different problem: they require labelled training data, operate on pixel images, and typically evaluate on class-balanced subsets. Our system works from die coordinates and bin data with no training, produces fully interpretable named findings, and functions on any wafer regardless of diameter or die pitch.

The most relevant comparison is the Decision Tree + Radon transform approach. The recall gap (64% vs >98%) is largely in the hard classes (donut 15%, scratch 26%) which benefit from global frequency-domain information (Radon transform) that local geometry cannot capture.

Our unique strength: the combined 99% detection rate is competitive as a detection system, and the integration with statistical regional analysis (which CNN papers do not provide) means the library surfaces both a pattern label and statistical evidence for the finding — giving process engineers more actionable information than a bare classification alone.


Design decisions

Adaptive thresholds

minimumClusterSize scales with wafer die count (max(5, round(N × 0.003))). A 5-die cluster is meaningful on a small wafer but noise on a 2,500-die wafer. This is the only threshold adapted at runtime — other regional analysis thresholds (minimumSampleSize, significanceLevel) must not be adapted because changing the number of tests fed into the Bonferroni correction alters the correction itself, producing unpredictable FP rate changes.

The classifier's minimumFailingDies and salienceSize floor also scale by the same formula, for the same reason.

Removed from public API

The following were removed from AnalyzeWaferMapOptions to prevent users from inadvertently degrading accuracy:

  • minimumSampleSize — interacts with Bonferroni; user adjustment produces unpredictable FP rate changes
  • minimumClusterSize — now auto-scaled; no user knob needed
  • patternThresholds / PatternThresholds / DEFAULT_PATTERN_THRESHOLDS — 17 inter-dependent thresholds calibrated on 25k wafers; adjusting one without understanding the others silently degrades classification accuracy

When the classifier identifies a pattern with high/medium confidence, correlated regional findings are downgraded to info to avoid double-counting. The mapping was updated to include sector and quadrant for edge-local and scratch — both patterns can produce a sector or quadrant finding when the failure is sufficiently localised.

Donut improvement — not feasible with current features

innerOuterRatio was investigated as a co-discriminator for center vs donut. In the WM-811K dataset, donuts have higher inner/outer failure rates than centers — counter-intuitive, but explained by the donut hole geometry: the donut ring sits in the outer half of the wafer, but the outer half contains far more dies than the inner circle, so the outer failure rate is diluted. This feature is not a reliable discriminator. The 15% donut recall ceiling is intrinsic to the p25DistNorm overlap between the two classes.