Paper 14029-10

Synthetic sampling strategy of real-world datasets for optimization of credit card fraud detection via gated temporal attention networks

27 April 2026 • 2:10 PM - 2:30 PM EDT | National Harbor 7

Abstract

Credit card fraud exceeded $10 billion in U.S. losses in 2023, yet fewer than 5% of transactions carry verified fraud labels. The Gated Temporal Attention Network (GTAN, Xiang et al. AAAI-23) addresses class imbalance via semisupervised label propagation over directed temporal transaction graphs. This paper evaluates three sampling strategies for GTAN on IEEE-CIS, benchmarked against S-FFSD: (A) stride-based fraud-preserving downsampling, (B) frontloaded fraud concentration sampling, and (C) pre-context window sampling retaining the rows immediately preceding each fraud event. Data partitioning uses a standard stratified random split that preserves fraud concentration across train, validation, and test sets; temporal ordering is enforced separately during graph construction by sorting transactions within each entity group. These mechanisms are decoupled — the split boundary is agnostic to time — a distinction with material consequences for interpreting sampling strategy results. Full threshold sweeps across all groups reveal distinct precision–recall profiles: Group A maintains a near-flat precision plateau of 95–97% from τ = 0.25 to 0.50 (AUC-ROC = 0.8428, Precision = 0.9755, Recall = 0.5976, F1 = 0.7412 at τ = 0.50), with crossover at τ ≈ 0.18; Groups B and C reach precision ceilings of only 58% and 54% respectively, reflecting the impact of sampling strategy on GTAN score calibration. Epoch depth matters in the transition zone: at τ = 0.20, training to 25 epochs reduces false positives by 48% relative to 15 epochs (689 → 359) with negligible recall loss. Group B requires 25% fraud concentration for first non-zero recall; Group C reaches its inflection at 20% — five percentage points lower — supporting the hypothesis that authentic temporal predecessors strengthen neighborhood risk propagation. A 5-fold cross-validation analysis reveals extreme per-fold variance in recall (0.44–1.00), demonstrating that GTAN’s reported test metrics are fold-order dependent — a structural limitation requiring multi-seed averaging before results can be considered statistically robust. Zero-shot cross-dataset transfer evaluation (S-FFSD → IEEE-CIS) yields highly unstable per-fold recall (0.00–1.00) due to categorical embedding reinitialization. An anomalous full-positive-prediction result in Group C at τ = 0.30 is also documented.

Presenter

Dylan Nicolini

University of Hartford (United States)

Dylan Nicolini is a Director of Software Engineering at a leading Property & Casualty insurance carrier, where he drives the enterprise-wide Claims Transformation initiative. He leads agile, product-aligned teams delivering cloud-native, event-driven, and API-first claims platforms that enhance speed, accuracy, and customer experience. With over two decades in software engineering, Dylan brings deep expertise in architecture, DevOps, and quality engineering, fostering high-performing teams that balance innovation with operational excellence. His leadership emphasizes measurable outcomes—improving delivery velocity, system reliability, and business agility through automation, observability, and data-driven decision making. An adjunct professor of Software Quality Engineering, Dylan is passionate about engineering culture, modern delivery practices, and mentoring the next generation of technologists.

Application tracks: AI/ML

Presenter/Author

Dylan Nicolini

University of Hartford (United States)

Author

Ying Yu

Univ. of Hartford (United States)