CircadifyCircadify
Fraud Prevention12 min read

What Is Synthetic Media Detection? How rPPG Identifies Fakes

A research-grade examination of synthetic media detection methods, explaining how remote photoplethysmography (rPPG) provides a physiological detection layer that identifies AI-generated face media regardless of the generation technique used.

tryfacescan.com Research Team·

What Is Synthetic Media Detection? How rPPG Identifies Fakes

Synthetic media — images, video, and audio generated or manipulated by artificial intelligence — has moved from a research novelty to an operational threat in identity fraud. Europol's 2024 Internet Organised Crime Threat Assessment reported that AI-generated identity documents and deepfake videos were implicated in fraud cases across all 27 EU member states, with financial services and cryptocurrency exchanges as primary targets. For fraud teams, KYC providers, and security architects, synthetic media detection has become a foundational capability, and rPPG — remote photoplethysmography — is emerging as the most generation-method-agnostic approach to identifying fakes in live video verification contexts.

"Synthetic media detection is not a single technology but a capability requirement. The question for any detection system is whether it tests for something the generator must get right to succeed, or something the generator can ignore. Physiological signals fall into the former category." — Adapted from Mirsky and Lee, "The Creation and Detection of Deepfakes: A Survey," ACM Computing Surveys, Vol. 54, No. 1, 2022.

Analyzing the Synthetic Media Landscape

Synthetic media encompasses a broad taxonomy of AI-generated and AI-manipulated content. Understanding the categories is essential for evaluating which detection methods apply to each.

Face synthesis — entire faces generated from noise or latent vectors using GANs (StyleGAN, StyleGAN3), diffusion models (Stable Diffusion, DALL-E), or flow-based generators. These faces belong to non-existent individuals and are used to create fraudulent identity documents and social engineering profiles.

Face swapping — the identity of one person is transferred onto the face of another in video. Architectures like FaceShifter, SimSwap, and diffusion-based swap models replace the target's facial identity while preserving their expression, pose, and lighting. This is the primary vector for identity verification fraud — the attacker's face is replaced with the victim's identity during a live selfie capture.

Face reenactment — a source actor's expressions and head movements drive a target identity's face in real time. Tools based on first-order motion models or neural head avatars allow an attacker to puppet a victim's face during a video verification session, producing blinks, smiles, and head turns on demand.

Lip-sync manipulation — the mouth region of a target video is modified to match arbitrary audio. While primarily associated with misinformation, lip-sync deepfakes also apply to video-based KYC sessions where an attacker needs the target identity to speak specific verification phrases.

Full neural rendering — emerging approaches use neural radiance fields (NeRF) or gaussian splatting to render photorealistic 3D face models from a small number of source images. These techniques produce novel viewpoints and expressions with high visual fidelity.

Each category presents different artifacts — and different artifact lifetimes as generators improve. This is precisely why detection methods tied to specific artifact types face structural limitations.

Why Artifact-Based Detection Faces Structural Limits

The following table summarizes the detection landscape across synthetic media categories:

Synthetic Media Type Artifact-Based Detection Frequency-Domain Analysis Temporal Consistency Checks rPPG Physiological Detection
Face synthesis (GAN) Moderate — detects GAN-specific textures Moderate — GAN frequency fingerprints present but fading with newer architectures Not applicable for single images High — generated faces lack hemodynamic signals
Face synthesis (diffusion) Low — diffusion models produce fewer spatial artifacts Low — frequency spectra approach real image distributions Not applicable for single images High — no cardiovascular pulse regardless of generation method
Face swapping (video) Moderate — boundary artifacts at swap edges Low to moderate — frequency signatures vary by method Moderate — temporal flickering at swap boundaries High — swap region lacks spatially coherent pulse; boundary discontinuities in rPPG maps
Face reenactment Low — high-quality reenactment preserves texture continuity Low Moderate — motion artifacts in extreme poses High — puppeted face does not transmit the source actor's cardiovascular signal
Lip-sync manipulation Low to moderate — artifacts concentrated in mouth region Low Moderate — temporal inconsistencies at modification boundary High — mouth-region rPPG signal incoherent with full-face cardiovascular pattern
Neural rendering (NeRF) Low — designed for photorealistic novel view synthesis Low — trained to match real frequency distributions Low to moderate High — rendered geometry lacks biological tissue properties

The pattern is consistent: artifact-based and frequency-domain methods degrade as generation quality improves, while rPPG detection remains effective because it tests for a property — cardiovascular blood flow — that no current generation pipeline produces.

Dolhansky et al. (2020) established the DeepFake Detection Challenge (DFDC) dataset to benchmark detection methods, and subsequent evaluations by Zi et al. (2020, ACM Multimedia) and Pu et al. (2022, CVPR Workshop) consistently found that pixel-level classifiers suffer significant accuracy drops when evaluated on generation methods absent from their training data. rPPG-based approaches, by contrast, demonstrated stable performance across generation methods because the detection target — physiological signal presence — is independent of how the fake was produced.

How rPPG Identifies Synthetic Media

The physiological detection process operates through a well-documented signal chain.

Every cardiac contraction pushes oxygenated hemoglobin through the facial vasculature, producing micro-color oscillations in the green channel at approximately 540 nm. These oscillations are periodic (0.7–4.0 Hz, corresponding to 42–240 BPM), spatially coherent across the face (forehead and cheeks exhibit the same fundamental frequency with predictable phase relationships), and contain harmonic structure determined by arterial compliance.

rPPG algorithms — CHROM (de Haan and Jeanne, 2013), POS (Wang et al., 2017), and more recently transformer-based architectures like TransrPPG (Yu et al., 2023) — extract this signal from standard RGB video. The extracted waveform is then evaluated against physiological expectations:

  1. Dominant frequency — is there a single dominant peak in the cardiac band of the power spectrum?
  2. Signal-to-noise ratio — does the pulse amplitude exceed the noise floor by a physiologically plausible margin?
  3. Harmonic structure — are second and third harmonics present, consistent with the shape of the arterial pressure waveform?
  4. Spatial coherence — do multiple facial regions (forehead, left cheek, right cheek) yield the same fundamental frequency and consistent phase?
  5. Temporal stability — does the heart rate exhibit the natural beat-to-beat variability governed by autonomic nervous system dynamics?

Synthetic media fails these checks at multiple levels. A GAN-generated face has no dominant cardiac frequency — the power spectrum is flat or noisy. A face-swap video may retain remnants of the source actor's pulse in unswapped regions but shows discontinuities at the swap boundary. A reenacted face driven by motion parameters does not transmit the source performer's blood flow to the target identity's rendered skin. A screen replay introduces display refresh artifacts that corrupt temporal periodicity.

Applications in Fraud Prevention Pipelines

Identity verification at onboarding — the primary application. When a user submits a selfie video during KYC, rPPG analysis runs passively on the same capture. Any synthetic media — whether a pre-rendered deepfake, a real-time face swap, or a replayed video — triggers a liveness failure because the video lacks authentic cardiovascular signals. This catches attacks that visual inspection and behavioral liveness miss.

Video call verification — regulated industries including banking and legal services increasingly conduct identity verification via live video. rPPG analysis running on the video feed during the call can detect if one participant is using real-time face-swapping or reenactment technology, flagging the session for review without interrupting the interaction.

Media forensics for submitted evidence — insurance claims, legal proceedings, and compliance investigations sometimes involve video evidence. While rPPG is most powerful in live-capture scenarios (where the system controls the camera), forensic analysis of submitted video can still evaluate whether faces in the content exhibit physiological signals consistent with live human subjects.

Platform integrity for user-generated content — social media platforms, dating services, and professional networks face synthetic profile attacks where AI-generated face images and videos create fraudulent accounts at scale. rPPG analysis of profile verification videos — even brief captures — provides a scalable check against synthetic media.

Research Foundations for rPPG-Based Synthetic Media Detection

  • Verkruysse, Svaasand, and Nelson (2008) — foundational demonstration of remote pulse extraction from ambient-light face video using consumer cameras, Optics Express. Established the physical basis for rPPG.
  • Ciftci, Demir, and Yin (2020) — FakeCatcher: the first comprehensive system for deepfake detection using PPG-based biological signal maps, IEEE TPAMI. Demonstrated generation-method-agnostic detection.
  • Mirsky and Lee (2022) — broad survey of deepfake creation and detection methods, identifying physiological signal analysis as a promising detection direction resistant to generator improvements, ACM Computing Surveys.
  • Hernandez-Ortega et al. (2024) — evaluated rPPG features against diffusion-model-generated face synthesis, confirming maintained separability with the latest generation techniques, Pattern Recognition, Vol. 148.
  • Yu, Zhao, et al. (2023) — TransrPPG: transformer architecture for rPPG extraction that captures long-range temporal dependencies, improving robustness under compression and low resolution.
  • Europol (2024) — Internet Organised Crime Threat Assessment documenting the operational deployment of synthetic media in financial fraud across EU member states.

The Future of Synthetic Media Detection

The detection challenge will evolve as generative AI continues to advance. Several research and industry developments are shaping the next generation of defenses.

Physiological adversarial robustness — researchers anticipate that attackers may attempt to inject artificial pulse signals into synthetic video. Preemptive defenses include analyzing pulse waveform morphology (dicrotic notch, systolic peak shape), heart rate variability patterns, and spatial pulse transit time across the face. Hou et al. (2024, ACM Computing Surveys) demonstrated that these second-order physiological features require accurate cardiovascular modeling to forge — a fundamentally harder problem than adding periodic intensity modulation.

Cross-modal consistency verification — future systems will evaluate consistency between multiple signal channels: the visual pulse in rPPG should correlate with speaking-induced respiratory patterns, micro-expression timing should align with autonomic nervous system state, and pupillary responses should match ambient lighting changes. Synthetic media that achieves realism in one channel while failing consistency across channels will be flagged.

Real-time detection at network scale — as video communication becomes more prevalent in financial services, healthcare, and legal proceedings, rPPG analysis will move from per-session processing to always-on network monitoring. Purpose-built inference hardware and optimized neural network architectures will enable simultaneous liveness analysis across thousands of concurrent video streams.

Standardization and certification — the NIST Media Forensics program and ISO/IEC JTC 1/SC 37 (biometrics) are developing evaluation frameworks specifically for synthetic media detection technologies. Standardized testing protocols will enable organizations to compare detection capabilities objectively, accelerating adoption of methods — like rPPG — that demonstrate consistent performance across generation methods.

Generative AI watermarking as a complementary layer — initiatives such as Google's SynthID and the C2PA provenance standard embed imperceptible markers or cryptographic metadata into AI-generated content. When combined with rPPG liveness detection, watermarking creates a two-layer defense: rPPG confirms biological presence in live captures, while watermark detection flags synthetic content in submitted media where live capture was not controlled.

Frequently Asked Questions

What makes rPPG different from other synthetic media detection methods?

Most detection methods look for signs that media is fake — visual artifacts, frequency anomalies, temporal inconsistencies. rPPG looks for evidence that the subject is real — specifically, the presence of cardiovascular blood flow. This inversion means rPPG does not need to know anything about the generation method that produced the synthetic media. It only needs to confirm that a physiological pulse is present and consistent with a living human.

Can rPPG detect synthetic media in pre-recorded video, or only live captures?

rPPG can analyze any video that contains a visible face, whether live or pre-recorded. However, its detection power is strongest in live-capture scenarios where the system controls the camera and can verify sensor integrity. In pre-recorded video, a sophisticated attacker could theoretically process the original video to remove or replace physiological signals, though no practical tool for this exists as of early 2026.

How does video compression affect rPPG-based synthetic media detection?

Lossy compression (H.264, H.265, VP9) attenuates the micro-color variations that carry the pulse signal. Modern rPPG algorithms compensate through spatial averaging over larger regions of interest, temporal filtering tuned to the cardiac band, and learned denoising networks. Yu et al. (2023) demonstrated that TransrPPG maintained detection performance at compression quality factors typical of mobile video capture and common streaming bitrates.

Does rPPG work on all types of synthetic media equally?

rPPG is most effective against synthetic media that depicts faces in video — deepfake face swaps, face reenactment, neural rendering, and video replays. For static synthetic images (AI-generated profile photos, forged document photos), rPPG requires that the image be presented as part of a video capture, at which point the static image fails liveness because no temporal physiological signal is present. rPPG does not address audio-only synthetic media.

What is the minimum video quality needed for reliable rPPG analysis?

rPPG requires a face resolution of approximately 100 by 100 pixels or larger, a frame rate of 15 FPS or higher, and adequate illumination (generally above 50 lux). These specifications are comfortably met by every smartphone front-facing camera manufactured in the past decade and by standard webcams used in video call verification.


Synthetic media detection is evolving from an artifact-hunting exercise to a physiological verification discipline. As generative models eliminate the visual flaws that first-generation detectors relied on, rPPG provides a detection layer grounded in cardiovascular biology — a signal that synthetic media does not produce because no current generator models the hemodynamic processes that create it. For fraud teams and KYC providers defending against an expanding taxonomy of AI-generated identity attacks, physiological liveness analysis through rPPG offers the most structurally resilient detection path available.

Learn how Circadify applies rPPG-based physiological detection to synthetic media threats in identity verification.

Request Enterprise Demo