WordPress Data Sanitization Guide: sanitize_* Functions

One of the most common vulnerabilities WP HealthKit discovers in WordPress plugins is confused sanitization and escaping. Developers mix up when to sanitize data, what function to use, and the crucial distinction between processing user input (sanitization) and preparing output for display (escaping). This confusion leads to security vulnerabilities, data corruption, and unreliable plugins.

Understanding WordPress data sanitization is foundational to secure plugin development. Sanitization is the process of removing dangerous elements from user input before you process, store, or output it. It's the first line of defense against injection attacks, malicious scripts, and data corruption. However, sanitization alone isn't sufficient—you must also escape data when displaying it. This guide covers every sanitization function WordPress provides, when to use each one, and how sanitization fits into the broader security picture.

The Difference Between Sanitization and Escaping
Core WordPress Sanitization Functions
Common Sanitization Functions and Their Use Cases
Advanced Sanitization: wp_kses and Custom Sanitization
Sanitizing Different Data Types
Common Data Sanitization Mistakes
Testing Sanitization in Your Plugin
Frequently Asked Questions

The Difference Between Sanitization and Escaping

This is the fundamental concept that trips up most developers. Sanitization and escaping are completely different operations that serve different purposes.

Sanitization is what you do when you receive user input. It's the process of removing or converting potentially dangerous elements from data before you store or process it. Think of sanitization as cleaning dirty water—you're removing contaminants to make it safe to use internally.

// User submits: "Hello <script>alert('xss')</script> World"
$user_input = $_POST['message'];

// Sanitize it
$clean_message = sanitize_text_field( $user_input );
// Result: "Hello World" (HTML tags removed)

// Now it's safe to store
update_option( 'my_plugin_message', $clean_message );

Escaping is what you do when you output data to HTML or other contexts. It's the process of converting characters so they display as text rather than being interpreted as code. Think of escaping as putting quotes around text in a sentence—the quotes indicate this is text data, not executable code.

// Retrieve stored data
$message = get_option( 'my_plugin_message' );

// Escape it for HTML context
echo wp_kses_post( $message );
// Output: The content displays safely, no HTML interpretation

Many developers think "I'll just escape everything and I don't need to sanitize." This is dangerous because:

Escaping doesn't remove the malicious code—it just prevents interpretation in a specific context
If you later use that data in a different context (JavaScript, SQL, URL), the escaping might not work
You might accidentally display the escaped characters to users, showing them raw HTML entities
Escaped data in your database is messier to work with and harder to migrate

The correct approach is Sanitize on input, escape on output.

Core WordPress Sanitization Functions

WordPress provides a comprehensive set of sanitization functions. Each is designed for a specific type of data.

sanitize_text_field() is the most commonly used sanitization function. It removes HTML tags, line breaks, and shortcodes, returning plain text:

// Removes all HTML and special formatting
$name = sanitize_text_field( $_POST['user_name'] );
// Input: "John <b>Doe</b>" → Output: "John Doe"

sanitize_email() validates and sanitizes email addresses. It removes anything that isn't valid in an email format:

$email = sanitize_email( $_POST['email'] );
// Input: "[email protected]<script>" → Output: "[email protected]"

sanitize_url() (also called esc_url_raw()) sanitizes URLs. It's crucial for any URL processing:

$url = sanitize_url( $_POST['redirect_url'] );
// Input: "javascript:alert('xss')" → Output: ""
// Input: "https://example.com" → Output: "https://example.com"

absint() converts a value to an absolute integer. Perfect for numeric IDs and quantities:

$post_id = absint( $_GET['post_id'] );
$quantity = absint( $_POST['quantity'] );
// Removes all non-numeric characters, converts to positive integer

intval() converts to an integer but allows negative numbers. Use this when negative values are valid:

$temperature = intval( $_POST['temp'] );
// Input: "-25.6" → Output: -25

floatval() converts to a floating-point number:

$price = floatval( $_POST['amount'] );
// Input: "19.99" → Output: 19.99

sanitize_key() sanitizes option keys, meta keys, and other identifiers. It removes anything that isn't alphanumeric or underscore/hyphen:

$color_scheme = sanitize_key( $_POST['scheme'] );
// Input: "dark-mode-2.0" → Output: "dark-mode-2.0"
// Input: "dark<script>" → Output: "darkscript"

sanitize_file_name() prepares filenames for use. It removes special characters that could cause issues:

$filename = sanitize_file_name( $_FILES['upload']['name'] );
// Input: "my file (1).pdf" → Output: "my-file-1.pdf"

wp_kses_post() is a specialized function that removes HTML tags except those allowed in posts (determined by wp_allowed_post_html()):

$content = wp_kses_post( $_POST['post_content'] );
// Allows: <p>, <br>, <strong>, <em>, <a>, <ul>, <ol>, <li>, etc.
// Removes: <script>, <style>, onclick handlers, etc.

wp_kses() provides fine-grained control over which HTML tags are allowed:

$allowed_html = array(
    'b'      => array(),
    'i'      => array(),
    'em'     => array(),
    'strong' => array(),
    'a'      => array( 'href' => array(), 'title' => array() ),
);

$clean_html = wp_kses( $_POST['description'], $allowed_html );
// Only these tags are allowed; everything else is stripped

sanitize_textarea_field() is like sanitize_text_field but preserves line breaks, making it ideal for multi-line text:

$bio = sanitize_textarea_field( $_POST['bio'] );
// Removes HTML but preserves newlines

Common Sanitization Functions and Their Use Cases

Choosing the right sanitization function prevents both security issues and data loss. Here's a practical guide to the most common scenarios.

Names and simple text: Use sanitize_text_field(). It removes HTML, shortcodes, and multiple spaces:

$first_name = sanitize_text_field( $_POST['first_name'] );
$last_name = sanitize_text_field( $_POST['last_name'] );
$company = sanitize_text_field( $_POST['company'] );

Emails: Always use sanitize_email(). It validates email format and removes invalid characters:

$email = sanitize_email( $_POST['email'] );

if ( ! is_email( $email ) ) {
    wp_die( 'Invalid email address' );
}

URLs and web addresses: Use sanitize_url() or esc_url_raw() (they're aliases):

$website = sanitize_url( $_POST['website'] );
$redirect = esc_url_raw( $_POST['return_url'] );

Numeric IDs and quantities: Use absint() for positive integers or intval() for signed integers:

$post_id = absint( $_GET['id'] );
$page = absint( $_GET['page'] );
$offset = intval( $_POST['offset'] );

Prices and decimal values: Use floatval() but validate the range:

$price = floatval( $_POST['price'] );

if ( $price < 0 || $price > 999999 ) {
    wp_die( 'Invalid price' );
}

Option and meta keys: Use sanitize_key(). Keys should be simple identifiers:

$setting_name = sanitize_key( $_POST['setting'] );
$color_theme = sanitize_key( $_GET['theme'] );

Filenames: Use sanitize_file_name() to prepare filenames for storage:

$original_filename = sanitize_file_name( $_FILES['document']['name'] );

Rich content and formatted text: Use wp_kses_post() for post-like content or wp_kses() with custom allowed HTML:

// For post content (allows standard post HTML)
$post_content = wp_kses_post( $_POST['content'] );

// For limited HTML (buttons, links, formatting only)
$description = wp_kses(
    $_POST['description'],
    array(
        'p'      => array(),
        'br'     => array(),
        'strong' => array(),
        'em'     => array(),
        'a'      => array( 'href' => array(), 'title' => array() ),
    )
);

Advanced Sanitization: wp_kses and Custom Sanitization

The wp_kses() function deserves deeper explanation because many developers struggle with it. The name stands for "WordPress Kill Evil Scripts," and it's your primary tool for handling HTML.

Here's a comprehensive example:

$allowed_html = array(
    // Simple tags (no attributes)
    'strong' => array(),
    'em'     => array(),
    'b'      => array(),
    'i'      => array(),
    'br'     => array(),
    'hr'     => array(),
    'p'      => array(),
    
    // Tags with attributes
    'a'      => array(
        'href'   => array(),
        'title'  => array(),
        'class'  => array(),
        'target' => array(),
    ),
    
    'img' => array(
        'src'    => array(),
        'alt'    => array(),
        'width'  => array(),
        'height' => array(),
    ),
    
    // Lists
    'ul' => array( 'class' => array() ),
    'ol' => array( 'class' => array() ),
    'li' => array(),
    
    // Nested allowed tags
    'div'   => array( 'class' => array(), 'id' => array() ),
    'span'  => array( 'class' => array() ),
    'table' => array(),
    'tr'    => array(),
    'td'    => array(),
    'th'    => array(),
);

$clean_content = wp_kses( $_POST['rich_content'], $allowed_html );

For custom sanitization beyond what WordPress provides, create a dedicated function:

function my_plugin_sanitize_phone( $phone ) {
    // Remove all non-numeric characters except + at start
    $phone = preg_replace( '/[^0-9+]/', '', $phone );
    
    // Validate format
    if ( ! preg_match( '/^\+?[0-9]{10,15}$/', $phone ) ) {
        return '';
    }
    
    return $phone;
}

// Usage
$phone = my_plugin_sanitize_phone( $_POST['phone'] );

Sanitizing Different Data Types

Different data types require different sanitization approaches. Here's a comprehensive guide:

Arrays and complex data: Handle each element individually:

$colors = array();
if ( isset( $_POST['colors'] ) && is_array( $_POST['colors'] ) ) {
    foreach ( $_POST['colors'] as $color ) {
        $sanitized = sanitize_hex_color( $color );
        if ( ! empty( $sanitized ) ) {
            $colors[] = $sanitized;
        }
    }
}

JSON data: Decode and sanitize each field:

$json_data = json_decode( $_POST['data'], true );

if ( is_array( $json_data ) ) {
    $clean_data = array(
        'name'  => sanitize_text_field( $json_data['name'] ?? '' ),
        'email' => sanitize_email( $json_data['email'] ?? '' ),
    );
}

HTML content: Use wp_kses_post() or wp_kses():

// For WordPress post-like content
$content = wp_kses_post( $_POST['content'] );

// For limited HTML
$bio = wp_kses(
    $_POST['bio'],
    array(
        'p' => array(),
        'br' => array(),
        'a' => array( 'href' => array() ),
    )
);

Color values: Use sanitize_hex_color():

$color = sanitize_hex_color( $_POST['accent_color'] );
// Input: "#FF5733" → Output: "#ff5733"
// Input: "#GG5733" → Output: ""

CSS classes: Use sanitize_html_class():

$class = sanitize_html_class( $_POST['css_class'] );
// Input: "primary-button" → Output: "primary-button"
// Input: "primary<script>" → Output: "primaryscript"

Common Data Sanitization Mistakes

Mistake 1: Sanitizing and escaping output simultaneously

// Wrong: Sanitizing and escaping together
echo sanitize_text_field( wp_kses_post( $content ) );

// Right: Sanitize on input, escape on output
$clean_content = wp_kses_post( $_POST['content'] );
update_option( 'my_content', $clean_content );

// Later, when outputting:
echo wp_kses_post( get_option( 'my_content' ) );

Mistake 2: Using the wrong sanitization function

// Wrong: Using sanitize_text_field for an email
$email = sanitize_text_field( $_POST['email'] );

// Right: Use the specific function
$email = sanitize_email( $_POST['email'] );

Mistake 3: Not sanitizing array elements

// Wrong: Only sanitizing the array, not elements
$items = array_map( 'sanitize_text_field', $_POST['items'] );

// Right if items can be non-string:
$item_ids = array_map( 'absint', $_POST['item_ids'] );

// Right for mixed content:
$processed = array();
foreach ( $_POST['items'] as $item ) {
    $processed[] = sanitize_text_field( $item );
}

Mistake 4: Trusting sanitization instead of validation

// Wrong: Assuming sanitized value is valid
$age = absint( $_POST['age'] );
update_user_meta( $user_id, 'age', $age ); // Could be 0 if invalid

// Right: Validate after sanitizing
$age = absint( $_POST['age'] );
if ( $age < 1 || $age > 150 ) {
    wp_die( 'Invalid age' );
}
update_user_meta( $user_id, 'age', $age );

Mistake 5: Applying multiple sanitization functions

// Wrong: Applying two sanitization functions
$email = sanitize_email( sanitize_text_field( $_POST['email'] ) );

// Right: Use one appropriate function
$email = sanitize_email( $_POST['email'] );

Mistake 6: Sanitizing database values

// Wrong: Sanitizing values from your own database
$saved_value = get_option( 'my_option' );
$clean = sanitize_text_field( $saved_value );

// Right: Only sanitize external input
$user_input = $_POST['user_input'];
$clean = sanitize_text_field( $user_input );

Testing Sanitization in Your Plugin

Effective sanitization testing prevents security vulnerabilities from reaching production. Here's how to test comprehensively:

// Test file: tests/test-sanitization.php
class Test_Sanitization extends WP_UnitTestCase {
    
    public function test_sanitize_email() {
        $this->assertEquals(
            '[email protected]',
            sanitize_email( '[email protected]<script>' )
        );
        
        $this->assertEquals(
            '',
            sanitize_email( 'not-an-email' )
        );
    }
    
    public function test_sanitize_text_field() {
        $this->assertEquals(
            'Hello World',
            sanitize_text_field( 'Hello <script>alert("xss")</script> World' )
        );
    }
    
    public function test_custom_sanitization() {
        $phone = '+1-555-123-4567';
        $clean = my_plugin_sanitize_phone( $phone );
        $this->assertEquals( '+15551234567', $clean );
    }
    
    public function test_wp_kses_allowed_html() {
        $allowed = array(
            'a' => array( 'href' => array() ),
            'strong' => array(),
        );
        
        $html = 'Check <a href="https://example.com">this link</a> and be <strong>careful</strong> with <script>evil</script>';
        $clean = wp_kses( $html, $allowed );
        
        $this->assertStringContainsString( '<a href', $clean );
        $this->assertStringContainsString( '<strong>', $clean );
        $this->assertStringNotContainsString( '<script>', $clean );
    }
}

WP HealthKit automatically scans your plugins for sanitization issues, checking that all user input is properly sanitized before storage or processing.

Data sanitization—cleaning untrusted input—is fundamental to preventing injection attacks. When users, attackers, or external systems provide data to your plugin, you must treat it as potentially malicious. SQL injection, XSS attacks, and other injection vulnerabilities happen because plugins fail to properly sanitize data. The principle is simple: never trust external input, always sanitize it.

WordPress provides sanitization functions for different contexts. For database queries, use wpdb->prepare(). For HTML output, use wp_kses_post() or other escaping functions. For URLs, use esc_url(). For JavaScript, use esc_js(). Each context has specific sanitization requirements. Using the wrong function or skipping sanitization entirely creates vulnerabilities.

Implementing sanitization properly requires understanding data flow. Where does data come from? Is it user-provided, from external APIs, from the database? Where does it go? Into the database, displayed as HTML, used in JavaScript? Each transition requires appropriate sanitization. By carefully tracking data flow and applying sanitization at each transition, you prevent injection attacks.

Frequently Asked Questions

What's the difference between sanitize_text_field() and sanitize_textarea_field()?

sanitize_text_field() removes line breaks and multiple spaces, making it suitable for single-line input like names or emails. sanitize_textarea_field() preserves line breaks, making it ideal for multi-line text areas and biographical information.

Should I sanitize data before storing it or when retrieving it?

Sanitize on input before storing. This ensures your database contains clean data. Only escape on output when displaying to users. Sanitizing on retrieval is redundant and wastes resources.

Is wp_kses_post() sufficient for user-submitted content?

wp_kses_post() removes dangerous HTML tags and attributes but allows safe formatting tags. For most WordPress content, yes. However, if you need more restrictive control, use wp_kses() with a custom allowed HTML array.

Can I use sanitize_text_field() for all text input?

No. It's appropriate for names, titles, and simple text, but not for emails (use sanitize_email()), URLs (use sanitize_url()), or HTML content (use wp_kses()). Each function is optimized for its data type.

What if I need to allow specific HTML tags that wp_kses_post() removes?

Use wp_kses() with a custom allowed HTML array specifying exactly which tags and attributes are permitted. This gives you fine-grained control over what HTML is allowed.

Should I sanitize data in AJAX handlers?

Yes, absolutely. AJAX handlers receive user input just like form submissions. Sanitize $_POST, $_GET, and request parameters in all AJAX handlers.

Is escaping on output enough if I don't sanitize on input?

No. Escaping only prevents interpretation in a specific context (HTML, JavaScript, URLs). If data isn't sanitized on input, it might cause issues in other contexts, be harder to work with in your database, or display escaped characters to users.

Sanitization vs Validation vs Escaping

Three concepts are often confused: sanitization, validation, and escaping. They serve different purposes in different contexts.

Sanitization removes dangerous characters from input, making it safe to use. For database input, sanitize means removing SQL special characters. For HTML output, sanitize means removing script tags and other dangerous tags. Sanitization makes input safe for its intended context.

Validation checks if input matches expected format. A valid email must contain "@" and have proper structure. A valid number must contain only digits. Validation determines if input is acceptable, not whether it's safe.

Escaping prepares output for a specific context. HTML escaping converts the < character to < so <script> displays as text, not executed. URL escaping encodes special characters so they're valid in URLs. JavaScript escaping escapes quotes and newlines so code works correctly.

Different contexts need different protections. Database input needs sanitization to prevent SQL injection. Form values need validation to ensure they're the right format. HTML output needs escaping to prevent XSS. By using the right protection for each context, you prevent attacks.

WordPress Sanitization Functions Reference

WordPress provides many sanitization functions for different contexts. sanitize_text_field() for text. sanitize_email() for emails. sanitize_url() for URLs. sanitize_file_name() for file names. Using the right function for your data type ensures proper sanitization.

For database input, wpdb->prepare() handles sanitization. For HTML output, esc_html() for text, wp_kses_post() for full HTML. For URLs, esc_url(). For JavaScript, esc_js(). For attributes, esc_attr(). By using these functions consistently, you prevent injection attacks.

Remember to validate before sanitizing. Validate that input matches expected format. Then sanitize to remove dangerous characters. By validating first, then sanitizing, you ensure data safety.

Context-Aware Sanitization Strategies

Different output contexts require different sanitization approaches, and applying the wrong function can be worse than applying none. Sanitizing HTML content as if it were plain text strips legitimate markup, while sanitizing plain text as if it could contain HTML creates security holes. WordPress provides specialized functions like wp_kses_post() for allowing safe HTML tags within post content, wp_kses_data() for stricter HTML filtering, and wp_kses() for custom tag whitelisting. The security audit process must identify where data is being output and match the sanitization function to that specific context. A string output in an HTML attribute requires different handling than the same string output as JavaScript data.

Database Context and Query Parameter Handling

Data destined for database queries requires escaping optimized for SQL syntax, not HTML. The $wpdb->prepare() method uses placeholder-based parameterization that's inherently secure against SQL injection. However, you cannot sanitize before prepare() and then escape again—each step expects unescaped input to prevent double-escaping issues. WP HealthKit scans for common mistakes like attempting to sanitize input destined for queries, or escaping database output when it should only be sanitized for display.

Conclusion

Data sanitization is non-negotiable in WordPress plugin development. Understanding the distinctions between different sanitization functions, knowing when to apply each one, and following the "sanitize on input, escape on output" principle protects your plugin from security vulnerabilities and data corruption.

The most critical rule: Sanitize all external input. Every POST variable, GET parameter, file upload, and API response should be validated and sanitized according to its expected data type.

Before publishing your plugin, comprehensive security audits catch sanitization oversights. Upload your plugin to WP HealthKit to scan for improper sanitization, missing escaping, and other security vulnerabilities. Our automated audits check hundreds of security patterns that manual review might miss.

For deeper technical reference, consult the WordPress Data Validation documentation and OWASP's Input Validation Cheat Sheet. You might also find our guides on WordPress XSS and Escaping and SQL Injection Prevention valuable, or explore WP HealthKit's ecosystem of security resources.

Protecting Users

Data sanitization protects your users from attacks. By properly sanitizing input, you prevent attackers from injecting malicious code into user sites. This is one of your most direct opportunities to protect your user community. By implementing sanitization properly, you become a guardian of user security. This responsibility is significant and worth taking seriously. WP HealthKit scans your plugin code to identify unsanitized inputs and unescaped outputs. Our tools verify that data is properly sanitized at every step, preventing injection attacks. By using our automated analysis, you catch sanitization issues before they become vulnerabilities.

Proper sanitization is non-negotiable. It's the foundation of plugin security. By systematically verifying that all inputs are sanitized and all outputs are escaped, you prevent entire classes of vulnerabilities.

Upload your plugin to WP HealthKit to verify proper input sanitization and output escaping throughout your codebase. Every unsanitized input is a potential vulnerability. Systematic sanitization prevents entire classes of injection attacks. By consistently applying sanitization functions and validating inputs, you create security that becomes automatic. This becomes part of your development habit—you don't have to remember to sanitize because it's built into your process. Use prepared statements for database queries. Use escaping functions for output. Use validation functions for input. By consistently applying these patterns, you prevent injection attacks automatically. Sanitization prevents injection attacks. Make it automatic in your development habits.