Understanding URL and HTML Encoding: Why It Happens, How to Spot It, and What to Do With It

If you’ve ever seen a URL filled with strange symbols like %20, or noticed your payloads getting altered in weird ways during testing, you’ve already met encoding — specifically URL encoding and HTML encoding.

To exploit vulnerabilities effectively, you need to recognize when encoding is in play, understand why it’s used, and know how to manipulate or decode it.

Let’s break it all down.

Encoding is the process of converting data into a different format so it can be safely transmitted or rendered.

It’s not encryption. It doesn’t hide or protect information — it just makes it compatible with certain systems.

URLs can only contain a limited set of characters. Certain characters have special meanings:

  • / = path separator
  • ? = start of query string
  • & = separates parameters
  • = = assigns values
  • # = anchor (fragment)

Other characters (like spaces or quotes) can break the URL or confuse the server, so they must be encoded.

URL encoding replaces unsafe characters with % followed by their ASCII hex code:

CharacterEncoded
space%20 or +
"%22
'%27
<%3C
>%3E
/%2F

Original:

http://example.com/search?query=hello world

Encoded:

http://example.com/search?query=hello%20world

You’ll often see encoded values in:

  • URLs
  • Query parameters
  • Form fields
  • API requests
  • Burp Suite repeater/interceptor

Example:

username=admin%27+OR+1%3D1--

This is an encoded SQL injection payload:

admin' OR 1=1--
  • Decode it to understand what the app is doing
    (using Burp, Python, or URL decoding tools)
  • Encode payloads before sending them manually or via tools
    (this avoids breaking syntax and can bypass filters)

When a web app reflects user input back into the page, unencoded special characters can break the page or even inject scripts.

To prevent this, HTML encodes characters like <, >, and " so they’re displayed as text — not interpreted as HTML or JavaScript.

CharacterEncoded
<&lt;
>&gt;
"&quot;
'&#x27;
&&amp;
<!-- Raw input (dangerous): -->
<p>Welcome, <script>alert(1)</script></p>

<!-- HTML-encoded input (safe): -->
<p>Welcome, &lt;script&gt;alert(1)&lt;/script&gt;</p>

You’ll see this mostly in reflected input:

<p>Hello, &lt;b&gt;admin&lt;/b&gt;!</p>

This means the app is trying to sanitize output — possibly to prevent Cross-Site Scripting (XSS). But sometimes it misses key spots or fails to encode properly in JavaScript contexts.

  • Decode encoded output to spot reflections, misconfigurations, or filtering
  • Encode your payloads intentionally to:
    • Bypass naive filters
    • Inject into HTML or JavaScript contexts
  • Use tools like:
    • Burp Suite (Decoder tab)
    • Python scripts
    • Online encoding/decoding tools
VulnerabilityEncoding Role
XSSEncode/decode payloads to test different injection contexts
SQL InjectionEncode special characters to evade filters
Command InjectionEncode spaces, pipes, etc.
Filter BypassesEncode characters to slip through sanitization
API TestingSome APIs encode responses — decode to reveal info
  • curl --data-urlencode
  • urldecode, urlencode (Linux tools)
  • Burp Suite Decoder
  • Firefox Dev Tools (watch live request data)

Try sending XSS payloads like:

  • <script>alert(1)</script>
  • Encoded version: %3Cscript%3Ealert(1)%3C%2Fscript%3E
  • HTML encoded version: &lt;script&gt;alert(1)&lt;/script&gt;

Then watch how the app reflects or processes it.

Encoding isn’t just a web developer’s safety net — it’s also a tool for us as pentesters.

When you see encoded characters, don’t ignore them. Decode them to understand what’s happening under the hood. Encode your payloads smartly to sneak through filters or bypass poor sanitization.

The more fluent you become with spotting and handling encoding, the more control you’ll have over input, context, and ultimately — the target.

Scroll to Top