Understanding URL and HTML Encoding: Why It Happens, How to Spot It, and What to Do With It
If you’ve ever seen a URL filled with strange symbols like %20, or noticed your payloads getting altered in weird ways during testing, you’ve already met encoding — specifically URL encoding and HTML encoding.
To exploit vulnerabilities effectively, you need to recognize when encoding is in play, understand why it’s used, and know how to manipulate or decode it.
Let’s break it all down.
What Is Encoding?
Encoding is the process of converting data into a different format so it can be safely transmitted or rendered.
It’s not encryption. It doesn’t hide or protect information — it just makes it compatible with certain systems.
Part 1: URL Encoding
Why It’s Done
URLs can only contain a limited set of characters. Certain characters have special meanings:
/= path separator?= start of query string&= separates parameters== assigns values#= anchor (fragment)
Other characters (like spaces or quotes) can break the URL or confuse the server, so they must be encoded.
How It Works
URL encoding replaces unsafe characters with % followed by their ASCII hex code:
| Character | Encoded |
|---|---|
| space | %20 or + |
" | %22 |
' | %27 |
< | %3C |
> | %3E |
/ | %2F |
Example:
Original:
http://example.com/search?query=hello world
Encoded:
http://example.com/search?query=hello%20world
Spotting It During a Pentest
You’ll often see encoded values in:
- URLs
- Query parameters
- Form fields
- API requests
- Burp Suite repeater/interceptor
Example:
username=admin%27+OR+1%3D1--
This is an encoded SQL injection payload:
admin' OR 1=1--
What To Do With It
- Decode it to understand what the app is doing
(using Burp, Python, or URL decoding tools) - Encode payloads before sending them manually or via tools
(this avoids breaking syntax and can bypass filters)
Part 2: HTML Encoding
Why It’s Done
When a web app reflects user input back into the page, unencoded special characters can break the page or even inject scripts.
To prevent this, HTML encodes characters like <, >, and " so they’re displayed as text — not interpreted as HTML or JavaScript.
How It Works
| Character | Encoded |
|---|---|
< | < |
> | > |
" | " |
' | ' |
& | & |
Example:
<!-- Raw input (dangerous): -->
<p>Welcome, <script>alert(1)</script></p>
<!-- HTML-encoded input (safe): -->
<p>Welcome, <script>alert(1)</script></p>
Spotting It During a Pentest
You’ll see this mostly in reflected input:
<p>Hello, <b>admin</b>!</p>
This means the app is trying to sanitize output — possibly to prevent Cross-Site Scripting (XSS). But sometimes it misses key spots or fails to encode properly in JavaScript contexts.
What To Do With It
- Decode encoded output to spot reflections, misconfigurations, or filtering
- Encode your payloads intentionally to:
- Bypass naive filters
- Inject into HTML or JavaScript contexts
- Use tools like:
- Burp Suite (Decoder tab)
- Python scripts
- Online encoding/decoding tools
Pentesting Relevance: Why Encoding Matters
| Vulnerability | Encoding Role |
|---|---|
| XSS | Encode/decode payloads to test different injection contexts |
| SQL Injection | Encode special characters to evade filters |
| Command Injection | Encode spaces, pipes, etc. |
| Filter Bypasses | Encode characters to slip through sanitization |
| API Testing | Some APIs encode responses — decode to reveal info |
Quick Tools and Tips
Tools:
curl --data-urlencodeurldecode,urlencode(Linux tools)- Burp Suite Decoder
- Firefox Dev Tools (watch live request data)
Test Ideas:
Try sending XSS payloads like:
<script>alert(1)</script>- Encoded version:
%3Cscript%3Ealert(1)%3C%2Fscript%3E - HTML encoded version:
<script>alert(1)</script>
Then watch how the app reflects or processes it.
Final Thoughts
Encoding isn’t just a web developer’s safety net — it’s also a tool for us as pentesters.
When you see encoded characters, don’t ignore them. Decode them to understand what’s happening under the hood. Encode your payloads smartly to sneak through filters or bypass poor sanitization.
The more fluent you become with spotting and handling encoding, the more control you’ll have over input, context, and ultimately — the target.
