A JVM heap dump is a snapshot of the Java Heap at a point in time, typically captured when ColdFusion throws an OutOfMemoryError or on-demand for diagnostics. When the ColdFusion application server slows down, crashes, or shows high memory usage, analyzing a heap dump helps pinpoint Memory leaks, oversized objects, and misuse of scopes or caching. These problems happen because objects are retained longer than expected (e.g., in session/Application scope, caches, or static fields), or because workloads temporarily need more memory than the JVM is configured to provide.
Overview of the Problem
ColdFusion runs atop the JVM, so any memory issue manifests as Java Heap pressure. Symptoms often include:
- Gradual memory growth until a crash with OutOfMemoryError: Java heap space
- Frequent full GCs with high CPU usage and long pauses
- Stalls when generating PDFs, manipulating images, or processing large queries/files
- Unbounded growth of cached or scoped objects (session/application/server scope)
A heap dump captured at or near the time of the incident can reveal what retained memory and why. The challenge is interpreting that dump correctly and correlating it with ColdFusion behavior, frameworks, and Configuration.
Possible Causes
Common root causes in ColdFusion environments:
- Excessive data in ColdFusion scopes: session, application, server, or request
- Query caching misuse (cfquery cachedwithin or global query cache)
- Large strings/byte arrays from file uploads, cfhttp responses, or JSON serialization
- PDF/cfdocument, image manipulation (cfimage), or spreadsheet operations holding large buffers
- ORM/Hibernate second-level cache or query cache accumulation
- Classloader leaks due to frequent redeploys, auto-reload of templates, or custom loaders
- Scheduled tasks or CFTHREADs not terminating; executors growing without bounds
- Custom caches (ehcache, Redis clients) misconfigured with no TTL/size limits
- Metaspace/class leak from hot deployments; not strictly heap, but can co-occur
- Off-heap/direct buffer growth (e.g., NIO, PDFg, Tika), causing memory pressure
- Third-party libraries storing large static collections or ThreadLocals
Quick reference: Causes and Solutions
-
Cause: Data stored in session/Application scope grows unbounded
— Solution: Purge and cap scope data; reduce session timeout; store only IDs. -
Cause: Query caching without size/time limits
— Solution: Constrain cache via TTL, max entries, and targeted usage. -
Cause: Large responses from cfhttp or oversized uploads retained in memory
— Solution: Stream files; limit size; process in chunks. -
Cause: PDF/image/spreadsheet generation using large in-memory buffers
— Solution: Stream to disk; increase temp storage; reduce resolution. -
Cause: Classloader leaks from continuous template recompilation or redeploys
— Solution: Use Trusted Cache in production; schedule clean restarts; fix custom loaders. -
Cause: ORM/Hibernate caches not pruned
— Solution: Tune 2nd-level/query cache; set eviction policies. -
Cause: Off-heap direct buffers consuming memory
— Solution: Cap -XX:MaxDirectMemorySize; fix native libs; watch OS memory. -
Cause: Threads/executors never releasing references
— Solution: Shut down executors; use try/finally to clear ThreadLocals.
Step-by-Step Troubleshooting Guide
1) Stabilize the Environment
- If production is down, consider a temporary heap increase and restart to restore service:
- In ColdFusion’s jvm.config, adjust -Xmx (and keep -Xms = -Xmx to avoid runtime resize).
- Disable or rate-limit memory-heavy Features (PDF/image jobs, large exports) temporarily.
- Communicate an incident timeline to preserve diagnostic artifacts.
2) Capture the Right Artifacts
-
Heap dump on OutOfMemoryError:
-
Ensure these JVM options are set:
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/coldfusion/heapdumps
-XX:ErrorFile=/opt/coldfusion/logs/hs_err_pid%p.log
-
-
On-demand heap dump (preferred when memory high but before OOME):
-
Identify PID:
jcmd | grep -i coldfusion
or
jps -lv
-
Dump heap with minimal pause:
jcmd
GC.heapdump /opt/coldfusion/heapdumps/cf-$(date +%F%H%M).hprof Alternative (older JDKs):
jmap -dump:format=b,file=/opt/coldfusion/heapdumps/cf.hprof
-
-
Thread dumps to correlate with heap state:
jcmd
Thread.print > /opt/coldfusion/heapdumps/tdump-$(date +%F_%H%M).txt -
GC logs:
-
Java 11+:
-Xlog:gc*,safepoint:file=/opt/coldfusion/logs/gc.log:time,uptime,level,tags
-
Java 8:
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/coldfusion/logs/gc.log
-
-
ColdFusion-specific metrics:
- Adobe CF Performance Monitoring Toolset (PMT) or Enterprise Server monitor: capture live heap, slowest calls, and scope sizes if available.
3) Prepare the Environment for Analysis
- Copy the .hprof file to a workstation with more memory than the dump size (e.g., 1.5–2x).
- Tools:
- Eclipse MAT (Memory Analyzer Tool) — primary for leak detection.
- VisualVM — overview, sampling.
- Java Flight Recorder / JMC — continuous profiling (for future captures).
- Open the dump in MAT, increase MAT memory (e.g., -Xmx8g) if needed.
4) Run First-Pass Analyses in Eclipse MAT
-
Run the built-in Leak Suspects Report.
-
Navigate the Dominator Tree to find largest retained heaps.
-
Use Histogram to see top object types by shallow/retained size.
-
Trace Paths to GC Roots for any suspect object to see why it’s retained.
-
Look for common bulky types:
- char[] / String (huge strings)
- byte[] (file contents, images, PDF buffers, HTTP bodies)
- coldfusion.runtime.Struct/Array (scope growth)
- JDBC result sets, Hibernate caches, Ehcache entries
- lucene/solr caches (if used)
-
Quick OQL in MAT to search large arrays:
SELECT s FROM char[] s WHERE s.@length > 1000000
SELECT b FROM byte[] b WHERE b.@length > 5000000
What to conclude:
- If retained by session/application scope roots, inspect your scope usage and cleanups.
- If retained by static fields in custom or third-party libs, investigate those caches or singletons.
- If dominated by PDF/image buffers, revisit content generation patterns.
- If many Class instances and ClassLoaders, suspect classloader leak (redeploy churn).
5) Correlate with ColdFusion Behavior
- Session/Application leaks:
- Look for keys named after domain-specific objects. Large coldfusion.runtime.StructImpl often means scopes stuffed with DTOs or query results.
- Inspect CFML patterns writing big arrays into session, e.g., query data cached in session for grids.
- Query caching:
- Large maps from Ehcache or ColdFusion’s query cache classes.
- Verify cachedwithin usage — broad usage on heavy queries with many parameter variations can explode cache.
- PDF/cfdocument:
- Huge byte[] in com.adobe.* or iText classes; confirm cfdocument usage and page counts/resolution.
- Hibernate:
- net.sf.ehcache., org.hibernate.cache. with many entries retained.
- Classloader leaks:
- Multiple instances of coldfusion.bootstrap.* ClassLoaders retaining templates (CFCs) across redeploys.
- Many identical copies of framework classes.
6) Implement Fixes
- Scope hygiene:
- Store only identifiers in session; fetch on demand.
- Reduce session timeout; clear large keys on logout/idle.
- For application scope, cache via a bounded cache (size+TTL) rather than unbounded structs.
- Caching:
- Set constrained TTL and maximum entries for query cache/ehcache.
- Cache only idempotent, highly reused results; add invalidation logic.
- PDF/Image/Spreadsheet:
- Stream outputs to disk; set smaller DPI/quality; chunk processing.
- Increase working temp disk and ensure it’s on fast storage; avoid holding whole payloads in memory.
- ORM/Hibernate:
- Tune second-level cache regions; prefer read-only where applicable.
- Disable query cache unless strong reuse; set eviction policies.
- Classloader:
- Enable ColdFusion Trusted Cache in production to prevent constant template recompiles.
- Avoid frequent hot-redeploy cycles; schedule controlled restarts.
- Remove custom classloader hacks; ensure listeners clean up on undeploy.
- Threads/Executors:
- Ensure ThreadPools are bounded; call shutdown() on application stop.
- Clear ThreadLocal variables in finally blocks.
- Direct memory:
- Cap with -XX:MaxDirectMemorySize=256m (example) and test; fix native library usage.
- JVM tuning:
- Use G1GC (Java 11+ default good choice) and size heap appropriately.
- Keep -Xms equal to -Xmx to eliminate runtime resize overhead.
- Consider -XX:+UseStringDeduplication on Java 8u20+ if many duplicate strings.
7) Validate Fixes
- Reproduce under load in staging; capture a new heap dump and compare using MAT’s “Compare to another heap dump.”
- Monitor GC logs:
- Verify reduced frequency of full GCs and lower pause times.
- Watch PMT/Server monitor for stable heap usage plateau and improved response times.
- Only then deploy to production.
Diagnostic Examples
ColdFusion jvm.config snippet (Linux path example)
/opt/coldfusion/cfusion/bin/jvm.config
-Xms8g
-Xmx8g
-XX:+UseG1GC
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/coldfusion/heapdumps
-XX:ErrorFile=/opt/coldfusion/logs/hs_err_pid%p.log
-Xlog:gc*,safepoint:file=/opt/coldfusion/logs/gc.log:time,uptime,level,tags
-XX:+UseStringDeduplication
Generating a heap dump on demand
Find the ColdFusion PID
jcmd | grep -i coldfusion
Dump heap
jcmd
Sample OutOfMemoryError log snippet
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /opt/coldfusion/heapdumps/java_pid12345.hprof …
Heap dump file created [1536245898 bytes in 12.345 secs]
Lightweight CFML memory probe (for development only)
rt = createObject(“java”,”java.lang.Runtime”).getRuntime();
writeOutput(“Heap (MB): used=” & (rt.totalMemory()-rt.freeMemory())/1024/1024 &
” committed=” & rt.totalMemory()/1024/1024 &
” max=” & rt.maxMemory()/1024/1024 );
Common mistakes and How to Avoid Them
- Misreading retained vs shallow size: focus on retained heap (dominator) to see real impact.
- Analyzing the wrong moment: a dump right after restart may not show the leak; capture at high watermark.
- Comparing dumps from different JVMs or vastly different workloads: keep conditions similar for comparisons.
- Ignoring off-heap/native memory: heap dump won’t show direct buffers; check OS memory and MaxDirectMemorySize.
- Restarting too early: don’t lose the OOME-triggered dump and GC logs before copying them.
- Taking heap dump on a tiny disk: ensure free space > dump size; otherwise dump fails silently.
- Over-trusting MAT’s first Leak Suspect: use Dominator Tree and Paths to GC Roots to confirm.
Best practices for Prevention
- Capacity and tuning:
- Right-size the heap based on load tests; maintain headroom (e.g., 30–40%).
- Use G1GC and keep -Xms == -Xmx.
- Coding Standards:
- Store only minimal data in session; never cache large queries or file data in scopes.
- Use bounded caches with TTL and max size; centralize caching policy.
- Stream large I/O (files, HTTP) instead of loading into memory.
- Clean up ThreadLocals; close streams; release references promptly.
- Deployment hygiene:
- Enable Trusted Cache in production; disable template auto-reload.
- Avoid continuous redeploys; schedule Maintenance restarts.
- Monitoring:
- Keep GC logging on; rotate logs.
- Use PMT/Server Monitor dashboards; alert on heap usage trends and full GC frequency.
- Track session counts and average session size; set realistic timeouts.
- Tooling and process:
- Maintain a runbook for heap dump and thread dump collection.
- Keep MAT and analysis scripts ready; practice in staging.
- Version control JVM and CF Admin configurations.
Key Takeaways / Summary Points
- Heap dumps are the most direct way to find what retains memory in ColdFusion’s JVM.
- Use jcmd or automatic OOME options to capture dumps; analyze with Eclipse MAT’s Dominator Tree and GC roots.
- Most leaks in CF come from oversized scopes, unbounded caches, content generation buffers, or classloader issues.
- Fixes center on bounding data, streaming I/O, tuning caches/ORM, enabling Trusted Cache, and proper JVM tuning.
- Prevent recurrences with steady monitoring, disciplined coding patterns, and controlled deployments.
FAQ
How do I trigger a heap dump from the ColdFusion Administrator or PMT?
Some ColdFusion Enterprise tools (Server Monitor/PMT) expose a “Heap Dump” action that internally calls JMX. If unavailable, use jcmd GC.heap_dump. Ensure the runtime user has permission to write to the dump directory.
What if I see OutOfMemoryError: Metaspace instead of Java heap space?
That indicates class metadata exhaustion. Increase -XX:MaxMetaspaceSize, reduce redeploy churn, enable Trusted Cache, and investigate classloader leaks. A standard heap dump won’t show metaspace contents; capture hs_err logs and consider JFR for class load events.
Can heap dumps impact Performance on production?
Generating a heap dump briefly pauses or slows the JVM and writes a large file. Prefer off-peak or pre-OOME high watermark captures. jcmd is generally safer than older tools. Always ensure adequate disk space and I/O throughput.
How do I tell if the leak is off-heap or native?
If the OS shows high memory but the heap is modest, suspect direct buffers or native libs. Set -XX:MaxDirectMemorySize, review native components (PDFg, Tika, ImageIO), and use OS tools (pmap, Process Explorer) or JFR Native Memory profiling to confirm.
Does Lucee differ from Adobe ColdFusion in heap analysis?
The JVM techniques are the same. Object types differ (e.g., Lucee.runtime.type.StructImpl). Lucee’s admin and caches have their own settings; still apply the same scope/caching hygiene, monitoring, and JVM flags.
