Overview of the Problem
JVM Garbage collection (GC) Performance issues in ColdFusion manifest as high CPU usage, long or frequent GC pauses, thread stalls, and slow or unresponsive requests. The Java Virtual Machine reclaims memory by suspending application threads (stop-the-world pauses) to collect garbage. If the heap is undersized, misconfigured, or the application creates objects faster than the GC can reclaim them, ColdFusion spends excessive time in GC, causing timeouts and poor throughput.
GC pressure is common when traffic spikes, large file/image/PDF operations dominate, or Memory leaks exist in the CFML code or libraries. Modern collectors like G1 (Java 11+) reduce pause times, but they still need correct sizing and healthy application behavior to remain efficient.
What GC Performance Problems Look Like in ColdFusion
- High CPU even when requests are few
- Increased request times, timeouts, and queued requests in ColdFusion Administrator
- Frequent “Full GC” or long pauses in GC logs
- OutOfMemoryError (Java Heap space or Metaspace)
- Stuck or blocked CFTHREADs, cfhttp calls timing out
- JVM monitoring showing Old Gen steadily climbing and not returning
Why It Happens
- The heap is too small or too large relative to workload and collector
- The young generation is poorly sized, causing frequent promotions to Old Gen
- Memory leaks (objects retained indefinitely) or excessive short-lived allocations (object churn)
- Large objects (images, PDFs, byte arrays) cause G1 humongous allocations and fragmentation
- Misconfigured caching, Session replication, or ORM settings blow up memory
- Metaspace growth due to classloader leaks from repeated deployments or dynamic class generation
Possible Causes
Runtime and Workload Causes
- Bursts of traffic generating heavy object churn
- Large payloads: image manipulation (cfimage), PDF generation (CFPDF, PDFg), large JSON/XML
- Session size bloat, many concurrent sessions
- Excessively cached queries or templates
Configuration Causes
- Heap too small or too big for the collector’s ergonomics
- Inappropriate collector choice (e.g., Parallel GC on Java 8 with latency-sensitive workloads)
- Missing or misconfigured GC logging (harder to see the real problem)
- Metaspace limits too small for the app’s class loading patterns
Code/ColdFusion-Specific Causes
- Query caching without TTL or size limits
- ORM caching growth without eviction
- Application cache (CFCACHE, EHCache/Redis) with oversized entries
- Large in-memory operations (images, cfspreadsheet) without streaming/chunking
- cfthread spawning too many concurrent memory-heavy tasks
Quick reference: Causes and Solutions
-
Cause: Heap too small
- Solution: Increase -Xms/-Xmx and retest; size to avoid frequent Old Gen promotions.
-
Cause: Excessive object churn
- Solution: Reduce allocations; reuse buffers; stream large I/O; optimize CFML hot paths.
-
Cause: G1 humongous allocations (images/PDF byte arrays)
- Solution: Reduce object sizes via streaming; consider increasing heap or tuning G1; avoid holding full files in memory.
-
Cause: Memory leak
- Solution: Capture heap dump; analyze with MAT; fix retaining references; verify caches and sessions.
-
Cause: Metaspace growth/classloader leak
- Solution: Increase MaxMetaspaceSize; fix redeploy patterns; check custom classloading.
-
Cause: Wrong GC or params
- Solution: Use G1 on Java 11+; set pause targets; avoid over-tuning; test one change at a time.
Step-by-Step Troubleshooting Guide
Step 1: Confirm It’s a GC Problem
- Check ColdFusion Administrator > Server monitor (or Enterprise Monitor)
- Look for high CPU with low throughput, erratic response times, and growing request queues.
- Verify with OS metrics and a JVM profiler (FusionReactor, New Relic, AppDynamics) showing GC time > 10–20% over sustained periods.
Step 2: Enable and Collect GC Logs
For Java 11/17 (G1 is default), edit ColdFusion’s jvm.config (e.g., cfusion/bin/jvm.config) and add:
Java 11/17 GC logging
-XX:+UseG1GC
-Xlog:gc*:file=/opt/ColdFusion/cfusion/logs/gc-%t.log:time,uptime,level,tags
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
For Java 8:
Java 8 GC logging (G1 recommended for latency)
-XX:+UseG1GC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:/opt/ColdFusion/cfusion/logs/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=50M
Restart ColdFusion to apply changes.
What to look for in GC logs:
- High frequency of “Pause Young” or long “Pause Full” events
- “To-space exhausted” (survivor/old space pressure)
- “Humongous allocation” lines with G1
- Increasing Old Gen occupancy after collections (no memory reclaimed)
- Long pauses exceeding your SLO (e.g., >200 ms frequently for APIs)
Example G1 log snippet (Java 11):
[2.345s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause) 45M->32M(1024M) 45.7ms
[5.210s][info][gc] GC(18) Pause Full (G1 Compaction Pause) 820M->510M(2048M) 1.24s
[6.700s][info][gc] GC(19) Humongous allocation 24M
Step 3: Baseline with jstat/jcmd and Monitor
Run:
jstat -gc
Watch S0/S1, E (Eden), O (Old), and YGC/FGC progression. If Old Gen grows steadily and doesn’t fall, suspect leak or insufficient heap.
For flags:
jcmd
jcmd
Step 4: Detect Leaks or Excessive Churn
- Enable heap dump on OOM just in case:
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/ColdFusion/cfusion/logs/heapdump.hprof
- To manually capture a dump:
jcmd
- Analyze with Eclipse MAT:
- Look at dominator tree and biggest retained sets
- Suspects: large caches (queries, session, EHCache), large byte[] from images/PDF, unbounded collections
- Paths to GC roots revealing references held by Application/Session scope, static fields, or classloaders
Step 5: Choose and Tune the Collector
- For ColdFusion on Java 11/17: G1GC is the safe default.
- Target latency with:
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=30
-XX:+ParallelRefProcEnabled
-XX:+UseStringDeduplication
-
If you see “Humongous allocation” frequently:
- Stream large payloads instead of loading completely in memory
- Ensure heap size is adequate; G1 treats objects > 50% region size as humongous
- Avoid excessive transient huge arrays
-
Consider ZGC/Shenandoah only if:
- You run on Java 17 where they’re stable
- Your vendor and ColdFusion Version support it
- You test thoroughly. Example:
-XX:+UseZGC
-Xms8g -Xmx8g
Note: Not all ColdFusion versions are certified on these collectors; verify Adobe ColdFusion support matrix.
Step 6: Right-Size Heap and Metaspace
- Start with:
- -Xms and -Xmx equal to avoid heap resizing
- For mid-size CF instances: 4–8 GB heaps are common; adjust based on live set and traffic
- Metaspace:
-XX:MaxMetaspaceSize=512m
Adjust if you see Metaspace OOM or high class loading. Persistent Deployment changes without restart can cause classloader buildup; schedule restarts or fix classloader leaks.
Step 7: Tune ColdFusion Settings
- Sessions:
- Keep session size small; store only IDs/small data
- If clustering, consider external session store to avoid heap bloat
- Caching:
- Query caching: apply TTL and limits; avoid caching massive result sets
- Template/Function caching: set reasonable timeouts and sizes
- If using EHCache/Redis: set max elements/TTL; avoid caching binary blobs
- ORM:
- Enable second-level cache judiciously; set eviction policies
- PDF/Image:
- Use streaming APIs; avoid loading entire files into memory; delete temp objects quickly
- Threads:
- Limit cfthread concurrency for heavy-memory tasks
- Logging:
- Avoid logging huge payloads in memory; stream to disk
Step 8: Optimize Application Code (CFML)
- Reduce temporary object creation in hot paths; reuse buffers/structures
- Avoid concatenating large strings repeatedly; use StringBuilder-style patterns in Java interop
- Stream I/O (cfhttp, file read/write) instead of building large in-memory arrays
- Use queryMore/offset pagination over loading entire datasets
- Remove unnecessary deep copies (duplicate() in CFML) on big structures
Step 9: Test, Measure, and Roll Out
- Make one change at a time; measure latency, throughput, and GC time
- Use realistic load tests reflecting traffic and payload sizes
- Monitor after Deployment for at least one peak cycle; revert if regressions appear
Configuration Examples
ColdFusion jvm.config (Java 11/17 + G1)
java.home=/opt/jdk-17
Heap sizing
-Xms6g
-Xmx6g
GC
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=30
-XX:+ParallelRefProcEnabled
-XX:+UseStringDeduplication
Logging
-Xlog:gc*:file=/opt/ColdFusion/cfusion/logs/gc-%t.log:time,uptime,level,tags
Diagnostics
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/ColdFusion/cfusion/logs
-XX:MaxMetaspaceSize=512m
Sample GC Log Patterns to Recognize
- Frequent young GCs but low pause times: normal under load if Old Gen is stable
- Repeated Full GCs with little memory reclaimed: leak or heap too small
- “To-space exhausted”: survivor/Old Gen pressure; increase heap or reduce promotion rate
- “Humongous allocation”: large arrays; restructure to streaming, ensure adequate heap
Common mistakes and How to Avoid Them
- Over-tuning with dozens of JVM flags. Avoid: start with G1 defaults + a few well-understood knobs.
- Setting enormous heaps without analysis. Too large heaps can lengthen pauses and mask leaks.
- Caching everything “for speed.” Unbounded caches cause heap growth and GC storms.
- Ignoring Metaspace. If you redeploy often without restart, you may create classloader leaks.
- Replacing GC tweaks for code issues. If you hold massive objects (images/PDF) in memory, GC won’t save you—stream instead.
- Changing multiple variables at once. You won’t know which change helped or harmed.
Prevention Tips / Best practices
- Capacity planning:
- Baseline memory footprint and live set size under typical and peak load
- Allocate headroom (30–50%) in heap for bursts
- Choose the right collector:
- Use G1 on Java 11+; set realistic pause targets (100–300 ms for web)
- Logging and observability:
- Keep GC logging on in production with rotation
- Use APM (FusionReactor/New Relic) to track allocation rate, GC time, and hotspots
- Code hygiene:
- Stream large files and responses
- Keep sessions lean; externalize large data
- Bound caches with TTL and size; monitor hit ratios
- Deployment practices:
- Avoid hot deployments that leak classloaders; schedule rolling restarts
- Test upgrades of Java/CF patches in stage with production-like data
- Regular audits:
- Quarterly review of GC logs and heap usage
- Load tests before major feature launches
Key Takeaways
- GC performance issues in ColdFusion are usually a mix of heap sizing, collector behavior, and application memory patterns.
- Start with evidence: enable GC logs, measure GC time, and inspect Old Gen behavior.
- Use G1GC with modest tuning on Java 11/17; aim for steady Old Gen and acceptable pause times.
- Fix root causes: stream large operations, bound caches, keep sessions small, and address leaks with heap analysis.
- Change one variable at a time, measure, and roll back if needed.
FAQ
Comment activer les logs GC sur ColdFusion ?
Add GC logging flags to jvm.config and restart ColdFusion. For Java 11/17:
- -Xlog:gc*:file=/path/to/logs/gc-%t.log:time,uptime,level,tags
For Java 8: - -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/path/to/logs/gc.log
Ensure log rotation is configured.
Quelle taille de heap dois-je choisir ?
Measure the live set under representative load, then set -Xms/-Xmx about 30–50% above it to absorb bursts. For many CF sites, 4–8 GB works well, but your workload (PDF/image-heavy vs. API-only) dictates the size. Validate with GC logs: Old Gen should stabilize and pauses should meet your SLO.
Puis-je utiliser ZGC ou Shenandoah avec ColdFusion ?
Possibly, if you run Java 17 and your ColdFusion version/vendor supports it. They can reduce pauses significantly, but require validation in staging. G1 is the safer default for most CF deployments; adopt ZGC/Shenandoah only after thorough compatibility and Performance testing.
Comment savoir si j’ai une fuite de mémoire ou juste beaucoup d’allocations ?
If Old Gen grows and doesn’t drop after Full GC, or the live set increases over time, suspect a leak. Take a heap dump and analyze with MAT to find retained objects and GC roots. If Old Gen stabilizes and GC time is acceptable, you likely have high allocation churn rather than a leak—optimize hot paths and object creation.
