When a ColdFusion server exhibits high CPU usage, requests slow down, timeouts increase, and the system may become unresponsive. High CPU means the JVM hosting ColdFusion is spending excessive cycles executing code, garbage collecting, or waiting on external resources while still keeping threads active. This happens for many reasons: inefficient code (tight loops, heavy regex), slow database operations, runaway background tasks, excessive concurrent requests, connector misconfiguration, or JVM memory/GC pressure. The goal is to identify the hotspot quickly, mitigate the impact, and apply fixes to prevent recurrence.
—–
## Overview of the Problem
High CPU utilization on a ColdFusion server typically manifests as:
– Spikes to 90–100% across one or more cores
– Slow responses, request queueing, and timeouts
– Increased error rates (500/503)
– Operators noticing w3wp.exe (IIS), httpd (Apache), or coldfusion.exe/java using the most CPU
Why it happens:
– CPU-bound code paths
– Thread contention and lock storms
– Garbage collection thrashing
– Database slowness causing threads to linger in RUNNABLE state
– Connector retry loops, health checks, or over-aggressive concurrency
– Background processes (Solr, PDF generation, Scheduled tasks) pegging cores
– External calls without timeouts or with massive payloads
Understanding which category you’re in is crucial before tuning blindly.
—–
## Possible Causes
Use the following quick-reference list to map symptoms to potential causes and fixes.
– Cause: Unbounded request concurrency or missing request timeouts
– Solution: Reduce “Simultaneous Requests” in CF Admin; set global and page-level timeouts.
– Cause: Slow database queries or missing indexes
– Solution: Analyze plans, add indexes, optimize queries, use cfqueryparam, tune JDBC pool settings.
– Cause: Garbage collection pressure (too small heap, wrong GC)
– Solution: Right-size heap, enable GC logging, tune GC flags, fix leaks.
– Cause: Busy loops or heavy code paths (cfloop, regex, cfdocument/PDFg)
– Solution: Add guards/limits, optimize algorithms, offload heavy tasks, cache.
– Cause: External HTTP calls with long delays
– Solution: Add connection/read timeouts, circuit breakers, retries with backoff.
– Cause: Connector misconfigurations (IIS/Apache to CF)
– Solution: Adjust AJP/connector threads, verify keep-alive, request queue limits, update web connector.
– Cause: Background jobs (scheduler, Solr, report generation)
– Solution: Throttle schedule, stagger runtimes, move to queues, scale out.
– Cause: Log rotation/AV scans on CF directories
– Solution: Exclude CF/jvm/logs/temp from AV; schedule non-peak Maintenance.
—–
## Step-by-Step Troubleshooting Guide
### Step 1: Verify the culprit process and scope of impact
– Use OS tools to identify which process is using CPU:
– Windows: Task Manager > Details (w3wp.exe, coldfusion.exe/java.exe, pdfg processes, jetty/solr)
– Linux: top/htop to find java, httpd, nginx worker, etc.
– Check if all cores are pegged or just one. A single-core spike often points to a single hot thread.
Tip: If web server CPU is high but ColdFusion is not, the bottleneck may be in the web tier (compression, SSL, static file IO) or connector retry loops.
—–
### Step 2: Capture thread dumps when CPU is high
You need at least 3 thread dumps spaced 5–10 seconds apart.
– If you have FusionReactor or New Relic, use their thread profiler.
– CLI options:
– Windows service: Locate Java PID in Task Manager > Details.
– Linux: jcmd
Example:
jcmd
sleep 5
jcmd
sleep 5
jcmd
Look for:
– Many RUNNABLE threads in the same stack frame
– Repeated patterns pointing to cfloop, coldfusion.tagext.sql, coldfusion.document.* (PDF), HTTP clients, or Solr
– Locks/monitors causing contention (java.util.concurrent locks)
A suspicious snippet might look like:
“ajp-bio-8014-exec-234” #812 prio=5 RUNNABLE
at coldfusion.tagext.sql.QueryTag.execute(QueryTag.java:…)
at coldfusion.tagext.lang.LoopTag.doStartTag(LoopTag.java:…)
at coldfusion.runtime.AppEventInvoker.onRequest(AppEventInvoker.java:…)
This indicates many request threads busy in queries or loops.
—–
### Step 3: Check ColdFusion logs for errors and slow requests
Review:
– coldfusion-error.log, exception.log, application.log for stack traces
– server.log for startup/config warnings
– web connector logs (IIS isapi_redirect.log or Apache mod_jk/mod_cfml logs)
– CF web access logs for high latency
Example slow query logging (DB-side), or add in CF with cfquery and log elapsed time.
—–
### Step 4: Inspect request tuning and timeouts
In ColdFusion Administrator:
– Server Settings > Settings:
– Enable “Timeout Requests” and set a sensible Default Timeout (e.g., 60–120s).
– Server Settings > Request Tuning:
– Limit Simultaneous Requests (start with cores x 2–4; e.g., 16–64)
– Queue Timeout (don’t let requests wait indefinitely)
– Throttle Concurrent cfthread if used heavily
– Update connectors (IIS/Apache) using wsconfig for your CF version.
Per page, you can also enforce:
—–
### Step 5: Validate JVM memory and GC behavior
If GC thrashing is the root cause, CPU will be busy in GC threads and throughput will collapse.
– Enable GC logs to analyze:
Add to JVM arguments (jvm.config):
-XX:+UseG1GC
-Xms4g
-Xmx4g
-XX:+UseStringDeduplication
-Xlog:gc*,safepoint:file=/path/to/logs/gc.log:time,uptime:filecount=5,filesize=20m
Restart CF and monitor gc.log. Signs of trouble:
– Very frequent young/old GCs with small reclaimed memory
– Long STW pauses
– Heap often near Xmx
Fixes:
– Increase heap if under-provisioned
– Reduce object churn (large allocations in loops, JSON/XML building)
– Use G1GC (CF 2018+/2021/2023 supported with appropriate JDK)
– Eliminate leaks (review caches, sessions, cfquery result retention)
—–
### Step 6: Analyze database bottlenecks
CPU spikes often mask DB delays. Threads appear RUNNABLE while waiting on I/O or CPU.
– Enable DB slow query logs; identify top offenders.
– In CF Admin > Data Sources:
– Set Max Connections appropriately (avoid too high; match DB capacity)
– Validation query and “Validate Connections” to avoid stale connections
– Timeouts for login and query
– In code:
– Always use cfqueryparam for parameterized queries.
– Avoid SELECT *; fetch only required columns.
– Add missing indexes; verify execution plans.
– Cache read-heavy, seldom-updated data:
SELECT id, title FROM posts WHERE isPopular = 1
—–
### Step 7: Look for busy loops and heavy operations
Common offenders:
– Unbounded cfloop over large in-memory datasets
– query-of-queries on massive result sets
– Complex regex in cfscript without limits
– cfdocument/CFPDF generating large PDFs with images/fonts
– cfspreadsheet on large Excel files
Mitigations:
– Break large jobs into chunks; batch processing
– Add guards; max iterations; streaming where possible
– Offload to asynchronous queues and run off-peak
– Pre-render or cache PDFs where feasible
—–
### Step 8: External service calls and timeouts
External HTTP/REST/SOAP calls can tie up CPU indirectly if many threads wait.
Enforce tight timeouts:
Also:
– Limit concurrency for external calls
– Implement circuit breakers/fallbacks
– Cache responses when possible
—–
### Step 9: Scheduler, mail, and indexing
– ColdFusion Scheduler: stagger jobs, reduce frequency, avoid running multiple heavy jobs simultaneously.
– cfmail: large spools or retries can spike CPU; check mail.log.
– Solr indexing (Jetty process) can consume CPU; verify indexing schedules and resource limits.
—–
### Step 10: Inspect the web connector and load balancer
– Ensure the CF web connector is up to date for your CF version.
– Right-size connector thread pools; too many threads = CPU contention.
– Avoid infinite health-check loops or overly aggressive LB retries.
– Keep-alive and timeout alignment between LB, web server, and CF.
—–
### Step 11: Compare against a known-good baseline
– Review recent deployments, CF updates, JDK changes, Driver updates
– Roll back recent changes if the spike correlates with a Deployment
– A/B test with one server removed from the pool to isolate
—–
## Configuration Examples
### Application-level timeouts and session control
In Application.cfc:
component {
this.name = “MyApp”;
this.sessionManagement = true;
this.sessionTimeout = createTimeSpan(0,0,30,0); // 30 minutes
this.applicationTimeout = createTimeSpan(1,0,0,0); // 1 day
// Optionally tune ORM/caching Features if used
// this.ormenabled = true; this.ormsettings = { … };
}
Page-level:
### cfthread usage
Limit threads and always join or time out:
// work
—–
## Common mistakes and How to Avoid Them
– Mistake: Raising Simultaneous Requests to very high values to “increase throughput”
– Avoid: Set to a value aligned with CPU cores and workload; too many threads cause context switching and higher CPU.
– Mistake: No request or external call timeouts
– Avoid: Enforce timeouts universally (CF Admin defaults and per-tag like cfhttp/CFQUERY).
– Mistake: Ignoring GC logs
– Avoid: Keep GC logging enabled in production; review after incidents.
– Mistake: Using SELECT * and no indexes
– Avoid: Fetch only needed columns; maintain proper indexing and query plans.
– Mistake: Running heavy scheduled jobs during peak
– Avoid: Stagger or move heavy jobs to off-peak hours; queue and throttle work.
– Mistake: Antivirus scanning CF temp and log directories
– Avoid: Exclude CF install, temp, and log paths from AV and backups during business hours.
– Mistake: Not updating CF hotfixes, JDK, or web connector
– Avoid: Keep platform up-to-date; many Performance issues are resolved by vendor patches.
—–
## Prevention Tips / Best practices
– Capacity planning
– Right-size JVM heap and CPU; monitor trends with APM tools.
– Proactive monitoring
– Use FusionReactor, New Relic, or JVM Flight Recorder to identify hotspots early.
– Alert on CPU > 85%, request queue depth, GC times, error rates.
– Request governance
– Reasonable defaults: 60–120s request timeout, connection timeouts, bounded concurrency.
– Code quality
– Add load tests for new Features.
– Use cfqueryparam, avoid large query-of-queries, paginate results.
– Caching
– Cache read-heavy outputs (page or fragment caching), query caching with expirations.
– Robust integrations
– Timeouts and retries with backoff, circuit breakers, response caching.
– Scheduled work
– Queue background tasks; use job runners; cap concurrency.
– Platform hygiene
– Keep CF/Java/DB connectors updated.
– Regularly rotate and compress logs; exclude from AV.
– Documentation and runbooks
– Maintain a standard incident Playbook: gather thread dumps, GC logs, metrics, and steps to roll back or scale out.
—–
## Key Takeaways
– High CPU on ColdFusion is a symptom; identify whether it’s code hotspots, database slowness, GC pressure, or Integration delays.
– Always capture multiple thread dumps and review GC logs during the incident to pinpoint the hotspot.
– Enforce strict timeouts, limit concurrency appropriately, and tune the JDBC pool and web connector.
– Optimize queries, add indexes, and cache where appropriate to reduce CPU load.
– Prevent recurrence through monitoring, capacity planning, and disciplined Release management.
—–
## FAQ
#### How can I tell if garbage collection is the cause of high CPU?
Enable GC logging and observe frequent or long pauses with low memory reclaimed. In thread dumps, GC threads will be active. If heap usage hovers near Xmx and pauses are frequent, tune heap and GC, reduce object churn, and investigate leaks.
#### Should I increase Simultaneous Requests to handle spikes?
Not blindly. Start with 2–4× CPU cores and test. Too many request threads cause context switching and contention, often increasing CPU and latency. Consider Scaling out horizontally or adding a queue rather than unbounded concurrency.
#### What’s the fastest way to capture evidence during an incident?
Immediately take 3 thread dumps 5–10 seconds apart, copy current CF logs (error/exception/server), and if enabled, the GC log. Note the top CPU-consuming process and number of cores affected. This snapshot often reveals the root cause.
#### Is high CPU always a ColdFusion problem and not the web server?
No. IIS (w3wp.exe) or Apache can consume CPU due to SSL, compression, or static File handling. Check which process is hot. Connector Configuration and load balancer behavior can create loops that spike CPU without CF being the bottleneck.
#### How do I safely test GC or heap changes?
Replicate the workload in a staging environment with production-like data, enable GC logs, and run load tests. Make one change at a time (e.g., switch to G1GC, adjust Xmx) and compare latency, throughput, and GC metrics before deploying to production.
