Troubleshooting

How to Troubleshoot ColdFusion Server Crashes

Overview of the Problem

ColdFusion “Server crashes” typically mean the ColdFusion application service stops responding, terminates unexpectedly, or restarts under load. Symptoms include pages timing out, HTTP 500 errors, JVM crash files (hs_err_pid), unresponsive Administrator, or the Windows/Linux service silently stopping. Causes range from JVM-level failures (OutOfMemoryError, native crashes), application deadlocks, runaway threads, resource exhaustion (CPU, file handles, ports), web server connector issues, to buggy drivers or OS limits.

Understanding whether your issue is a hard crash (JVM terminated), a hang (process alive but not responding), or a restart loop is critical. ColdFusion runs on Tomcat inside a JVM, so most “crashes” are Java or OS resource problems triggered by CFML code, third-party libraries, or Infrastructure Configuration.


Quick reference: Common Causes and Fixes

  • JVM memory exhaustion (heap or metaspace)
    • Fix: Increase heap, resolve Memory leaks, enable heap dumps, optimize code.
  • Long GC pauses or GC thrashing
    • Fix: Tune GC, right-size heap, update JVM, reduce object churn.
  • Native memory or library crash (e.g., JDBC drivers, image manipulation, PDFG)
  • Thread deadlock or request queue exhaustion
    • Fix: Capture thread dumps; remove synchronized bottlenecks; adjust Tomcat maxThreads; correct cfthread usage.
  • Web server connector (IIS/Apache) misconfiguration or AJP issues
    • Fix: Recreate connector; align timeouts/thread limits; update connector.
  • Database connection pool exhaustion or DB-side locks
    • Fix: Increase pool size; ensure connections are closed; index queries; resolve DB blocking.
  • Disk full or permissions on temp/log directories
    • Fix: Free space; assign correct permissions; relocate logs and heap dumps.
  • OS resource limits (ulimit, file descriptors, ephemeral ports)
    • Fix: Raise limits; tune TIME_WAIT reuse; monitor sockets.
  • Bugs fixed by ColdFusion or JVM updates
    • Fix: Apply latest CF updates/hotfixes and supported JVM version.

Step-by-Step Troubleshooting Guide

Step 1: Confirm You’re Dealing with a Crash vs a Hang

  • Is the Windows service or Linux systemd unit stopped? Are there new hs_err_pid files?
  • Does the process show high CPU but no response? That’s more likely a hang.
  • Are requests returning 503 from IIS/Apache while CF is healthy? That points to connector issues.
See also  How to Resolve 500 Server Errors in ColdFusion

Key check: If you see a new hs_err_pid_XXXX.log or the process is gone, it’s a crash. If the process is present but unresponsive, it’s a hang or deadlock.


Step 2: Gather Evidence Before Restarting

  • CF logs: cfusion/logs/exception.log, application.log, coldfusion-out.log, server.log, http-out.log, scheduler.log.
  • Tomcat logs: catalina.out (varies by install), coldfusion-launcher logs.
  • OS logs:
    • Windows Event Viewer: System and Application logs.
    • Linux: /var/log/messages or journalctl -u coldfusion (service name may vary).
  • JVM artifacts:
    • hs_err_pid*.log for JVM crashes.
    • Heap dumps (.hprof) if enabled.
    • GC logs if enabled.

Example hs_err_pid snippet:

A fatal error has been detected by the Java Runtime Environment:

EXCEPTION_ACCESS_VIOLATION (0xc0000005)

at pc=0x00007ff9b0a1d2f1, pid=1234, tid=5678

JRE version: OpenJDK Runtime Environment (11.0.22+7)

Problematic frame:

C [sqljdbc_auth.dll+0x1d2f1]

Interpretation: Native crash in a JDBC DLL. Update driver, review integrated auth Configuration.


Step 3: Enable Diagnostic Logging (if not already)

Add or confirm JVM options in jvm.config (ColdFusion Administrator > JVM; or cfusion/bin/jvm.config):

-Xms4g
-Xmx4g
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/coldfusion/cfusion/logs/heapdumps
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:/opt/coldfusion/cfusion/logs/gc.log

  • Adjust heap sizes to match server capacity.
  • Ensure heap dump path exists and is writable.
  • On Java 11+, you can use unified logging for GC:

-Xlog:gc*:file=/opt/coldfusion/cfusion/logs/gc.log:time,uptime,level,tags

Restart during a Maintenance window if you change JVM options.


Step 4: Diagnose Memory Issues

Indicators:

  • OutOfMemoryError in logs.
  • GC log shows frequent full GCs with little memory reclaimed.
  • Heap dump generated at crash.

Actions:

  • Analyze heap dumps with Eclipse MAT, VisualVM, or YourKit to find large dominators and leaks (e.g., Application scope caches).
  • Inspect CFML for unbounded collections stored in Application/Server scope, excessive Session replication, or large binary objects.

Example CFML leak pattern:
cfml

<cfset application.reportCache[createUUID()] = getLargeReportData()>

Solution: Introduce size-bounded caches (e.g., Guava cache via JavaLoader, or simple LRU), purge expired entries, or store only keys/paths to files.

If memory is simply undersized (no leak), right-size -Xms/-Xmx and tune GC.


Step 5: Investigate CPU Spikes and Thread Contention

Signs:

  • 100% CPU, slow or no responses, but JVM alive.
  • Thread dumps show many threads in RUNNABLE with same stack (hot loop) or BLOCKED on locks.

Capture several thread dumps 5–10 seconds apart:

jcmd Thread.print

or

jstack -l > /tmp/tdump1.txt

Review for:

  • Deadlocks (jstack reports FOUND ONE JAVA-LEVEL DEADLOCK).
  • Long-running synchronized blocks.
  • cfthread jobs that never finish.

Fixes:

  • Remove unnecessary synchronization.
  • Use timeouts with cfthread and cfhttp.
  • Break long tasks into smaller chunks or run outside request path (queues).

Step 6: Check Request Queue and Connector Limits

Tomcat connector settings (server.xml):

<Connector port=”8500″ protocol=”AJP/1.3″
maxThreads=”400″
connectionTimeout=”20000″
maxConnections=”10000″
acceptCount=”200″/>

  • maxThreads too low can cause backlogs; too high can starve CPU and memory.
  • Align IIS/Apache and Tomcat limits to prevent 503/timeout before CF can respond.

IIS/Apache connector:

  • Recreate connector using wsconfig to ensure current module and settings.
  • Review connector logs for timeouts.

Step 7: Database and External Dependencies

Common problems:

  • Connection pool exhaustion (connections not returned).
  • Slow queries causing thread pile-ups.
  • DB locks and deadlocks, particularly under load.
See also  How to Fix Lucee Migration Compatibility Issues

Actions:

  • Enable datasource logging in CF Administrator temporarily.

  • Ensure cfquery blocks close resources; consider setting “Maintain Connections” properly.

  • Add indexes, review execution plans, use timeouts:
    cfml


    SELECT … WHERE indexed_column =

  • For external services (cfhttp, Web services), add reasonable timeouts and retries with backoff.


Step 8: Validate OS-Level Limits and Health

  • Disk: ensure free space on CF logs, temp, and heap dump directories.
  • File descriptors (Linux): raise limits for the CF user.

ulimit -n

/etc/Security/limits.conf

cfuser soft nofile 65536
cfuser hard nofile 65536

  • Ephemeral ports: monitor TIME_WAIT; consider net.ipv4.ip_local_port_range and tcp_tw_reuse as appropriate (consult Security guidelines).
  • Antivirus exclusions for CF directories and temp to reduce file lock issues (Windows).

Step 9: Update Stack Components

  • Apply the latest ColdFusion updates/hotfixes (Adobe CF 2018/2021/2023) or Lucee patches.
  • Update to a supported JVM build recommended for your CF version (Java 11 or 17 for CF2021/2023).
  • Update JDBC drivers, PDFG components, imaging libraries, and web server connectors.

Always test in staging first.


Step 10: Stabilize and Monitor

  • Add application Performance monitoring:
    • FusionReactor, SeeFusion, New Relic Java agent, or Java Flight Recorder.
  • Create alerts for heap usage, GC time, thread counts, request queue, and response times.
  • Schedule log rotation and archiving to avoid disk pressure.

Detailed Causes With Symptoms and Solutions

  • JVM OutOfMemoryError (heap or metaspace)

    • Symptoms: OOME in logs, heap dump created, long GC pauses.
    • Solution: Increase -Xmx; fix Memory leaks; reduce object churn; use caches with limits; enable HeapDumpOnOutOfMemoryError.
  • Native crash (DLL/SO)

    • Symptoms: hs_err_pid references native library; Windows Event Viewer shows faulting module.
    • Solution: Update/remove offending module (e.g., sqljdbc_auth.dll, image/PDF libraries), ensure Architecture matches (x64), and OS patches are current.
  • Deadlock or thread starvation

    • Symptoms: jstack shows deadlock; many threads BLOCKED; CPU may be low.
    • Solution: Remove circular locks; minimize synchronized sections; avoid nesting locks; redesign concurrency.
  • Connector/IIS/Apache issues

    • Symptoms: 502/503 from web server, CF logs quiet, CPU low in CF.
    • Solution: Recreate connector using wsconfig; adjust timeouts; align max threads; confirm firewall/AV isn’t blocking AJP.
  • Database contention

    • Symptoms: Many requests waiting on DB calls; DB shows locks or long-running queries.
    • Solution: Tune SQL; add indexes; set query timeout; increase pool size; ensure connections are returned to pool.
  • File system and permissions

    • Symptoms: Errors writing to logs/temp; CF fails on startup or during heavy logging.
    • Solution: Set correct permissions; rotate logs; relocate temp and dump directories to ample space.

Diagnostics: Logs and Configuration Examples

Sample coldfusion-out.log extract:

[INFO] Server startup in 102345 ms
[WARN] High number of active requests: 380
[ERROR] java.lang.OutOfMemoryError: Java Heap space

ColdFusion Administrator JVM settings example (jvm.config lines):

java.args=-Xms4096m -Xmx4096m -Djava.awt.headless=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/coldfusion/cfusion/logs/heapdumps -Xlog:gc*:file=/opt/coldfusion/cfusion/logs/gc.log:time,uptime,level,tags

Tomcat server.xml connector sizing:

<Connector port=”8500″ protocol=”AJP/1.3″
maxThreads=”300″
acceptCount=”200″
connectionTimeout=”20000″
redirectPort=”8443″ />


Common mistakes and How to Avoid Them

  • Treating every outage as memory-related without evidence

    • Avoidance: Collect thread dumps, GC logs, and hs_err_pid before changing heap sizes.
  • Setting massive maxThreads to “fix” slowness

    • Avoidance: Size thread pools based on CPU and DB capacity; use load testing.
  • Ignoring web server connector timeouts

    • Avoidance: Align IIS/Apache timeouts with CF request timeouts; recreate connector after CF updates.
  • Keeping unlimited data in Application/Server/Session scopes

    • Avoidance: Implement bounded caches; evict stale entries; store lightweight references.
  • Updating Java without verifying CF compatibility

    • Avoidance: Consult Adobe’s supported JVM matrix; test in staging first.
  • Disabling timeouts in cfhttp or cfquery

    • Avoidance: Always set sane timeouts; add retries with backoff.
See also  How to Fix CFThread Hanging or Not Completing

Prevention Tips / Best practices

  • Capacity planning and right-sizing

    • Set -Xms/-Xmx to appropriate values (usually 25–60% of RAM for the JVM, depending on other services).
    • Configure GC logging and review monthly.
  • Robust monitoring

    • Track heap usage, GC time, thread count, request throughput/latency, DB pool usage, disk space.
  • Code hygiene

    • Use cfqueryparam to help DB and prevent SQL injection.
    • Limit data stored in scopes; clear caches on deploys.
    • Use cfthread judiciously with timeouts and Error handling.
  • Update management

    • Keep ColdFusion, the JVM, connectors, and drivers up to date.
    • Maintain a staging environment and rollback plan.
  • Resilience patterns

    • Implement circuit breakers and bulkheads for external services.
    • Use queues or async jobs for long-running tasks.
    • Consider horizontal Scaling with session storage externalized (database/Redis) if needed.
  • Operational readiness

    • Automate log rotation.
    • Keep heap dump and GC log directories on disks with ample space.
    • Document runbooks for crash vs hang escalation paths.

Key Takeaways / Summary Points

  • Determine crash vs hang first; evidence guides the fix.
  • Enable diagnostic artifacts: GC logs, heap dumps, and thread dumps.
  • Memory, thread contention, connectors, and DB bottlenecks are the top culprits.
  • Fix the root cause before increasing resources; tune with data, not guesses.
  • Keep CF/JVM/drivers current and monitor the platform continuously.

FAQ

How do I tell the difference between a ColdFusion crash and a hang?

A crash leaves the CF service stopped or produces hs_err_pid files indicating the JVM terminated. A hang keeps the process alive but unresponsive. Use jstack to capture thread dumps during a hang and check OS logs and CF logs for crash artifacts.

What’s the safest way to capture a heap dump in production?

Enable -XX:+HeapDumpOnOutOfMemoryError and specify a writable HeapDumpPath. For an on-demand dump without stopping the JVM, use jcmd GC.heap_dump /path/file.hprof. Ensure sufficient disk space and restrict access to protect sensitive data.

Can IIS or Apache cause “ColdFusion is down” symptoms?

Yes. Misconfigured or outdated web server connectors can return 502/503 errors even when CF is healthy. Recreate the connector with wsconfig, align timeouts and thread limits, and review connector logs. Test hitting Tomcat’s AJP/HTTP port directly to isolate the issue.

How much memory should I allocate to ColdFusion?

Start by profiling under load. A common baseline is 4–8 GB heap for medium apps, ensuring the OS and other services have room. Monitor GC logs and adjust. Bigger isn’t always better; excessive heap can increase GC pause times if not tuned.

Should I upgrade the JVM to fix crashes?

Often yes, but verify compatibility with your ColdFusion version. Adobe documents supported JVM builds for each CF release. Test in staging, update connectors/drivers as needed, and monitor after upgrade.

About the author

Aaron Longnion

Aaron Longnion

Hey there! I'm Aaron Longnion — an Internet technologist, web software engineer, and ColdFusion expert with more than 24 years of experience. Over the years, I've had the privilege of working with some of the most exciting and fast-growing companies out there, including lynda.com, HomeAway, landsofamerica.com (CoStar Group), and Adobe.com.

I'm a full-stack developer at heart, but what really drives me is designing and building internet architectures that are highly scalable, cost-effective, and fault-tolerant — solutions built to handle rapid growth and stay ahead of the curve.