Checking Spark logs is essential for debugging, performance tuning, and understanding the execution flow of your Spark applications. The method depends on whether your Spark job is currently running, has completed, or is running on a managed platform.
1. Using the Spark Web UI (For Live Jobs)
The most common and immediate way to check logs for a currently running Spark application is through its dedicated Spark Web UI.
-
Accessing the UI:
- If you're running Spark locally, the UI is usually accessible at
http://localhost:4040
. - On a cluster, the Spark UI address is typically linked from the cluster's resource manager (like YARN or Mesos) or provided by your cloud provider. It usually follows the format
http://<driver-node-ip>:4040
(or another port if 4040 is in use). - Tip: When running a Spark application, the console output often includes a line indicating the Spark UI URL.
- If you're running Spark locally, the UI is usually accessible at
-
Navigating for Logs:
- Once in the Spark UI, navigate to the
Executors
tab. - For each executor, you'll find links under the
Log
column, typically labeledstdout
(standard output) andstderr
(standard error). - Clicking these links will display the logs for that specific executor, including any print statements, error messages, or internal Spark logging.
- The driver logs are usually found under the "Driver" row within the "Executors" tab as well, or sometimes linked directly from the
Jobs
orStages
tab if a specific task failed on the driver.
- Once in the Spark UI, navigate to the
2. Via Cluster Resource Managers (YARN, Mesos, Kubernetes)
When Spark runs on a cluster, the cluster's resource manager often acts as a gateway to your application's details, including the Spark UI and logs.
-
YARN (Yet Another Resource Negotiator):
- Access the YARN ResourceManager UI (often
http://<ResourceManager_IP>:8088
). - Find your Spark application in the list (look for "Spark" in the application type or name).
- Click on the "Tracking UI" link, which usually redirects you to the Spark Web UI for that specific application.
- Alternatively, YARN's UI provides an "Logs" column for each application. Clicking this might take you directly to the aggregated logs (via tools like YARN Log Aggregation) or to the Application Master's logs.
- Access the YARN ResourceManager UI (often
-
Mesos or Kubernetes: Similar patterns exist where their respective UIs provide links to the Spark application's interface or direct access to container logs. For Kubernetes, you'd typically use
kubectl logs <pod-name>
to view container logs.
3. Reviewing Direct Log Files
Spark applications write their logs to specific files on the cluster nodes. This method is useful for deeply investigating issues or when the UI is unavailable.
- Driver Logs: These logs reside on the node where your Spark driver program is running (e.g., the edge node or the Application Master node).
- Common Locations:
$SPARK_HOME/logs/
on the driver node./var/log/spark/
(system-wide logs, if configured).- In the application's working directory, if
spark-submit
was configured to redirect output there. - Within the container logs if running on Docker/Kubernetes.
- Common Locations:
- Executor Logs: These are located on the worker nodes where your Spark executors are running. Each executor writes its own
stdout
andstderr
files.- Common Locations:
- Within the working directory of the Spark application on each worker node (e.g., under
/tmp/spark-<user>/
or a configuredspark.local.dir
). - YARN typically aggregates these logs to HDFS after the application completes, which can be viewed via the YARN UI.
- Within the working directory of the Spark application on each worker node (e.g., under
- Common Locations:
- Event Logs: These are structured logs that record Spark application events, used by the Spark History Server.
- Location: Configured by
spark.eventLog.dir
, usually an HDFS path.
- Location: Configured by
To view these files directly, you'd typically SSH into the respective nodes and use command-line tools:
tail -f <log_file>
: To view real-time log updates.grep "ERROR" <log_file>
: To filter for specific keywords like errors.cat <log_file> | less
: To view the entire file page by page.
4. Leveraging the Spark History Server (For Completed Jobs)
The Spark History Server is invaluable for reviewing details and logs of completed or terminated Spark applications. It reads event logs generated by Spark applications and presents them in a UI similar to the live Spark UI.
- How it Works:
- You must configure
spark.eventLog.enabled
totrue
andspark.eventLog.dir
to a persistent storage location (like HDFS or S3) that the History Server can access. - The History Server is a separate daemon that you start.
- Once running, you can access its UI (often
http://<HistoryServer_IP>:18080
) to see a list of all applications whose event logs it has processed.
- You must configure
- Accessing Logs:
- Click on a specific application ID in the History Server UI.
- This will display the familiar Spark UI for that past application.
- Navigate to the
Executors
tab to view aggregatedstdout
andstderr
logs, or links to the log files if they are still accessible.
5. On Managed Cloud Platforms and Notebook Environments
Cloud providers and managed Spark services (like AWS EMR, Databricks, Google Cloud Dataproc, Azure Synapse, or notebook platforms) centralize log management and provide streamlined interfaces.
- Centralized Interfaces: For managed Spark platforms, reviewing historical job execution and accessing past logs often involves a specific workflow through their user interface. Typically, you would navigate to a section designated for job analysis or execution history. Here, you can usually identify your specific Spark job or command by a unique identifier. Platforms often provide a search or history feature, sometimes accessible via an intuitive control like a down arrow on a search bar, to display a list of your past commands and their detailed execution information, including links to their respective logs.
- Cloud Logging Services: Many platforms integrate with their native cloud logging services (e.g., Amazon CloudWatch, Google Cloud Logging, Azure Monitor), where logs are automatically collected, stored, and made searchable.
6. Understanding Different Log Types
Log Type | Description | Location | Primary Use |
---|---|---|---|
Driver Logs | Output from your main Spark application (driver program) including log4j messages and exceptions. |
Driver node's log directory, Application Master logs. | Application errors, job submission issues. |
Executor Logs | Output from worker processes (executors) running tasks. Includes task-specific errors and stdout/stderr. | Worker node's log directory, aggregated by resource manager, Spark UI Executors tab. |
Task failures, data processing issues. |
Event Logs | Structured JSON logs recording Spark application events (e.g., job start/end, task completion). | Configured HDFS/S3 path (spark.eventLog.dir ). |
Spark History Server, detailed post-mortem analysis. |
System Logs | Underlying system logs (e.g., YARN NodeManager logs, Kubernetes pod logs). | Respective cluster manager daemon logs or cloud logging services. | Infrastructure issues, resource allocation. |
7. Practical Tips for Log Analysis
- Search for Keywords: Use
ERROR
,Exception
,WARN
,Failed
,GC
to quickly pinpoint issues. - Time Correlation: Note timestamps to correlate events across different log files (driver, executor, system).
- Log Levels: Configure
log4j.properties
(orspark.driver.extraJavaOptions
,spark.executor.extraJavaOptions
) to adjust log verbosity. For production,INFO
orWARN
is common; for debugging,DEBUG
can be useful but generates a lot of output. - Log Aggregation: On clusters, enable log aggregation (e.g., YARN Log Aggregation) to consolidate logs from all nodes into a central location (like HDFS) for easier access after the job completes.
By utilizing these methods, you can effectively monitor and troubleshoot your Spark applications.