What are the steps in exploratory factor analysis?

The steps in Exploratory Factor Analysis (EFA) involve a systematic process to uncover the underlying structure of a set of observed variables. This robust statistical technique simplifies complex data by identifying common factors, but it also involves a certain amount of subjective judgment at key stages.

Here are the essential steps in exploratory factor analysis:

Understanding the EFA Process

Exploratory Factor Analysis (EFA) is primarily used when researchers do not have prior hypotheses about the number of factors or the specific items that will load on each factor. It helps researchers identify latent constructs by examining the patterns of correlations among a set of observed variables.

1. Data Preparation and Assumption Checking

Before diving into factor analysis, it's crucial to prepare your data and ensure it meets the necessary statistical assumptions.

Sample Size: A sufficient sample size is vital. While there's no single rule, a common guideline suggests at least 5-10 participants per item, or a minimum of 100-200 observations.
Variable Measurement: Ensure variables are measured at an appropriate level (typically interval or ratio scales, though ordinal can be used with caution).
Correlation Matrix Suitability:
- Bartlett's Test of Sphericity: Tests whether the observed variables are unrelated (i.e., whether the correlation matrix is an identity matrix). A significant p-value (p < .05) indicates that the data is suitable for factor analysis.
- Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy: Measures the proportion of variance among variables that might be common variance. KMO values range from 0 to 1, with values above 0.6 generally considered acceptable, and above 0.8 being good.
Multicollinearity and Singularity: Check for highly correlated variables (multicollinearity) or perfectly correlated variables (singularity), which can cause issues.

2. Choosing an Extraction Method

The second basic decision point in exploratory factor analysis is choosing an extraction method. This step determines how the initial factors are derived from the observed variables. Each method uses a different statistical approach to identify common variance.

Principal Component Analysis (PCA):
- Purpose: A data reduction technique that explains the maximum amount of total variance in the observed variables. It doesn't distinguish between common and unique variance.
- When to Use: Often used when the primary goal is data reduction without necessarily identifying latent constructs, or as a preliminary step.
Principal Axis Factoring (PAF) / Principal Component Factoring (PCF):
- Purpose: Focuses specifically on explaining common variance among variables, aiming to identify underlying latent constructs.
- When to Use: Preferred when the goal is to uncover latent factors rather than just data reduction.
Maximum Likelihood (ML):
- Purpose: Provides parameter estimates that are most likely to have produced the observed correlation matrix, assuming multivariate normality. It allows for statistical significance tests.
- When to Use: Ideal when multivariate normality is met and the objective is statistical inference and model fit assessment.
Other Methods: Unweighted Least Squares (ULS), Generalized Least Squares (GLS), Alpha Factoring.

Example: If you're analyzing survey responses and believe there are underlying psychological traits driving responses, PAF or ML would be more appropriate than PCA.

3. Deciding the Number of Factors

The first basic decision point is to decide the number of factors to retain. This is a critical step that significantly impacts the interpretation of the results, and it often involves a blend of statistical criteria and theoretical reasoning.

Kaiser's Criterion (Eigenvalues > 1):
- Description: Retain all factors with an eigenvalue greater than 1. An eigenvalue represents the amount of variance explained by a factor.
- Caution: This method can sometimes over-extract factors, especially with a large number of variables.
Scree Plot:
- Description: A graphical method where eigenvalues are plotted against the factor number. The "elbow" or point where the slope of the line dramatically changes indicates the optimal number of factors to retain before the remaining factors explain only trivial amounts of variance.
- Benefit: Provides a visual aid to guide judgment, often more reliable than Kaiser's criterion alone.
Parallel Analysis:
- Description: Compares the observed eigenvalues with eigenvalues from randomly generated datasets of the same size. Factors are retained if their observed eigenvalue is greater than the corresponding random eigenvalue.
- Benefit: Considered one of the most accurate methods for determining the number of factors.
Theoretical Justification:
- Description: Prior research, existing theories, or the intended use of the factor structure can inform the number of factors.
- Importance: Statistical criteria should always be balanced with substantive knowledge.

4. Choosing a Rotation Method

The third basic decision point in exploratory factor analysis is choosing a rotation method. After factors are initially extracted, they are often difficult to interpret because variables may load on multiple factors. Rotation simplifies the factor structure, making it easier to understand which variables belong to which factor. This step also involves subjective judgment.

There are two main types of rotation:

Orthogonal Rotation (Uncorrelated Factors):
- Description: Assumes that the underlying factors are independent (uncorrelated).
- Methods:
  - Varimax: Most common orthogonal rotation. It minimizes the number of variables that have high loadings on a factor, thereby simplifying the columns of the factor matrix.
  - Quartimax: Aims to simplify the rows of the factor matrix, making each variable load highly on only one factor.
  - Equamax: A compromise between Varimax and Quartimax.
- When to Use: When there's a theoretical reason to believe factors are truly independent.
Oblique Rotation (Correlated Factors):
- Description: Allows factors to be correlated with each other, which is often more realistic in social sciences where latent constructs are rarely completely independent.
- Methods:
  - Direct Oblimin: A common oblique rotation method.
  - Promax: Often used for larger datasets and as a faster alternative to Direct Oblimin.
- When to Use: When factors are expected to be related, providing a more accurate representation of their underlying relationships.

Practical Insight: If you suspect your underlying constructs might be related (e.g., "anxiety" and "depression" are often correlated), oblique rotation is generally preferred as it provides both the factor loadings and the correlations between factors.

5. Factor Interpretation and Refinement

Once factors are extracted and rotated, the next step is to interpret their meaning.

Examine Factor Loadings:
- Loadings: These are the correlation coefficients between the original variables and the factors. High loadings (e.g., > 0.3 or 0.4) indicate a strong relationship.
- Simple Structure: Aim for a "simple structure" where each variable loads strongly on only one factor and weakly on others.
Naming Factors: Based on the variables that load highly on each factor, assign a meaningful name that reflects the underlying construct. This often requires careful consideration and theoretical understanding.
Addressing Problematic Items:
- Cross-loading Items: Variables that load significantly on more than one factor. These might be removed or re-evaluated.
- Low Loading Items: Variables that don't load strongly on any factor. These might also be removed.
Re-running the Analysis: It's common to iterate through steps 2-5, removing problematic items or trying different rotation methods, until a clear and interpretable factor structure emerges.

6. Reporting Results

The final step is to clearly report the findings of your EFA.

Methodology: Detail the extraction and rotation methods used, and the criteria for determining the number of factors.
Factor Structure: Present the final factor loadings, typically in a table, highlighting significant loadings.
Explained Variance: Report the percentage of total variance explained by the retained factors.
Factor Correlations: If oblique rotation was used, report the correlations between the factors.
Interpretation: Discuss the theoretical implications of the identified factors and their names.

Exploratory Factor Analysis