# Config Composition & Reusable Components with ZenDag

Hydra is powerful for configuration management, especially its ability to compose configurations from smaller, reusable pieces. ZenDag leverages this: you can define common components (like loggers, trainers, data modules) as separate configurations and then include them in your main stage configs. ZenDag will still discover any `deps_path` or `outs_path` declarations within these composed parts.

## Example: A Reusable File Logger

Let's define a configuration for a simple file logger. This logger will write to a file, and we want DVC to track this log file as an output of any stage that uses this logger.

### Defining the Logger Configuration

Create `configs/loggers_config.py`:
```python
# configs/loggers_config.py
from hydra_zen import builds, store
from zendag.config_utils import outs_path # Logger's output file is a DVC output
from pathlib import Path
import logging # Standard logging

# This is a simplified function. In reality, it would configure the logging system.
# For ZenDag's dvc.yaml generation, we primarily care that it defines an output path.
# The actual logging setup happens when the stage runs and Hydra instantiates this.
def setup_stage_file_logger(log_file_path_str: str, log_level: str = "INFO"):
    """
    (Mock) Sets up a file logger for a stage.
    The actual configuration of the Python logging system would happen here
    when Hydra instantiates this part of the config during stage execution.
    """
    log_file_path = Path(log_file_path_str)
    log_file_path.parent.mkdir(parents=True, exist_ok=True) # Ensure directory exists

    # Simulate logger setup for demonstration
    print(f"[LoggerSetup] Configuring file logger at: {log_file_path} with level {log_level}")
    
    # In a real scenario, you might return a configured logger object or just perform side effects.
    # For ZenDag's config resolution, the important part is that `log_file_path_str` uses `outs_path`.
    return {"log_file": str(log_file_path), "level": log_level}

# Hydra-Zen config for our file logger
FileLoggerConfig = builds(
    setup_stage_file_logger,
    populate_full_signature=True,
    # The log file path is an output of the stage using this logger.
    # It will be relative to the stage's output directory.
    log_file_path_str=outs_path("logs/stage_execution.log"),
    log_level="DEBUG" # Default log level for this config
)

# Register it in a 'logger' group
store(FileLoggerConfig, group="logger", name="default_file_logger")

# Another variant
VerboseFileLoggerConfig = builds(
    setup_stage_file_logger,
    populate_full_signature=True,
    log_file_path_str=outs_path("logs/verbose_stage_execution.log"),
    log_level="NOTSET" # Using NOTSET which is more verbose than DEBUG for standard logging
)
store(VerboseFileLoggerConfig, group="logger", name="verbose_file_logger")
```

### Using the Logger in a Stage

Let's modify the `TransformConfig` from our [Quickstart Notebook](quickstart.md) to include this logger using Hydra's `hydra_defaults`.

Modify `configs/transform_config.py`:
```python
# configs/transform_config.py (modified)
from hydra_zen import builds, store, MISSING # Import MISSING
from zendag.config_utils import deps_path, outs_path
# Assume transform_data is in my_project.stages.simple_transform
from my_project.stages.simple_transform import transform_data # Or your actual import

# Option 1: Stage function is unaware of the logger (Hydra instantiates it)
TransformConfigWithLogger = builds(
    transform_data, # transform_data itself doesn't take a logger argument here
    populate_full_signature=True,
    input_csv_path=deps_path("data/raw/input.csv"),
    output_csv_path=outs_path("data/processed/output_with_logging.csv"),
    scale_factor=2.5,
    # --- Hydra Defaults for Composition ---
    hydra_defaults=[
        "_self_",  # Always include this first
        {"logger": "default_file_logger"} # Load the 'default_file_logger' from the 'logger' group
        # To use the other logger: {"logger": "verbose_file_logger"}
        # The key 'logger' here will create a 'logger' node in the final composed config.
    ]
)
# Ensure the original default_transform (from quickstart) is also available if needed for other examples
# or update it to also use a logger if that's the new baseline.
# For this example, we create a new named config.
store(TransformConfigWithLogger, group="transform", name="logged_transform")

# If you had an original default_transform:
# OriginalTransformConfig = builds(
#     transform_data,
#     populate_full_signature=True,
#     input_csv_path=deps_path("data/raw/input.csv"),
#     output_csv_path=outs_path("data/processed/output.csv"),
#     scale_factor=1.5
# )
# store(OriginalTransformConfig, group="transform", name="default_transform")

```
For simplicity, we'll focus on the case where the logger is instantiated by Hydra, and the stage function `transform_data` doesn't need a `logger` argument directly. The `setup_stage_file_logger` function would typically configure a global/module logger that `transform_data` then uses via `logging.getLogger(__name__)`.

### How ZenDag Discovers the Logger's Output

1.  **Update `configure.py`**:
    *   Import `configs.loggers_config`.
    *   Ensure `transform` (and specifically `logged_transform`) is processed.
    ```python
    # configure.py (snippet)
    import configs.transform_config # Has logged_transform
    import configs.loggers_config   # Defines logger configs
    # ...
    # If you are also running quickstart's default_transform, keep its dummy input logic
    # Path("data/raw/input.csv").parent.mkdir(parents=True, exist_ok=True) 
    # pd.DataFrame({'id': [1,2], 'value': [10,20]}).to_csv(Path("data/raw/input.csv"), index=False)
    # os.system(f"dvc add data/raw/input.csv")

    STAGE_GROUPS = ["transform"] # This will pick up all configs in the 'transform' group
    # ...
    ```

2.  **Run Configuration:**
    ```bash
    python configure.py
    ```

3.  **Inspect `dvc.yaml`:**
    Look at the entry for `transform/logged_transform`:
    ```yaml
    stages:
      transform/logged_transform:
        cmd: python -m my_project.run_hydra_stage -cd artifacts/transform -cn logged_transform hydra.run.dir='artifacts/transform/logged_transform'
        deps:
        - data/raw/input.csv
        outs:
        # Output from transform_data itself
        - artifacts/transform/logged_transform/data/processed/output_with_logging.csv
        # Output from the composed logger!
        - artifacts/transform/logged_transform/logs/stage_execution.log 
        params:
        - artifacts/transform/logged_transform.yaml
    ```
    ZenDag's `configure_pipeline` calls `OmegaConf.resolve(cfg)` on the *fully composed* configuration for `transform/logged_transform`. This composed config includes the `logger` node (because of `hydra_defaults`), which itself contains `log_file_path_str=outs_path("logs/stage_execution.log")`. The `outs:` resolver is triggered, and the log file path is added to the `outs` for the `transform/logged_transform` DVC stage.

### Running the Stage

When you run `dvc exp run transform/logged_transform` (or `dvc exp run` if it's the only changed part):
*   Hydra will instantiate the `logger` part of its config, calling `setup_stage_file_logger`.
*   The (mock) `setup_stage_file_logger` will print its message. Because its `log_file_path_str` used `outs_path`, DVC now tracks this log file.
*   The `transform_data` function will run. Its own `log = logging.getLogger(__name__)` statements would go to the logger configured by `setup_stage_file_logger` if it configured the root logger or a relevant parent.
*   The file `artifacts/transform/logged_transform/logs/stage_execution.log` will be created (even if it's just empty or with minimal content in this mock, DVC tracks its existence as an output).

### Benefits
*   **Reusability:** Define logger (or trainer, optimizer, etc.) configs once, use them in many stages.
*   **Separation of Concerns:** Stage logic doesn't need to be cluttered with detailed setup for common components.
*   **Dynamic Outputs:** ZenDag automatically picks up DVC outputs (`outs_path`) declared deep within composed configuration structures.
*   **Flexibility:** Easily swap out components by changing the `hydra_defaults` (e.g., switch to `verbose_file_logger`).

## Conclusion

Hydra's composition, combined with ZenDag's `deps_path` and `outs_path` discovery, allows for building sophisticated and modular MLOps pipelines where even common components can have their outputs tracked by DVC without manual duplication in the `dvc.yaml`. This leads to cleaner, more maintainable, and highly reproducible workflows.