Reader
Recommended: A class for reading ETL files from the sync-output directory. This is the recommended way to read ETL files as it provides consistent error handling and metadata access.Installation
Basic Usage
Key Methods
get(stream, default=None, catalog_types=False, **kwargs)
Reads the selected file into a pandas DataFrame.get_metadata(stream)
Retrieves metadata from parquet files.get_pk(stream)
Gets primary key(s) from parquet file metadata.get_catalog_schema
Retrieves Singer schema from catalog file.Usage
Returns
Dictionary containing the stream’s schema definitionNotes
- Requires
catalog.jsonin root directory - Raises exception if stream not found in catalog
- Filters schema to include only type and properties
- Ensures array types have items dictionary
Common Patterns
Iterating through multiple streams
etl.py