app.parsers.csv_parser module¶
CSV Bank Statement Parser — Full Pipeline
Architecture: 5-stage pipeline, each stage is independently testable.
Stage 1: FileReader → decode bytes, detect encoding Stage 2: FormatDetector → detect delimiter, find header row, map columns Stage 3: RowFilter → skip blanks, metadata rows, summary rows Stage 4: RowParser → parse each row (date, amount, description, type) Stage 5: PostProcessor → validate balance, compute stats, build result
- Design principles:
Never crash on bad data; collect warnings and continue
Always preserve raw values for debugging
Amount is always positive; direction is explicit (‘C’ / ‘D’)
All warnings attached to the ParsedBankStatement result
- class app.parsers.csv_parser.CSVParser(max_rows: int = 100000)[source]¶
Bases:
BaseParser[ParsedBankStatement]Orchestrates the 5-stage CSV parsing pipeline. Entry point for all CSV bank statement parsing.
- parse(content: bytes) ParsedBankStatement[source]¶
Full pipeline: bytes → ParsedBankStatement.
- Parameters:
content – Raw file bytes.
- Returns:
ParsedBankStatement with transactions and quality metadata.
- Raises:
CSVEncodingError – Cannot decode file.
CSVMissingRequiredColumnsError – Mandatory columns absent.
CSVNoDataRowsError – No valid data rows after filtering.
CSVParseError – Unrecoverable structural error.
- class app.parsers.csv_parser.FileReader[source]¶
Bases:
objectReads raw bytes → decoded text string. Handles: UTF-8, UTF-8-BOM, Latin-1, Windows-1252, UTF-16.
- SUPPORTED_ENCODINGS = ['utf-8-sig', 'utf-8', 'latin-1', 'windows-1252', 'utf-16']¶
- read(content: bytes) tuple[str, str][source]¶
Decode bytes to string.
- Returns:
(decoded_text, detected_encoding)
- Raises:
CSVEncodingError – If all known encodings fail.
- class app.parsers.csv_parser.FormatDetector[source]¶
Bases:
objectDetects CSV format and maps column headers to logical fields.
- CANDIDATE_DELIMITERS = [',', ';', '\t', '|', ':']¶
- detect(text: str) tuple[str, list[list[str]]][source]¶
Detect delimiter and parse all rows.
- Returns:
(detected_delimiter, all_rows_as_lists)
- find_header_row(rows: list[list[str]]) int[source]¶
Find which row index contains the column headers. Some bank exports have 3-5 metadata rows before the actual table.
Returns the 0-based index of the header row.
- map_columns(header_row: list[str]) ColumnMapping[source]¶
Map header cell values to logical field indices. Returns a ColumnMapping with index positions.
- validate_mapping(mapping: ColumnMapping, headers: list[str]) None[source]¶
Raise if mandatory columns are missing.
- class app.parsers.csv_parser.PostProcessor[source]¶
Bases:
objectFinal validation and result assembly. - Infers statement date range from transactions - Detects dominant currency - Validates running balance continuity - Computes statistics
- process(transactions: list[ParsedTransaction], raw_headers: dict, encoding: str, delimiter: str, column_mapping: ColumnMapping, rows_skipped: int, format_warnings: list[dict]) ParsedBankStatement[source]¶
- class app.parsers.csv_parser.RowFilter(expected_col_count: int)[source]¶
Bases:
objectFilters out non-data rows before parsing.
- should_skip(row: list[str], row_index: int) bool[source]¶
Returns True if this row should be skipped. Appends to self.skipped for audit purposes.
- skipped: list[dict]¶
- class app.parsers.csv_parser.RowParser(mapping: ColumnMapping, dayfirst: bool = True)[source]¶
Bases:
objectParses a single CSV data row into a ParsedTransaction.
- parse(row: list[str], row_index: int) ParsedTransaction | None[source]¶
Parse one CSV row.
- Returns:
ParsedTransaction on success. None if the row cannot be parsed at all (e.g., date missing).