The cpc_parser
package is a Python library designed to extract structured question data from CPC (Certified Professional Coder) practice test PDF files. It converts unstructured PDF content into validated, structured data models that can be used for analysis, benchmarking, and machine learning applications.
cpc_parser/
├── __init__.py # Package exports and version info
├── schema.py # Pydantic data models (Question, QuestionDataset)
└── parse_pdf.py # Main parsing logic (CPCTestParser)
schema.py
)Question
: Represents a single CPC test question with validation
QuestionDataset
: Collection of questions with metadata and utilities
parse_pdf.py
)CPCTestParser
: Main parsing engineparse_cpc_test()
: Convenience function for quick parsingflowchart TD
A["CPC Test PDF"] --> B["Initialize CPCTestParser"]
B --> C["Parse Questions<br/>(Pages 4-35)"]
C --> D["Parse Answer Key<br/>(Answer Key Section)"]
D --> E["Parse Explanations<br/>(Explanations Section)"]
E --> F["Combine Data"]
F --> G["Validate with Pydantic"]
G --> H["QuestionDataset"]
H --> I["Export JSONL"]
H --> J["Generate Statistics"]
style A fill:#e1f5fe
style H fill:#e8f5e8
style I fill:#fff3e0
style J fill:#fff3e0