CLI Parser construction#

This file describes how to run the same parser generation pipeline as described in the parser construction notebook, but using the command line interface. It constructs a parser file for an animals.csv file of test data, and assumes all commands are run from the root of the autoparser package.

Note: As a reminder, you will need an API key for OpenAI or Google. This example uses the OpenAI LLM.

Generate a data dictionary#

In this example, we will generate a data dictionary with descriptions already added in one step. The CLI command follows this syntax:

autoparser create-dict data language [-d] [-k api_key] [-l llm_choice] [-c config_file] [-o output_name]

so for the animal_data.csv data we will run this command to generate a data dictionary with descriptions

autoparser create-dict tests/sources/animal_data.csv "fr" -d -k $OPENAI_API_KEY -c tests/test_config.toml -o "animal_dd"

This creates an animals_dd.csv data dictionary to use in the next step.

Create intermediate mapping file#

The next step is to create an intermediate CSV for you to inspect, mapping the fields and values in the raw data to the target schema. This is the CLI syntax:

autoparser create-mapping dictionary schema language api_key [-l llm_choice] [-c config_file] [-o output_name]

so we can run

autoparser create-mapping animal_dd.csv tests/schemas/animals.schema.json "fr" $OPENAI_API_KEY -c tests/test_config.toml -o animal_mapping

to create the intermediate mapping file animal_mapping.csv for you to inspect for any errors.

Write the parser file#

Finally, the parser file for ADTL should be written out based on the contents of animal_mapping.csv. Once you’ve mande any changes to the mapping you want, we can use the create_parser command

autoparser create-parser mapping schema_path [-n parser_name] [--description parser_description] [-c config_file]

as

autoparser create-parser animal_mapping.csv tests/schemas -n animal_parser -c tests/test_config.toml

which writes out the TOML parser as animal_parser.toml ready for use in ADTL.