prompt_processing

Prompt Processing package organization

This guide describes how the prompt_processing repository is organized at the directory and module level. It assumes familiarity with the Core Concepts.

For developing in parallel with other repositories, see Coordinating development. For how the module organization complicates testing, see Testing Prompt Processing. For the largest upcoming changes to the code, see the MiddlewareInterface refactoring project.

Overview

The Prompt Processing repository largely follows Science Pipelines conventions. It uses EUPS to manage Science Pipelines dependencies (the name is prompt_processing rather than prompt-processing for EUPS compatibility), making it easy to set up on different systems, and uses scons for build and test management. Issues and outstanding are work are tracked in Jira, using the prompt_processing component.

However, there are some differences from a typical Science Pipelines package:

Directory and module organization

Prompt Processing’s Python code is divided into four namespaces (none of which are actually packages):

Prompt Processing’s activator modules have evolved organically from what used to be a simple prototype, and are the focus of ongoing refactoring work to separate responsibilities and dependencies (see the Prompt-Processing-refactor label in Jira and the MiddlewareInterface refactoring project). However, they can be roughly divided into primary modules that form the backbone of the service and utility modules that provide a specific class or related set of components.

Core modules

The core modules are:

One of the key priorities in this architecture is separation of dependencies. Only driver_keda.py and driver_gunicorn.py depend on packages like redis or flask, while only middleware_interface.py and local_repo.py depend on Middleware. activator.py should deal with external dependencies only through the abstractions provided by other modules (an ideal that has not yet been met).

Important utilities

Prompt Processing has many smaller classes, of which the most important are:

GitHub Scripts

Because Prompt Processing is deployed through Docker containers and is responsible for its own releases, it has an unusually large number of custom GitHub actions and workflows. Reusable components may be implemented as actions or workflows, depending on that the component needs (in particular, actions let the enclosing workflow define environment variables for them).

For running the workflows, see the Playbook.