Assembly Line Pattern: Components
April 14, 2019
Assembly Line pattern describes a process transforming a series of similar items. It is very common in functional programming languages. After reading this article, you will be familiar with components of the pattern, and you will be able to identify them in real-life applications. The article constitutes the first part of a series on how to apply the pattern in code, system design, problem analysis, and individual productivity.
Components and Flow
An assembly line consists of a series of components, or stations:
which perform actions that change an item's status:
Dispenser: the source of your items
Every assembly line starts with a source, a dispenser which either produces or delivers items (data) of similar shape for the assembly line to process.
Real-life dispensers include:
- a directory of files
- a CSV file with a list of customer email addresses
- a sequence of numbers (1, 2, 3 ... n)
- a queue of buy/sell instructions
- a search API endpoint
- a log file of user interactions with your website
Dispensers exist on the boundary between your system and the "outside world", which means they require special handling that would be unnecessary in the rest of your program. (A receptionist may sit at the office entrance, but usually not at the entrance to each individual meeting room.)
Quality control: which items are useful
In a factory, not every item coming out from a dispenser is of the right shape. The assembly line needs a quality control step. The earlier you can install it the better, as its goal is to save operational costs by cutting down the amount of items processed.
To set up effective quality control, you need to prepare a simple formula which, given an item, will judge "yes, it is useful" or "no, it should be discarded". This is called a filter.
Real-life quality control filters:
- skip write-protected and hidden files
- discard inactive emails
- in a sequence of (1, 2, ...) skip days which are weekends and holidays this month
- skip buy/sell instructions that have already been reported
- skip API responses that returned an error
- skip user interaction logs generated by testing tools
Refinery: how do you process and improve the items?
An assembly line exists because items generated by the dispenser are not usable in their raw state, without processing. Refinery station is where items are modified. It's useful to think of a refinery station in terms of an action performed on a single object: a transformation.
A real-life refinery station:
- translates a file from one language to another
- inserts a customer's name and email into an email template
- converts the day number into (day-number, day-of-week) pair, generates an iCalendar event specifying gym routine for the day
- converts the currency in a buy/sell instruction according to today's exchange rate
- transforms a text file with markdown encoding into a static HTML file
- transforms a line of log file into a JSON object
Assembly: how do you bundle the items?
In an assembly station items are combined or bundled into larger packages in order to:
- assemble the items into a final product (a layer cake), or
- group items into batches (nobody would buy a single piece of pasta), or
- bundle the items together for shipment to save on individual delivery costs.
A real-life assembly station might:
- compress files into a single archive
- wrap iCalendar events into a single iCalendar file
- calculate a day's balance as sum of financial expenses and profits
- wrap JSON objects in a JSON list
Dispatcher: who organises delivery?
Finally, items need to be passed to a dispatcher, who will then take care of further delivery. Like a dispenser, a dispatcher is not strictly part of the assembly line. An assembly line may have more than one dispatcher.
A real-life dispatcher might perform:
- displaying translated files on a website
- sending templates to an email server queue
- uploading the calendar to a calendar service or server
- printing a financial report
- uploading files to a VPS via FTP
- sending JSON in response to an API request
Analysing a system
If a system doesn't look like it conforms to the basic assembly line pattern, classify functions of its components by function.
Dispenser | Dispatcher | Quality Control | Refinery | Assembly |
---|---|---|---|---|
Input | Output | Tests, checks | Transformations | n items -> 1 item |
Randomness | Logging | Value reads | Writes, updates | Temporary cache |
Timers | -1 item | Stateless | Stateless | Stop condition |
+1 item | Stateful |
Repeating or skipping stations
A process may repeat or skip stages of a certain type. For example, the following imaginary JSON processing workflow contains multiple Quality and Process steps, while skipping Assembly.
- Dispenser: decode a JSON string containing a list of objects
- Quality: if field contents are empty, discard the object
- where the field is "name"
- (empty "name" field means malformed data)
- Process: transform object "category" field from ID to label
- (using an ID-to-label lookup map from process configuration)
- Quality: if field contents are empty, discard the object
- where the field is "category"
- (empty "category" field means the lookup failed)
- Process: encode each object as JSON string
- Dispatcher: save each string in a separate file
Composite Assembly Lines
When examined in greater detail, stages of a process may consist of other assembly lines. Analysing your system in this way helps you find optimisations, reusability, and switch between bottom-up and top-down design (all of which we will cover when talking about assembly line design.)
- Dispenser: get files from a specified directory
- Dispenser: get a list of files
- Quality: if a file is read-only or hidden, discard it
- Quality: if the extension is something else than
clj
,cljs
, orcljc
, discard the file - Transform: for each file, count its lines of code
- Dispenser: read in the file and split it into lines
- Filter: if a line is a comment, discard it
- Assembly: count the remaining lines
- Dispatcher: return the line count
- Assembly: create a HTML report with LOC statistics
- Dispenser: calculate the min, max, mean and median and return a sequence of (label, value) pairs
- Transform: round each value to two digits and convert it to string
- Assembly: update the HTML report template by copying each value into a location indicated by label
- Dispatcher: send the resulting webpage as a response to a request that triggered the process
Parallel Assembly Lines
Parallel processing is one of the flagship usecases for functional languages. This is because an assembly line is easily parallelised. Filters and transformations work on a single item at a time, independently of other items.
every node in a processing cluster can run its own small sequence of filters and transformations.
- Dispenser: send each text of a corpus to an available distributed processing node
- Transform-parallel: a node calculates frequency of words in the text it received
- Assembly: collect results from nodes and calculate frequency of words in the entire corpus
- Dispatcher: save the frequency table into a file
Summary
An assembly line is a programming and system design pattern especially common in functional programming languages.
An assembly line consists of a dispenser, which adds items to the line, a quality control station, which discards items from the line, a refinery which transforms the items, an assembly station which bundles them, and a dispatcher which sends them out of the system.
Filters are functions which, given an item, return a boolean (indicating whether to keep or discard an item.) Transformations are functions which, given an item, return a new item. Both should be stateless, that is, given an item they will always return the same result. This makes assembly lines especially suited to parallel processing.