Skip to main content

A Brief Introduction to Context Free Grammars

I spent much of my afternoon and evening writing a specification in Extended Backus-Naur Form (one form of a context free grammar). While trying to convey my activities to a friend, I stumbled upon an analogy. I realized the analogy works quite well to explain in a jargon free way, that anyone could understand, what is arguably a fairly simple concept – context free grammars – but that usually gets obfuscated in formal language and technical terms.

Python (and likely many other languages) use Backus-Naur Form to describe the syntax of the language. I used it to describe a format for a specific part of the project I’m working on. What both of these use cases have in common is that context free grammars are a powerful way to compactly describe something with a lot of options.

So imagine you want to open up a design-your-own-taco shop. While thinking about the menu, you realize that there’s a ton of options, what do you do?

One solution that you might have already seen is like this:

  • For tortilla the customer can have either flour or corn
  • For ‘meat’ the customer can have one of these: beef, pork, steak, chicken, tofu, fajita veggies
  • For salsa the customer can have any of the following: pico de gallo, mild, medium, hot, extra hot
  • For toppings the customer can have two of the following: cheddar, lettuce, pepper jack, sour cream

Pretty straight forward right? We’re all used to that style of menu. But if the resturant were to try to list out all the possible tacos a customer could make it would be around 17! or 355,687,428,096,000 possible tacos. That’s a lot of tacos.

So how does this relate to Backus-Naur Form and context free grammars?

What you did when designing a menu like that roughly emulates a context free grammar. Let me show you how:

The first assumption is that a taco is made up of the following: tortilla, ‘meat’, salsa and toppings

taco = tortilla, meat, salsa, toppings

That’s our first introduction into Extended Backus-Naur Form (EBNF), it means that the thing we call taco is actually made up of four things: tortilla, meat, salsa, toppings.

Well what are the options for tortilla. You and I both know, but this is how to write it in EBNF:

tortilla = corn | flour

The vertical bar | character means or. So the customer can pick either corn or flour, like we said above. These lines are called ‘production rules’ as they specify how the thing on the left, can be replaced by the thing or things on the right. So the next rule for the ‘meat’ option is pretty straight forward:

meat = beef | pork | steak | chicken | tofu | fajita veggies

But what happens when we try to create the next menu rule? Previous rules have been of a ‘pick one of the following options’ type. Salsa is different, as the customer can have as many kinds of salsa as they like. In EBNF we describe that like this:

salsa = [pico de gallo] [mild] [medium] [hot] [extra hot]

The brackets ([ ]) around each object means it’s optional. Finally our last rule gets a little more complex, as the customer can have any two of the options. There might be other ways this could be done, but the way I would specify it is as follows:

topping = [choices] [choices]
choices = cheddar | lettuce | pepper jack | sour cream

This says that the customer can choose zero toppings, one topping, or two toppings. Then we specify what counts as a topping.

Context free grammars and Backus-Naur Form have the concept of production rules and terminal symbols. The terminal symbols in our case are the specific ingredients that would go on the menu, as that’s the base unit for our menu. In other venues or usages a terminal symbol might be a literal character, a digit or something else.

As you can see, context free grammars are a powerful way of breaking something down that has a ton of options into something we can reason with and understand. They allow both flexibility and specificity. In our case we specified for our restaurant what counts as a valid or allowable taco. Though there might be 355,687,428,096,000 possible tacos, you know a fish taco isn’t one of them.

The total taco menu specifying grammar is as follows:

taco = tortilla, meat, salsa, toppings
tortilla = corn | flour
meat = beef | pork | steak | chicken | tofu | fajita veggies
salsa = [pico de gallo] [mild] [medium] [hot] [extra hot]
topping = [choices] [choices]
choices = cheddar | lettuce | pepper jack | sour cream