# Recursive Descent Parser for arithmetic expressions with real numbers

In previous post we were building Recursive Descent Parser for Boolean expressions and in the post before that we were parsing simple arithmetic expressions (with only addition and subtraction).

In this third, final post we will build more real world example – we will parse arithmetic expressions that include (besides addition and subtraction) multiplication and division. Note here that we also have to respect the priority of operations (multiplication and division are done before addition and subtraction) so our parser has to be “smart” enough to handle that. And just to make it more fun, this time we will also support real numbers instead of just boring integers.

#### Grammar

As usual, lets start with the grammar – i cannot stress how important this step is for the whole process:

```Expression := [ "-" ] Term { ("+" | "-") Term }
Term       := Factor { ( "*" | "/" ) Factor }
Factor     := RealNumber | "(" Expression ")"
RealNumber := Digit{Digit} | [Digit] "." {Digit}
Digit      := "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"```

Since we need to support priority of operations, we are putting the addition and subtraction (and negation) on top level of our grammar and therefore they will be evaluated at the end after the recursive calls to lover level terminal symbols are parsed and evaluated – because we are building a “descent” parser that goes from top level concepts towards lower ones.

Lets analyze each row of our expression.

In Expression definition row we have introduced non-terminal symbol Term that represents the part of expression that is on top level and has addition or subtraction for operation:

```Expression := [ "-" ] Term { ("+" | "-") Term }
```

Inside of Term definition we are introducing the Factor non-terminal symbol that represents part of expression that has multiplication or division for operation:

` Term       := Factor { ( "*" | "/" ) Factor }`

In the Factor we are already expecting the ending symbol of our expression – the RealNumber, its the first one we can start directly evaluating. If not RealNumber then we expect another Expression in parenthesis:

``` Factor     := RealNumber | "(" Expression ")"
```

After that we define what RealNumber is – its either a digit followed by more digits or its a digit with decimal point followed by more decimal places:

`RealNumber := Digit{Digit} | [Digit] "." {Digit}`

#### The Parser

Since we defined the grammar, writing the parser is almost mechanical process. We just start from the top level concepts in the grammar and create method for each element we have (Term, Factor, RealNumber, Digit etc) and call them one after the other or recursively.

Here is our entry method called Parse that starts the parsing and handles the first row of our grammar, the Expression row:

```        // Expression := [ "-" ] Term { ("+" | "-") Term }
public double Parse()
{
var isNegative = NextIsMinus();
if (isNegative)
GetNext();
var valueOfExpression = TermValue();
if (isNegative)
valueOfExpression = -valueOfExpression;
while (NextIsMinusOrPlus())
{
var op = GetTermOperand();
var nextTermValue = TermValue();
if (op is PlusToken)
valueOfExpression += nextTermValue;
else
valueOfExpression -= nextTermValue;
}
return valueOfExpression;
}
```

You can see that we are just following the grammar: we handle the negation and then call the TermValue method to get the next Term and then while there are plus or minus operators, we continue getting the next term again (again by calling TermValue) and then based on the extracted operator we decide should we add or subtract those two Term values. We repeat this as many as times as needed and accumulate the total result and then return it – that’s all.

Lets drill down to see the TermValue method:

```        // Term       := Factor { ( "*" | "/" ) Factor }
private double TermValue()
{
var totalVal = FactorValue();
while (NextIsMultiplicationOrDivision())
{
var op = GetFactorOperand();
var nextFactor = FactorValue();

if (op is DivideToken)
totalVal /= nextFactor;
else
totalVal *= nextFactor;
}

}
```

Here its again similar process: but we are handling the second row of our grammar. We call the FactorValue method to parse the next Factor and then while we encounter tokens for multiplication or division we again call FactorValue to get the second Factor and based on the operation we divide or multiply. And as many as times this happens we are accumulating the result in the totalVal variable, and returning it at the end.

Final important method we need to implement for parser is the FactorValue method:

```        // Factor     := RealNumber | "(" Expression ")"
private double FactorValue()
{
if (NextIsDigit())
{
var nr = GetNumber();
return nr;
}
if (!NextIsOpeningBracket())
throw new Exception("Expecting Real number or '(' in expression, instead got : " + (PeekNext() != null ? PeekNext().ToString()  : "End of expression"));
GetNext();

var val = Parse();

if (!(NextIs(typeof(ClosedParenthesisToken))))
throw new Exception("Expecting ')' in expression, instead got: " + (PeekNext() != null ? PeekNext().ToString() : "End of expression"));
GetNext();
return val;
}
```

This one is even simpler. Based on our grammar row we know that we can expect either our RealNumber, or Expression in parenthesis.
So we check if next Token in the expression is token with number (NumberConstatToken), and if it is we just extract its value by calling GetNumber method and return it.

If next token is not number, then we know we expect the opening bracket (we throw exception if not) and here we recursively call the main Parse method (from which the execution started in the first place) to parse the next expression and so on, and on, and on until we unwind the expression completely ending up with the calculated result for the whole expression.

I wont go through all the helper methods that are defined and used there since they are trivial and not really interesting – you can always check out the full source on GitHub. Important takeaway is that it is really straightforward to implement the parser once you define the grammar properly – this opens up endless possibilities…

You probably noticed that in the code i’m able to check if next symbol in the expression is of certain type – this is because i implemented simple TokensWalker class which allows me to peek the next token without consuming it:

```    public class TokensWalker
{
private readonly List<Token> _tokens = new List<Token>();
private int _currentIndex = -1;

public bool ThereAreMoreTokens
{
get { return _currentIndex < _tokens.Count - 1; }
}

public TokensWalker(IEnumerable<Token> tokens)
{
_tokens = tokens.ToList();
}

public Token GetNext()
{
MakeSureWeDontGoPastTheEnd();
return _tokens[++_currentIndex];
}

private void MakeSureWeDontGoPastTheEnd()
{
if (!ThereAreMoreTokens)
throw new Exception("Cannot read pass the end of tokens list");
}

public Token PeekNext()
{
MakeSureWeDontPeekPastTheEnd();
return _tokens[_currentIndex + 1];
}

private void MakeSureWeDontPeekPastTheEnd()
{
var weCanPeek = (_currentIndex + 1 < _tokens.Count);
if (!weCanPeek)
throw new Exception("Cannot peek pass the end of tokens list");
}

public bool IsNextOfType(Type type)
{
return PeekNext().GetType() == type;
}
}
```

#### May the source be with you

This concludes our not so long voyage through Recursive Descent Parsers. Hopefully i was able to interest some of you to start playing with these concepts. Now you can go and extend this and build a new calculator application, better than the built-in one that came with your beloved OS.

If you want to see the full source of this parser with all the unit tests, its available in my ArithmeticExpressionParser GitHub repo.

# Introduction to Recursive Descent Parsers with C#

#### Parser? Aren’t parsers utterly boring?

Well no, quite the opposite. Lately i have been solving some of the programming challenges on talentbuddy and bumped into task to create parser and solver for simple arithmetic expressions in string format, something like this: “(2+2)-(3-(6-5))-4″.

On first thought this seems trivial, but only until the moment you start implementing it. If you are doing it the wrong way, for example by using regular expressions, you can bump into some interesting problems, and also develop a solution that works for most cases but then fails on edge cases etc.

#### The proper way

Fortunately there are proper ways to do this, and one of them is building a Recursive Descent Parser. As the name implies, this parser will use a Top-Down approach and start breaking the expression into smaller pieces but in recursive way. In order to be able to build a proper parser we first need to define a grammar of the expressions we want to parse.

For defining grammars we can use Backus-Naur Form or Extended Backus-Naur Form.

If we manage to define the grammar properly, then implementing the parser is almost a mechanical process, and this is what i really like about this approach.

In this post we will build very simple Recursive Descent Parser for basic aritmetic operations of addition and subtraction of integers. Later we will tackle some more interesting scenarios, but for this post we want to keep things simple.

So the examples of expressions we want to parse could be:

• “1”
• “1+100″
• “1+2+3+44″
• “100-99+1″ etc

#### Defining the grammar

So how would our grammar in EBNF would look like?

```Expression := Number {Operator Number}
Operator   := "+" | "-"
Number     := Digit{Digit}
Digit      := "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"```

Grammar above tells us that we are defining expressions that can contain numbers and operators.

Each Expression can contain a single integer Number or this number can be followed by optional set of Operator and then another Number must follow. Note the curly brackets around the {Operator Number} which means that this part can repeat zero or more times.

Then we continue and define what Operator could be (+ or -) and then we define that Number is just a Digit followed by zero or more Digits (again we use curly braces to define optional repetition of digits in integer number).

At the end we define that Digit is number from 0 to 9.

With this we defined our grammar and its vocabulary – and creating a parser should be simple, right? Well almost

#### Tokens and Tokenizers

First there is another small issue we need to handle here. Our input expressions will be in string format, and we need to parse them into something that our Parser will understand in order to abstract the raw character processing – we don’t want to string processing inside of Parser this would be bad practice.

This is where idea of Tokens comes into play. We will define tokens for each part of our grammar. We can have abstract Token class, and other tokens will inherit it.

Then we will have base OperatorToken that will be inherited by PlusToken and MinusToken, and finally we will have NumberConstantToken for our numbers.

Token classes will just be a simple marker classes with almost no functionality inside except for the NumberConstantToken which has to hold the actual value.

``` public abstract class Token
{
}

public class OperatorToken : Token
{

}
public class PlusToken : OperatorToken
{
}

public class MinusToken : OperatorToken
{
}

public class NumberConstantToken : Token
{

public NumberConstantToken(int value)
{
_value = value;
}

public int Value
{
get { return _value; }
}
}
```

Now that we defined our Tokens, lets build a PlusMinusTokenizer class that will transform the raw string of the expression into a sequence of Tokens that our Parser can then elegantly work with:

```    public class PlusMinusTokenizer
{

public IEnumerable<Token> Scan(string expression)
{

var tokens = new List<Token>();
{
if (Char.IsWhiteSpace(c))
{
continue;
}

if (Char.IsDigit(c))
{
var nr = ParseNumber();
}
else if (c == '-')
{
}
else if (c == '+')
{
}
else
throw new Exception("Unknown character in expression: " + c);
}

}

private int ParseNumber()
{
var digits = new List<int>();
{
int i;
if (int.TryParse(Char.ToString(digit), out i))
{
}
else
throw new Exception("Could not parse integer number when parsing digit: " + digit);
}

var nr = 0;
var mul = 1;
digits.Reverse();
digits.ForEach(d =>
{
nr += d*mul;
mul *= 10;
});

return nr;
}
}
```

As you can see code is very simple and for given string expression it returns IEnumerable<Token>. There is also a part there that converts sequence of digits into into integer but we all know that so lets move on.

#### Parsing is Fun!

Now we finally come to the interesting part. Our parser should start from the top of our grammar and parse Expression, then inside of it parse the Number, then Operator then again Number etc.

So lets first examine the ParseExpression method:

```        private int ParseExpression()
{
var number = ParseNumber();
if (!_tokens.MoveNext())
return number;

if (_tokens.Current is OperatorToken)
{
var op = _tokens.Current;
_tokens.MoveNext();

var secondNumber = ParseNumber();
if (op is PlusToken)
{
return number + secondNumber;
}
if (op is MinusToken)
return number - secondNumber;

throw new Exception("Unsupported operator: " + op);
}

throw new Exception("Expecting operator after number, but got " + _tokens.Current);
}
```

Its barely 30 lines of code that first calls our ParseNumber method to get first number of the expression, and then it checks if next Token is OperatorToken, and if so, it calls the ParseNumber again to get second number and then based on the type of the Operator performs the addition or subtraction and returns the resulting integer.

Note that Exception is thrown if some unexpected token is encountered, so that we can know when we have invalid expression. We could have also done this inside of the Tokenizer but i wanted to have this logic just in one place.

Now let’s see the ParseNumber method:

```        private int ParseNumber()
{
if (_tokens.Current is NumberConstantToken)
{
return (_tokens.Current as NumberConstantToken).Value;
}

throw new Exception("Expected a number constant but found " + _tokens.Current);
}
```

As you see this method just returns the value of the number from our Token.

I told you this is gonna be simple

OK lets now see the main method of the Parser that uses the methods explained above:

```        public int Parse()
{
var result = 0;
while (_tokens.MoveNext())
{
result = ParseExpression();

if (_tokens.MoveNext())
{
if (_tokens.Current is OperatorToken)
{
var op = _tokens.Current;

var secondNumber = Parse();
if (op is PlusToken)
{
return result + secondNumber;
}
if (op is MinusToken)
return result - secondNumber;

throw new Exception("Unsupported operator: " + op);

}
throw new Exception("Expecting operator after expression, but got " + _tokens.Current);
}
}

return result;
}
```

This is where it all happens, the Parse method calls ParseExpression to get first Expression, and then – if there are more tokens and first one is OperatorToken, we call recursively again Parse and therefore parse next expression and get its result, and then we do the operation based on the Operator, and then inside of that call if needed we call Parse again recursively, and then the next one, and way we go down the rabbit hole…

I told you this was fun

#### Tests

Whole solution is built with Test-First because the idea of Parsers and Tokenizers fit very well with unit testing. If you are interested in this check out the project PlusMinusParser.Tests where there are tests for both the Tokenizer and the Parser. All of them pass an input expressions and expect certain results – integer as solution of the expression (for Parser) or list of Tokens (for Tokenizer).

Here are some tests for Parser (Note that im using NUnit’s TestCase attribute):

```    [TestFixture]
public class PlusMinusParserUnitTests
{
[TestCase("1", ExpectedResult = 1)]
[TestCase("2", ExpectedResult = 2)]
[TestCase("1+1", ExpectedResult = 2)]
[TestCase("1+1+1", ExpectedResult = 3)]
[TestCase("1-1", ExpectedResult = 0)]
[TestCase("0-0", ExpectedResult = 0)]
[TestCase("0-1", ExpectedResult = -1)]
[TestCase("1+2+3+4+5+6", ExpectedResult = 21)]
[TestCase("100-100", ExpectedResult = 0)]
[TestCase("100-100+1", ExpectedResult = 1)]
[TestCase("100-100+100", ExpectedResult = 100)]
[TestCase("1 2", ExpectedException = typeof(Exception))]
[TestCase("+", ExpectedException = typeof(Exception))]
[TestCase("++", ExpectedException = typeof(Exception))]
[TestCase("--", ExpectedException = typeof(Exception))]
[TestCase("+-", ExpectedException = typeof(Exception))]
[TestCase("-+", ExpectedException = typeof(Exception))]
[TestCase("-", ExpectedException = typeof(Exception))]
[TestCase("1-1 2", ExpectedException = typeof(Exception))]
[TestCase("10-100", ExpectedResult = -90)]
public int CanParseExpressions(string expression)
{
var tokens = new PlusMinusTokenizer().Scan(expression);
var parser = new PlusMinusParser(tokens);
var res = parser.Parse();
return res;
}
```

#### Where next?

Well now that we have a solid basic understanding on how to build simple parsers, we can go wild and invent different grammars and build parsers for them. Having this in your tool-belt can be very handy when you encounter tasks that need to parse text.

In one of the next posts i will implement some more complex parsers so stay tuned…

If you want to see the full source code check out my PlusMinusParser repository on GitHub.

# Using await to build cool UI tutorials

In the last post we discussed how we can build custom awaiters and showed simple example how to await for click on the Button instance.

Maybe it was not obvious how we can expand that idea and create some useful application, so i decided to expand the whole concept in this post.

We are going to build some more complex awaiters (for example TextBox awaiter that awaits for certain text to by typed into it) and use it together with the old Button click-awaiter to build the tutorial on how to use some imaginary app inside of the app itself.

For those not patient enough here is what it will look like at the end:

So lets get started. First we need to build that awaiter for text box to be typed into the TextBox. This is little more complicated then our Button awaiter because in this case we need to pass the text parameter to our awaiter, and await keyword does not support that out of the box.

This is the usage we want to achieve in the end – if we are awaiting for text “123” to be entered into TextBox named tb1 then we want to be able to write something like this in our code:

```    await new TextBoxTextAwaiter(tb1, "123");
```

To do the trick, first we are going to create one intermediary class TextBoxTextAwaiter that will accept two parameters, our TextBox instance and the text that we are expecting to be typed into it:

```
public class TextBoxTextAwaiter
{
public TextBox TextBox { get; set; }
public string TextToAwait { get; set; }

public TextBoxTextAwaiter(TextBox textBox, string textToAwait)
{
TextBox = textBox;
TextToAwait = textToAwait;
}
}
```

So now that we have something to hold on to our parameters we will create an extension method GetAwaiter for that class TextBoxTextAwaiter so that it can be awaited:

```    public static class TextBoxAwaiterExtensions
{
public static TextBoxAwaiter GetAwaiter(this TextBoxTextAwaiter textBoxTextAwaiter)
{
return new TextBoxAwaiter(textBoxTextAwaiter.TextBox, textBoxTextAwaiter.TextToAwait);
}
}
```

As you see in the GetAwaiter we are creating instance of another class called TextBoxAwaiter (details of it below) and pass all the parameters into its constructor and this class can now do the actual await for use:

```    public class TextBoxAwaiter : INotifyCompletion
{

public TextBoxAwaiter(TextBox textBoxToAwait, string textToAwait)
{
_textToAwait = textToAwait;
TextBoxToAwait = textBoxToAwait;
}

public void GetResult() {}

public bool IsCompleted
{
get { return TextBoxToAwait.Text.Equals(_textToAwait); }
}

public TextBox TextBoxToAwait { get; set; }
public void OnCompleted(Action continuation)
{
TextChangedEventHandler h = null;
h = (sender, args) =>
{
TextBoxToAwait.Focus();
if (IsCompleted)
{
TextBoxToAwait.TextChanged -= h;
continuation();
}
};
TextBoxToAwait.TextChanged += h;
}
}
```

As you see this final awaiter is simple and works similarly like our ButtonAwaiter with difference that its subscribing to TextChanged event of the TextBox and checks if our desired text is entered. Once the text is entered, we are notifying awaiter that we are done by calling the continuation Action (and unsubscribing from TextChanged event for the cleanup).

And that’s all, now we can build any in-application tutorial that involves typing certain text in TextBoxes and clicking on the Buttons.  We could off course go further and build all kinds of awaiters for example MouseMovedAwaiter, MouseMovedToRegionAwaiter, UserInactivityAwaiter etc.

Also on mobile platforms we could easily create SwipeAwaiter, ShakeDeviceAwaiter, DoubleTapAwaiter etc.

So here is how we would implement that imaginary “GUI Tutorial” from the image on the top of the post:

```            public MainWindow()
{
InitializeComponent();
Loaded += (a,e) => { Setup(); };
}

private async void Setup()
{
await BtnTutorial;
ShowDemo();
}

private async void ShowDemo()
{
tb1.Text = "";
ShowMsg("Press First button");
await Btn1;
ClearVisualCues(Btn1);

ShowMsg("Press Second button");
await Btn2;
ClearVisualCues(Btn2);

ShowMsg("Press Third button");
await Btn3;
ClearVisualCues(Btn3);

ShowMsg("Press Fourth button");
await Btn4;
ClearVisualCues(Btn4);

ShowMsg("Type the code '123' into the TextBox");
await new TextBoxTextAwaiter(tb1, "123");
ClearVisualCues(tb1);

ShowMsg("Tutorial finished");

Setup();
}
```

I’m not posting the whole source code here because you can download the full zipped solution to see all the details. We simply use our awaiters to avoid boilerplate code for anticipated user interactions and we can concentrate on the actual logic of the tutorial.

Off course this is very simplified example but its very easy to extend it and make it really useful and interesting.

I can imagine this approach being used to create in-app tutorials for mobile applications where user would be expected to touch certain parts of the screen, do swipe actions, then maybe shake the device to clear screen etc.

Or for mobile games for kids where they should “interact” with the device in order to proceed in the game, take a picture of their pet, say some voice commands etc.

i would hear about see some nice usages of this pattern, feel free to post some in the comments.

# Awaiting for that Button click

## Why wait when we can await?

I believe that most of C# developers know about the new language features for asynchronous programming with async and await keywords but how many of us are really exploiting them to full extent?

Recently i was watching the excellent C# Language Internals course on PluralSight.com by Bart De Smet and came across very cool idea of using Await to wait for Button to be clicked – which pushed me to write this post and share it with everyone.

By implementing a custom awaiter for Button (via extension methods) and hooking up to the Click handler we are able to await for that button to be clicked in very elegant way.

So innstead of manually adding Clicked event handler to each Button we hide that functionality in our awaiter and then we are able to write code this:

``` private async void AwaitForButtonClicks()
{
await btn1;
}
```

## How do we implement custom awaiter?

Implementing custom awaiter surprisingly does not require you to inherit some class or implement an interface. Compiler is smart enough to recognize if you are implementing proper methods (either directly on the instance or via extension method) and if you do, then you can use await keyword on that type.

Cool thing about that is that you don’t need to own the class which you want to extend with awaiter,  same as in our case where we want to await for System.Windows.Control.Button which we don’t own (we could off course inherit it but if it would be sealed, we could still do it via extension methods).

Lets write the awaiter:

```    public static class ButtonAwaiterExtensions
{
public static ButtonAwaiter GetAwaiter(this Button button)
{
return new ButtonAwaiter()
{
Button = button
};
}
}
```

As you see we are adding the GetAwaiter method on the Button via extension method and returning the ButtonAwait which is just a class in which we will implement the actual await functionality (it could also be a struct instead of class). Also we pass the instance of the Button instance so that our ButtonAwaiter has something to attach to.

Compiler will recognize that we added this method (and also it will check if the ButtonAwaiter class that we return from it is implementing proper methods and interfaces) and because it will “see” all is implemented properly await keyword will start working for us.

In order for all this to work lets implement the ButtonAwaiter class:

```    public class ButtonAwaiter : INotifyCompletion
{
public bool IsCompleted
{
get { return false; }
}

public void GetResult()
{

}

public Button Button { get; set; }

public void OnCompleted(Action continuation)
{
RoutedEventHandler h = null;
h = (o, e) =>
{
Button.Click -= h;
continuation();
};
Button.Click += h;
}
}
```

Our awaiter first of all returns false in the IsCompleted notifying the runtime that when we await for button, that it cannot return immediately, but only later after button is clicked.
And we take care of that in the OnCompleted method where runtime is subscribing with Action continuation to be called after awaiting is done – in other words after someone has clicked on the Button instance.
So in that method we create Click button handler and subscribe to the Click event and then inside that handler we unsubscribe (so we don’t get restart the await state machine every next time Button is clicked) and then we invoke the continuation action to notify everyone that our await is finished.

Note that GetResults does not return nothing since it would not make sense to return anything in our case – we are just waiting for the button click.

And that’s really all there is to it.

## How to use this thing?

Now in our app we can await for multiple Buttons in which ever order we want and then do some action after Buttons are pressed in that specific order:

```    public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
{
AwaitForButtonClicks();
};
}

private async void AwaitForButtonClicks()
{
await btn1;

await btn2;

await btn3;

MessageBox.Show("Well done!");

AwaitForButtonClicks();
}
}
```

So as you see in the Loaded even of the Window we call our async method that is awaiting for Button clicks and then show some message. And then we recursively start waiting again

Don’t forget that first await is waiting for click on the btn1 instance so even if you click some other button, code will still be waiting there for that specific click. Once first btn1 is clicked, execution moves to second line and awaits for btn2 to be clicked etc.

## Ok we can use it but is it really useful?

I can think of scenarios where this could be really useful.

Besides freeing us from always writing the boilerplate code for subscribing to Click event of the button, there are more advantages.

Imagine a application that has many UI elements and interactions and you want to create a tutorial for your app that will run inside of the app.

Imagine app itself guiding the user through its own tutorial by highlighting certain Buttons or TextBoxes, and waiting for the user to click them or write some text in TextBox, or move some sliders of the app to understand how to setup parameters etc.

We could also use this in some game where user has to press some buttons in certain order etc.

Possibilities are endless and this Button awaiter is just a start – in one of the next posts I will write some more examples on how to await for different actions, how to pass parameters to await, so stay tuned.

I hope you find this useful and that it will push you to start exploring those (not so new) language features.