Tag Archives: Programming

Introduction to Recursive Descent Parsers with C#

Parser? Aren’t parsers utterly boring?

Well no, quite the opposite. Lately i have been solving some of the programming challenges on talentbuddy and bumped into task to create parser and solver for simple arithmetic expressions in string format, something like this: “(2+2)-(3-(6-5))-4″.

On first thought this seems trivial, but only until the moment you start implementing it. If you are doing it the wrong way, for example by using regular expressions, you can bump into some interesting problems, and also develop a solution that works for most cases but then fails on edge cases etc.

The proper way

Fortunately there are proper ways to do this, and one of them is building a Recursive Descent Parser. As the name implies, this parser will use a Top-Down approach and start breaking the expression into smaller pieces but in recursive way. In order to be able to build a proper parser we first need to define a grammar of the expressions we want to parse.

For defining grammars we can use Backus-Naur Form or Extended Backus-Naur Form.

If we manage to define the grammar properly, then implementing the parser is almost a mechanical process, and this is what i really like about this approach.

In this post we will build very simple Recursive Descent Parser for basic aritmetic operations of addition and subtraction of integers. Later we will tackle some more interesting scenarios, but for this post we want to keep things simple.

So the examples of expressions we want to parse could be:

  • “1”
  • “1+100″
  • “1+2+3+44″
  • “100-99+1″ etc

Defining the grammar

So how would our grammar in EBNF would look like?

Expression := Number {Operator Number}
Operator   := "+" | "-"
Number     := Digit{Digit}
Digit      := "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

Grammar above tells us that we are defining expressions that can contain numbers and operators.

Each Expression can contain a single integer Number or this number can be followed by optional set of Operator and then another Number must follow. Note the curly brackets around the {Operator Number} which means that this part can repeat zero or more times.

Then we continue and define what Operator could be (+ or -) and then we define that Number is just a Digit followed by zero or more Digits (again we use curly braces to define optional repetition of digits in integer number).

At the end we define that Digit is number from 0 to 9.

With this we defined our grammar and its vocabulary – and creating a parser should be simple, right? Well almost :)

Tokens and Tokenizers

First there is another small issue we need to handle here. Our input expressions will be in string format, and we need to parse them into something that our Parser will understand in order to abstract the raw character processing – we don’t want to string processing inside of Parser this would be bad practice.

This is where idea of Tokens comes into play. We will define tokens for each part of our grammar. We can have abstract Token class, and other tokens will inherit it.

Then we will have base OperatorToken that will be inherited by PlusToken and MinusToken, and finally we will have NumberConstantToken for our numbers.

Token classes will just be a simple marker classes with almost no functionality inside except for the NumberConstantToken which has to hold the actual value.

 public abstract class Token
    {
    }

    public class OperatorToken : Token
    {

    }
    public class PlusToken : OperatorToken
    {
    }

    public class MinusToken : OperatorToken
    {
    }

    public class NumberConstantToken : Token
    {
        private readonly int _value;

        public NumberConstantToken(int value)
        {
            _value = value;
        }

        public int Value
        {
            get { return _value; }
        }
    }

Now that we defined our Tokens, lets build a PlusMinusTokenizer class that will transform the raw string of the expression into a sequence of Tokens that our Parser can then elegantly work with:

    public class PlusMinusTokenizer
    {
        private StringReader _reader;

        public IEnumerable<Token> Scan(string expression)
        {
            _reader = new StringReader(expression);

            var tokens = new List<Token>();
            while (_reader.Peek() != -1)
            {
                var c = (char) _reader.Peek();
                if (Char.IsWhiteSpace(c))
                {
                    _reader.Read();
                    continue;
                }

                if (Char.IsDigit(c))
                {
                    var nr = ParseNumber();
                    tokens.Add(new NumberConstantToken(nr));
                }
                else if (c == '-')
                {
                    tokens.Add(new MinusToken());
                    _reader.Read();
                }
                else if (c == '+')
                {
                    tokens.Add(new PlusToken());
                    _reader.Read();
                }
                else
                    throw new Exception("Unknown character in expression: " + c);
            }

            return tokens;
        }

        private int ParseNumber()
        {
            var digits = new List<int>();
            while (Char.IsDigit((char) _reader.Peek()))
            {
                var digit = (char) _reader.Read();
                int i;
                if (int.TryParse(Char.ToString(digit), out i))
                {
                    digits.Add(i);
                }
                else
                    throw new Exception("Could not parse integer number when parsing digit: " + digit);
            }

            var nr = 0;
            var mul = 1;
            digits.Reverse();
            digits.ForEach(d =>
            {
                nr += d*mul;
                mul *= 10;
            });

            return nr;
        }
    }

As you can see code is very simple and for given string expression it returns IEnumerable<Token>. There is also a part there that converts sequence of digits into into integer but we all know that so lets move on.

Parsing is Fun!

Now we finally come to the interesting part. Our parser should start from the top of our grammar and parse Expression, then inside of it parse the Number, then Operator then again Number etc.

So lets first examine the ParseExpression method:

        private int ParseExpression()
        {
            var number = ParseNumber();
            if (!_tokens.MoveNext())
                return number;

            if (_tokens.Current is OperatorToken)
            {
                var op = _tokens.Current;
                _tokens.MoveNext();

                var secondNumber = ParseNumber();
                if (op is PlusToken)
                {
                    return number + secondNumber;
                }
                if (op is MinusToken)
                    return number - secondNumber;

                throw new Exception("Unsupported operator: " + op);
            }

            throw new Exception("Expecting operator after number, but got " + _tokens.Current);
        }

Its barely 30 lines of code that first calls our ParseNumber method to get first number of the expression, and then it checks if next Token is OperatorToken, and if so, it calls the ParseNumber again to get second number and then based on the type of the Operator performs the addition or subtraction and returns the resulting integer.

Note that Exception is thrown if some unexpected token is encountered, so that we can know when we have invalid expression. We could have also done this inside of the Tokenizer but i wanted to have this logic just in one place.

Now let’s see the ParseNumber method:

        private int ParseNumber()
        {
            if (_tokens.Current is NumberConstantToken)
            {
                return (_tokens.Current as NumberConstantToken).Value;
            }

            throw new Exception("Expected a number constant but found " + _tokens.Current);
        }

As you see this method just returns the value of the number from our Token.

I told you this is gonna be simple :)

OK lets now see the main method of the Parser that uses the methods explained above:

        public int Parse()
        {
            var result = 0;
            while (_tokens.MoveNext())
            {
                result = ParseExpression();

                if (_tokens.MoveNext())
                {
                    if (_tokens.Current is OperatorToken)
                    {
                        var op = _tokens.Current;

                        var secondNumber = Parse();
                        if (op is PlusToken)
                        {
                            return result + secondNumber;
                        }
                        if (op is MinusToken)
                            return result - secondNumber;

                        throw new Exception("Unsupported operator: " + op);

                    }
                    throw new Exception("Expecting operator after expression, but got " + _tokens.Current);
                }
            }

            return result;
        }

This is where it all happens, the Parse method calls ParseExpression to get first Expression, and then – if there are more tokens and first one is OperatorToken, we call recursively again Parse and therefore parse next expression and get its result, and then we do the operation based on the Operator, and then inside of that call if needed we call Parse again recursively, and then the next one, and way we go down the rabbit hole…

I told you this was fun :)

Tests

Whole solution is built with Test-First because the idea of Parsers and Tokenizers fit very well with unit testing. If you are interested in this check out the project PlusMinusParser.Tests where there are tests for both the Tokenizer and the Parser. All of them pass an input expressions and expect certain results – integer as solution of the expression (for Parser) or list of Tokens (for Tokenizer).

Here are some tests for Parser (Note that im using NUnit’s TestCase attribute):

    [TestFixture]
    public class PlusMinusParserUnitTests
    {
        [TestCase("1", ExpectedResult = 1)]
        [TestCase("2", ExpectedResult = 2)]
        [TestCase("1+1", ExpectedResult = 2)]
        [TestCase("1+1+1", ExpectedResult = 3)]
        [TestCase("1-1", ExpectedResult = 0)]
        [TestCase("0-0", ExpectedResult = 0)]
        [TestCase("0-1", ExpectedResult = -1)]
        [TestCase("1+2+3+4+5+6", ExpectedResult = 21)]
        [TestCase("100-100", ExpectedResult = 0)]
        [TestCase("100-100+1", ExpectedResult = 1)]
        [TestCase("100-100+100", ExpectedResult = 100)]
        [TestCase("1 2", ExpectedException = typeof(Exception))]
        [TestCase("+", ExpectedException = typeof(Exception))]
        [TestCase("++", ExpectedException = typeof(Exception))]
        [TestCase("--", ExpectedException = typeof(Exception))]
        [TestCase("+-", ExpectedException = typeof(Exception))]
        [TestCase("-+", ExpectedException = typeof(Exception))]
        [TestCase("-", ExpectedException = typeof(Exception))]
        [TestCase("1-1 2", ExpectedException = typeof(Exception))]
        [TestCase("10-100", ExpectedResult = -90)]
        public int CanParseExpressions(string expression)
        {
            var tokens = new PlusMinusTokenizer().Scan(expression);
            var parser = new PlusMinusParser(tokens);
            var res = parser.Parse();
            return res;
        }

Where next?

Well now that we have a solid basic understanding on how to build simple parsers, we can go wild and invent different grammars and build parsers for them. Having this in your tool-belt can be very handy when you encounter tasks that need to parse text.

In one of the next posts i will implement some more complex parsers so stay tuned…

If you want to see the full source code check out my PlusMinusParser repository on GitHub.

Awaiting for that Button click

Why wait when we can await?

I believe that most of C# developers know about the new language features for asynchronous programming with async and await keywords but how many of us are really exploiting them to full extent?

Recently i was watching the excellent C# Language Internals course on PluralSight.com by Bart De Smet and came across very cool idea of using Await to wait for Button to be clicked – which pushed me to write this post and share it with everyone.

By implementing a custom awaiter for Button (via extension methods) and hooking up to the Click handler we are able to await for that button to be clicked in very elegant way.

So innstead of manually adding Clicked event handler to each Button we hide that functionality in our awaiter and then we are able to write code this:

 private async void AwaitForButtonClicks()
 {
     await btn1;
 }

How do we implement custom awaiter?

Implementing custom awaiter surprisingly does not require you to inherit some class or implement an interface. Compiler is smart enough to recognize if you are implementing proper methods (either directly on the instance or via extension method) and if you do, then you can use await keyword on that type.

Cool thing about that is that you don’t need to own the class which you want to extend with awaiter,  same as in our case where we want to await for System.Windows.Control.Button which we don’t own (we could off course inherit it but if it would be sealed, we could still do it via extension methods).

Lets write the awaiter:

    public static class ButtonAwaiterExtensions
    {
        public static ButtonAwaiter GetAwaiter(this Button button)
        {
            return new ButtonAwaiter()
            {
                Button = button
            };
        }
    }

As you see we are adding the GetAwaiter method on the Button via extension method and returning the ButtonAwait which is just a class in which we will implement the actual await functionality (it could also be a struct instead of class). Also we pass the instance of the Button instance so that our ButtonAwaiter has something to attach to.

Compiler will recognize that we added this method (and also it will check if the ButtonAwaiter class that we return from it is implementing proper methods and interfaces) and because it will “see” all is implemented properly await keyword will start working for us.

In order for all this to work lets implement the ButtonAwaiter class:

    public class ButtonAwaiter : INotifyCompletion
    {
        public bool IsCompleted
        {
            get { return false; }
        }

        public void GetResult()
        {

        }

        public Button Button { get; set; }

        public void OnCompleted(Action continuation)
        {
            RoutedEventHandler h = null;
            h = (o, e) =>
            {
                Button.Click -= h;
                continuation();
            };
            Button.Click += h;
        }
    }

Our awaiter first of all returns false in the IsCompleted notifying the runtime that when we await for button, that it cannot return immediately, but only later after button is clicked.
And we take care of that in the OnCompleted method where runtime is subscribing with Action continuation to be called after awaiting is done – in other words after someone has clicked on the Button instance.
So in that method we create Click button handler and subscribe to the Click event and then inside that handler we unsubscribe (so we don’t get restart the await state machine every next time Button is clicked) and then we invoke the continuation action to notify everyone that our await is finished.

Note that GetResults does not return nothing since it would not make sense to return anything in our case – we are just waiting for the button click.

And that’s really all there is to it.

How to use this thing?

Now in our app we can await for multiple Buttons in which ever order we want and then do some action after Buttons are pressed in that specific order:

    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();
            Loaded += (s, e) =>
            {
                AwaitForButtonClicks();
            };
        }

        private async void AwaitForButtonClicks()
        {
            await btn1;

            await btn2;

            await btn3;

            MessageBox.Show("Well done!");

            AwaitForButtonClicks();
        }
    }

So as you see in the Loaded even of the Window we call our async method that is awaiting for Button clicks and then show some message. And then we recursively start waiting again :)

Don’t forget that first await is waiting for click on the btn1 instance so even if you click some other button, code will still be waiting there for that specific click. Once first btn1 is clicked, execution moves to second line and awaits for btn2 to be clicked etc.

Button Awaiter in action

Ok we can use it but is it really useful?

I can think of scenarios where this could be really useful.

Besides freeing us from always writing the boilerplate code for subscribing to Click event of the button, there are more advantages.

Imagine a application that has many UI elements and interactions and you want to create a tutorial for your app that will run inside of the app.

Imagine app itself guiding the user through its own tutorial by highlighting certain Buttons or TextBoxes, and waiting for the user to click them or write some text in TextBox, or move some sliders of the app to understand how to setup parameters etc.

We could also use this in some game where user has to press some buttons in certain order etc.

Possibilities are endless and this Button awaiter is just a start – in one of the next posts I will write some more examples on how to await for different actions, how to pass parameters to await, so stay tuned.

I hope you find this useful and that it will push you to start exploring those (not so new) language features.

Download Visual Studio 2013 solution with source code for this post

Marbles

Marbles game for Windows 8

Finally my first Windows 8 game called Marbles is finished and available in the Windows Store!

That partially explains why i did not wrote any posts on this blog for almost a year :)

It’s been emotional!

Seriously, it has been quite a journey.

First i was learning XNA, then Monogame, and then game development in general.

I always wanted to write a game (aren’t we all?) so finally i decided to really do it!

At first i started very enthusiastically and tried building a 2d platformer game, and after solving most of technical problems like character and level rendering and physics i sadly realized that there is no way i will be able to complete it alone without help of level designer and gfx artist, sfx artist etc.

Since i was a one man band that meant this game is not gonna happen…

Just for the record, here is the video of that unfinished game – be warned it was just a early prototype.

So eventually after months of work i got real and abandoned that idea and started from scratch.

I realized that if i want to get anything done i need to start small and build simple game and this is how idea for Marbles was born.

Along the way I discovered that developing games is very much unlike writing other software applications.

When you are running at 60 FPS then every byte counts, every loop is considered not necessary until proven otherwise and every operation needs to be fast or its out!

But writing games is very fun so i enjoyed (almost) every moment of it :)

Monogame is for real!

I must say that guys from Monogame team did excellent job and gave XNA another life.

Once you get past initial problems like how to setup environment or how to make the Content Pipeline working you are already making games and Monogame tries to help without getting in your way – just as i expected it.

Forums on the Monogame Codeplex site are very helpful and if you post your questions there you are almost sure to get some answer.

Forget Class Inheritance!

Another interesting thing i discovered on this bumpy road is the Entity Component System pattern.

After struggling some time with class inheritance for the game, i realized that there has to be better way so i started researching and found out that many people are writing games using Entity Component Systems.

I wrote small Entity Component System framework for my game and it worked much better then deep class inheritance approach so i definitely recommend it to any aspiring game dev out there.

If you want to see how real world Entity Component System for game development looks like check out for example Artemis its quite stable framework.

To the future… And beyond!

So now that i wrote a game, i can proceed to my next task – learn how to play drums!

Ok seriously, whats next?

Probably i will soon write some Monogame tutorials on this blog to help beginners that are struggling with typical problems that every game has to solve.

Especially on how to setup Content Pipeline and how to easily do Resolution Independence – the holy grail of Game Programming :)

So what else can i say then – stay tuned and in the meantime check out my game and send some criticism or at least few bug reports ;)

msdn-logo

Dot Net Gotcha #2 – Loop variables and Closures

This one is my favorite.

Can you guess the output of this simple console application:

    class Program
    {
        static void Main()
        {
            var actions = new List();

            for (var i = 0 ; i < 10; i++)             
            {                 
                var writeToConsoleAction = new Action(() =>
                    {
                        Console.WriteLine(i);
                    });
                actions.Add(writeToConsoleAction);
            }

            foreach (var action in actions)
            {
                action();
            }

            Console.ReadLine();
        }
    }

One would expect to see numbers from 0 to 9 but here is the actual output of the app:



OK that’s strange right?

It turns out its like that by design. What you have there is a Closure over the loop variable. And closures in C# are done around variables and not around specific values of those variables – so that means that lambda expression gets to use the actual reference to the closed variable.

Let me explain in little bit more detail:

In first loop we iterate increasing the counter variable from 0 to 9 and then use it in the lambda expression. That means each lambda gets its access to the counter via closure.

So at the end of this first loop, value of the counter is 10 (its 10 because we used post-increment operator: counter++, so in last iteration of the loop we increase counter to 10 and then check if its <10 and we see its not, so we exit the loop).

Afterwards we start executing those actions we created in first loop, and they all have a Closure over this same variable counter – which has value of 10 and therefore each action prints out its current value 10 to the console.

So it works as expected, once you know what to expect. :)

And this is really common mistake C# developers make.

How to fix it?

Solution is very simple once you know what is going on. All we need to do is this: instead of passing the variable counter to the lambda expression, create a local copy of this variable and pass this copy to the lambda instead of passing the counter.

That way, closure will each time be done around that copy variable that has current value of counter at that moment of execution, and this local copy value stay that way and will not be changed afterwards by the loop.

Later when the actions are executed (in the second loop), each will use their own closure around copy of the counter and therefore each will have its own different, expected value.

Here is the code that works as we expect:

    class Program
    {
        static void Main()
        {
            var actions = new List();

            for (var i = 0 ; i < 10; i++)             
            {                 
                    int counterCopy = i;                 
                    var writeToConsoleAction = new Action(() =>
                    {
                        Console.WriteLine(counterCopy);
                    });
                actions.Add(writeToConsoleAction);
            }

            foreach (var action in actions)
            {
                action();
            }

            Console.ReadLine();
        }
    }

As you see we create a copy of counter called counterCopy and pass this to the lambda expression each time.

Although this seems like old news if you knew it, there are many C# developers that are not aware of this behavior (or tend to forget it from time to time) so make sure to spread the word and always remember it.

By the way, the C# team is changing this in C# version 5 to work as one would expect but until then we just need to copy those loop variables manually :)