Parse Integers from Strings manually in C# without int.Parse
Table of Contents
- Introduction
- Requirements and test cases
- How do we define a valid and invalid integer?
- Testing
- Valid integer test cases
- Invalid integer test cases
- Converting the test cases to C#
- Implementing the integer parser
- Handling null and empty input
- Identifying positive and negative integers
- Confirming there is more after the plus or minus prefix
- Checking each character is a digit
- Table for Unicode decimals and C# integer numbers
- Parsing a character to an integer digit
- Handling bases when building up the integer result
- Checking for integer overflow
- Returning the final result
- Summary
- Full implementation code
- Bonus
Introduction
Have you have ever wondered what happens behind the scenes when you call int.Parse or int.TryParse in C#? We are going to build our own integer parser, completely from scratch, without relying on the built-in methods. Along the way, you will get a chance to explore edge cases, learn more about how numbers and strings work in C#, and sharpen your requirements gathering and test-driven development skills.
This article is purely for fun and learning, but it is a great way to see what it takes to handle converting input in C#, and really understand how parsing logic works under the hood.
Requirements and test cases
Defining the requirements for testing integer parsing without using the built-in methods can be difficult. When you build your own parsing logic, you have to think about all sorts of edge cases, such as what happens with extra spaces, a stray plus or minus sign, or non-digit characters hidden in the input. There is also the scenario of handling really big or really small numbers that could overflow the limits of a 32-bit integer. It is easy to miss a scenario if your requirements are not clear and detailed, therefore it is important to define exactly how each type of input should behave. Getting these details right saves you time when you start writing tests, and also in the implementation phase. You have a good chance to ensure the parser works exactly as expected first time.
How do we define a valid and invalid integer?
The requirements for integer parsing are driven by both the valid and invalid input scenarios. The parser needs to handle plain numbers, numbers with optional leading and trailing spaces, tabs or newlines, and also handle optional plus or minus signs. Inputs such as "1", " 456", and "-789", and even values at the limits of int.MinValue and int.MaxValue should all be accepted and parsed correctly. Alternatively, the parser must reject null or blank inputs, and any string containing non-digit characters, multiple consecutive sign symbols, or signs and spaces in incorrect positions. It should also return errors if the number is too large or too small to fit in a 32-bit integer.
Testing
Now that we have the requirements, it is relatively straightforward to define the test cases to meet those requirements. We have broken down the requirements below into valid and invalid test cases for parsing an integer.
Valid integer test cases
Below you can see the test cases which will be used to determine whether an integer is valid.
Test case | Sample input | Expected output |
---|---|---|
Single digit | 1 | 1 |
Multiple digits | 123 | 123 |
Leading spaces | 456 | 456 |
Trailing spaces | 457 | 457 |
Leading and trailing spaces | 458 | 458 |
Negative number | -789 | -789 |
Explicit positive number | +42 | 42 |
Zero | 0 | 0 |
Double zero | 00 | 0 |
Maximum int value | 2147483647 | 2147483647 |
Minimum int value | -2147483648 | -2147483648 |
Tab character before digits | "\t123" | 123 |
Newline before and after digits | "\n456\n" | 456 |
Invalid integer test cases
Below you can see the test cases which will be used to determine whether an integer is invalid.
Test case | Sample input | Expected error |
---|---|---|
Null input | null | Input is null or blank |
Empty string | "" | Input is null or blank |
Whitespace only | " " | Input is null or blank |
Non-digit characters only | abc | Input contains a non-digit character |
Mixed digits and non-digits | 12a3 | Input contains a non-digit character |
Multiple minus signs | --123 | Input contains a non-digit character |
Multiple plus signs | ++123 | Input contains a non-digit character |
Plus sign only | + | Input does not contain any digits |
Minus sign only | - | Input does not contain any digits |
Plus sign with space before digits | + 123 | Input contains a non-digit character |
Minus sign with space before digits | - 123 | Input contains a non-digit character |
Space after digits with plus sign | 123 + | Input contains a non-digit character |
Space after digits with minus sign | 123 - | Input contains a non-digit character |
Space within digits | 1 2 | Input contains a non-digit character |
Space within digits | 12 3 | Input contains a non-digit character |
Underscore in number | 1_000 | Input contains a non-digit character |
Above int.MaxValue | 2147483648 | Integer is higher than max value (2,147,483,647) |
Below int.MinValue | -2147483649 | Integer is lower than min value (-2,147,483,648) |
Converting the test cases to C#
The go-to testing framework for many C# developers is xUnit.net. It is a testing framework which is well suited to minimise the amount of duplication required to fulfil all of the tests cases we want to verify. This is accomplished via the TheoryData class and MemberData attributes.
You can see below that we have two tests, one for valid and one for invalid integer input and expected outputs. Below that, we have defined two generators to produce our test data. We prefer to use the collection initialiser syntax of the TheoryData, as we find it clearer. You may also use the older Add method, or the newer AddRow methods, along with an implementation of the TheoryData class if you prefer
public sealed class IntegerParserTests
{
[Theory]
[MemberData(nameof(ValidIntegers))]
public void IntegerParser_TryParse_Returns_Success_With_Valid_Integer(string input, int expected)
{
ParseIntegerResult result = IntegerParser.TryParse(input);
Assert.Equal(expected, result);
}
[Theory]
[MemberData(nameof(InvalidIntegers))]
public void IntegerParser_TryParse_Returns_Failure(string? input, string expectedErrorMessage)
{
ParseIntegerResult.Failure expected = new(expectedErrorMessage);
ParseIntegerResult result = IntegerParser.TryParse(input);
Assert.Equal(expected, result);
}
public static IEnumerable<object[]> ValidIntegers => new TheoryData<string, int>
{
{ "1", 1 },
{ "123", 123 },
{ " 456", 456 },
{ "457 ", 457 },
{ " 458 ", 458 },
{ "-789", -789 },
{ "+42", 42 },
{ "0", 0 },
{ "00", 0 },
{ "2147483647", 2147483647 },
{ "-2147483648", -2147483648 },
{ "\t123", 123 },
{ "\n456\n", 456 }
};
public static IEnumerable<object[]> InvalidIntegers => new TheoryData<string?, string>
{
{ null, "Input is null or blank" },
{ "", "Input is null or blank" },
{ " ", "Input is null or blank" },
{ "abc", "Input contains a non-digit character" },
{ "12a3", "Input contains a non-digit character" },
{ "--123", "Input contains a non-digit character" },
{ "++123", "Input contains a non-digit character" },
{ "+", "Input does not contain any digits" },
{ "-", "Input does not contain any digits" },
{ "+ 123", "Input contains a non-digit character" },
{ "- 123", "Input contains a non-digit character" },
{ "123 +", "Input contains a non-digit character" },
{ "123 -", "Input contains a non-digit character" },
{ "1 2", "Input contains a non-digit character" },
{ "12 3", "Input contains a non-digit character" },
{ "1_000", "Input contains a non-digit character" },
{ "2147483648", "Integer is higher than max value (2,147,483,647)" },
{ "-2147483649", "Integer is lower than min value (-2,147,483,648)" }
};
}
The test input and expected output is passed into the tests because they have been marked with the MemberData attribute, along with the type parameter of the generator, such as [MemberData(nameof(ValidIntegers))].
All of the test cases we have in Valid test cases and Invalid test cases are present in those two generators, and now that we have the requirements and tests, we can move on to the implementation, so that we may turn all of those tests a lovely shade of passing green.
Implementing the integer parser
Below we will break down the implementation of the integer parser. You can see the full source code of the implementation later on as well.
Handling null and empty input
public static ParseIntegerResult TryParse(string? input)
{
input = input?.Trim();
bool integerIsBlankOrNull = string.IsNullOrWhiteSpace(input);
if (integerIsBlankOrNull)
{
return new ParseIntegerResult.Failure("Input is null or blank");
}
For the implementation of integer parser we have introduced the TryParse method, with a return type of ParseIntegerResult. This result can either be a success or failure type. The success type will contain the integer value that was parsed, whilst the failure type will contain the error message from the failed parse.
The TryParse method takes in a nullable string as the input parameter as well. We also immediately trim the string, which removes spaces from the beginning and end. We do this because we do not care about leading or trailing spaces in the requirements for a valid integer. Additionally, we have the null-conditional operator (?) placed before the trim method, so that it will only execute a trim on a non-null string.
Next we return a failure if the input is null or contains only space or tabs of any length, using the string.IsNullOrWhiteSpace method.
Identifying positive and negative integers
Now that we know we have a non-null and non-empty string we need to determine whether the potential integer is a positive or negative number.
int sign = 1;
int index = 0;
bool integerIsNegative = input![0] == '-';
bool integerIsPositive = input[0] == '+';
if (integerIsNegative)
{
sign = -1;
index++;
}
else if (integerIsPositive)
{
index++;
}
We have three scenarios:
- Number has no sign as prefix
- Number has a positive as prefix
- Number has a negative prefix
To accommodate these requirements we firstly declare two integer variables, sign and index. sign is initialised to 1, and it may be later changed to -1 so that it can be used to transform the parsed integer to negative if needed. This is done via standard primary school mathematics, whereby multiplying a positive number a negative number, will result in a negative number.
The index variable tracks the current position within the input string we are looking at. As C# is zero-based, the initial position is always zero.
To account for the scenarios where there is a plus or minus symbol preceding the number, we create two boolean variables via comparing the first character of the number against the + or - symbols, and then store in integerIsNegative and integerIsPositive.
If either is integerIsNegative or integerIsPositive is true, we the increment the index variable by one, as we then know the potential number should start after the prefix symbol. Additionally, when the prefix symbol is a minus, we assign -1 to the sign variable, so that later on we can multiply the parsed number by minus one, as mentioned earlier.
Confirming there is more after the plus or minus prefix
Next we need to confirm there is more to the potential integer than just a plus or minus sign.
bool inputIsEmpty = index >= input.Length;
if (inputIsEmpty)
{
return new ParseIntegerResult.Failure("Input does not contain any digits");
}
The index >= input.Length boolean expression above works to determine whether there is more than just a plus or minus symbol, because in that scenario, the index variable would have been incremented to 1, and the length of the string would also be 1, therefore meaning there are no more digits after the symbol. In that event, we return an error.
In the scenario where there is no plus or minus prefix, the index variable would be at 0 and the length would be at minimum 1, thereby the boolean expression would be false.
Checking each character is a digit
Now we know whether we have a positive or negative number, and that there are more characters to check on the input, we will be checking each additional character in the input to confirm whether it is a numerical digit or not.
long integer = 0;
for (; index < input.Length; index++)
{
char character = input[index];
bool characterIsNotDigit = character is < '0' or > '9';
if (characterIsNotDigit)
{
return new ParseIntegerResult.Failure("Input contains a non-digit character");
}
Firstly we define a result variable with the type long, which is to store the fully-parsed integer. You will see it later on, but the reason we use a long is that we want to know if there is any overflow of the int type. That essentially means, the number is either larger than the max size of an int, or lower than the minimum size of an int. To accomplish that we need a data type which can store larger numbers than an int.
Next we initiate a for loop which will end after the last character in the string has been evaluated. The index < input.Length expression is quite common in looping through strings. As mentioned earlier, as C# is zero-based, so after you iterate to the last character of a string, your index value will be no longer be smaller than the array size, therefore the loop will terminate.
Checking characters are digits via Unicode comparison
The first check we do inside the for loop to determine whether a character we are looking at is a digit between 0 to 9. The boolean expression used is character is < '0' or > '9', and the reason this works is due to Unicode and how individual characters of a string are evaluated by C#.
Unicode is a character encoding standard used widely by most countries around the world. It is composed of 154,998 characters as of version 16.0. We are talking letters, numbers, special symbols, even emojis, and characters of non-English languages. C# uses characters encoded in UTF-16, as mentioned in our Unicode safe reverse extension method article.
The C# team helpfully implemented the operators for greater than (>) and less than (<) for the char type, among others, so that we can compare individual characters in a string based on the Unicode table values. The table is a reference to look up Unicode values, for which we have included a small slice below. You can see the Unicode decimal notation, along with its corresponding integer number in C#. The reason the integer number 0 starts at the Unicode decimal 48, is that there are more characters preceding that on the Unicode side.
Table for Unicode decimals and C# integer numbers
Unicode decimal | C# integer number |
---|---|
48 | 0 |
49 | 1 |
50 | 2 |
51 | 3 |
52 | 4 |
53 | 5 |
54 | 6 |
55 | 7 |
56 | 8 |
57 | 9 |
With the support in C# for determining whether a Unicode character falls in the set of 0 to 9, it allows us to fail-fast when we iterate over a character that is not within that set. In that event, we return an error message and exit the function.
Parsing a character to an integer digit
We know the character we are looking at is a digit. Now we need to determine which digit it is by parsing. We can do this via some lovely Unicode arithmetic.
int characterUnicodeSubtracted = character - '0';
integer = integer * 10 + characterUnicodeSubtracted;
The method used to convert a Unicode char to an integer is simple and elegant. Remember in the Unicode table above that the Unicode value for 0 is 48. All we need to do to calculate the the Unicode value for another digit is to subtract that 48 from the Unicode value of the other digit.
For example:
- 9 has a Unicode value of 57, so 57 - 48 = 9
- 8 has a Unicode value of 56, so 56 - 48 = 8
- 7 has a Unicode value of 55, so 55 - 48 = 7
- and so on...
This parsed integer digit is then stored in a appropriately named integer variable named characterUnicodeSubtracted.
Handling bases when building up the integer result
We have successfully parsed a digit in the number. Now we need to calculate it's base within the result variable. It can be accomplished by some more simple arithmetic, such as result * 10 + characterUnicodeSubtracted.
What that expression is doing is for every iteration of the loop, it multiplies the running total of parsed integers by ten, then add the integer digit was just parsed. It will make more sense in the table below.
Here is an example of the number 1359:
Loop iteration | Result value before | Parsed number | Result value after | Calculation |
---|---|---|---|---|
1 | 0 | 1 | 1 | 0 * 10 + 1 = 1 |
2 | 1 | 3 | 13 | 1 * 10 + 3 = 13 |
3 | 13 | 5 | 135 | 13 * 10 + 5 = 135 |
4 | 135 | 9 | 1359 | 135 * 10 + 9 = 1359 |
Checking for integer overflow
Integers in C# are 32-bits in size and have a minimum and maximum value. For positive numbers the maximum limit is 2,147,483,647, and for negative numbers the minimum limit is -2,147,483,648. C# provides some handy constants for these two numbers, in the form of int.MinValue and int.MaxValue. We also need to check each time we iterate the loop that we are not overflowing the allowed integer value.
switch (sign)
{
case 1 when result > int.MaxValue:
{
return new ParseIntegerResult.Failure("Integer is higher than max value (2,147,483,647)");
}
case -1 when -result < int.MinValue:
{
return new ParseIntegerResult.Failure("Integer is lower than min value (-2,147,483,648)");
}
}
Checking for integer overflow is easy enough using a switch statement on the sign. Because the values for minimum and maximum integers are different (apart from the sign), we have a case for each (1 and -1), and return an error if the parsed integer matches either of those expressions. As mentioned earlier, this is why it was important to declare the result variable as a long, so that we could then check later on for an integer overflow.
Returning the final result
As the for loop finishes, all that remains is to convert the result to a negative number if required, and then return the result.
return new ParseIntegerResult.Success((int)(sign * result));
In the code above, we are returning the success result, with the result multiplied by the sign value, which in the case that the sign is negative, would multiply by minus 1, otherwise multiply by 1. It is a quick and easy way of flipping the number to a negative if needed, otherwise leave it along.
Summary
We do hope you enjoyed this little exercise. In terms of practical use, it is completely useless, as C# has it's own Parse and TryParse methods for integers.
It does however make you think about the following:
- How to approach requirements gathering
- How to define test cases and think with a test-first mindset
- How Unicode is used with characters in C#
- How characters can be subtracted from one another to parse a char to an int
Thank you for reading and please check our more of our Fun Exercises series. We will endeavour add more regularly.
Full implementation code
See below for a full implementation of the integer parser.
internal static class IntegerParser
{
public static ParseIntegerResult TryParse(string? input)
{
input = input?.Trim();
bool integerIsBlankOrNull = string.IsNullOrWhiteSpace(input);
if (integerIsBlankOrNull)
{
return new ParseIntegerResult.Failure("Input is null or blank");
}
int sign = 1;
int index = 0;
bool integerIsNegative = input![0] == '-';
bool integerIsPositive = input[0] == '+';
if (integerIsNegative)
{
sign = -1;
index++;
}
else if (integerIsPositive)
{
index++;
}
bool inputIsEmpty = index >= input.Length;
if (inputIsEmpty)
{
return new ParseIntegerResult.Failure("Input does not contain any digits");
}
long result = 0;
for (; index < input.Length; index++)
{
char character = input[index];
bool characterIsNotDigit = character is < '0' or > '9';
if (characterIsNotDigit)
{
return new ParseIntegerResult.Failure("Input contains a non-digit character");
}
// 0 is Unicode 48
// If we subtract the Unicode char for 0 (48) from the Unicode number for the character,
// we will get the difference between those two Unicode numbers represented as an integer.
// The answer will also be the actual number we want to parse.
// For example,
// 9 is Unicode 57
// 57-48 = 9
int characterUnicodeSubtracted = character - '0';
result = result * 10 + characterUnicodeSubtracted;
switch (sign)
{
case 1 when result > int.MaxValue:
{
return new ParseIntegerResult.Failure("Integer is higher than max value (2,147,483,647)");
}
case -1 when -result < int.MinValue:
{
return new ParseIntegerResult.Failure("Integer is lower than min value (-2,147,483,648)");
}
}
}
return new ParseIntegerResult.Success((int)(sign * result));
}
}
Bonus
A good friend of Illumonos has written a similar article in the Dart language. Please check out the Int Parser Challenge by Grab a Byte.
View the source code for this article on GitHub