Previously I had written about how I enjoy regular expressions and how I get to use them occasionally. Recently for work I had to revisit some old code. The task was to recreate an old process that parsed a text file where each line had fixed width columns. I had the old program that was using a SQL Server DTS package and I was able to work my way through the steps until I figured out where each column was and what was written to in the database. It was even more fun when I learned that negative numbers were represented with letters but more on that later.
I couldn't use the original DTS package with the new SQL Server and the process would fail when I tried to recreate it with SSIS. Well, of the four files only one would play nice. I created a console application to import the three remaining files.
The program only needed to get kicked off by the SSIS package and the code was dead simple. Loop through a file and break the line into parameters for the stored procedure to insert into the database. Not amazingly efficient but simple with some for loops. The best part was that RegEx came to the rescue. The line length varied in the files. The first two were small, less than 200 characters per line but the last file was 1800 characters per line. I needed to make sure I grabbed the correct substrings. I could have, eventually, used many Substring functions but instead RegEx simplified the process.
Example RegEx:
^(?<Field1>.{25})(?<Field2>.{12})(?<Field3>.{1})(?<Field4>.{12})(?<Field5>.{12})
I didn't actually use Field(1-5). I had about 80 fields that I matched to my stored procedure. The expression is using named groups that capture any character a specific number of times. The end result was that I just needed to set my parameter with the captured value.
string field1 = match.Groups["Field1"].Value;
string Field2 = match.Groups["Field2"].Value;
...
sqlcmd.Parameters.Add("@Field1", System.Data.SqlDbType.Char, 25).Value = field1.Trim();
sqlcmd.Parameters.Add("@Field2", System.Data.SqlDbType.Char, 12).Value = field2.Trim();
...
Once I had all the strings I could convert them to the proper data types as needed.
In my opinion it is much more readable than doing multiple Substrings with different start indexes. The only downside is that if I do need to manually review a line I will need to figure out the start index by summing each captured groups' capture length (the number in brackets) until the field I need. Still a trade-off that is worth it to me.
As I said negative numbers were represented with letters. Below is the actual function I used with my notes in comments as I built it. Sort of fun to see my though process, whether that is good or bad who can say.
public decimal ConvertDec(string s)
{
if (s.Trim().Length == 0)
{
return 0;
}
bool isneg = false;
//check if the last character is a special character that means negative number, }, J-R
//sometimes the letter is caps and sometimes not so replace both
switch (s[s.Length-1].ToString().ToLower()[0])
{
case '}':
isneg = true;
s = s.Replace("}", "0");
break;
case 'j':
isneg = true;
s = s.Replace("j", "1").Replace("J", "1");
break;
case 'k':
isneg = true;
s = s.Replace("k", "2").Replace("K", "2");
break;
case 'l':
isneg = true;
s = s.Replace("l", "3").Replace("L", "3");
break;
case 'm':
isneg = true;
s = s.Replace("m", "4").Replace("M", "4");
break;
case 'n':
isneg = true;
s = s.Replace("n", "5").Replace("N", "5");
break;
case 'o':
isneg = true;
s = s.Replace("o", "6").Replace("O", "6");
break;
case 'p':
isneg = true;
s = s.Replace("p", "7").Replace("P", "7");
break;
case 'q':
isneg = true;
s = s.Replace("q", "8").Replace("Q", "8");
break;
case 'r':
isneg = true;
s = s.Replace("r", "9").Replace("R", "9");
break;
}
//some are 10, some are 12, some are 8 maybe.
decimal d = decimal.Parse(s.Substring(0, s.Length-2) + "." + s.Substring(s.Length-2, 2));
return (isneg ? d * -1 : d);
}