[c#] How to parse a text file with C#

By text formatting I meant something more complicated.

At first I began manually adding the 5000 lines from the text file I'm asking this question for,into my project.

The text file has 5000 lines with different length.For example:

1   1   ITEM_ETC_GOLD_01    ??(?)   xxx xxx xxx_TT_DESC 0   0   3   3   5   0   180000  3   0   1   0   0   255 1   1   0   0   0   0   0   0   0   0   0   0   -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_money_small.bsr    xxx xxx xxx 0   2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1   ??? ??? ?(param1??) -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   4   ITEM_ETC_HP_POTION_01   HP ?? ??    xxx SN_ITEM_ETC_HP_POTION_01    SN_ITEM_ETC_HP_POTION_01_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   60  0   0   0   1   21  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_01.ddj   xxx xxx 50  2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP???   0   HP???(%)    0   MP???   0   MP???(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   5   ITEM_ETC_HP_POTION_02   HP ??? (?)  xxx SN_ITEM_ETC_HP_POTION_02    SN_ITEM_ETC_HP_POTION_02_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   110 0   0   0   2   39  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_02.ddj   xxx xxx 50  2   0   0   2   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP???   0   HP???(%)    0   MP???   0   MP???(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

The text between the first character(1) and the second character(1/4/5) is not a whitespace,it's a tab.There's no whitespaces in that text file.

What I want:

I want to get the second integer(In the three lines I posted above,the second integers are 1,4 and 5) and the string in the middle of each line indicating the path(It starts with "item\" and ends with the file extension ".ddj").

My problem:

When I google "Text formatting C#" - all I get is how to open a text file and how to write a text file in C#.I don't know how to search for text inside a text file.Also I can't search for the first integer,because in case its a small integer like in the three lines I posted above,I wont be able to find the corrent location,because for example "1" might exist in a different location.

My question:

It would be the best If I write a program that would delete anything,but what I need.

The other way in my mind is to directly search inside that file,but as I mentioned above - I might get the wrong location of the second integer if its too low.

Please suggest something,I can't format all this by hand.

This question is related to c# parsing text

The answer is


Try regular expressions. You can find a certain pattern in your text and replace it with something that you want. I can't give you the exact code right now but you can test out your expressions using this.

http://www.radsoftware.com.au/regexdesigner/


You could do something like:

using (TextReader rdr = OpenYourFile()) {
    string line;
    while ((line = rdr.ReadLine()) != null) {
        string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
        int theInt = Convert.ToInt32(fields[1]);
    }
}

The reason you didn't find relevant result when searching for 'formatting' is that the operation you are performing is called 'parsing'.


One way that I've found really useful in situations like this is to go old-school and use the Jet OLEDB provider, together with a schema.ini file to read large tab-delimited files in using ADO.Net. Obviously, this method is really only useful if you know the format of the file to be imported.

public void ImportCsvFile(string filename)
{
    FileInfo file = new FileInfo(filename);

    using (OleDbConnection con = 
            new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
            file.DirectoryName + "\";
            Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))
    {
        using (OleDbCommand cmd = new OleDbCommand(string.Format
                                  ("SELECT * FROM [{0}]", file.Name), con))
        {
            con.Open();

            // Using a DataReader to process the data
            using (OleDbDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    // Process the current reader entry...
                }
            }

            // Using a DataTable to process the data
            using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
            {
                DataTable tbl = new DataTable("MyTable");
                adp.Fill(tbl);

                foreach (DataRow row in tbl.Rows)
                {
                    // Process the current row...
                }
            }
        }
    }
} 

Once you have the data in a nice format like a datatable, filtering out the data you need becomes pretty trivial.


Like it's already mentioned, I would highly recommend using regular expression (in System.Text) to get this kind of job done.

In combo with a solid tool like RegexBuddy, you are looking at handling any complex text record parsing situations, as well as getting results quickly. The tool makes it real easy.

Hope that helps.


You could open the file up and use StreamReader.ReadLine to read the file in line-by-line. Then you can use String.Split to break each line into pieces (use a \t delimiter) to extract the second number.

As the number of items is different you would need to search the string for the pattern 'item\*.ddj'.

To delete an item you could (for example) keep all of the file's contents in memory and write out a new file when the user clicks 'Save'.


Another solution, this time making use of regular expressions:

using System.Text.RegularExpressions;

...

Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");

StreamReader reader = FileInfo.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) {
    Match match = parts.Match(line);
    if (match.Success) {
        int number = int.Parse(match.Group(1).Value);
        string path = match.Group(2).Value;

        // At this point, `number` and `path` contain the values we want
        // for the current line. We can then store those values or print them,
        // or anything else we like.
    }
}

That expression's a little complex, so here it is broken down:

^        Start of string
\d+      "\d" means "digit" - 0-9. The "+" means "one or more."
         So this means "one or more digits."
\t       This matches a tab.
(\d+)    This also matches one or more digits. This time, though, we capture it
         using brackets. This means we can access it using the Group method.
\t       Another tab.
.+?      "." means "anything." So "one or more of anything". In addition, it's lazy.
         This is to stop it grabbing everything in sight - it'll only grab as much
         as it needs to for the regex to work.
\t       Another tab.

(item\\[^\t]+\.ddj)
    Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?

Examples related to text

Difference between opening a file in binary vs text How do I center text vertically and horizontally in Flutter? How to `wget` a list of URLs in a text file? Convert txt to csv python script Reading local text file into a JavaScript array Python: How to increase/reduce the fontsize of x and y tick labels? How can I insert a line break into a <Text> component in React Native? How to split large text file in windows? Copy text from nano editor to shell Atom menu is missing. How do I re-enable