c program parsing and tokenize from file using classes
C++ parser and tokenizer using classes
Design and implement a C++ program to process commands (to parse and tokenize) using classes for that purpose. You will use these classes to extract the passages that form the story from a much larger input file.
You will write code to parse out the different passages in a work of interactive fiction. Reading in and interpreting text is often referred to as parsing. The first step in parsing input is to tokenize the input; that is, break it down into smaller chunks, called tokens, which can be analyzed, and the string of tokens can then be interpreted by the parser. For exceptionally complex input, this tokenization process may even be multi-level, with one tokenizer breaking the initial input into coarse tokens that are then fed into another tokenizer to be broken down into smaller tokens.
Your goal is to write a pair of classes to tokenize the passages in interactive fiction stories. The â€œmainâ€ class, StoryTokenizer, will take in the text of an interactive fiction story (often stored in HTML files), which it will then break up into PassageToken objects, each of which represent one passage in the IF story (similar to a chapter).
Interactive fiction works are divided into passages, which appear inside the HTMLtag<tw-passagedata>.Eachpassagewillstartwith <tw-passagedata
…>andwillend with </tw-passagedata>. In addition to starting with
<tw-passagedata, the opening tag will specify some attributes, one of which will be the name of the passage, and the body of the passage will be between the opening and closing tags.
<tw-passagedata pid=”1″ name=”start” tags=”” location=”100,100″> The body of the passage will be here.
Your StoryTokenizer should have two member functions: hasNextPassage and nextPassage. As can be inferred from the name, hasNextPassage returns whether the story contains another passage (i.e., one that has not been read in yet), while nextPassage returns a PassageToken object describing the passage. It should also have a constructor that accepts a string containing the story to tokenize.
PassageTokens should have two member functions, getName and getText, as well as an appropriate constructor. The getName member function should return the name of the passage, specified as by the name attribute of the starting
<tw-passagedata> tag. The name of the example passage above is â€œstartâ€. The getText member function should return the text of the passage (between the starting tag <tw-passagedata…> and the ending tag </tw-passagedata>).In the example above, this text would contain â€œThe body of the passage will be here.â€, with newlines before and after. An invalid PassageToken (e.g., the return result of nextPassage when there are no more passages) should return an empty string for its name and text. The arguments of the constructor (and data members of PassageTokens) are up to you, as a PassageToken will only be constructed by the StoryTokenizer.
Assembling the Code
You have been provided with a main function that will read in a story from input.txt and use your StoryTokenizer and PassageToken classes to break down that story into its constituent passages. Your tokenizer should appropriately ignore any text in the input file that is not part of a passage. You have also been provided with a couple of example input file you can use to test your tokenizer.
Though there is more than one way to implement your tokenizer, you may wish to take advantage of the find, substr, and/or at member functions of the string class when implementing your code. Check the online documentation (www.cplusplus.com) for more information.
You should submit header and source files for your StoryTokenizer and PassageToken classes as a zip archive. You may combine both of them into a single header and single source file, or you may submit two of each. If you do not combine the headers together, you should #include the PassageToken header at the top of your StoryTokenizer header (storytokenizer.h).
The output should be the same as the other example. Opening this file in a web browser will allow you to play through the story.