Skip to Main Content

University Library, University of Illinois at Urbana-Champaign

Text Mining Tools and Methods

This guide contains resources for researching with text mining

What is SAS Text Miner?

SAS Text Miner is a text mining plug-in for the SAS Enterprise Guide that provides different capabilities for analyzing text.

Tips for using SAS Text Miner

  • Data is easiest to use when it is in a SAS file already. You load the data in using the new data source command in the file menu.
  • When importing data from Excel, you will need to use the data import filter or macro from the sample menu above your diagram.  On this guide, we will only cover importing SAS data sources.
  • Enterprise miner uses icons and menus to function which is different from the SAS base program. However, there is still a log file similar to SAS base. Also the run command icon is the same as well. The library and work folder setup is also the same as in base SAS.
  • Text miner works in a procedural, step-based process meaning that certain procedures must be added in sequential order and ran first in order to complete the next step.

Get Started with SAS Text Miner

Creating a Project

Starting in SAS Text Miner you need to create a new project.

1. Select create a new project and save it in whichever folder you prefer.

2. After this, a workstation will open up with different menus.

Creating a Diagram

1. To start you will need to create a diagram (blank canvas for the procedures). To create a diagram go to the file menu and hold your cursor over the word, New. A menu to the right should pop-up select the diagram option.

2. A pop-up window will appear asking you to name the diagram. Type your diagrams’ name into the box and select ok. A blank diagram or page will appear on the right of the side of the screen.

Setting Up the Data

1. Once you had the diagram created, you must read in a data source. To load a SAS data file go to the File menu and hold your cursor over the word, New. A menu to the right should pop-up select the data source option.

2. A data source window will pop-up that will first ask you to select your metadata source. The default source is SAS table, use this and select next.

3. The next page will ask you to find the location of the SAS table select the browse button and find the folder with your files. Before beginning your SAS project, you should place all your SAS data files into the same location where you created your project which will show in the SAS libraries.

4. Click on the SAS library location on the left. Our data is located in a folder called Data. In the menu to the right, different SAS files will be listed.

5. Select the data file you will use and press ok.

6. On the next page the file you shoes will be highlighted in the dialogue box. Select the next button.

7. The next page will show the data properties of the file. Select next.

8. The following page will ask about metadata advisor options, for this we will use the default, basic. Then select next.

9. The next page will provide information about the column metadata. Select next.

10. The next page will ask you if you wish to create a sample data set. The default is no, select this option and select next.

11. The next page will ask you if you want to change the name or the role of data. If you wish to change the name change the name. The role default is raw. Select next.

12. The next page will provide a summary of the metadata, stating that the creation of the metadata is complete. Select the finish button.

13. Once you have the diagram created you must drag the data file onto the diagram from the data sources menu in the upper-left-hand side.

14. You will notice that a new dataset appears in the upper left menu under the data sources.

 

Running Text Miner

Running the Text Parsing Procedure

1. In order to run a procedure you must drag icons from the menus from the toolbars above the diagram onto your diagram (see picture below).

 

Image of SAS toolbar

 

2. To start, you must first place your data source icon onto the diagram. To do this you will drag the data source icon from the menu from the data sources menu onto your blank diagram that will look like this.

 

 

 

 

 

 

3. Next, we will use the text mining menu; select the text mining tab above the diagram. To add the text parsing procedure we will need to select the icon which looks like a page with text and a magnifying glass Magnifying Glass.  Select this icon and drag it onto the diagram and place it to the right of your data source.

4. On your diagram, you are building a train of procedures that are linked to each other in sequential order. You must connect the procedures by the order of the process using arrows. To create a process arrow from the data source to the text parsing procedure, you will need hover your cursor to the right of the data source icon. When you do this your cursor will turn into a pencil icon. Hold the mouse button down and drag the arrow to connect to the text parsing icon.  It will look something like this when you have drawn the arrow.

Image of drawn arrows in SAS

 

5. The text parsing procedure filters out words on the stop list and performs frequency counts and parts of speech classification for all text in the data file.

6. To run the text parsing procedure hover your cursor over the text parsing icon on your diagram and right click. A menu will pop-up, on this menu select the run option. A confirmation window will pop-up asking if you want to run this path, text parsing, on your diagram. Select yes. The procedure will run for a little while depending on how big your data set is.

7. When it is done, a window will appear indicating the run status this will indicate that the data run procedure was completed. Select the results button to view the results window.

8. SAS creates multiple fancy interactive graphs for results. However, when you export them they will not be interactive anymore because they are saved as .bmp files. To save a specific graph click on it on the screen and then select Save As from the file menu.

9. To explore the interactivity of the graphs see the picture below. In the graphs below the term not is selected in the middle Terms graph. It shows that this word appeared 2,4885 times in 16,175 documents and was classified as an adverb in the dataset. In the graph above, The Number of Documents by Frequency, highlights where the not term is on the graph with a dark black outline on the blue square. If you hover your cursor over the data point it will tell you which term it is and the number of documents it appears in and the frequency. The same information can be found in the Zipf Plot (zero-inflated poisson graph) in the upper right-hand corner, the not data point is outlined in dark black lines. In the Role by Freq graph in the lower left-hand corner, you will notice a slightly darker underline at the base of the adv (adverb) column on chart, specifying where the adverb not falls in the data chart.

 

image of graphs using SAS

 

Additional resources for SAS