The Corpus Search Window


The Corpus Search window is opened by clicking on the "Corpus Search" button on the Project window. Use this window to find segments in your corpus which match the criteria that you specify.

Specifying a query

You specify a search using the small widget in the upper left of the Search window:

Simple Feature Search: The search widget initially shows the name of the feature at the root of one of your coding schemes. If you press "Show", all segments tagged with that feature will be shown. You can click on this feature to select another feature to use, from any of your defined layers.

Searching for segments with combinations of features: Click on the small "+" next to the feature selector and select 'and', 'or', or 'and not'. A second feature will be presented. This allows you to find all segments which contain both features (with 'and'), either feature (with 'or'), or which contain the first feature but not the second ('with 'and not'). You can also click on the '+' again to add more features to the search query.

Searching for segments WITHIN another segment type: Click on the '+' and select 'in segment'. Another seaech widget will appear at the end of this one, as in the following:

This type of search allows you to search across layers, specifying that segments should match only if they are contained within segments at the second specified layer. For instance, the search query shown below will find all segments tagged as 'person' in documents tagged as 'editorial'.

Searching for segments CONTAINING another segment type: Click on the '+' and select 'containing segment'. Returns all segments tagged with the first feature which contain segments (possibly at another layer) tagged with the second feature. For instance, one might search for 'finite-clause containing person&subject', to find all finite clauses where the segment boundaries totally include a segment at the participant layer which is coded both person and subject.

String searches Click on the '+' and select 'containing string': this will allow you to find all segments with the nominated feature which contain a given string. Matching is not case sensitive.

NOTE: when searching for either "containing segment" or "containing string", the search query will display "containing [anywhere]". You can click on the "anywhere" to change it to "immediately". This term controls how matches are made.

Combining Complex Searches: One can combine complex searches, e.g., the following complex search:

Concordance Pattern Searching: the 'containing string' search can also be used for concordance searching (searching based on lexical features, wildcard matching, etc. (English only currently for use of lexical features). Add a "containing string" field to your search query, and then specify a lexical pattern instead of a simple string. For example, to find passive clauses, "be% @participle" will match all segments containing any form of 'be' followed by a participle verb (-en verb).

Note that the corpus is NOT tagged in terms of part of speech (POS). Rather, CorpusTool includes a large dictionary of English, and looks up each word in the dictionary. Because of this, a word will match all POS classes to which it belongs. For instance "be%" will match all occurrences of "being", even in the context where the word is not a verb, e.g., "the being".

Matching occurs as follows:

Case Insensitive: all searching is case insensitive. Thus 'Birch' will match 'Birch' and 'birch' and "BIRCH'.

The search string consists of a sequence of search tokens separated by a space. Each search token can be of the following format:

  1. Literal token: a token not containing *, #, @ or % will match the token itself only.
  2. Wildcard token : if the query token includes an '*', the '*' will match any number of chars. Thus
  3. Match any: a '#' by itself matches any single token.

    Match from beginning of segment: if your search string begins with a "^" character (followed by a space), then only matches at the beginning of the containing segment will be returned. E.g.

    clause containing anywhere '^ Bush'

    ... will return all segments tagged with feature 'clause' which start with 'Bush' or 'bush'.

    (The above 4 cases work for any language. Those below only currently work for English)

  4. Constraining by class: a wildcard form can be followed with '@' and then a lexical feature, and the form will match only tokens which, according to the system's lexicon, can take that lexical class. E.g., An asterisk cannot appear by itself, it must have text either before or after it. A full list of the lexical features that can be used are in Appendix II of the manual, or can be seen within the tool by selecting "Show Wordclass Network" from the Misc menu of CorpusTool.
  5. General class matching: If no token string is provided before the '@', then the query form matches all tokens which could represent the specified class. E.g.,
  6. Inflection matching: '%', at the end of a token indicates that all inflection forms of the token, which should be a root form, should be matched. Thus, To constrain the inflection matching to a limited set of inflections, one can add 'noun', 'verb', 'adjective' or 'pronoun' after the '%'. E.g., Note that wildcards cannot be used within % forms. Nor can the string before the % be blank.

Running a Query

After entering your query, you can hit the "Show" button. If your cursor is in a text field (Containing String), you can hit the Return Key.

Modifying a Query

To change a feature selection, just click on the feature to change it. To delete any of your search extensions, click on the keyword ("&", "/", "containing", "in") and click on "remove".

The Result Space

The white space below the Query space displays the results. Click on a result and the annotation file containing this segment will be opened at the right place. The three columns at the left indicate the state of each coding:

For each matching segment, a row is shown. The row starts with a magnifying glass icon (). Click on this icon to open this segment in the Coding window for that file.

Saving Search Results

Click on the "Save" button to save search results to a file, either in HTML or plain text format.

See also:

  • Autocoding: the Autocode window allows you to perform searches and automatically code all results with a given feature (e.g., search for passive patterns and code as passive).