A CURRENT CORPUS
OF TECHNOLOGY LANGUAGE IN
Blas Curado Fuentes
Universidad Politécnica de Madrid
Alejandro Curado Fuentes
Universidad de Extremadura
Rapid developments taking place in the
world of Information Technology greatly influence the way we look at Business
and Education today. In English-speaking countries, this certainly proves
to be the case with listening to news reports or scanning newspaper headlines,
where descriptions of new software or reviews of applications for the office
are published regularly. In countries where English is a foreign language,
this direct influence is researched in university settings. Such is the case
However, not any type of text material on business technology is relevant to or effective in learning English. The preparation of representative collections of sources, mainly in the form of specific corpora, should be a consistently performed task. Thus we can respond to objective criteria set forth by our context, the current Spanish job market. There are areas where advanced command of English is a must, and there are spheres where understanding written documents is a priority. In all cases, the knowledge of specialized vocabulary is required, chiefly in the form of academic and technical word combinations and collocations.
But which lexical items make up the core of professional vocabulary to be studied in college, which is indispensable in future careers of computer technicians, business executives or company engineers? The question should be addressed by a learner-centered corpus design that is effective at both academic and professional planes. This is attained, from our perspective on ESP (English for Specific Purposes) learning, through the compilation of recently published English texts in key subject areas. The selection of texts is based on the required reading lists for subjects on the curriculum at our institution covering specific issues and topics. For instance, the core field of Business Technology is part of several subjects studied in the specialist area of Business and Administration: Accounting, Economics, Finance, General Business, Management Information Systems, Marketing, and Statistics. Recent texts (published in the last couple of years) dealing with Business Technology should then be classified according to thematic variables.
The complete corpus should be restricted to few but representative sources in the demarcated area of Business Technology within these academic boundaries. The collection must be renewed and updated yearly to obtain as current a view on specific language as possible. The number of running words (tokens) is limited to 69,301, and there are 6,826 different items or types. Various genres are encompassed depending on learning stages; for example, for General Business, only textbook material is selected, whereas for Management Information Systems - a subject taken in the third year of studies, both technical reports and research articles (more complex genres) come into focus.
Wordlists in each of these categories prove to be highly convenient for drawing results. The items are contrasted with the overall corpus created. A concordance software like WordSmith Tools (Scott, 2000) enables us to perform this task easily and productively. The keywords thus identified in the subjects reveal the semantic essence of the groups, or in Scott’s words (1997), `the aboutness of the texts´, mainly defined by content words, and, in particular, nouns. Figure 1 lists the words denoting a great degree of key-ness, obtained by means of statistical analyses of the subject categories and the reference (overall) corpus. Also pointed out in Figure 1 are the genres and the number of sources where the particular words function.
Figure 1: Top five keywords in each subject heading
The data reveals contextual factor influence, mainly recognized by both topic- and genre-based occurrences. In Finance and General Business, for instance, the presence of the research article and textbook environment is respectively conveyed through the words. Such traits are also reflected in other cases, where the subject determines the type of pivotal lexis extracted – e.g. in Accounting and Statistics.
Lexicographic material in the form of specialized corpora and dictionaries or vocabularies proves useful as guidance in the survey of the results. In the contrastive view of our terminology, a corpus-based lexical analysis, such as James’ and Purchase’s (1996) on the English of Business and Economics, can be fruitful. The study can pinpoint word position in terms of frequency and dispersion; the aim is to check the range of use of our data as measured in a larger collection utilized as reference (nearly two million tokens). For instance, the item tax is highly frequent in Accounting texts in James and Purchase (1996) – 205 occurrences -, only surpassed by its appearance in Economics texts (255). Accounting sources include 144,927 running words, which means that tax is also one of the most frequent content items. The result proves to be positive in this respect for our own approach (see Figure 1).
However, the General Business keyword text (Figure 1) presents zero instances of utilization in James and Purchase (1996). This contradiction should not be discarded, since it significantly states that our work establishes its identity as a specific linguistic resource. Our students, learning about basic notions in Business Technology related to the general area of their studies, must cope with an emphasis on text as reference. This is often made plain by our own colleagues, who propose that the genre of textbooks is a primary source of information at this educational stage.
Due to the search for specificity in these texts, the language should be analyzed as functional / operative in the ESP arena. This means that we aim to provide learners with particular issues in Business Technology as it relates to the different subjects listed above. We must explore significant word behavior in the form of both specific collocations and common lexical combinations. These are technical and academic elements offering rich input for the learner. Figure 2 displays the most important ones in terms of both frequency and distribution in the subject texts.
Accounting (theme-based items for tax)
Economics (theme-based items for commerce)
Finance (genre-based items for article)
General Business (genre-based items for text)
Management Information Systems (theme-based items for effectiveness)
Marketing (genre-based items for com)
Statistics (theme-based items for percent)
Figure 2: Key collocations within subject boundaries
The majority of the elements in Figure 2 are assessed as technical collocations. This means that they are noun compounds narrowly defined within their subject sources. They may be labelled as either theme- or genre-based, given their tendency to describe the encircling textual environment according to either one. The example provided above with text in General Business is obvious, while article in Finance depicts the research material exploited, and com refers to the internet space of electronic reviews and reports from which lexical data are derived.
The lexical combinations found in a different arrangement of our corpus differs substantially from subject-driven words. This occurs in the management of the sources according to common core patterns. In other words, taking the collection as a whole, we seek to identify widespread lexical behaviour in our specific texts. The results are gathered in detailed consistency lists, which display items according to their frequency and dispersion in the seven genres encompassed (from left to right in Figure 3: Articles, Discussions, Textbooks, News, Reports, Abstracts and Reviews). Thus, there are seven genre columns in Figure 3, illustrating a significant presence of the words in question throughout the corpus. The top five content elements positioned within the 50 first slots are shown in Figure 3.
Figure 3: Top five content items in the detailed consistency list of genres in our corpus. N = Word position on list
RAs = Research articles / Disc = Discussions /
NAs = News articles / RPs = Technical reports /
TXs = Textbooks / Abs = Abstracts / Rev = Reviews
The focus is placed on common content words, as indicated above. Grammar items such as articles, prepositions and the like, are discarded. The aim is to exploit semi-technical words, including nouns, verbs (except indexical forms –e.g. have, be--, and modals -e.g. can, will), adjectives, and adverbs. The classification is made by first checking that the items occur in all genres, and secondly, according to frequency. The data is categorized as semi-technical - `subject-independent words´ (Farrell, 1990: 13) appearing throughout the whole corpus. The evaluation of their typical associations is regarded in close proximity to the one offered by lexical combination dictionaries such as Benson, Benson and Ilson (1997). In this respect, content items play a decisive role, as grammar is secondary in terms of its inferred status from lexis.
Figure 4 exemplifies common usage of the semi-technical elements given above (see Figure 3). The numbers in brackets indicate, in this order, the amount of sources where these items are located, and their frequencies.
Figure 4: Common core word combinations in our corpus
By elaborating charts in this fashion, we distinguish constructions that are less specific, freer in their development through general academic discourse. Similar language is often encountered in academic word lists (e.g. Coxhead, 1998).
We thus underline the distinct lexical spaces of technical collocations (Figure 2) and academic forms (Figure 4). For ESP, we claim that both approaches must be equally applied and exploited. For example, in an academic task-based exercise, having learners build their own frequency lists and deducing lexical priorities from reading material can be effected. The observation of common coreness in a noun like use, can be contrasted with lower frequency synonyms, – e.g. utilization. Different degrees of combination can then be checked: the use of (3 texts, 9 instances) vs. utilization rate (1 source, 4 occurrences). Learners are challenged with the development of their own lexical profiles based on their English material comprehension. It implies, as a matter of fact, coming to terms with detailed consistency lists across genre sub-corpora, instantiating common core constructions, typically occurring in their textual setting.
Dealing with more specific elements, the objetive should be placed on the identification of technical behavior. A collocation such as tax preparation is revised in its typical clauses within the subject texts (Figure 2). Longer stretches of language are thus viewed and assessed, given their high co-occurrence probability in the sources: tax preparation software, online tax preparation software, as Figure 5 displays:
Figure 5: Technical construction span examination
Lexical activities like these should allow for task development in the ESP classroom. They constitute vocabulary exploitation for academic and technical performance in such priority areas as delivering oral presentations on subject matter and writing summaries of specific reading material. The scope of ESP can thus become rather useful for achieving adequate language proficiency at both academic and professional planes.
Benson, M., E. Benson and R. Ilson (1997) The BBI Dictionary of English Word Combinations. Amsterdam: John Benjamins.
Coxhead, A. (1998) An Academic Word List. English Language Institute Occasional Publication No 18. Victoria University of Wellington.
Farrell, P. (1990) A Lexical Analysis of the English of Electronics and a Study of Semi-technical Vocabulary. Dublin: Trinity College.
James, G. and J. Purchase (1996) English in Business Studies and Economics. A Corpus-Based Lexical Analysis. Hong Kong: Longman.
Scott, M. (1997) "PC Analysis of Key Words and Key Key Words". System 25 (1): 1-13.
Scott, M. (2000) WordSmith. Oxford: Oxford University Press. 1st ed: 1996.About the authors
Blas Curado currently works for a company in
Alejandro Curado teaches English for Specific Purposes at a university in Spain.
|ESP World Copyright © 2002 Top|