2

 

Home

ISSN 1682-3257

English for Specific Purposes World

Web-based Journal
Links Issues Organisations Call for paper People Our history

A CURRENT CORPUS OF TECHNOLOGY LANGUAGE IN SPAIN: ENGLISH WORDS THAT MATTER

Blas Curado Fuentes

Universidad Politécnica de Madrid

Alejandro Curado Fuentes

Universidad de Extremadura

Rapid developments taking place in the world of Information Technology greatly influence the way we look at Business and Education today. In English-speaking countries, this certainly proves to be the case with listening to news reports or scanning newspaper headlines, where descriptions of new software or reviews of applications for the office are published regularly. In countries where English is a foreign language, this direct influence is researched in university settings. Such is the case of Spain, where English for Business and Computer Science is taught. Our experience repeatedly shows how practical the field of technical English can become for developing language proficiency.

However, not any type of text material on business technology is relevant to or effective in learning English. The preparation of representative collections of sources, mainly in the form of specific corpora, should be a consistently performed task. Thus we can respond to objective criteria set forth by our context, the current Spanish job market. There are areas where advanced command of English is a must, and there are spheres where understanding written documents is a priority. In all cases, the knowledge of specialized vocabulary is required, chiefly in the form of academic and technical word combinations and collocations. 

But which lexical items make up the core of professional vocabulary to be studied in college, which is indispensable in future careers of computer technicians, business executives or company engineers? The question should be addressed by a learner-centered corpus design that is effective at both academic and professional planes. This is attained, from our perspective on ESP (English for Specific Purposes) learning, through the compilation of recently published English texts in key subject areas. The selection of texts is based on the required reading lists for subjects on the curriculum at our institution covering specific issues and topics. For instance, the core field of Business Technology is part of several subjects studied in the specialist area of Business and Administration: Accounting, Economics, Finance, General Business, Management Information Systems, Marketing, and Statistics. Recent texts (published in the last couple of years) dealing with Business Technology should then be classified according to thematic variables.

The complete corpus should be restricted to few but representative sources in the demarcated area of Business Technology within these academic boundaries. The collection must be renewed and updated yearly to obtain as current a view on specific language as possible. The number of running words (tokens) is limited to 69,301, and there are 6,826 different items or types. Various genres are encompassed depending on learning stages; for example, for General Business, only textbook material is selected, whereas for Management Information Systems - a subject taken in the third year of studies, both technical reports and research articles (more complex genres) come into focus.

Wordlists in each of these categories prove to be highly convenient for drawing results. The items are contrasted with the overall corpus created. A concordance software like WordSmith Tools (Scott, 2000) enables us to perform this task easily and productively. The keywords thus identified in the subjects reveal the semantic essence of the groups, or in Scott’s words (1997), `the aboutness of the texts´, mainly defined by content words, and, in particular, nouns. Figure 1 lists the words denoting a great degree of key-ness, obtained by means of statistical analyses of the subject categories and the reference (overall) corpus. Also pointed out in Figure 1 are the genres and the number of sources where the particular words function.

Accounting Tax Clients Financial
Client Says (3 RAs, 6 Disc)

Economics Commerce Electronic Public
Information Technology (3 NAs, 2 RPs, 1 TX)

Finance Article Finance Journal
Association Site
(10 NAs, 12 Abs, 3 RAs)

General Business Text Chapter
Causal Trees Menu (1 TX)

Management Information Systems
Effectiveness MIS Information
Research Dimension
(3 RPs, 2 RAs)

Marketing Com Domain Names
Name Register
(3 Rev, 7 RPs)

Statistics Percent Personal Income
Production Pulse
(4 RAs, 1 TX)

Figure 1: Top five keywords in each subject heading

 

 RAs = Research articles  / Disc = Discussions / 

 NAs = News articles /  RPs = Technical reports /

 TXs = Textbooks / Abs = Abstracts / Rev = Reviews

The data reveals contextual factor influence, mainly recognized by both topic- and genre-based occurrences. In Finance and General Business, for instance, the presence of the research article and textbook environment is respectively conveyed through the words. Such traits are also reflected in other cases, where the subject determines the type of pivotal lexis extracted – e.g. in Accounting and Statistics. 

Lexicographic material in the form of specialized corpora and dictionaries or vocabularies proves useful as guidance in the survey of the results. In the contrastive view of our terminology, a corpus-based lexical analysis, such as James’ and Purchase’s (1996) on the English of Business and Economics, can be fruitful. The study can pinpoint word position in terms of frequency and dispersion; the aim is to check the range of use of our data as measured in a larger collection utilized as reference (nearly two million tokens). For instance, the item tax is highly frequent in Accounting texts in James and Purchase (1996) – 205 occurrences -, only surpassed by its appearance in Economics texts (255). Accounting sources include 144,927 running words, which means that tax is also one of the most frequent content items. The result proves to be positive in this respect for our own approach (see Figure 1).

However, the General Business keyword text (Figure 1) presents zero instances of utilization in James and Purchase (1996). This contradiction should not be discarded, since it significantly states that our work establishes its identity as a specific linguistic resource. Our students, learning about basic notions in Business Technology related to the general area of their studies, must cope with an emphasis on text as reference. This is often made plain by our own colleagues, who propose that the genre of textbooks is a primary source of information at this educational stage. 

Due to the search for specificity in these texts, the language should be analyzed as functional / operative in the ESP arena. This means that we aim to provide learners with particular issues in Business Technology as it relates to the different subjects listed above. We must explore significant word behavior in the form of both specific collocations and common lexical combinations. These are technical and academic elements offering rich input for the learner. Figure 2 displays the most important ones in terms of both frequency and distribution in the subject texts.

Accounting (theme-based items for tax)

Tax preparation tax -efficient tax returns tax funds

Tax preparation software On-line tax preparation software

Tax-efficient investing 

Economics (theme-based items for commerce)

Electronic commerce global electronic commerce

Department of Commerce the Commerce Department

Facilitate + electronic commerce 

Finance (genre-based items for article)

Article abstract full text of this article research article

General Business (genre-based items for text)

Text panel text file text / equation

Management Information Systems (theme-based items for effectiveness)

MIS effectiveness the effectiveness of MIS

Marketing (genre-based items for com)

.com com/img/ dot com the com domain

Statistics (theme-based items for percent)

Percent change percent in # above # percent 

Percent of disposable income percent of personal income

Figure 2: Key collocations within subject boundaries

The majority of the elements in Figure 2 are assessed as technical collocations. This means that they are noun compounds narrowly defined within their subject sources. They may be labelled as either theme- or genre-based, given their tendency to describe the encircling textual environment according to either one. The example provided above with text in General Business is obvious, while article in Finance depicts the research material exploited, and com refers to the internet space of electronic reviews and reports from which lexical data are derived. 

The lexical combinations found in a different arrangement of our corpus differs substantially from subject-driven words. This occurs in the management of the sources according to common core patterns. In other words, taking the collection as a whole, we seek to identify widespread lexical behaviour in our specific texts. The results are gathered in detailed consistency lists, which display items according to their frequency and dispersion in the seven genres encompassed (from left to right in Figure 3: Articles, Discussions, Textbooks, News, Reports, Abstracts and Reviews). Thus, there are seven genre columns in Figure 3, illustrating a significant presence of the words in question throughout the corpus. The top five content elements positioned within the 50 first slots are shown in Figure 3. 

N  Word Files Total RAs Disc TXs NAs RPs Abs Rev

 

34  New 7 117 28 15 42 1 4 17 10

38  Use 7 88 14 3 52 8 5 2 4

44  Industry 7 59 24 1 20 3 4 2 5

49  State 7 43 14 5 13 2 4 2 3

50  Need 7 42 7 3 19 1 5 2 5

Figure 3: Top five content items in the detailed consistency list of genres in our corpus. N = Word position on list

 RAs = Research articles / Disc = Discussions / 

 NAs = News articles / RPs = Technical reports  /

 TXs = Textbooks / Abs = Abstracts / Rev = Reviews

The focus is placed on common content words, as indicated above. Grammar items such as articles, prepositions and the like, are discarded. The aim is to exploit semi-technical words, including nouns, verbs (except indexical forms –e.g. have, be--, and modals -e.g. can, will), adjectives, and adverbs. The classification is made by first checking that the items occur in all genres, and secondly, according to frequency. The data is categorized as semi-technical - `subject-independent words´ (Farrell, 1990: 13) appearing throughout the whole corpus. The evaluation of their typical associations is regarded in close proximity to the one offered by lexical combination dictionaries such as Benson, Benson and Ilson (1997). In this respect, content items play a decisive role, as grammar is secondary in terms of its inferred status from lexis.

Figure 4 exemplifies common usage of the semi-technical elements given above (see Figure 3). The numbers in brackets indicate, in this order, the amount of sources where these items are located, and their frequencies.

NEW

New business (3 / 8) new businesses (2 / 8)

New media (2 / 15)

New services (1 / 4) new technologies (1 / 3)

USE

 The use of (3 / 9) for use in (2 / 4)

To use the (2 /4) to use them (2 / 4)

For use in designing (1 / 3) try to use the (1 / 3)

INDUSTRY

In the industry (2 / 4)  industry self-regulation (1 / 6)

Telecommunications industry (1 / 3) computer industry (1 / 3)

STATE

State and local government (3 / 8) state law (2 / 4)

State university (1 / 3)

NEED

You need to (8 / 14)  the need for + N (5 / 8) they need to (4 / 5)

You need to know (3 / 4) you need to make (2 / 4)

 Figure 4: Common core word combinations in our corpus

By elaborating charts in this fashion, we distinguish constructions that are less specific, freer in their development through general academic discourse. Similar language is often encountered in academic word lists (e.g. Coxhead, 1998).

We thus underline the distinct lexical spaces of technical collocations (Figure 2) and academic forms (Figure 4). For ESP, we claim that both approaches must be equally applied and exploited. For example, in an academic task-based exercise, having learners build their own frequency lists and deducing lexical priorities from reading material can be effected. The observation of common coreness in a noun like use, can be contrasted with lower frequency synonyms, – e.g. utilization. Different degrees of combination can then be checked: the use of (3 texts, 9 instances) vs. utilization rate (1 source, 4 occurrences). Learners are challenged with the development of their own lexical profiles based on their English material comprehension. It implies, as a matter of fact, coming to terms with detailed consistency lists across genre sub-corpora, instantiating common core constructions, typically occurring in their textual setting.

Dealing with more specific elements, the objetive should be placed on the identification of technical behavior. A collocation such as tax preparation is revised in its typical clauses within the subject texts (Figure 2). Longer stretches of language are thus viewed and assessed, given their high co-occurrence probability in the sources: tax preparation software, online tax preparation software, as Figure 5 displays:

 

Figure 5: Technical construction span examination

Lexical activities like these should allow for task development in the ESP classroom. They constitute vocabulary exploitation for academic and technical performance in such priority areas as delivering oral presentations on subject matter and writing summaries of specific reading material. The scope of ESP can thus become rather useful for achieving adequate language proficiency at both academic and professional planes.

Bibliography

Benson, M., E. Benson and R. Ilson (1997) The BBI Dictionary of English Word Combinations. Amsterdam: John Benjamins.

Coxhead, A. (1998) An Academic Word List. English Language Institute Occasional Publication No 18. Victoria University of Wellington.

Farrell, P. (1990) A Lexical Analysis of the English of Electronics and a Study of Semi-technical Vocabulary. Dublin: Trinity College.

James, G. and J. Purchase (1996) English in Business Studies and Economics. A Corpus-Based Lexical Analysis. Hong Kong: Longman.

Scott, M. (1997) "PC Analysis of Key Words and Key Key Words". System 25 (1): 1-13.

Scott, M. (2000) WordSmith. Oxford: Oxford University Press. 1st ed: 1996.

About the authors

Blas Curado currently works for a company in Spain under a fellowship in Engineering. In addition, his interest in linguistics brought him to the observation and study of specialized terms in context.

Alejandro Curado teaches English for Specific Purposes at a university in Spain.

ESP World Copyright © 2002 Top