(i) converted lexicographic data from 25 sources, including several extensive and complete dictionaries, totaling over 148,000 words (see 1a)
(ii) received from other researchers contributions of lexicographic data from about 140 languages totaling 198,000 words (see 1b)
(iii) provided to researchers an integrated comparative database with cross-platform query tools (see [[section]]2.2)
(iv) assisted or advised in the conversion of numerous other data sources
(v) developed (in collaboration with other researchers) cartographic tools geared toward Bantu linguistic research (see [[section]]2.3.1)
(vi) aggregated bibliographies from disparate sources into a coherent annotated bibliography containing over 6,800 citations for use by the Bantu research community (see [[section]]2.3.2)
(vii) provided data and resources used in the production of two completed dissertations and numerous research articles (see [[section]]2.4)
Thus, we have accomplished the first major phase of setting up this common resource. In addition, CBOLD has functioned as:
(i) a clearing house to receive and distribute lexical materials
(ii) a provider and developer of other tools, e.g. MapMaker, CBOLD bibliography, index of Bantu language names, database templates (see [[section]]2.2.2-[[section]]2.2.3)
(iii) an advice center to a growing number of dictionary-makers at universities and in the field who have contacted us
(iv) a focus and catalyst for future development of Bantu linguistics
Four of Berkeley's Africanist students have also completed their Ph.D.'s and now hold university faculty positions: Josephat Rugemalira (U Dar es Salaam), Kathleen Hubbard (U.C. San Diego), Cheryl Zoll (M.I.T.), Joyce Mathangwane (U. Botswana). Others who are continuing to work on CBOLD as graduate students are the following: (i) Jeri Moxley, who is publishing on velar palatalization with the PI, will produce a noun class database that will be used for morphological and semantic work for her dissertation; (ii) Armindo Ngunga (from Mozambique) has produced an extensive Yao P.21 lexicon of 7000+ entries which he has used to prepare several presentations and submitted papers (one on preconsonantal nasality with the PI). (iii) Galen Sibanda (from Zimbabwe) arrived at Berkeley this past academic year to do lexical work on his language, Ndebele S.44. The Berkeley graduate students had exclusive responsibility for organizing a day-long Special Session entitled "Historical issues in African linguistics" during the Berkeley Linguistics Society meeting on February 18, 1994. It is vitally important to the PI that the project contribute to the training of graduate students, and that they be especially encouraged to develop as independent researchers in the fields of general and Bantu linguistics. In addition to the Berkeley students, the PI is frequently contacted by students working on Bantu at other universities and provides verbal and written comments on their work to them. As part of the general openness of the project, CBOLD has encouraged both Berkeley and other graduate students to participate and, wherever possible, to use the lexicons and tools that we produce.
Although we do not repeat all of the rationale in the original funded proposal for creating CBOLD, it can be inferred from the following description of our goals and the range of accomplishments and activities sponsored by the project.
(1) Dictionaries converted by CBOLD (by Guthrie Language number; counts are approximate)
Language ID Source of Data Form --------------------------------------- Londo A.11 Kuperus 1985 1800 Tunen A.44 Dugast 1967 4190 Tiene B.81 Ellington 1977 580 Koyo C.24 Hyman/Ndzambo 1600 1996 Bamwe C.30 Samarin n.d. 126 Bobangi C.32 Whitehead 1899 9500 Lingala C.36d Dzokange 1979 7200 Holoholo D.28 Coupez 1955 728 Nande D.42 Kavutirwaki 1978 2224 Shi D.53 Polak-Bynon 1978 2391 Kiga E.13 Taylor 1959 12700 Ganda E.15 Snoxall 1967 11243 Nyambo E.21 Rugemalira 1993 1230 Sukuma F.21 Mann 1966 3253 Swahili G.42 Rugemalira 1993 1399 Kongo H.16 Swartenbroeckx 25659 Laadi H.16f Jacquot 1982 9300 Yaka H.31 Ruttenberg 1971 3791 Lozi K.21 Jalla 1937 11200 Pende L.11 Gusimana 1972 8700 Tonga M.64 Turner 1952 1900 Cewa N.31b Sc&Heth. 1957 5295 Yao P.21 Sanderson 1954 7433 Kwanyama R.21 Turvey 1977 7200 Shona S.10 Taylor 1967 38000 Kalanga S.16 Mathangwane 1996 3765In addition, CBOLD has received about 140 lexicons and dictionaries from outside contributors. A list of these materials is given in (5) in [[section]]2.4 below. Many of these lexicons have already been exploited for linguistic research. CBOLD has been successful in obtaining copyright releases for most of the copyrighted material. Oxford University Press has been quite generous in this regard. The machine-readable contributions by other scholars have generally been unrestricted, e.g. the Tanzanian Language Survey (TLS) data provided by Prof. Derek Nurse (Memorial University, Newfoundland). To facilitate the distribution of this material, CBOLD has circulated a data-sharing agreement for signature by Bantu researchers. The agreement, designated "The Bantuists' Manifesto", affirms the consent of members of the Bantuist academic community to the dissemination of their data. To date the document has been signed by 32 current and near-term CBOLD contributors, demonstrating their support for this project.
(i) Several format recognition programs have been written to recognize the structure of dictionary entries in text format, making it possible to extract particular data elements for further processing. These programs take advantage of specific conventions (e.g. layout and typography) about the dictionary text in order to divide the entries into headword, part-of-speech, definidia, etc. In the scanned Luganda dictionary (Snoxall 1967), for example, the basic lexicographic data is distinguished (by SGML tags) from the extended dictionary entry (which contains proverbs, example sentences, usage notes, etc.) which, while useful, have not yet found a specific use in the CBOLD research program. It is likely that these example sentence "mini-corpora", a rich source of data for comparative Bantu syntax and semantics, will be useful to other researchers.
(ii) Two database templates in FileMaker Pro have been used by researchers to create dictionaries of Bantu languages. Template I was developed at the Laboratoire Dynamique du Langage (DDL) in Lyon by Joel Brogniart has been used to create Bantu language dictionaries, several of which are now part of the CBOLD database. It is a highly refined scheme for entering lexicographic data by way of a syllabic template, and for connecting modern forms to preexisting reconstructions. Template II, a simple format for entry of Bantu-specific lexical data, was developed at CBOLD in Berkeley by John Lowe. It has been used to enter data for nine of the dictionaries produced by CBOLD. Both templates are available on the CBOLD FTP site.
(iii) The CBOLD data representation standard is a draft document specifying several varieties of text documents which can be easily loaded into the CBOLD database. The standard, which provides a simple markup language and format specification, is designed to permit contributors to convert existing files into a suitable format, quickly and easily, usually by making a few simple changes with a text editor. Most database export formats are supported (with minor caveats) by the standard. Other tools developed at DDL and CBOLD are designed to facilitate the retranscription of data in different fonts and formats and to insert markup tags into unformatted text. Several of these have been made available to researchers on an individual basis.
(iv) A morphological stemmer. One of the first analyses performed on an machine-readable dictionary (MRD) is the segmentation of headwords into prefix and stem. This is carried out with a simple stemmer which compares a predefined list of prefixes and their common allomorphs with dictionary headword and inserts a delimiter between the prefix and the stem. Since this program can take a "left-edge in" approach and since only the first and longest possible segmentation is made, there is no ambiguity in the parsing. So, for example, the program can separate prefixes ku- (before consonants) and kw- (before vowels) from the stems of verbs, and is not confused by nouns which have concord prefixes which begin with either o- or omu-/omw-. Of course, this procedure is a heuristic; a certain number of the segmentations are in error, and the entire result must be verified by a linguist familiar with the data. The program can also assign noun class and part-of-speech designations based on the occurrence of prefixes, facilitating the inclusion of this basic categorial information in wordlists.
(ii) A finite state transducer calibrated for Bantu syllable structure is used to analyze the stem forms into phonological constituents, assigning constituents (e.g. segments) to canonical CV templates. This analysis makes it possible to search the dictionaries using boolean queries based on phonotactic properties. For example, it is possible to search the database to find all cases where vowels of a certain user-defined class are followed by vowels of another class (i.e. to determine the range of co-occurrence of segments in the V1 and V2 slots in Bantu syllables). This technique is being used by the PI to study vowel harmony properties of specific Bantu languages (cf. [[section]]3.4.1).
(2) (a) Languages for which CBOLD has data (b) 5 vs. 7 Vowels in Bantu Languages
(3) Environments of Velar Palatalization in Bantu
Type A Type B Type C Type D Type E
Across Morphemes +
Morpheme-Internal + +
Root-initial + + +
Prefixes + + + +
ky/gy > c/j + + + + +
The authors provide evidence for a diachronic progression of Type E > D > C > B > A, arguing that palatalization was originally restricted but underwent gradual analogical extensions by morphological context. Particularly relevant are languages of Type C for which electronic dictionaries are now available, e.g. Shi D.53, Bemba M.42, Cewa N.31b, and Kalanga S.16 (the last two developed by graduate students J. Moxley and J. Mathangwane).
(4) Development of high vowels in Kalanga
*pO(i,[[cedilla]]) > swi *tO(i,[[cedilla]]) > tshi *kO(i,[[cedilla]]) > si
*bO(i,[[cedilla]]) > zwi *dO(i,[[cedilla]]) > dzi *gO(i,[[cedilla]]) > zi
*pO(u,[[cedilla]]) > fu *tO(u,[[cedilla]]) > thi *kO(u,[[cedilla]]) > fu
*bO(u,[[cedilla]]) > vu *dO(u,[[cedilla]]) > du *gO(u,[[cedilla]]) > vu
Zoll (1995) surveys the phenomenon and provides both cross-linguistic generalizations about these "mutations", as well as a formal feature-geometric account. Hyman (1996a) is more concerned with the problem that these changes are frequently conditioned differentially by morphological context. As verified via the CBOLD database, languages such as Ganda and Shi mutate *t, *d, *k and *g to [s, z] in all environments before *O(i,[[cedilla]]), while *p and *b mutate to [s, z] only intramorphemically, not across morpheme boundaries. Closely related "Rutara" languages such as Haya, Nyambo, Nkore and Kiga show a similar exceptional labial pattern, but in these languages the frication/non-frication of *k, *g may also be sensitive to morphological environment. Hyman hypothesizes that these changes first occur morpheme-internally and then extend out, hitting consonants differentially according to place of articulation.
In section [[section]]3.4 we outline additional on-going projects that will be followed up during the second funding period of CBOLD.
Hyman, Larry M. and Joyce Mathangwane. 1997. Tonal domains and depressor consonants in Ikalanga". L.M. Hyman and C. Kisseberth (eds), Theoretical aspects of Bantu tone. Stanford: C.S.L.I. (about to go to press).
Hyman, Larry M. and Jeri Moxley. 1996. The morpheme in phonological change: velar palatalization in Bantu. Diachronica 13.2 (in press).
Hyman, Larry M. Morphologie et frication diachronique en bantou. Mémoires de la Société de Linguistique de Paris (invited paper to be submitted, summer 1996).
Hyman, Larry M. and Armindo Ngunga. Preconsonantal nasality in Yao (to be submitted to Phonology, summer 1996).
Lowe, John B. 1995. Cross-linguistic lexicographic databases for etymological research, with examples from Sino-Tibetan and Bantu languages. Ph.D. Dissertation, U.C. Berkeley.
Mathangwane, Joyce. 1996. Phonetics and Phonology of Ikalanga: a diachronic and synchronic study. Ph.D. dissertation, University of California, Berkeley.
Ngunga, Armindo. 1996. The role of nasals in Ciyao Segmental Phonology. To appear in Proceedings of Berkeley Linguistic Society 22.
Zoll, Cheryl. 1995. Consonant mutation in Bantu. Linguistic Inquiry 26.536-545.
In the next phase of the project, CBOLD intends to:
(i) complete the database and improve its availability to researchers, including those in Africa with limited access to computing and telecommunications facilities
(ii) continue the etymological work required to revise and complete the reconstruction of *PB
(iii) continue to convert printed materials into machine-readable form for inclusion in the database and to gather and disseminate lexicographic data on Bantu languages
(iv) continue work being conducted currently by the PI and his students on the phonology and morphology of Bantu languages based on the CBOLD database
(i) We will complete the editing and conversion of the remaining data sources into FoxPro format.
(ii) Also, the entire TLS 1975 (Tanzania Language Survey) corpus, itself nearly a quarter of CBOLD holdings, already in a FoxPro format, will be loaded into the database.
(iii) Certain data sources appear to be rather intractable computationally or linguistically; processing them has been deferred until the major components and data sources are in good shape. It is likely that the conversion of these data will not be completed until sometime during the first year of the requested renewal period.
(iv) Certain features of the query system which have been implemented in a rudimentary way and tested on small scale datasets may not scale up when applied to the entire 300,000 word corpus. Notably, the query subsystem for searching on phonological features exists only in this "alpha" state; this subsystem has both a data component (a feature specification for the transcription used for each data source and language must be provided) and a processing component (converting feature specifications into segments and searching those segments in the database). A certain amount of conceptual development followed by design and implementation remains before this subsystem (which will certainly be among the most useful for phonological research) can be fully intergrated into the database.
(v) We will "wring-out" the query functions already developed and have our beta-testers evaluate the user interface before burning CD-ROMs and releasing the database to the research community.
(vi) At the time of the first proposal, the World Wide Web was virtually unknown. Now, however, it is evident that the WWW will be the dissemination and research environment of choice for many types of projects. CBOLD, along with several other Bantu researchers, have prepared niches on the web (http://bantu.berkeley.edu/CBOLD.html); we have already put a number of resources on the site which have been used by other researchers, not only Bantuists. The CBOLD database will be mounted on the WWW; we are currently negotiating with the managers of the SunSite server cluster at Berkeley and some of the principals of the Digital Libraries Project for access to software and server space to use for the CBOLD database. Of course, for researchers without internet access, the FoxPro version will continue to be available or a CD-ROM version of the WWW database will be made available.
(5) Inventory of CBOLD data sources (by Guthrie Language number; count s are approximate)
A.11 Kwanyama Turvey 1977/7200
A.11 Londo Kuperus 1985/1800
A.25 Ngoli Burssens 1994/787
A.40 Basa Dautrey 1994/1986
A.44 Tunen Dugast 1967/4190
A.70 Lenje Évariste1995/1023
A.75 *Fang Lyon.Fang/451
A.75 Fang Medjo 1994/447
A.85b Bekwil DDL/1000
B.10 Myene Mouguiama 1994/2625
B.11c Galwa DDL/1000
B.20 Metombolo DDL/1000
B.20 Pouvi DDL/1000
B.22a Kele DDL/1000
B.25 Kota Piron 1990/1000
B.25 Mahongwe DDL/1000
B.25 Shake DDL/1000
B.25 Tumbidi DDL/1000
B.30 Gevove Van d. Veen 1994/1467
B.40 Isangu Idiata 199x/2000
B.41 Shira Mouguiama 1994/848
B.42 Sangu DDL/1000
B.43 Punu Blanchon 1994/4228
B.50 Wanzi Mouele 1994/3020
B.74 buma Burssens 1994/616
B.75 Teke DDL/1000
B.81 Tiene Ellington 1977/580
B.85 Yansi Burssens 1994/822
B.86 Dinga Burssens 1994/830
C.30 Bamwe Samarin 19xx/126
C.32 Bobangi Whitehead 1899/9500
C.34 Sakata Burssens 1994/825
C.36 Lingala Dzokange 1979/7200
D.25 Lega Botne 1994/1507
D.28 Holoholo Coupez 1955/728
D.42 Nande Kavutirwaki 1978/2224
D.53 Shi Polak-Bynon 1978/2391
D.61 Rwanda TLS Wstighlnds/1052
D.62 Rundi TLS Wstighlnds/1052
D.65 Hangaza TLS Wstighlnds/1052
D.66 Ha TLS Wstighlnds/1052
D.67 Vinza TLS Wstighlnds/1052
E.11 Nyoro TLS Rutara/1052
E.12 Tooro TLS Rutara/1052
E.13 Kiga Taylor 1959/12700
E.13 Nkore TLS Rutara/1052
E.14 Rukiga TLS Rutara/1052
E.15 Ganda Snoxall 1967/11243
E.21 Nyambo Rugemalira 1993/1230
E.21 Nyambo TLS Rutara/1052
E.22 Haya TLS Rutara/1052
E.23 Zinza TLS Rutara/1052
E.24 Kerebe TLS Rutara/1052
E.24 Kerewe Odden 1995/1554
E.25 Jita TLS Suguti/1052
E.25 Mkwaya TLS Suguti/1052
E.31b Kizu TLS ENyaza/1052
E.42 Gusii TLS ENyaza/1052
E.43 Kuria_Mago TLS ENyaza/1052
E.43 Kuria_Tari TLS ENyaza/1052
E.44 Zanaki TLS ENyaza/1052
E.45 Nata TLS ENyaza/1052
E.47 Ngoreme TLS ENyaza/1052
E.53 Meru TLS 1975/1079
E.62a Machame TLS 1975/1079
E.62a Mochi.unn TLS 1975/530
E.62b Vunjo TLS 1975/1079
E.64 Keni TLS 1975/530
E.65 Gweno TLS 1975/1079
E.74a Dawida TLS 1975/1079
F.12 Bende TLS 1975/1079
F.21 Sukuma Mann 1966/3253
F.21 Sukuma TLS 1975/1053
F.21a Shashi_siz TLS ENyaza/1052
F.22 Nyamwezi M&S 1992/1905
F.22 Nyamwezi TLS 1975/1053
F.24 Kimbu TLS 1975/1053
F.33 Langi TLS Langi/1052
G.23 Sambaa TLS 1975/1079
G.42 Swahili Rugelemira 1993/1399
G.42 Swahili TLS 1975/1079
G.51 Pogoro TLS 1975/1079
G.52 Ndamba TLS 1975/1079
G.61 Lori Burssens 1994/809
H.10 Yombe Mabiala 1994/2122
H.16 Kongo Swart 1973/25659
H.16f Laadi Jacquot 1982/9300
H.31 Yaka Ruttenberg 1971/3791
K.15 Mbunda Burssens 1994/814
K.21 Lozi Jalla 1937/11200
L.11 Pende Gusimana 1972/8700
L.13 Pindi Burssens 1994/616
M.11 Pimbwe TLS 1975/1079
M.13 Fipa TLS 1975/1079
M.14 Rungu TLS 1975/1079
M.15 Mambwe Halemba 1996/5000
M.15 Mambwe TLS 1975/1079
M.21 Ndali TLS 1975/1079
M.21 Wanda TLS 1975/1079
M.22 Namwanga TLS 1975/1079
M.23 Nyiha TLS 1975/1079
M.24 Malila TLS 1975/1079
M.25 Safwa TLS 1975/1079
M.28 Lambya TLS 1975/1079
M.31 Nyakyusa TLS 1975/1079
M.42 Bemba Mann 1995/7200
M.64 Tonga Turner 1952/1900
N.11 Manda TLS 1975/1079
N.12 Ngoni TLS 1975/1079
N.13 Matengo TLS 1975/1079
N.14 Mpoto TLS 1975/1079
N.31B Cewa S&H1957/5295
P.11 Ndengeleko Ndegeleko/1052
P.12 Rufiji TLS BantuLg/1052
P.13 Matumbi TLS BantuLg/1052
P.14 Ngindo TLS BantuLg/1052
P.15 Mbunga TLS 1975/1079
P.21 Yao BantuLg/1052
P.21 Yao Sanderson 1954/7433
P.22 Mwera TLS 1975/1079
P.23 Makonde TLS BantuLg/1052
P.25 Mabia TLS BantuLg/1052
S.10 Shona Taylor 1967/38000
S.16 Kalanga Mathangw.1994/3765
S.31 Tswana Creissels 1995/6500
(NB: sets of historical reconstructions and a few small data sets are not
listed here)
The linking of the reconstructions will be accomplished in two ways: (i) manual "tagging" of modern forms, and (ii) computer-aided detection and verification of correspondences and reconstructions. These methods are outlined below.
(6) Prototype etymological tagger (using the FoxPro database management program)
As shown by the numerous variant reconstructions for bínà
dance' (numbered 537 in (6) above), the reconciliation of the existing
Bantu reconstructions with each other is a required first step to effective
tagging. Therefore a new numbering scheme for Bantu reconstructions will be
devised and a consensus of Bantu researchers sought. This scheme will bring
together variant reconstructions (or "allofams", to use Matisoff's term)
under a single numeric rubric, thus avoiding the complications of using a
numbering scheme like that of Guthrie in which reconstructed forms were
inserted into the sequence after it was completed (resulting in numbers such as
1988 1/2 and 236c) and in which many obviously related forms were
not so marked, e.g.
(7) Variant reconstructions (after Guthrie 1967/71, vol. 2, p.130)
ProtoLg Reconstruction Gloss Source ID #
CB kímb wander about G1967.CB 1060
CB kímbid hurry G1967.CB 1062
(8) Near synonyms for `body'/`corpse' in 8 Bantu languages
omu tuúmbi N sg 3-4 corpse Kerewe 64900 óomu biri 3 corps Shi 81588 omu biri body Nyambo 88435 omu tûmbi corpse Nyambo 88572 omu biri n body; substance; Kiga 98830 fortune... omu tûmbi n corpse Kiga 99836 en túmbi n corpse(s) of cow(s) Kiga 100957 -biri n. 3/4 corps humain Nande 102620 m tembo n. 3 a corpse... Chewa 112115 n- tembo n. 3 corpse. Yao 114062 n- tuvi n. 3 corpse. Yao 116829 ci- vidividi n. 7 trunk of the body; torso. Yao 120620 n tumbú n 3 a corpse Kalanga 124486Once the phonological parsing (carried out by the finite state transducer described in [[section]]2.2.2 above) has provided a syllabic analysis, it is possible to compare constituents across syllable slots and propose these as correspondences:
(9) Above data (8) sorted after phonological parsing
C1 V11 V12 C2 V2 T11 T12 T2 Language b i r i Kiga b i r i Nande b i r i Nyambo b i r i Shi v i d i Yao t e mb o Chewa t e mb o Yao t u mb i H Kiga t u mb i HL Kiga t u mb i HL Nyambo t u mb u H Kalanga t u u mb i H Kerewe t u v i Yao(10) A few correspondences extracted from analyzed syllables with *PB reconstruction
Slot *PB Kerewe Shi Nyambo Kiga Nande Chewa Yao Kalanga C1 *t t t t t t t t C1 *b b b b v C2 *mb mb mb mb mb mb mb C2 *d r r r d V1 *u uu u u e e u V1 i i i i o o u V2 *e i i i iUsing the modern forms and these correspondences (and others not shown), it is now a simple matter to generate actual protoforms for the reconstructed sets. Two reconstructions may be generated (or "confirmed") by the Reconstruction Engine on the basis of these correspondences and the cognate forms. One (-bèdè 3/4 body #112) may be found in Guthrie 1967. The other (-túmbi) is not in the list of reconciled reconstructions.
There are a number of details of implementation and heuristics not described here. The method as outlined here has a number of limitations which should be clear to anyone who has attempted to reconstruct a proto-lexicon. The technique has been applied succesfully to Tibeto-Burman, and early trials with Bantu languages indicate that it is suitable for application in this family as well.
Related to this important aspect of the project, CBOLD, at the urging of collaborating researchers, will convert the entire list of Guthrie's supporting forms (vols. 3 and 4) used to set up his comparative series (and hence the sound correspondences between *PB and the individual languages). This would make these data, the results of a twenty years of research, easily accessible to researchers.
(11) a. Front height harmony (FHH) : *i > e / { e, o } C __
b. Back height harmony (BHH) : *u > o / o C __
As a first step towards explaining the widespread asymmetry, almost unknown outside Bantu, the PI has created a preliminary vowel harmony database which now includes 130 languages and which he plans to extend to at least twice that number. Already a number of properties clearly cluster in languages which have what can be referred to as "canonical Bantu vowel harmony":
(12) a. VHH has the above asymmetry: i.e. independence of FHH and BHH
b. VHH is not conditioned by /a/ (i.e. it patterns with high vowels)
c. VHH does not apply to /a/
d. VHH does not apply to final vowel (FV) morpheme
e. VHH does not apply to prefix vowels
The database reveals exceptions to each of the above properties, however, particularly in the NW:
(13) a. Some languages have no VHH, e.g. Punu B.43, Lengola D.12, Enya D.14, N. Binja D.26, Chaga E.62, Suku H.32, Mbala H.41, Ruund L.53.
b. Prefixes harmonize in Londo A.11, Bakweri A.22, Nen A.44, Gunu A.62, Bobangi C.32, Mongo C.61, Tetela C.71, Kela C.72, Ombo C.76, Budu D.35, Logooli E.41, Gusii E.42.
c. Final vowel harmonizes in Bobe, Bia, Pinzi (etc.) B.30, Boma B.82, B.74b, Leke C.14, and in the perfective only, Kongo H.10, Yaka
d. Asymmetry is not found in zones A-B-C and Mituku D.13, Gusii E.42, Kuria E.43, Beembe H.11, Vili H.16d, Laadi H.16f, Mbundu H.21a + perfective in other Kongo H.10 and Yaka H.31,
e. /a/ conditions VHH in Boma B.74b (B.82), Mbundu H.21a, Mbunda K.15, Kwangali K.33, Kwezo K.35, Dciriku K.62, Pende L.11 (K.52), Mbundu R.11, Kwanyama R.21, Ndongo R.22, and Herero R.31.
f. /a/ undergoes VHH in Londo A.11, Bakweri A.22, Nen A.44, Gunu A.62, Kota B.25, Nzebi B.52, Tiene B.81, Boma B.74b (B.82), Leke C.14, Koyo C.24, Mboshi C.25, Doko C.31, Lingala C.36d, Ngombe C.41, Leku C.60, Bembe H.11, Lwalwa L.00 (cf. also Leitch 1996).
While the properties can vary as indicated above, interestingly, no language was found which has prefixal (i.e. right-to-left) VHH cooccurring with the front-back asymmetry. There are numerous questions for which (convincing) answers need to be sought: (i) Was there vowel harmony in *PB? (ii) Where was asymmetric VHH innovated and why do so many languages retain this typological anomaly? (iii) What is the relation between internal VHH, e.g. in verbs, and final VHH, e.g. in disyllabic nouns? Nande D/J.42, for instance, has the asymmetry cited above, such that we obtain -CeCeC-, -CoCeC-, -CoCoC- vs. -CeCuk-, but shows symmetric VHH of the second vowel of CVCV noun stems:
(14) V + FV in Nande CVCV Noun Stems
V/FV O(i,[[ i e O(u,[[ u o a cedill cedill a]]) a]]) O(i,[[ 31 --- 8 4 --- 25 35 cedill a]]) i --- 25 --- --- 5 29 28 e 14 --- 70 4 --- 37 28 O(u,[[ 29 --- 4 7 --- 18 32 cedill a]]) u --- 15 10 --- 43 16 42 o 18 --- 16 5 --- 46 28 a 21 21 12 17 10 38 113Thus, there are 37 disyllabic noun stems of the shape CeCo and none with the shape *CeCu. (Another asymmetry not found in -CVCVC- forms is the non-occurrence of noun stems of the shape *CiCe vs. the presence of CuCe.) One hypothesis is that internal and final harmonies are independent either in origin or, at least, in their being subjected differentially to the three pressures of vowel assimilation, vowel reduction, and vowel peripheralization (to /i, u, a/ in various positions). Hyman (1996b) represents a progress report on only part of a much more extensive study that is projected for the second funding period during which the database will be significantly expanded and improved.
(15) a. lók-a 'vomit' lósek-E 'cause to vomit' PB *-es- 'causative'
b. yók-a 'hear' yólek-E 'listen to' PB *-ed- 'applicative'
c. kab-a 'divide' kalab-a 'be divided' PB *-ad-? 'extensive'
d. sook-E 'put in' solek-E 'take out' PB *-ud- 'reversive'
Students of Bantu will immediately recognize the semantic and phonetic relatedness of the apparently infixed material to the reconstructed *PB derivational suffixes in the right column. Had the present-day Tiene affixal material been simply suffixed to the verb roots, as in almost every other Bantu language, we would have expected the derived forms to be lók-es-, yók-el-, kab- el- , and sok-el-, respectively. In fact, as seen in the following derivations, Ellington proposes exactly these underlying forms and derives the surface realizations by a synchronic consonant metathesis rule:
(16) /lók-es-/ /yók-el-/ /kab-el-/ /sook-el-/ Underlying Reps.
sok-el- Vowel shortening
lós-ek- yól-ek- kal-eb- sol-ek- Metathesis
While other more modern analyses are possible, what is important for us is to understand what is motivating this phenomenon. The PI, using a lexical database of Tiene based on Ellington, discovered (i) that verb stems are maximally trisyllabic, i.e. C1VC2VC3V, and (ii) that in such trisyllabic stems, C2 must be coronal and C3 must be non-coronal. Since the external suffixation of -es- or -el- to a C1VC2- root whose C2 is a non-coronal would violate this pattern, the C2 and the [l] or [s] are metathesized. While similar, but less severe constraints, are found in the nearby Teke languages, this is the only case known where maintaining the distribution of stem consonants by place of articulation forces a metathesis. (Other processes are at work when the normal derivation would produce coronal C2+C3 or non-coronal C2+C3.) This analysis explains the diachronic metathesis of reconstructed stem forms as well, e.g. PB *-kúkut- > kótoka `gnaw', *-túbud- > tólebE `pierce', *dO(í,[[cedilla]])mid- > dínema `get lost' (the last case involving also the nasalization of *d to [n] as if it came after the following [m]). As is well known, many of the NW Bantu languages also have severe constraints on suffixation, e.g. losing certain derivational suffixes, restricting their cooccurrence etc. The PI hypothesizes that prosodic constraints may also be at work in languages such as Koyo C.24, where one cannot combine a causative and a reciprocal suffix to a CVC- root, e.g. kór-a `tie', kór-is-a `cause to tie', kór-in-a `tie each other', but *kór-is-in-a, *kór-in-is-a `cause to tie e.o./cause e.o. to tie'. However, double suffixation is possible with a CV- root, e.g. -tá-a `see', -tá-s-a `show' (i.e. `make see'), -tá-n-a `see each other', tá-s-an-a `show each other'. If we extend the Tiene facts to say that Koyo derived stems are maximally trisyllabic, then causative + reciprocal cannot occur on CVC- roots because they would produce too many syllables (four) with the final vowel -a. (Of course some causativized CV roots may be "lexicalized", e.g. dz-és-a `cause to eat' > `feed', but this too is at least in part due to the fact that the forms are short.) With the expansion and refinement of the CBOLD database over this second period, the PI and other researchers will be able to study prosodic constraints on stems in NW Bantu that are hypothesized to play an important role in the gradual dissolution of the extensive Bantu suffix system as one goes from East to West.