Our information on training is available either as a dummy variable (for whether the individual experienced any training event during the reference period) or as a continuous variable (the average number of hours of training per month) with left censoring at zero hours. Assuming that the individual effect <<mu>> and the errors <<epsilon>>F and <<epsilon>>O are normally distributed, we use either a probit or a tobit model to study training incidence and intensity. In the former case we explicitly take into account the possibility of contemporaneous correlation between errors by using a bivariate probit.
A feature of equations (1)-(3) is that they include unmeasured individual talent <<mu>>, which is correlated with educational attainment if the more talented are also more likely to be better educated, a plausible assumption. If education affects the returns to training, but unmeasured talent does not, then a fixed-effects estimator will remove the time-invariant individual effect from (3) and produce unbiased estimates. However, if talent affects the private returns to training, fixed-effects estimation will produce a biased estimate of the impact of education on these returns. To avoid this bias, we have to instrument education in the fixed-effects estimate of the earnings equation. [FN4]
The training equations (1) and (2) can be treated as limited information simultaneous limited dependent variable models, as in Smith and Blundell (1986), and the correlation between education and unmeasured ability can generate a simultaneous equation bias. We deal with the potential endogeneity of education as follows. First, we assume that unobserved ability is partly the consequence of the genetic and environmental contributions of the family (see Willis 1986; Plug and Vijverberg 2003) and include in the training equations the father's, mother's, and oldest sibling's education [FN5] and the number of siblings. The underlying idea is that cognitive development in relatively poor economies is affected both by parental education and by nutritional status--see, for instance, Behrman et al. (2003) and Martorell (1997)--and that the latter is determined in part by the resources devoted to each child, which are related in turn to the number of siblings. We also add province of birth dummies, because the local environment matters in the development of individual talent. [FN6] Even after the analysis conditions on family background and the province of birth, however, residual ability could be correlated with educational attainment. Therefore, we need instrumental variables. Let educational attainment E be given by
(4) E = <<PI>>Z + <<epsilon>>E,
where Z is a vector of exogenous variables, which includes individual characteristics such as gender and a third order polynomial in age, family background variables, and province of birth dummies, plus at least one variable omitted from equations (1)-(3).
Our instruments for educational attainment--which we omit from the training and earnings equations--are birth order (a dummy taking the value 1 if the individual is the oldest son or daughter and 0 otherwise), the mother's age at the time of the individual's birth, and the interaction of those two variables. These variables capture household preferences in the decision to provide education to the offspring, and are not related in any obvious way to unobserved ability, once we have conditioned for parental education, the number of siblings, and province of birth. For instance, the older son/daughter may have priority in the allocation of the resources devoted by the household to education. Moreover, very young mothers may value education of the offspring less than do more mature mothers. We focus on the mother rather than the father because the age of the former at the time of the interviewed individual's birth is less likely to be correlated with available household resources--and nutrition--than the age of the latter, due to the lower labor force participation of women, the less accentuated life-cycle pattern of female earnings, or both.
We fit years of education on the variables included in the vector Z and use the Bound F-test to verify whether the selected instruments are jointly statistically significant in the first stage regression. Following Smith and Blundell (1986), we also compute residuals and add them to the explanatory variables in (1) and (2), in order to test whether education can be treated as weakly exogenous with respect to training.
Data
The employee survey on which our empirical investigation is based covers firms belonging to four manufacturing sectors: food processing, auto parts, hard disk drives, and computer components. The latter two industries are high-tech and dominated by subsidiaries of foreign manufacturers. Thailand is one of the largest production locations for hard disk drives and related components, and this industry is one of the country's major exporters (see Doner and Brimble 1998). The first two industries use more labor-intensive production technologies and include a substantial share of domestic firms. Despite being high-tech, HDD and computer firms are also fairly labor-intensive, as production gets outsourced in Thailand from abroad to take advantage of the favorable price of labor.
Although we do not pretend that this selection of industries results in a statistically representative sample, we believe it provides reasonable coverage of Thai industry. Due to research budget constraints, we restricted our attention to firms with plants located in the Greater Bangkok area and with more than 100 employees. Firms in the four industries were approached and asked to participate in the survey. Overall, twenty firms agreed to participate--five in food processing, five in auto parts, six in personal computers, and four in the HDD industry. Each of the firms in the sample had more than 100 employees (in the HDD industry, more than 1,000). After restricting our sample to production workers, technicians, and engineers, we stratified employment in each firm by age and education and randomly sampled employees within each cell, using larger weights for smaller firms.
Each selected employee was interviewed in the summer of 2001 by trained personnel hired by the Thailand Development Research Institute (TDRI), which cooperated in the project. Because the questionnaire was rather lengthy (121 questions), individual interviews lasted, on average, 40 minutes. The questionnaire asked for detailed information on family background, education, previous job experience, current job or position, training, and monthly labor income net of bonuses but including overtime.
The questions on wages and training were asked not only for the reference period of the survey (year 2001) but also for the years 1998-2000. The time framing of some of the retrospective questions was designed to generate predetermined variables. For example, monthly wages were asked with reference to January of each year, and questions on training incidence referred to the calendar year. Therefore, training in 1999 could be considered as predetermined with respect to wages in 2000, which are measured in January 2000. Our empirical results are based on the sample covering all available years. Since recall data are affected by different types of measurement error (see Beckett et al. 2001 for a review), we also check to see whether restricting attention to the subsample covering only the last year in the sample (2001) makes an appreciable difference. |