Liebe Kolleginnen und Kollegen,
unter meiner Signatur finden Sie das Programm des 9. deutschen Stata Users Group Meetings in Bamberg. Ausdrücklich hinweisen möchte ich auf den Workshop "Survey Analysis with Stata", der am Tag _vor_ dem Meeting stattfindet. Der Workshop behandelt Verfahren zur Analyse von Daten aus komplexen Stichproben und wird geleitet von Jeffrey Pitblado, der bei Stata für Implementation dieser Verfahren zuständig ist.
Mit freundlichen Grüßen
Ulrich Kohler
2011 German Stata Users Group Meeting --------------------------------------
The ninth German Stata Users Group Meeting will be held at the Otto-Friedrich-University Bamberg on Friday, July 1 2011. Everybody from anywhere who is interested in using Stata is invited to attend this meeting. The meeting will include presentations about causal models, general statistics, and data management, both by researchers and by StataCorp staff. The meeting will also include a "wishes and grumbles" session, during which you may air your thoughts to Stata developers (see program below and on http://www.stata.com/meeting/germany11/).
On the day before the conference, Jeffrey Pitplado, Associate Director of StataCorp will present a workshop on "Survey data analysis with Stata". The workshop will begin by reviewing the sampling methods used to collect survey data and how they affect the estimation of totals, ratios, and regression coefficients. It will then cover the three variance estimators implemented in Stata's survey estimation commands, as well several more specific topics (see description below and http://www.stata.com/meeting/germany11/workshop.html)
There is (at additional cost) the option of an informal meal at a restaurant in Bamberg in Friday evening. Details about accommodations and fees are given below the program and on http://www.stata.com/meeting/germany11/#registration)
Program of the User Meeting (July 1st 2011) -------------------------------------------
Date: July 1, 2011 Time: 8:45 AM to 6:00 PM Venue: Otto-Friedrich-University Aula (Dominikanerbau) Dominikanerstr. 2a 96052 Bamberg Fees: Meeting only: 40 Euro (students 20 Euro) Workshop and meeting: 100 Euro Optional Dinner: TBA
(Abstracts are given at the end of this email)
8:45–9:15 Registration
9:15–10:15 Structural equation modeling using gllamm, confa, and gmm Stas Kolenikov University of Missouri–Columbia
10:15–11:15 Evaluating one-way and two-way cluster-robust covariance matrix estimates Mark E. Schaffer Heriot-Watt University–Edinburgh
11:15–11:30 Coffee
11:30–12:00 Implementation of a multinomial logit model with fixed effects Klaus Pforr Mannheim Center for European Social Research (MZES)
12:00–12:30 Plagiarism in student papers and cheating on exams: Results from surveys using special techniques for sensitive questions Ban Jann University of Bern
12:30–1:30 Lunch
1:30–2:00 orderalpha: Nonparametric order-alpha efficiency analysis for Stata Harald Tauchmann RWI
2:00–2:45 Investigating the effects of factor variables Jeff Pitblado StataCorp
2:45–3:00 Coffee
3:00–3:30 Correlation metric Kristian B. Karlson Danish National Center for Social Research and the Center for Research in Compulsory Schooling
3:30–4:00 Comparing coeficients between nested nonlinear probability models Ulrich Kohler WZB
4:00–4:30 SOEPlong: How to restructure complex longitudinal survey data (An application for the German Socio-Economic Panel study) Arno Simons Innovation in Governance Research Group, Technische Universität Berlin Katja Möhring Research Training Group SOCLIFE, University of Cologne Peter Krause German Socio-Economic Panel Study (SOEP), DIW
4:30–4:45 Coffee
4:45–5:15 Report to users Bill Rising StataCorp
5:15–6:00 Wishes and grumbles StataCorp
Program of the Workshop (June 30th 2011) ----------------------------------------
Date: June 30, 2011 Time: 9:00 AM to 5:00 PM Venue: Otto-Friedrich-University Rechenzentrum (computer center) Room RZ 0.07 Feldkirchenstr. 21 96052 Bamberg Presenter: Jeff Pitblado Associate Director, Statistical Software, at StataCorp. Fees: €90 (workshop and meeting, €100) Register: anke.mrosek@dpc.de
The workshop covers how to use Stata for survey data analysis, assuming a fixed population. Knowledge of Stata is not required, but attendees are assumed to have some statistical knowledge, such as that typically gained in an introductory statistics course. We will begin by reviewing the sampling methods used to collect survey data and how they affect the estimation of totals, ratios, and regression coefficients. We will then cover the three variance estimators implemented in Stata’s survey estimation commands. Strata with a single sampling unit, certainty sampling units, subpopulation estimation, and poststratification also will be covered in some detail. Each of the following topics will be illustrated with an example in a Stata session:
1. Sampling design characteristics * cluster sampling * stratified sampling * sampling without replacement 2. Variance estimation * linearization * balanced repeated replication (BRR) * jackknife 3. Special types of sampling units * Strata with a single sampling unit * Certainty units 4. Restricted sample and subpopulation estimation 5. Poststratification
Registration -------------
Participants are asked to travel at their own expense. There will be a small registration fee (40 Euro, Students 20 Euro) to cover the costs for coffee, tea, and luncheon. There will also be an optional informal meal at a restaurant in Bamberg on Friday evening for an additional cost.
You can enroll by emailing Anke Mrosek (anke.mrosek@dpc.de) or by writing, phoning, or faxing to
Anke Mrosek Dittrich & Partner Consulting GmbH Prinzenstrasse 2 42697 Solingen GERMANY Telephone: +49 (0) 212-260660 Fax: +49 (0) 212 260 6666
Organizers -----------
(a) Scientific organizers
Johannes Giesecke University of Bamberg (johannes.giesecke@uni-bamberg.de)
Ulrich Kohler WZB Social Science Research Center, Berlin (kohler@wzb.eu)
(b) Logistics organizer
Dittrich & Partner (www.dpc.de), the distributor of Stata in several countries, including Germany, The Netherlands, Austria, Czech Republic, and Hungary
Abstracts ----------
(1) Structural equation modeling using gllamm, confa and gmm Stas Kolenikov
In this talk, I introduce the main ideas of structural equation models (SEMs) with latent variables and Stata tools that can be used for such models. The two approaches most often used in applied work are numeric integration of the latent variable s and covariance structure modeling. The first approach is implemented in Stata via gllamm, which was developed by Sophia Rabe-Hesketh). The second approach is currently implemented in confa for confirmatory factor analysis models. Also, introduction of the generalized method of oments (GMM) estimation and testing framework in Stata 11 made it possible to estimate SEMs by using moderately complex parameter and matrix manipulations. I provide working examples with some popular datasets (Holzinger–Swineford factor analysis model and Bollen’s industrialization and political democracy model).
(2) Evaluating one-way and two-way cluster-robust covariance matrix estimates Mark E. Schaffer
Although cluster-robust standard errors are now recognized as essential in a panel data context, official Stata only supports clusters that are nested within panels. This rules out the possibility of defining clusters in the time dimension, and modeling contemporaneous dependence of panel units' error processes. We build upon recent analytical developments that define 2-way (and conceptually n-way) clustering, and the implementation in 2010 of 2-way clustering in the widely used ivreg2 and xtivreg2 packages. We present examples of the utility of 1-way and 2-way clustering using Monte Carlo techniques, a comparison with alternative approaches to modeling error dependence, and consider tests for clustering of errors.
(3) Implementation of a multinomial logit model with fixed effects Klaus Pforr
Fixed effect models have become increasingly popular in the field of sociology. The possibility to control for unobserved heterogeneity makes these models a prime tool for causal analysis.
As of today, fixed effects models have been derived and implemented for many statistical software packages for continuous, dichotomous and count-data dependent variables, but there are still many important and popular statistical models, for which only population-average estimators are available such as models for multinomial categorical dependent variables. In a seminal paper by Chamberlain (1980) such a model has been derrived. Possible applications would be analyses of effects on employent status with special consideration of part-time or irregular employment, and analyses of the effects on voting behavior, that impicitly control for long-time party identification rather than having to measure it directly. This model has not been implemented in any statistical software package, yet.
In this presentation I show a first version of an ado, that closes this gap. The implementation draws on the native Stata multinomial logit and conditional logit model implementations. The actual ml evaluator utilizes mata functions to implement the conditional likelihood function. Toshow the numerical stability and computational speed of the implementation, comparison results with the built-in clogit are shown and some basic results with simulated data.}
Plagiarism in student papers and cheating in exams: Results from surveys using special techniques for sensitive questions Ban Jann
Eliciting truthful answers to sensitive questions is an age-old problem in survey research. Respondents tend to underreport socially undesired or illegal behaviors, while overreporting socially desirable ones. To combat such response bias, various techniques have been developed that are geared toward providing the respondent greater anonymity and minimize the respondent’s feelings of jeopardy. Examples of such techniques are the Randomized Response Technique, the Item Count Technique, and the Crosswise Model. I will present results from several surveys, conducted among university students, that employ such techniques to measure the prevalence of plagiarism and cheating in exams. User-written Stata programs for analyzing data from such techniques are also presented.}
(4) orderalpha: non-parametric order-alpha Efficiency Analysis for Stata Harald Tauchmann
Despite its frequent use in applied work, non-parametric approaches to efficiency analysis, namely Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH), have bad reputation among econometricians. This is mainly due to DEA and FDH representing deterministic approaches that are highly sensitive to outliers and measurement errors. However recently, so called partial frontier approaches -- namely order-m (Calzas et al., 2002) and order-\alpha (Aragon et al., 2005) -- have been developed, which generalize FDH by allowing for super-efficient observations being located beyond the estimated production-possibility frontier. Though, also purely non-parametric, sensitivity to outliers is substantially reduced by this methods enveloping just a sub-sample of observations. We present the new stata command orderalpha that implements order-\alpha efficiency analysis in stata. The command allows for sever options such as statistical inference based on sub-sampling bootstrap. In addition we present the accompanying stata command oaoutlier, which is an explorative tool that employs orderalpha for detecting potential outliers in data, meant for subsequent efficiency analysis using DEA.
(5) Investigating the effects of factor variables Jeff Pitblado, StataCorp
Stata has a rich set of operators for specifying factor variables in linear and nonlinear regression models. I will show how to test for the effects factor variables in these models. I will also show how to compare and contrast these effects using linear combinations of the model coefficients.
(6) Correlation metric Kristian B. Karlson
The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from these models have proven contentious. These problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples: Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in Stata command -nlcorr-.}
(7) Comparing coeficients between nested non-linear probability models Ulrich Kohler
In a series of recent articles, Karlson, Holm and Breen have developed a method for comparing estimated coefficientns of nested nonlinear probability models. The KHB-method is a general decomposition method that is unaffected by the rescaling or attenuation bias that arise in cross-model comparisons in nonlinear models. It recovers the degree to which a control variable, Z, mediates or explains the relationship between X and a latent outcome variable, $Y∗$ , underlying the nonlinear probability model. It also decomposes effects of both discrete and continuous variables, applies to average partial effects, and provides analytically derived statistical tests. The method can be extended to other models in the GLM-family. This presentation describes this method and the user-written program -khb- that implements the method.
(8) SOEPlong -- How to restructure complex longitudinal survey data. An Application for the German Socio-Economic Panel Study Arno Simons, Katja Möhring, Peter Krause
Currently we observe in the social and behavioral sciences an increasing demand on complex longitudinal household survey data for national and cross-national analyses. The state of the art (for national as well as international comparative data collections) provides two types of solutions: either the full presentation of all original wave-specific variables over time, or the creation of fixed variables according to common time-consistent standards. The first type of solution leaves it to the researcher to choose how to encapsulate differing categories over time, and thus, is rather time-consuming. The second type of solution is very easy to use; however, it doesn’t provide the user with information on perhaps necessary annual extensions or modifications for specific years. In both cases the researcher has no further information on potential changes of variables over time. This paper addresses the topic of how complex representative longitudinal data can be disseminated for analyses in the social (and behavioral) sciences such that the amount of time for data preparation is reduced to a minimum while information on consistency and changes of variables over time remains fully available. It turns out that, if we want to monitor changes in living conditions by permanent regular observations using panel surveys, adaptations in variables seem to be the rule rather than the exception. Therefore, our solution for the restructuring of longitudinal data fulfils the requirements of permanently ongoing adaptations in variables as a reflection of adapted measures according to new social conditions, new theoretical backgrounds, or improved conceptual measures when monitoring changes in living conditions directly over time. Using Stata, we provide a conceptual and technical solution on how to restructure the full set of SOEP variables with a complete documentation of all adaptations over time. Our Stata programs generate two output files: one covering the restructured data and another one for the full documentation on the consistency of the variables over all waves. SOEPlong has been released in 2010 for the first time as a beta version together with the usual data dissemination on DVD for the full set of SOEP variables for 26 waves of data. While the paper is specifically addressed to the German Socio-Economic Panel (SOEP) study, our general approach on how to deal with complex household panel data might well be applied to other national and cross-national longitudinal household surveys.
methsekt-intern@mailman.uni-konstanz.de