Stata Users Group Meeting - Methsekt-intern - mailman.uni-konstanz.de

17 May 2011


      Liebe Kolleginnen und Kollegen,
unter meiner Signatur finden Sie das Programm des 9. deutschen Stata
Users Group Meetings in Bamberg. Ausdrücklich hinweisen möchte ich auf
den Workshop "Survey Analysis with Stata", der am Tag _vor_ dem Meeting
stattfindet. Der Workshop behandelt Verfahren zur Analyse von Daten aus
komplexen Stichproben und wird geleitet von Jeffrey Pitblado, der bei
Stata für Implementation dieser Verfahren zuständig ist.
Mit freundlichen Grüßen
Ulrich Kohler
2011 German Stata Users Group Meeting
--------------------------------------
The ninth German Stata Users Group Meeting will be held at the
Otto-Friedrich-University Bamberg on Friday, July 1 2011. Everybody from
anywhere who is interested in using Stata is invited to attend this
meeting. The meeting will include presentations about causal models,
general statistics, and data management, both by researchers and by
StataCorp staff. The meeting will also include a "wishes and grumbles"
session, during which you may air your thoughts to Stata developers (see
program below and on http://www.stata.com/meeting/germany11/).
On the day before the conference, Jeffrey Pitplado, Associate Director
of StataCorp will present a workshop on "Survey data analysis with
Stata". The workshop will begin by reviewing the sampling methods used
to collect survey data and how they affect the estimation of totals,
ratios, and regression coefficients. It will then cover the
three variance estimators implemented in Stata's survey estimation
commands, as well several more specific topics (see description below
and http://www.stata.com/meeting/germany11/workshop.html)
There is (at additional cost) the option of an informal meal at a
restaurant in Bamberg in Friday evening. Details about accommodations
and fees are given below the program and on
http://www.stata.com/meeting/germany11/#registration)
Program of the User Meeting (July 1st 2011)
-------------------------------------------
Date:      July 1, 2011
Time:      8:45 AM to 6:00 PM 
Venue:     Otto-Friedrich-University
           Aula (Dominikanerbau) 
           Dominikanerstr. 2a 
           96052 Bamberg
Fees:      Meeting only: 40 Euro (students 20 Euro) 
           Workshop and meeting: 100 Euro
           Optional Dinner: TBA
(Abstracts are given at the end of this email)
8:45–9:15 
Registration
9:15–10:15 
Structural equation modeling using gllamm, confa, and gmm
Stas Kolenikov
University of Missouri–Columbia
10:15–11:15 
Evaluating one-way and two-way cluster-robust covariance matrix
estimates
Mark E. Schaffer
Heriot-Watt University–Edinburgh
11:15–11:30 
Coffee
11:30–12:00 
Implementation of a multinomial logit model with fixed effects
Klaus Pforr
Mannheim Center for European Social Research (MZES)
12:00–12:30 
Plagiarism in student papers and cheating on exams: Results from surveys
using special techniques for sensitive questions
Ban Jann
University of Bern
12:30–1:30 
Lunch
1:30–2:00 
orderalpha: Nonparametric order-alpha efficiency analysis for Stata
Harald Tauchmann
RWI
2:00–2:45 
Investigating the effects of factor variables
Jeff Pitblado
StataCorp
2:45–3:00 
Coffee
3:00–3:30 
Correlation metric
Kristian B. Karlson
Danish National Center for Social Research and the Center for Research
in Compulsory Schooling
3:30–4:00 
Comparing coeficients between nested nonlinear probability models
Ulrich Kohler
WZB
4:00–4:30 
SOEPlong: How to restructure complex longitudinal survey data (An
application for the German Socio-Economic Panel study)
Arno Simons
Innovation in Governance Research Group, Technische Universität Berlin
Katja Möhring
Research Training Group SOCLIFE, University of Cologne
Peter Krause
German Socio-Economic Panel Study (SOEP), DIW
4:30–4:45 
Coffee
4:45–5:15 
Report to users
Bill Rising
StataCorp
5:15–6:00 
Wishes and grumbles
StataCorp
Program of the Workshop (June 30th 2011)
----------------------------------------
Date:      June 30, 2011
Time:      9:00 AM to 5:00 PM 
Venue:     Otto-Friedrich-University 
           Rechenzentrum (computer center)
           Room RZ 0.07 
           Feldkirchenstr. 21
           96052 Bamberg
Presenter: Jeff Pitblado
           Associate Director, Statistical Software, at StataCorp. 
Fees:      €90 (workshop and meeting, €100) 
Register:  anke.mrosek@dpc.de
The workshop covers how to use Stata for survey data analysis, assuming
a fixed population. Knowledge of Stata is not required, but attendees
are assumed to have some statistical knowledge, such as that typically
gained in an introductory statistics course. We will begin by reviewing
the sampling methods used to collect survey data and how they affect the
estimation of totals, ratios, and regression coefficients. We will then
cover the three variance estimators implemented in Stata’s survey
estimation commands. Strata with a single sampling unit, certainty
sampling units, subpopulation estimation, and poststratification also
will be covered in some detail. Each of the following topics will be
illustrated with an example in a Stata session:
1. Sampling design characteristics 
              * cluster sampling
              * stratified sampling
              * sampling without replacement
     2. Variance estimation 
              * linearization
              * balanced repeated replication (BRR)
              * jackknife
     3. Special types of sampling units 
              * Strata with a single sampling unit
              * Certainty units
     4. Restricted sample and subpopulation estimation
     5. Poststratification
Registration
-------------
Participants are asked to travel at their own expense. There will be a
small registration fee (40 Euro, Students 20 Euro) to cover the costs
for coffee, tea, and luncheon. There will also be an optional informal
meal at a restaurant in Bamberg on Friday evening for an additional
cost.
You can enroll by emailing Anke Mrosek (anke.mrosek@dpc.de) or by
writing, phoning, or faxing to
Anke Mrosek 
Dittrich & Partner Consulting GmbH
Prinzenstrasse 2
42697 Solingen
GERMANY
Telephone: +49 (0) 212-260660 
Fax: +49 (0) 212 260 6666
Organizers
-----------
(a) Scientific organizers
Johannes Giesecke
University of Bamberg
(johannes.giesecke@uni-bamberg.de)
Ulrich Kohler
WZB Social Science Research Center, Berlin
(kohler@wzb.eu)
(b) Logistics organizer
Dittrich & Partner (www.dpc.de), the distributor of Stata in several
countries, including Germany, The Netherlands, Austria, Czech Republic,
and Hungary
Abstracts
----------
(1) Structural equation modeling using gllamm, confa and gmm
Stas Kolenikov
In this talk, I introduce the main ideas of structural equation models
(SEMs) with  latent variables and Stata tools that can be used for such
models. The two approaches most often used in applied work are numeric
integration of the latent variable s and covariance   structure
modeling.  The first approach is implemented in Stata via gllamm, which
was developed by Sophia Rabe-Hesketh). The second approach is currently
implemented in confa for confirmatory factor analysis models. Also,
introduction of the generalized method of oments (GMM) estimation and
testing framework in Stata 11 made it possible to estimate SEMs by using
moderately complex parameter and matrix manipulations. I provide working
examples with some popular datasets (Holzinger–Swineford factor analysis
model and Bollen’s industrialization and political democracy model).
(2) Evaluating one-way and two-way cluster-robust covariance matrix
estimates
Mark E. Schaffer
Although cluster-robust standard errors are now recognized as
essential in a panel data context, official Stata only supports
clusters that are nested within panels. This rules out the
possibility of defining clusters in the time dimension, and
modeling contemporaneous dependence of panel units' error
processes. We build upon recent analytical developments that
define 2-way (and conceptually n-way) clustering, and the
implementation in 2010 of 2-way clustering in the widely used
ivreg2 and xtivreg2 packages. We present examples of the utility
of 1-way and 2-way clustering using Monte Carlo techniques, a
comparison with alternative approaches to modeling error
dependence, and consider tests for clustering of errors.
(3) Implementation of a multinomial logit model with fixed
effects 
Klaus Pforr
Fixed effect models have become increasingly popular in the field
of sociology. The possibility to control for unobserved
heterogeneity makes these models a prime tool for causal
analysis.
As of today, fixed effects models have been derived and
implemented for many statistical software packages for
continuous, dichotomous and count-data dependent variables, but
there are still many important and popular statistical models,
for which only population-average estimators are available such
as models for multinomial categorical dependent variables. In a
seminal paper by Chamberlain (1980) such a model has been
derrived. Possible applications would be analyses of effects on
employent status with special consideration of part-time or
irregular employment, and analyses of the effects on voting
behavior, that impicitly control for long-time party
identification rather than having to measure it directly. This
model has not been implemented in any statistical software
package, yet.
In this presentation I show a first version of an ado, that
closes this gap. The implementation draws on the native Stata
multinomial logit and conditional logit model
implementations. The actual ml evaluator utilizes mata functions
to implement the conditional likelihood function. Toshow the
numerical stability and computational speed of the
implementation, comparison results with the built-in clogit are
shown and some basic results with simulated data.}
Plagiarism in student papers and cheating in exams: Results from
surveys using special techniques for sensitive questions
Ban Jann
Eliciting truthful answers to sensitive questions is an age-old
problem in survey research. Respondents tend to underreport
socially undesired or illegal behaviors, while overreporting
socially desirable ones. To combat such response bias, various
techniques have been developed that are geared toward providing
the respondent greater anonymity and minimize the respondent’s
feelings of jeopardy. Examples of such techniques are the
Randomized Response Technique, the Item Count Technique, and the
Crosswise Model. I will present results from several surveys,
conducted among university students, that employ such techniques
to measure the prevalence of plagiarism and cheating in
exams. User-written Stata programs for analyzing data from such
techniques are also presented.}
(4) orderalpha: non-parametric order-alpha Efficiency Analysis for
Stata
Harald Tauchmann
Despite its frequent use in applied work, non-parametric
approaches to efficiency analysis, namely Data Envelopment
Analysis (DEA) and Free Disposal Hull (FDH), have bad reputation
among econometricians. This is mainly due to DEA and FDH
representing deterministic approaches that are highly sensitive
to outliers and measurement errors. However recently, so called
partial frontier approaches -- namely order-m (Calzas et al.,
2002) and order-\alpha (Aragon et al., 2005) -- have been
developed, which generalize FDH by allowing for super-efficient
observations being located beyond the estimated
production-possibility frontier. Though, also purely
non-parametric, sensitivity to outliers is substantially reduced
by this methods enveloping just a sub-sample of observations. We
present the new stata command orderalpha that implements
order-\alpha efficiency analysis in stata. The command allows for
sever options such as statistical inference based on sub-sampling
bootstrap. In addition we present the accompanying stata command
oaoutlier, which is an explorative tool that employs orderalpha
for detecting potential outliers in data, meant for subsequent
efficiency analysis using DEA.
(5) Investigating the effects of factor variables
Jeff Pitblado, StataCorp
Stata has a rich set of operators for specifying factor variables
in linear and nonlinear regression models. I will show how to
test for the effects factor variables in these models. I will
also show how to compare and contrast these effects using linear
combinations of the model coefficients.
(6) Correlation metric
Kristian B. Karlson
The logit model is a widely used regression technique in social
research. However, the use and interpretation of coefficients
from these models have proven contentious. These problems arise
because the mean and the variance of discrete variables cannot be
separated. Logit coefficients are identified relative to an
arbitrary scale, which makes the coefficients difficult both to
interpret and to compare across groups or samples: Do differences
in coefficients reflect true differences or differences in
scales? This cross-sample comparison problem raises concerns for
comparative research. However, we suggest a new correlation
metric, derived from logit models, which gives new interpretation
to the estimates of logit models (log odds-ratios). The metric
leads the way to a reorientation of the use of logit models,
because it helps to clarify what logit coefficients are and how
and when logit coefficients can (or cannot) be used in
comparative research. The metric recovers the correlation between
a predictor variable x and a continuous latent outcome variable
y* assumed to underlie a binary observed outcome y. This metric
is truly invariant to differences in the marginal distributions
of x and y* across groups or samples, making it suitable for
situations met in real applications in comparative research. Our
derivations also extend to the probit and to ordered and
multinomial models. The new metric is implemented in Stata
command -nlcorr-.}
(7) Comparing coeficients between nested non-linear probability models
Ulrich Kohler
In a series of recent articles, Karlson, Holm and Breen have
developed a method for comparing estimated coefficientns of
nested nonlinear probability models. The KHB-method is a general
decomposition method that is unaffected by the rescaling or
attenuation bias that arise in cross-model comparisons in
nonlinear models. It recovers the degree to which a control
variable, Z, mediates or explains the relationship between X and
a latent outcome variable, $Y∗$ , underlying the nonlinear
probability model. It also decomposes effects of both discrete
and continuous variables, applies to average partial effects, and
provides analytically derived statistical tests. The method can
be extended to other models in the GLM-family. This presentation
describes this method and the user-written program -khb- that
implements the method.
(8) SOEPlong -- How to restructure complex longitudinal survey data.
An Application for the German Socio-Economic Panel Study
Arno Simons, Katja Möhring, Peter Krause
Currently we observe in the social and behavioral sciences an
increasing demand on complex longitudinal household survey data
for national and cross-national analyses. The state of the
art (for national as well as international comparative data
collections) provides two types of solutions: either the full
presentation of all original wave-specific variables over time,
or the creation of fixed variables according to common
time-consistent standards. The first type of solution leaves it
to the researcher to choose how to encapsulate differing
categories over time, and thus, is rather time-consuming. The
second type of solution is very easy to use; however, it doesn’t
provide the user with information on perhaps necessary annual
extensions or modifications for specific years. In both cases the
researcher has no further information on potential changes of
variables over time. This paper addresses the topic of how
complex representative longitudinal data can be disseminated for
analyses in the social (and behavioral) sciences such that the
amount of time for data preparation is reduced to a minimum while
information on consistency and changes of variables over time
remains fully available. It turns out that, if we want to monitor
changes in living conditions by permanent regular observations
using panel surveys, adaptations in variables seem to be the rule
rather than the exception. Therefore, our solution for the
restructuring of longitudinal data fulfils the requirements of
permanently ongoing adaptations in variables as a reflection of
adapted measures according to new social conditions, new
theoretical backgrounds, or improved conceptual measures when
monitoring changes in living conditions directly over time. Using
Stata, we provide a conceptual and technical solution on how to
restructure the full set of SOEP variables with a complete
documentation of all adaptations over time. Our Stata programs
generate two output files: one covering the restructured data and
another one for the full documentation on the consistency of the
variables over all waves. SOEPlong has been released in 2010 for
the first time as a beta version together with the usual data
dissemination on DVD for the full set of SOEP variables for 26
waves of data. While the paper is specifically addressed to the
German Socio-Economic Panel (SOEP) study, our general approach on
how to deal with complex household panel data might well be
applied to other national and cross-national longitudinal
household surveys.