Multivariable Model-building
A pragmatic approach to regression analysis based on fractional polynomials for modelling continous variables
Patrick Royston and Willi Sauerbrei, Wiley Series in Probability and Statistics, Wiley, 2008
Additional material including datasets, programs and teaching material:
Book Description:
Multivariable regression models are widely used in all areas of science in which empirical data are analysed. Using the multivariable fractional polynomials (MFP) approach this book focuses on the selection of important variables and the determination of functional form for continuous predictors. Despite being relatively simple, the selected models often extract most of the important information from the data. The authors have chosen to concentrate on examples drawn from medical statistics, although the MFP method has applications in many other subject-matter areas as well.
Multivariable Model-Building:
- Focuses on normal-error models for continuous outcomes, logistic regression for binary outcomes and Cox regression for censored time-to-event data.
- Concentrates on fractional polynomial models and illustrates new approaches to model critisism and stability.
- Provides comparisons with and discussion of other techniques such as spline models.
- Features new strategies on modelling interactions with continuous covariates which are important in the context of randomized trials and observational studies
- Does not consider high-dimensional data, such as gene expression data.
- Is illustrated throughout with working examples from 23 substantial real datasets, most data sets and programs in Stata are available on a website enabling the reader to apply techniques directly
- Is written in an accessible and informal style making it suitable for researchers from a range of disciplines with minimal mathematical background.
This book provides a readable text giving the rationale of, and practical advice on, a unified
approach to multivariable modelling. It aims to make multivariable model building simpler, transparent and more effective. This book is aimed at graduate students studying regression modelling and professionals in statistics as well as researchers from medical, physical, social and many other sciences where regression models play a central role.
Table of Contents:
1. Introduction
2. Selection of variables
3. Handling categorical and continous predictors
4. Fractional polynomials for one variable
5. Some issues with univariate FP models
6. MFP: multivariable model-building with fractional polynomials
7. Interactions
8. Model stability
9. Some comparisons of MFP with splines
10. How to work with MFP
11. Special topics involving fractional polynomials
12. Epilogue
Appendix A: Data and software resources
Appendix B: Glossary of Abbreviations
References
Index
Datasets:
For more details about the data see the Appendix A of the book.
Datasets used once in our book:
No. | Name | Outcome | Obs | Events | Vars |
01 | Myeloma | Survival | 65 | 48 | 16 |
02 | Freiburg DNA breast cancer | Survival | 109 | 56 | 1 |
03 | Cervix cancer | Binary | 899 | 141 | 21 |
04 | Nerve conduction | Cont. | 406 | N/A | 1 |
05 | Triceps skinfold thickness | Cont. | 892 | N/A | 1 |
06 | Diabetes | Cont. | 42 | N/A | 2 |
07 | Advanced prostate cancer | Survival | 475 | 338 | 13 |
08 | Quit smoking study | Cont. | 250 | N/A | 3 |
09 | Breast cancer diagnosis | Binary | 458 | 133 | 6 |
10 | Boston housing | Cont. | 506 | N/A | 13 |
11 | Pima Indians | Binary | 768 | 268 | 8 |
12 | Rotterdam breast cancer | Survival | 2982 | 1518 | 11 |
13 | Fetal growth | Cont. | 574 | N/A | 1 |
14 | Cholesterol (not available) | Cont. | 553 | N/A | 1 |
Datasets used more than once in our book:
No. | Name | Outcome | Obs | Events | Vars |
15 | Research body fat | Cont. | 326 | N/A | 1 |
16 | GBSG breast cancer | Survival | 686 | 299 | 9 |
17 | Educational body fat | Cont. | 252 | N/A | 13 |
18 | Glioma | Survival | 411 | 274 | 15 |
19 | Prostate cancer | Cont. | 97 | N/A | 7 |
20 | Whitehall 1 | Survival | 17260 | 2576 | 10 |
Whitehall 1 | Binary | 17260 | 1670 | 10 | |
21 | PBC | Survival | 418 | 161 | 17 |
22 | Oral cancer | Binary | 397 | 194 | 1 |
23 | Kidney cancer | Survival | 347 | 322 | 10 |
Simulated data set from chapter 10:
ART Study | Cont. | 250 | N/A | 10 |
Extended to 10 replicates of 500 observations, altogether 5000 obervations.
Dataset references, background or analyses:
1. Myeloma
Krall, J. M., Uthoff, V. A. and Harley, J. B. (1975). A step-up procedure for selecting variables
associated with survival, Biometrics 31: 49-57.
2. Freiburg DNA breast cancer
Pfisterer, J., Kommoss, F., Sauerbrei, W., Menzel, D., Kiechle, M., Giese, E., Hilgarth, M. and
Pfleiderer, A. (1995). DNA flow cytometry in node positive breast cancer: Prognostic value
and correlation to morphological and clinical factors, Analytical and Quantitative Cytology and
Histology 17: 406-412
3. Cervix cancer
Collett, D. (2003). Modelling binary data, second edn, Chapman & Hall/CRC, Boca Raton.
4. Nerve conduction (no reference)
5. Triceps skinfold thickness
Cole, T. J. and Green, P. J. (1992). Smoothing reference centile curves: the LMS method and penalized
likelihood, Statistics in Medicine 11: 1305-1319.
6. Diabetes
Sockett, E. B., Daneman, D., Clarson, C. and Ehrich, R. M. (1987). Factors affecting and patterns
of residual insulin secretion during first year of Type I (insulin-dependent) diabetes mellitus in
children, Diabetologia 30: 453–459.
7. Advanced prostate cancer
Byar, D. P. and Green, S. B. (1980). The choice of treatment for cancer patients based on covariate information:
application to prostate cancer, Bulletin du Cancer 67: 477–490.
8. Quit smoking study
Cohen, J., Cohen, P., West, S. G. and Aiken, L. S. (2003). Applied Multiple Regression/Correlation
Analysis for the Behavioral Sciences, third edn, Lawrence Erlbaum Associates, New Jersey.
9. Breast cancer diagnosis
Sauerbrei, W., Madjar, H. and Prömpeler, H. J. (1998). Differentiation of benign and malignant breast
tumors by logistic regression and a classification tree using Doppler flow signals, Methods of
Information in Medicine 37: 226–234.
10. Boston housing
Harrison, D. and Rubinfeld, D. L. (1978). Hedonic house prices and the demand for clear air, Journal
of Environmental Economics and Management 5: 81-102.
11. Pima Indians
Royston, P. (2005). Multiple imputation of missing values: update of ICE, Stata Journal 5: 527-536.
12. Rotterdam breast cancer
Sauerbrei, W., Royston, P. and Look, M. (2007). A new proposal for multivariable modelling
of time-varying effects in survival data based on fractional polynomial time-transformation,
Biometrical Journal 49: 453-473.
13. Fetal growth
Altman, D. G. and Chitty, L. S. (1993). Design and analysis of studies to derive charts of fetal size,
Ultrasound in Obstetrics and Gynecology 3: 378-384
14. Cholesterol dataset (not available)
Mann, J. I., Lewis, B., Shepherd, J.,Winder, A. F., Fenster, S., Rose, L. and Morgan, B. (1988). Blood
lipid concentrations and other cardiovascular risk factors: distribution, prevalence and detection in
Britain, British Medical Journal 296: 1702–1706.
15. Research body fat
Luke, A., Durazo-Arvizu, R. and others (1997). Relation between body mass index and body fat in
black population samples from Nigeria, Jamaica, and the United States, American Journal of
Epidemiology 145: 620-628.
16. GBSG breast cancer
Sauerbrei, W. and Royston, P. (1999). Building multivariable prognostic and diagnostic models:
transformation of the predictors using fractional polynomials, Journal of the Royal Statistical
Society, Series A 162: 71-94.
17. Educational body fat
Johnson, R. W. (1996). Fitting percentage of body fat to simple body measurements, Journal of
Statistics Education 4(1).
18. Glioma
Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building:
application to the Cox regression model, Statistics in Medicine 11: 2093–2109.
19. Prostate cancer
Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A. and Yang, N.
(1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the
prostate. ii. radical prostatectomy treated patients, Journal of Urology 141: 1076–1083.
20. Whitehall 1
Royston, P., Ambler, G. and Sauerbrei, W. (1999). The use of fractional polynomials to model
continuous risk variables in epidemiology, International Journal of Epidemiology 28: 964-974.
21. PBC
Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis, JohnWiley &
Sons, Ltd/Inc., NewYork.
22. Oral cancer
Rosenberg, P. S., Katki, H., Swanson, C. A., Brown, L. M., Wacholder, S. and Hoover, R. N. (2003).
Quantifying epidemiologic risk factors using nonparametric regression: model selection remains the
greatest challenge, Statistics in Medicine 22: 3369-3381.
23. Kidney cancer
Royston, P., Sauerbrei, W. and Ritchie, A. W. S. (2004). Is treatment with interferon-α effective in
all patients with metastatic renal carcinoma? A new approach to the investigation of interactions,
British Journal of Cancer 23: 794–799.
Programs (only Stata programs are available):
Order:
This website was last updated 2011-02-28.
In 2016 we released the MFP website http://mfp.imbi.uni-freiburg.de/.