Content-Length: 296736 | pFad | http://github.com/Derek-Jones/Software-estimation-datasets

23 GitHub - Derek-Jones/Software-estimation-datasets: Collected public software estimation datasets
Skip to content

Derek-Jones/Software-estimation-datasets

Repository files navigation

Software estimation data

The aim of this repository is to collect all the publicly available software estimation datasets which also includes the corresponding actual implementation effort.

Entries are listed by order of number of rows in the dataset, largest first.

Note: it's common for paper X to cite Y as the source of the data, without checking to find that Y cites Z as the source of the data (occasionally the citation chain is longer).

The arff file format contains embedded information about the data.

If you know of any datasets that are missing, and you have the data, please send me a copy. Also, if you spot any mistakes, please let me know.

Valdes-Souto.csv

F. Valdés-Soutoa, and J. Valeriano-Assem, "Merging Distinct Sources Databases to Improve Software Estimation Models", Programming and Computer Software, 2024, Vol. 50(8), pp. 786–795.

57 rows

Project totals of COSMIC function points and person hours. Data from a Mexican company, the International Software Benchmarking Standards Group (ISBSG), and the Mexican Software Metrics Association (AMMS).

CESAW.tgz

Derek M. Jones, William R. Nichols, "The CESAW dataset: a conversation", Jun 2021, arXiv:cs.SE/2106.03679 .

203,621 rows

Estimate/actual in person hours for small tasks.

renzo-pomodoro.csv

Derek M. Jones, "The Renzo Pomodoro dataset", Dec 2019, The Shape of Code

17,764 rows

Estimate/actual in Pomodoros for small tasks.

Subbiah.csv

C. Subbiah, "Task-Based Estimation and Planning for Application Development Projects and Resources: Models, Methods and Applications", PhD thesis Dec 2019, University of Missouri – St. Louis.

72 rows

Function points.

SiP

Derek M. Jones, Stephen Cullum "A conversation around the analysis of the SiP effort estimation dataset", Jan 2019, arXiv:cs.SE/1901.01621.

10,100 rows

Estimate/actual in person hours for small tasks.

Project-22

Derek M. Jones "Small team estimating in story points; a project dataset", Feb 2023, blog post.

630 rows

Estimates in Story points and actuals in hours for small tasks.

china.arff

Fang Hon Yun, "China: Effort Estimation Dataset", Apr 2010, Zenodo.

499 rows

Estimate in Function points and actual in person hours for large tasks (i.e., years).

Huijgens492.zip

Hennie Huijgens, Arie van Deursen, Rini van Solingen, "The effects of perceived value and stakeholder satisfaction on software project impact", Sep 2017, Information and Software Technology volume 89, pp 19-36.

492 rows

Estimates in Function points and actuals in Euros/person hours for medium size projects.

kitchenham.arff

Barbara Kitchenham, Shari Lawrence Pfleeger, Beth McColl and Suzanne Eagan, “An empirical study of maintenance and development estimation accuracy”, Oct 2002, Journal of Systems and Software, volume 64(1), pp.57-77.

145 rows

Estimates in Function points/hours and actuals in person hours for large projects.

nasa93.arff

T. Menzies, D. Port, Z. Chen, J. Hihn and S. Stukes, "Validation Methods for Calibrating Software Effort Models", May 2005, 27th International Conference on Software Engineering, pp 587-595.

93 rows

Estimates in thousands of lines of code and actuals in person months for large projects.

Desharnais.csv

Jean-Marc Desharnais, "Analyse Statistique de la Productivite des Projets de Developpement en Informatique a Partir de la Technique des Points de Fonction" (Statistical Analysis on the Productivity of Data Processing with Development Projects using the Function Point Technique), Dec 1988, MSc, Université du Québec à Montréal.

80 rows

Estimate in Function points and actual in person hours for large tasks (i.e., years).

UCP_Dataset.csv

Radek Silhavy, Petr Silhavy and Zdenka Prokopova, "Analysis and selection of a regression model for the Use Case Points method using a stepwise approach", Mar 2017, The Journal of Systems and Software, volume 125(C), pp 1-14.

71 rows

Estimates in Use case points and actuals in person hours for large projects.

COCOMO-81.csv

Barry W. Boehm, "Software Engineering Economics", 1981, Prentice-Hall, Inc.

63 rows

Estimate in lines of code and actual in person months for large projects.

Maxwell.arff

K. D. Maxwell, "Applied Statistics for Software Managers", Prentice-Hall, 2002,

62 rows

Estimates in Function points and actuals in person hours for large projects.

miyazaki94.csv

Miyazaki, M. Terakado, K. Ozaki, H. Nozaki, "Robust regression for developing software estimation models", Oct 1994, Journal of Systems and Software, volume 27(1), pp. 3-16.

48 rows

Estimates in lines of code and actuals in man-months for large projects.

Finnish.arff

B. Sigweni and M. Shepperd "Using Blind Analysis for Software Engineering Experiments", 2015, Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, pp. 1-6.

38 rows

Estimates in Function points and actuals in person months for large projects.

Albrecht.arff

A.J. Albrecht and J.E. Gaffney, "Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation", Nov 1983, IEEE Transactions on Software Engineering volume SE-9(6), pp 639-648.

24 rows

Estimates in Function points and actuals in thousands of person hours for large projects.

About

Collected public software estimation datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/Derek-Jones/Software-estimation-datasets

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy