HomeTechniques and Tips@RISK Distribution FittingNumber of Bins in Distribution Fitting

# 4.10. Number of Bins in Distribution Fitting

Applies to: @RISK 5.x, Professional and Industrial Editions
(The fitting methods were changed beginning with @RISK 6.0.)

How does your software automatically determine the number of chi-squared bins to use when fitting distributions against sample data? What degrees of freedom does it use?  Is this the same method used for the "Auto" option when specifying the number of bins in a histogram?

χ² (chi-squared) binning and histogram binning are very different, and the number and position of bars on a histogram chart is almost never the same as the arrangement of the χ² bins. For starters, χ² bins are equally probable and therefore are typically not all the same width, while (at least for all Palisade products) histogram graph bars always have equal width.

For histogram binning, see Number of Bins in a Histogram.

For χ² (chi-squared) binning with n data points:

• If n < 35, bins = nearest integer to [n/5]
• If n >= 35, bins = largest integer below [1.88 n ^ (2/5)]

The small-n part is a rule of thumb that says you should have on average at least five data points per bin (a rule which is not always followed in practice).  The large-n part has a real basis in statistical theory.  A reference for it is in Goodness-of-Fit Tests by Ralph D'Agostino and Michael Stephens (Dekker 1986), page 70.

After a fit, you can find how many bins @RISK used for computing the chi-squared statistic by clicking the "Statistical Summary" icon at the bottom of the "Fit Results" graph.

The degrees of freedom for the χ² statistic is (number of points) minus 1, without regard to the number of parameters in the particular distribution. You can see this by examining the critical parameters in that same statistical summary. Law and Kelton, in Simulation Modeling and Analysis (2000), pages 359–360, say that some authors do vary degrees of freedom according to the number of parameters in the fitted distribution, but the conservative procedure is to use (number of points) minus 1, as @RISK does.

It's important to remember that the χ² binning has zero effect on which fit is actually presented to the user. In other words, when @RISK is trying to fit to (say) a triangular distribution, it chooses the parameters that make the triangle as close as possible to your data as measured by MLEs. (There is no L-M optimizer in current versions of @RISK.) Changing the binning may change the statistics that purport to measure the goodness of a fit, but will have no effect on the parameters of the fitted distribution.

For most data sets, from a glance at the overlay plot of the fitted distribution against your data it should be obvious which fit is best. If the binning is important to you, you can click that tab of the Fit dialog before performing the fit, and adjust the binning to your preference.

Last edited: 2014-01-14