World Congress on Neural Networks - 1996

ISBN # 0-8058-2608-4

This paper is copyrighted by INNS Press and Lawrence Erlbaum Associates. For reprints, please contact INNS at or Lawrence Erlbaum Associates at Lawrence Erlbaum Associates, Inc., Publishers, 10 Industrial Avenue, Mahwah, NJ, 07430, USA.

Pruning a Softmax Neural Network Using Principled Optimal Brain Damage

Richard M. Golden
RMG Consulting, 2000 Fresno Road, Plano, Texas 75074
Steven S. Henley, Harvey L. Bodine, Robert L. Dawes
Martingale Research Corporation
T. Michael Kashner
UT Southwestern Medical Center at Dallas, 8267 Elmbrook, Suite 250, Dallas, TX 75247-9141
A quasi-linear feedforward softmax artificial neural network architecture known as the CCR (Constrained Categorical Regression) model is introduced which provides an explicit mechanism for representing prior knowledge in the form of deterministic heuristic rules for mapping a given input stimulus activation pattern s into a set of m mutually exclusive outcomes. Introduction of prior knowledge in a principled way into the probability model building process is a key feature of the CCR modeling approach because it provides probability models with interpretable parameters.


  1. Bridle, J. S. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fougelman-Soulie & J. Herault (Eds.), neurocomputing: Algorithms, Architectures, and Applications (pp. 227-236). New York: Springer-Verlag.
  2. Golden, R. M. (1995). Making correct statistical inferences using a wrong probability model. Journal of Mathematical Psychology, 38, 3-20.
  3. Golden, R. M. (forthcoming). Mathematical Methods for Neural Network Analysis and Design. MIT Press.
  4. Hassibi, B. & Stork, D. G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in Neural Information Processing Systems 5 (pp. 164-171).
  5. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland (Eds.), Parallel distributed processing . Volume 1: Foundations (pp. 318-362). Cambridge, MA: MIT Press.
  6. SAS Institute. (1990). SAS User's Guide: Volume 1, ANOVA-FREQ, Version 6.0. Cary, NC: SAS Institute, Inc.
  7. White, H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1, 425-464.

NOTE: This material was based on work sponsored by the National Institute on Alcohol Abuse and Alcoholism. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Institute on Alcohol Abuse and Alcoholism.

Back to CCR Page
Back to Publications

Skip to navigation