

HIGH RISK CRASH ANALYSIS Final Report 558 Prepared by: Simon Washington and Wen Cheng Department of Civil Engineering & Engineering Mechanics University of Arizona Tucson, AZ 85721 December 2005 Prepared for: Arizona Department of Transportation 206 South 17th Avenue Phoenix, Arizona 85007 in cooperation with U. S. Department of Transportation Federal Highway Administration DISCLAIMER The contents of this report reflect the views of the authors who are responsible for the facts and the accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the Arizona Department of Transportation or the Federal Highway Administration. This report does not constitute a standard, specification, or regulation. Trade or manufacturers' names which may appear herein are cited only because they are considered essential to the objectives of the report. The U. S. Government and the State of Arizona do not endorse products or manufacturers. Technical Report Documentation Page 1. Report No. FHWA AZ 05 558 2. Government Accession No. 3. Recipient's Catalog No. 5. Report Date December 2005 4. Title and Subtitle High Risk Crash Analysis 6. Performing Organization Code 7. Author Dr. Simon Washington and Wen Cheng 8. Performing Organization Report No. 10. Work Unit No. 9. Performing Organization Name and Address University of Arizona Tucson, AZ 85721 11. Contract or Grant No. SPR PL 1( 63) 558 13. Type of Report & Period Covered Final Report 12. Sponsoring Agency Name and Address Arizona Department of Transportation 206 S. 17th Avenue Phoenix, Arizona 85007 14. Sponsoring Agency Code 15. Supplementary Notes Prepared in cooperation with the U. S. Department of Transportation, Federal Highway Administration 16. Abstract In agencies with jurisdiction over extensive road infrastructure it is common practice to select and rectify hazardous locations. Improving hazardous locations may arise during safety management activities, during maintenance activities, or as a result of political pressures and/ or public attention. Commonly a two stage process is used. In the first stage the past accident history of all sites is reviewed to screen a limited number of high risk locations for further examination. In the second stage the selected sites are studied in greater detail to devise cost effective remedial actions or countermeasures for a subset of correctable sites. Due often to limited time and resources constraints and the extensive number of candidate sites typically considered in such endeavors, it is impractical for agencies to examine all sites in detail. The current Arizona Local Government Safety Project Analysis Model ( ALGSP) is intended to facilitate conducting these procedures by providing an automated method for analysis and evaluation of motor vehicle crashes and subsequent remediation of ‘ hot spot’ or ‘ high risk’ locations. The software is user friendly and can save lots of time for local jurisdictions and governments such as Metropolitan Planning Organizations ( MPOs), counties, cities, and towns. Some analytical improvements are possible, however. The objective of this study was to provide recommendations that will lead to improvement in the accuracy and reliability of the ALGSP software for identifying true ‘ hot spots’ within the Arizona transportation system or network, be they road segments, ramps, or intersections. The research resulted in 1) a survey of past and current hot spot identification ( HSID) approaches, 2) evaluation of HSID methods and exploration of optimum duration of before period crash data under simulated scenarios, 3) development of safety performance functions ( SPFs) for various functional road sections within Arizona, 4) extended comparisons of alternative HSID methods based on SPFs by using real crash data, and 5) recommendations for improving the identification ability of current ALGSP model. 17. Key Words Hot Spot Identification, High Risk Sites, Sites with Promise, Safety, Motor Vehicle Crashes 18. Distribution Statement Document is available to the U. S. Public through the National Technical Information Service, Springfield, Virginia, 22161 19. Security Classification Unclassified 20. Security Classification Unclassified 21. No. of Pages 154 22. Price 23. Registrant's Seal SI* ( MODERN METRIC) CONVERSION FACTORS APPROXIMATE CONVERSIONS TO SI UNITS APPROXIMATE CONVERSIONS FROM SI UNITS Symbol When You Know Multiply By To Find Symbol Symbol When You Know Multiply By To Find Symbol LENGTH LENGTH in Inches 25.4 millimeters mm mm millimeters 0.039 inches in ft Feet 0.305 meters m m meters 3.28 feet ft yd Yards 0.914 meters m m meters 1.09 yards yd mi Miles 1.61 kilometers km km kilometers 0.621 miles mi AREA AREA in2 square inches 645.2 square millimeters mm2 mm2 Square millimeters 0.0016 square inches in2 ft2 square feet 0.093 square meters m2 m2 Square meters 10.764 square feet ft2 yd2 square yards 0.836 square meters m2 m2 Square meters 1.195 square yards yd2 ac Acres 0.405 hectares ha ha hectares 2.47 acres ac mi2 square miles 2.59 square kilometers km2 km2 Square kilometers 0.386 square miles mi2 VOLUME VOLUME fl oz fluid ounces 29.57 milliliters mL mL milliliters 0.034 fluid ounces fl oz gal Gallons 3.785 liters L L liters 0.264 gallons gal ft3 cubic feet 0.028 cubic meters m3 m3 Cubic meters 35.315 cubic feet ft3 yd3 cubic yards 0.765 cubic meters m3 m3 Cubic meters 1.308 cubic yards yd3 NOTE: Volumes greater than 1000L shall be shown in m3. MASS MASS oz Ounces 28.35 grams g g grams 0.035 ounces oz lb Pounds 0.454 kilograms kg kg kilograms 2.205 pounds lb T short tons ( 2000lb) 0.907 megagrams ( or “ metric ton”) mg ( or “ t”) Mg megagrams ( or “ metric ton”) 1.102 short tons ( 2000lb) T TEMPERATURE ( exact) TEMPERATURE ( exact) º F Fahrenheit temperature 5( F 32)/ 9 or ( F 32)/ 1.8 Celsius temperature º C º C Celsius temperature 1.8C + 32 Fahrenheit temperature º F ILLUMINATION ILLUMINATION fc foot candles 10.76 lux lx lx lux 0.0929 foot candles fc fl foot Lamberts 3.426 candela/ m2 cd/ m2 cd/ m2 candela/ m2 0.2919 foot Lamberts fl FORCE AND PRESSURE OR STRESS FORCE AND PRESSURE OR STRESS lbf Poundforce 4.45 newtons N N newtons 0.225 poundforce lbf lbf/ in2 poundforce per square inch 6.89 kilopascals kPa kPa kilopascals 0.145 poundforce per square inch lbf/ in2 SI is the symbol for the International System of Units. Appropriate rounding should be made to comply with Section 4 of ASTM E380 TABLE OF CONTENTS EXECUTIVE SUMMARY ................................................................................................ 1 CHAPTER I  INTRODUCTION ...................................................................................... 3 CHAPTER II  LITERATURE REVIEW OF HSID METHODS ..................................... 5 HOT SPOT IDENTIFICATION PROBLEM BACKGROUND............................... 5 BAYESIAN TECHNIQUES TO IDENTIFY HAZARDOUS LOCATIONS ......... 11 Bayesian Techniques Based on Accident Frequencies..................................... 11 Bayesian Techniques Based on Accident Rates ............................................... 13 CHAPTER III  EXPERIMENT DESIGN FOR EVALUATION OF HSID METHODS AND EXPLORATION OF ACCIDENT HISTORY....................................................... 17 EXPERIMENT FOR EVALUATING HSID METHOD PERFORMANCE .......... 17 Hot Spot Identification Methods....................................................................... 17 Ground Rules for Simulation Experiment ........................................................ 19 Generating Mean Crash Frequencies from Real Data ...................................... 20 Generation of Random Poisson Samples from TPMs ...................................... 21 Performance Evaluation Results for HSID Methods ........................................ 26 EXPERIMENT FOR OPTIMIZING DURATION OF CRASH HISTORY ........... 30 RESULTS ................................................................................................................. 32 CONCLUSIONS AND RECOMMENDATIONS ................................................... 38 CHAPTER IV  SAFETY PERFORMANCE FUNCTIONS FOR ARIZONA ROAD SEGMENTS ..................................................................................................................... 39 DATA DESCRIPTION ............................................................................................ 39 HOW TO CREATE SPFS? ...................................................................................... 40 RESULTS OF SPFS ................................................................................................. 41 CONCLUSIONS....................................................................................................... 42 CHAPTER V  COMPARISON OF HSID METHODS BASED ON REAL CRASH DATA OF ARIZONA ROAD SEGMENTS.................................................................... 43 HSID METHODS BASED ON SPFS ...................................................................... 43 The EB Approach Based on SPFs .................................................................... 43 Accident Reduction Potential Method Based on SPFs ..................................... 44 Numerical Examples to Show the HSID Methods Based on SPFs .................. 44 DATA DESCRIPTION ............................................................................................ 46 TESTS FOR COMPARISON OF HSID METHODS.............................................. 46 Site Consistency Test........................................................................................ 47 Method Consistency Test.................................................................................. 48 Total Ranking Differences Test ........................................................................ 48 False Identification Test.................................................................................... 49 COMPARISON RESULTS...................................................................................... 51 Site Consistency Test Result............................................................................. 51 Method Consistency Test Result ...................................................................... 52 Total Ranking Differences Test Result............................................................. 53 False Identification Test Result ........................................................................ 54 False True Poisson Means Differences Test Result.......................................... 55 Result of Similarity of Alternative HSID Identification Methods.................... 56 CONCLUSIONS AND RECOMMENDATIONS ................................................... 57 CHAPTER VI  HSID IN CURRENT ALGSP MODEL AND RECOMMENDED SOFTWARE CHANGES ................................................................................................. 59 HSID IN CURRENT ALGSP MODEL ................................................................... 59 RECOMMENDED SOFTWARE CHANGES......................................................... 61 Incorporating the Functional Classification as an Additional User Selection Parameter .......................................................................................................... 61 Data Interface Improvement ............................................................................. 61 Exploring the Relationship between Exposure and Safety as Employed in the ALGSP.............................................................................................................. 62 Incorporation of the EB Techniques to Calculate the Expected Crash Number62 Incorporation of Accident Reduction Potential Method................................... 63 Incorporation of the EB Techniques to Calculate the Expected Crash Costs... 64 Recommended Period of Analysis for Software Users..................................... 64 REFERENCES ................................................................................................................. 67 APPENDIX A: REAL ARIZONA CRASH DATA USED FOR THE DEVELOPMENT OF SIMULATED CRASH DATA................................................................................... 71 APPENDIX B: THE IDENTIFICATION ERROR RATES ASSOCIATED WITH VARIOUS HSID METHODS, CONFIDENCE LEVELS, AND GROUPS ................... 80 APPENDIX C: SAFETY PERFORMANCE FUNCTIONS OF VARIOUS FUNCTIONAL CLASSIFICATIONS OF ARIZONA ROAD SEGMENTS................ 107 APPENDIX D: COMPARISON TESTS RESULTS AND SIMILARITY OF ALTERNATIVE HSID METHODS FOR VARIOUS CLASSIFICATIONS OF HIGHWAY SECTIONS................................................................................................. 117 LIST OF TABLES Table 1: Summary of Gamma Fittings of Six Datasets .................................................... 24 Table 2: Simulated Data for 30 Sites and 16 Observation Periods................................... 25 Table 3: Percent Errors for Low Heterogeneity in Crash Counts..................................... 29 Table 4: Percent Errors for High Heterogeneity in Crash Counts .................................... 29 Table 5: Snapshot of the Simulated Data.......................................................................... 31 Table 6: The Number of t year Which is the “ Knee” of the Curve for Group 1 .............. 33 Table 7: The Number of t year Which is the “ Knee” of the Curve for Group 2 .............. 33 Table 8: The Number of t year Which is the “ Knee” of the Curve for Group 3 .............. 33 Table 9: Percent Errors for Low Heterogeneity in Crash Counts ( 3 Years Data) ............ 37 Table 10: Percent Errors for High Heterogeneity in Crash Counts ( 3 Years Data).......... 37 Table 11: Functional Classification Codes ...................................................................... 39 Table 12: Statistics for Roads of Various Functional Classifications............................... 40 Table 13: Crash Information of a Sample of 20 Principle Arterial Road Sections........... 47 Table 14: Results of Site Consistency Test of Various Methods for All Classifications of Highways: Accumulated Crashes for Hot Spot Sites for Various Methods ............. 51 Table 15: Results of Method Consistency Test of Various Methods for All Classifications of Highways: Number of Sites Commonly Identified across Periods ...................... 52 Table 16: Results of Total Ranking Differences Test of Various Methods for All Classifications of Highways: Cumulative Ranking Differences of Hot Spot Sites .. 53 Table 17: Results of False Identification Test of Various Methods for All Classifications of Highways: Frequency of Errors............................................................................ 54 Table 18: Results of False True Poisson Means Differences Test of Various Methods for All Classifications of Highways: Cumulative Difference in TPMs.......................... 55 Table 19: Accumulated Similarity of Various Methods for All Classifications of Highways ( δ = 0.90).................................................................................................. 56 Table 20: Accumulated Similarity of Various Methods for All Classifications of Highways ( δ = 0.95).................................................................................................. 56 Table 21: Observed Data from Apache ( E1) .................................................................... 71 Table 22: Observed Data from Gila ( E2).......................................................................... 71 Table 23: Observed Data from Graham ( L1).................................................................... 72 Table 24: Observed Data from Lapaz ( L2)....................................................................... 72 Table 25: Observed Data from Pima ( S1)......................................................................... 72 Table 26: Observed Data from Santacruz ( S2) ................................................................. 73 Table 27: The Identification Error Rates of SR Method for Group 1 ( δ = 0.90).............. 80 Table 28: The Identification Error Rates of ER Method for Group 1 ( δ = 0.90).............. 81 Table 29: The Identification Error Rates of CI Method for Group 1 ( δ = 0.90)............... 82 Table 30: The Identification Error Rates of SR Method for Group 1 ( δ = 0.95).............. 83 Table 31: The Identification Error Rates of EB Method for Group 1 ( δ = 0.95).............. 84 Table 32: The Identification Error Rates of CI Method for Group 1 ( δ = 0.95)............... 85 Table 33: The Identification Error Rates of SR Method for Group 1 ( δ = 0.99).............. 86 Table 34: The Identification Error Rates of EB Method for Group 1 ( δ = 0.99).............. 87 Table 35: The Identification Error Rates of CI Method for Group 1 ( δ = 0.99)............... 88 Table 36: The Identification Error Rates of SR Method for Group 2 ( δ = 0.90).............. 89 Table 37: The Identification Error Rates of EB Method for Group 2 ( δ = 0.90).............. 90 Table 38: The Identification Error Rates of CI Method for Group 2 ( δ = 0.90)............... 91 Table 39: The Identification Error Rates of SR Method for Group 2 ( δ = 0.95).............. 92 Table 40: The Identification Error Rates of EB Method for Group 2 ( δ = 0.95).............. 93 Table 41: The Identification Error Rates of CI Method for Group 2 ( δ = 0.95)............... 94 Table 42: The Identification Error Rates of SR Method for Group 2 ( δ = 0.99).............. 95 Table 43: The Identification Error Rates of EB Method for Group 2 ( δ = 0.99).............. 96 Table 44: The Identification Error Rates of CI Method for Group 2 ( δ = 0.99)............... 97 Table 45: The Identification Error Rates of SR Method for Group 3 ( δ = 0.90).............. 98 Table 46: The Identification Error Rates of EB Method for Group 3 ( δ = 0.90).............. 99 Table 47: The Identification Error Rates of CI Method for Group 3 ( δ = 0.90)............. 100 Table 48: The Identification Error Rates of SR Method for Group 3 ( δ = 0.95)............ 101 Table 49: The Identification Error Rates of EB Method for Group 3 ( δ = 0.95)............ 102 Table 50: The Identification Error Rates of CI Method for Group 3 ( δ = 0.95)............. 103 Table 51: The Identification Error Rates of SR Method for Group 3 ( δ = 0.99)............ 104 Table 52: The Identification Error Rates of EB Method for Group 3 ( δ = 0.99)............ 105 Table 53: The Identification Error Rates of CI Method for Group 3 ( δ = 0.99)............. 106 Table 54: Estimation Results for SPF of Rural Interstate Principle Arterials ( Functional Code: 1)................................................................................................................... 108 Table 55: Estimation Results for SPF of Rural Other Principle Arterials ...................... 109 Table 56: Estimation Results for SPF of Rural Minor Arterials..................................... 110 Table 57: Estimation Results for SPF of Rural Major Collectors ( Functional Code: 7) 111 Table 58: Estimation Results for SPF of Rural Minor Collectors ( Functional Code: 8) 112 Table 59: Estimation Results for SPF of Urban Interstate Principle Arterials ( Functional Code: 11)................................................................................................................. 113 Table 60: Estimation Results for SPF of Urban Freeways ............................................. 114 Table 61: Estimation Results for SPF of Urban Other Principle Arterials ( Functional Code: 14)................................................................................................................. 115 Table 62: Estimation Results for SPF of Urban Minor Arterials.................................... 116 Table 63: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 1)................................................................................................................... 117 Table 64: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 1)................................................................................................................... 117 Table 65: Results of Site Consistency Test of Various Methods.................................... 118 Table 66: Results of Method Consistency Test of Various Methods ............................. 118 Table 67: Results of Total Ranking Differences Test of Various Methods.................... 118 Table 68: Results of False Identification Test of Various Methods ............................... 119 Table 69: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 1) ............................................................................................... 119 Table 70: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 2)................................................................................................................... 120 Table 71: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 2)................................................................................................................... 120 Table 72: Results of Site Consistency Test of Various Methods.................................... 120 Table 73: Results of Method Consistency Test of Various Methods ............................. 121 Table 74: Results of Total Ranking Differences Test of Various Methods.................... 121 Table 75: Results of False Identification Test of Various Methods ............................... 121 Table 76: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 2) ............................................................................................... 122 Table 77: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 6)................................................................................................................... 123 Table 78: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 6)................................................................................................................... 123 Table 79: Results of Site Consistency Test of Various Methods.................................... 123 Table 80: Results of Method Consistency Test of Various Methods ............................. 124 Table 81: Results of Total Ranking Differences Test of Various Methods.................... 124 Table 82: Results of False Identification Test of Various Methods ............................... 124 Table 83: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 6) ............................................................................................... 125 Table 84: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 7)................................................................................................................... 126 Table 85: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 7)................................................................................................................... 126 Table 86: Results of Site Consistency Test of Various Methods.................................... 126 Table 87: Results of Method Consistency Test of Various Methods ............................. 127 Table 88: Results of Total Ranking Differences Test of Various Methods.................... 127 Table 89: Results of False Identification Test of Various Methods ............................... 127 Table 90: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 7) ............................................................................................... 128 Table 91: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 8)................................................................................................................... 129 Table 92: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 8)................................................................................................................... 129 Table 93: Results of Site Consistency Test of Various Methods.................................... 129 Table 94: Results of Method Consistency Test of Various Methods ............................. 130 Table 95: Results of Total Ranking Differences Test of Various Methods.................... 130 Table 96: Results of False Identification Test of Various Methods ............................... 130 Table 97: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 8) ............................................................................................... 131 Table 98: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 11)................................................................................................................. 132 Table 99: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 11)................................................................................................................. 132 Table 100: Results of Site Consistency Test of Various Methods.................................. 132 Table 101: Results of Method Consistency Test of Various Methods ........................... 133 Table 102: Results of Total Ranking Differences Test of Various Methods.................. 133 Table 103: Results of False Identification Test of Various Methods ............................. 133 Table 104: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 11) ............................................................................................. 134 Table 105: Similarity of Identification Results ( δ = 0.90) of Various Methods............. 135 Table 106: Similarity of Identification Results ( δ = 0.95) of Various Methods............. 135 Table 107: Results of Site Consistency Test of Various Methods.................................. 135 Table 108: Results of Method Consistency Test of Various Methods ........................... 136 Table 109: Results of Total Ranking Differences Test of Various Methods ( Functional Code: 12)................................................................................................................. 136 Table 110: Results of False Identification Test of Various Methods ............................. 136 Table 111: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 12) ............................................................................................. 137 Table 112: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 14)................................................................................................................. 138 Table 113: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 14)................................................................................................................. 138 Table 114: Results of Site Consistency Test of Various Methods.................................. 138 Table 115: Results of Method Consistency Test of Various Methods ........................... 139 Table 116: Results of Total Ranking Differences Test of Various Methods ( Functional Code: 14)................................................................................................................. 139 Table 117: Results of False Identification Test of Various Methods ............................. 139 Table 118: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 14) ............................................................................................. 140 Table 119: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 16)................................................................................................................. 141 Table 120: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 16)................................................................................................................. 141 Table 121: Results of Site Consistency Test of Various Methods.................................. 141 Table 122: Results of Method Consistency Test of Various Methods ........................... 142 Table 123: Results of Total Ranking Differences Test of Various Methods ( Functional Code: 16)................................................................................................................. 142 Table 124: Results of False Identification Test of Various Methods ............................. 142 Table 125: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 16) ............................................................................................. 143 LIST OF FIGURES Figure 1: Observed and Fitted PDF of E1 Crash Data and Fit Summary Statistics......... 23 Figure 2: Fitted and Empirical CDF of E1........................................................................ 24 Figure 3: Moving Averages vs. Original Statistic............................................................. 32 Figure 4: The Number of t year Which is the “ Knee” of the Curve for 90% Confidence Level ......................................................................................................................... 34 Figure 5: The Number of t year Which is the “ Knee” of the Curve for 95% Confidence Level ......................................................................................................................... 34 Figure 6: The Number of t year Which is the “ Knee” of the Curve for 99% Confidence Level ......................................................................................................................... 35 Figure 7: The Number of t year Which is the “ Knee” of the Curve for All Confidence Levels........................................................................................................................ 35 Figure 8: The Cumulative Percent Distribution of Various t years.................................. 36 Figure 9: Key Steps of ALGSP Model ............................................................................. 59 Figure 10: The Flowchart of Conducting EB Analysis .................................................... 63 Figure 11: The Flowchart of Computing Accident Reduction Potential .......................... 64 Figure 12: Empirical Cumulative Distribution of Dataset One ( E1) ................................ 74 Figure 13: Empirical Cumulative Distribution of Dataset Two ( E2) ............................... 75 Figure 14: Empirical Cumulative Distribution of Dataset Three ( L1) ............................. 76 Figure 15: Empirical Cumulative Distribution of Dataset Four ( L2) ............................... 77 Figure 16: Empirical Cumulative Distribution of Dataset Five ( S1)................................ 78 Figure 17: Empirical Cumulative Distribution of Dataset Six ( S2).................................. 79 Figure 18: Relation of AADT and Crashes/ year km for Rural Interstate Principle Arterials ( Functional Code: 1, year: 2000) ............................................................. 108 Figure 19: Relation of AADT and Crashes/ year km for Rural Other Principle Arterials ( Functional Code: 2, year: 2000) ............................................................................ 109 Figure 20: Relation of AADT and Crashes/ year km for Rural Minor Arterials ( Functional Code: 6, year: 2000)................................................................................................ 110 Figure 21: Relation of AADT and Crashes/ year km for Rural Major Collectors ( Functional Code: 7, year: 2000) ............................................................................ 111 Figure 22: Relation of AADT and Crashes/ year km for Rural Minor Collectors ( Functional Code: 8, year: 2000) ............................................................................ 112 Figure 23: Relation of AADT and Crashes/ year km for Urban Interstate Principle Arterials ( Functional Code: 11, year: 2000) ........................................................... 113 Figure 24: Relation of AADT and Crashes/ year km for Urban Freeways ..................... 114 Figure 25: Relation of AADT and Crashes/ year km for Urban Other Principle Arterials ( Functional Code: 14, year: 2000) .......................................................................... 115 Figure 26: Relation of AADT and Crashes/ year km for Urban Minor Arterials ( Functional Code: 16, year: 2000) .......................................................................... 116 1 EXECUTIVE SUMMARY In many agencies with jurisdiction over extensive road infrastructure, it is common practice to select and rectify hazardous locations. Improving hazardous locations may arise during safety management activities, during maintenance activities, or as a result of political pressures and/ or public attention. Commonly a two stage process is used. In the first stage, the past accident history of all sites is reviewed to screen a limited number of high risk locations for further examination. In the second stage, the selected sites are studied in greater detail to devise cost effective remedial actions or countermeasures for a subset of correctable sites. Due to limited time and resources, constraints and the extensive number of candidate sites typically considered in such endeavors, it is impractical for agencies to examine all sites in detail. The current Arizona Local Government Safety Project ( ALGSP) Analysis Model, which was developed by Carey ( 2001) with funding from the Arizona Department of Transportation ( ADOT), is intended to facilitate conducting these procedures by providing an automated method for analysis and evaluation of motor vehicle crashes and subsequent remediation of ‘ hot spot’ or ‘ high risk’ locations. The software is user friendly and can save large amounts of time for local jurisdictions and governments such as Metropolitan Planning Organizations ( MPOs), counties, cities, and towns. However, its analytical core is based on the simple ranking of crash statistics, where the user is offered choices of crash frequency, crash rate, crash severity, or crash cost ( severities associated with average costs per crash severity type). Although this method has the benefit of straightforwardness, the efficiency of identifying truly high risk sites leaves some room for improvement. This research, funded by ADOT, aims to justify and recommend improvements to the analytical algorithms within the ALGSP model, thus enhancing its ability to accurately identify high risk sites. Included in the results of this research are a survey of past and current hot spot identification ( HSID) approaches; evaluation of HSID methods, and exploration of optimum duration of before period crash data under simulated scenarios; development of safety performance functions ( SPFs) for various functional road sections within Arizona; extended comparisons of alternative HSID methods based on SPFs by using real crash data; and recommendations for improving the identification ability of the current ALGSP model. These results are divided into the following sections: • Literature review of HSID methods ( chapter II): Through tracing the historical and conceptual development of various HSID techniques, the strengths and weaknesses associated with alternative approaches are assessed and appropriate directions of future research on HSID methods are explored and proposed. A detailed description of Bayesian approaches is also provided. • Experimental design for evaluation of HSID methods and exploration of accident history ( chapter III): In this experiment, “ sites with promise” are known a priori. Real intersection crash data from six counties within Arizona are used to simulate crash frequency distributions at hypothetical sites. A range of real conditions is manipulated to quantify their effects. Various levels of confidences are explored. 2 False positives ( labeling a safe site as high risk) and false negatives ( labeling a high risk site as safe) are compared across the following three methods, say, simple ranking method, confidence interval method, and Empirical Bayesian ( EB) method. Finally, the effect of crash history duration in these approaches is quantified. • Safety performance functions for Arizona road segments ( chapter IV): The SPFs for nine functional classifications of road sections in Arizona are created based on the crash data of Year 2000 provided by ADOT. Due to the existence of overdispersion of accidents, Negative Binomial models are utilized to develop these SPFs. • Comparison of HSID methods based on real crash data of Arizona road segments ( chapter V): On the basis of SPFs for Arizona road sections, five tests are implemented to evaluate the performances of the EB, accident reduction potential, accident frequency, and the accident rate methods. Two levels of confidences are explored under each test. In addition, the similarity of identification results of the alternative HSID methods is explored as well. • HSID in current ALGSP model and recommended software changes ( chapter VI): The algorithms for conducting HSID in the current ALGSP model are first reviewed and the software changes are then recommended. These recommendations include incorporating functional classification as an additional selection parameter, data interface improvements, accident history requirements, embed ding the relationships between exposure and safety for various roadway functional classes, incorporation of the EB techniques to compute the expected crash count, incorporation of accident reduction potential as an additional weighting method, and incorporation of EB techniques to calculate the expected crash costs. Based on both real and simulated data, the results in this report show significant advantages of the EB methods over other HSID methods across various confidence levels and different statistical tests. Specifically, the research found that: • A higher percentage of truly high risk sites are identified as ‘ high risk.’ • A higher percentage of truly safe sites are identified as ‘ safe.’ • Overall misclassifications are reduced using a Bayesian approach compared to alternative methodologies. • The Bayesian approach shows the best site consistency and method consistency among the alternative methodologies. Although it is shown that incorporation of Bayesian techniques into the ALGSP will provide model users with more accurate prediction of hot spots, improvements are contingent upon accurate safety performance functions, which are currently unavailable in the ALGSP. Safety performance functions— the relationship between traffic volumes, road section lengths, and crashes— are provided in Appendix C for various roadway functional classifications in the state of Arizona. These safety performance functions enable the software enhancements needed to improve the ALGSP and accommodate Empirical Bayes’ procedures. 3 CHAPTER I  INTRODUCTION Hot spot identification is a critical contemporary transportation issue. The Intermodal Surface Transportation Efficiency Act ( ISTEA) of 1991, along with the subsequent Transportation Efficiency Act for the 21st Century ( TEA 21), brought HSID squarely into transportation planning activities. In particular, ISTEA requires each state to develop a work plan outlining strategies to implement Safety Management Systems ( NCHRP, 2003). The objectives outlined in this management system require that several activities be undertaken by MPOs and/ or DOTs: 1) The development and maintenance of a regional safety database so that safety investments can be evaluated regionally and forward in time. 2) The adoption of a defensible ( i. e. state of practice) methodology for identifying safety deficiencies within a region. 3) A maintained and updated record of ‘ sites with promise,’ including intersections, segments, interchanges, ramps, curves, etc. 4) A defensible methodology for evaluating the effectiveness of safety countermeasures. Besides this mandate to spend safety funds wisely, there is professional pressure to conduct rigorous analyses and be held accountable for ‘ good number crunching.’ Due to both public and professional pressures and the import associated with motor vehicle injuries and fatalities, transportation safety professionals desire analytical tools to cope with HSID. As a powerful tool for local governments and jurisdictions, the current ALGSP model can be used to facilitate the selection of hazardous roadway locations in local jurisdictions and to aid in the evaluation of potential spot treatments of safety hazards. Its identification method is to simply rank the crash statistics in descending order and then the top ones are selected in terms of the allowed money budget. Due to a random “ up” fluctuation in crash counts during the observation period, this simple ranking method is always subject to regression to the mean bias, which decreases the identification accuracy. By contrast, Bayesian methods have been proposed for obviating this bias and have revealed themselves as superior for accurately identifying ‘ sites with promise’ in considerable literature. However, much of the research was conducted on real crash data ( where hazardous sites are not truly known) and comparisons across various scenarios have not been conducted. In addition, real crash data specific to Arizona regions have not been used to examine the performance of Bayesian analyses. By designing a special experiment which simulates various scenarios and using the real crash data from Arizona, this research effort evaluates and compares alternative HSID methods. All the results show the consistent superiority of Bayesian techniques for accurately identifying ‘ sites with promise.’ This lays the solid foundation for the future incorporation of Bayesian approaches into the current ALGSP model. Moreover, safety performance functions of various classifications of road sections within Arizona are also provided in this report to facilitate the integration procedure. This report is divided into five primary sections. In the second section of this report, Literature Review of HSID Methods, the historical and conceptual development of HSID 4 procedures is reviewed chronologically, and for the convenience of understanding the more complicated computation procedures, the detailed description about two types of Bayesian techniques is provided. In the third section, an experimental approach is taken to evaluate the performance of simple ranking, classical confidence intervals, and the EB techniques in terms of percent of false negatives and positives. Several practical empirical crash distributions from the state of Arizona are selected to represent a realistic range of ‘ base’ crash data and several degrees of crash heterogeneity are examined in the simulation. The results demonstrate that the EB methods in general outperform the other two relatively conventional methods, especially in the low heterogeneity situations. In addition, the effect of crash history duration employed in the three HSID methods is also explored in this experiment. The moving average method is used to smooth the trend of the various duration data and to find the “ knee” of the curve. Using 3 years of crash history data results in significant improvements in error rates for all three methods, and 3 through 6 years make up almost 90% of all the optimum duration. The major focus of the fourth section is on developing the safety performances of road sections. Since design criteria and level of service vary according to the function of the highway facility, the safety performance function is created for each of nine types of road sections within Arizona. The data for modeling includes accident number, Annual Average Daily Traffic ( AADT), and road section length. The graph showing the relationship among variables, the model form, and measures of goodness of fit are provided as well. It is expected that the input of the alternate SPFs would facilitate the procedure of incorporation of Bayesian techniques into the future ALGSP. The fifth section contains a comprehensive comparison of identification performances of the EB, accident reduction potential, accident frequency, and the accident rate methods using crash data from Arizona and the SPFs developed in the previous section. Five evaluation tests including site consistency test, the method consistency test, total ranking differences test, false identification test, and false/ true Poisson mean differences test are conducted. Both top 10% and top 5% locations ( in terms of accident frequency) are considered as hot spots. The results across the nine types of road sections show the consistent advantage associated with the EB method, and disadvantage of the accident rate method while conducting HSID. The final section provides recommended software changes to improve its ability to select truly hazardous locations from road network. The information of traffic volume is proposed to be incorporated in the software. As one of the factors significantly affecting road safety, it should be included in the safety performance function, which is the basis for conducting the EB analysis. Both the experimental design results based on the simulated data and the results of the evaluation tests based on Arizona crash data support the incorporation of Bayesian technique in the software. The accident reduction potential method is also recommended to be included as an additional weighting method. Finally, the recommendation of length of crash analysis period is provided. 5 CHAPTER II  LITERATURE REVIEW OF HSID METHODS Identifying ‘ sites with promise,’ also known as black spots, hot spots, or high risk locations, has received considerable attention in the literature. This is not surprising, since there is public and professional pressure to allocate safety investment resources efficiently across the transportation system and to invest in sites that will yield safety benefits for relatively modest cost. In addition, US federal legislation requires the practice of remediating high risk locations. It is intended that this identification stage act as an effective sieve that allows sites that do not require remedial action to pass through, while retaining sites that require remediation. This is difficult to accomplish, however, because an individual sites’ safety performance ( i. e. number of crashes) varies from year to year as a result of natural variation— causing two potential errors— false positives and false negatives. False positives are sites identified as needing remediation when in fact they are safe, while false negatives are sites identified as being safe when in fact they require remediation. The following literature review comprehensively examines hot spot identification methods. It is intended to support ongoing work for the Arizona Department of Transportation aimed at improving the current ALGSP Model. It is the first of several steps toward ultimately improving the software that enables jurisdictions in the state of Arizona to identify sites for potential improvement, such as road segments, intersections, ramps, etc. This literature review is divided into two sections: the historical and conceptual development of hot spot identification methods, and a detailed description of Bayesian techniques, the current state of the art. HOT SPOT IDENTIFICATION PROBLEM BACKGROUND Due to the significant importance of identifying sites with promise, a large number of techniques have been employed to improve the detection accuracy. The historical and conceptual development of such procedures is reviewed chronologically in this section to help familiarize you with the hot spot identification problem background. The following notation will be useful in the discussions that follow: X = observed accident count for a road section/ site and period; λ = expected accident count ( E{ X}) for the road section/ site and period; E{ λ} = mean of λ’ s for similar road sections/ sites; D = length of the road section; Q = number of vehicles passing road section/ site during period to which X pertains; R = observed accident rate ( e. g., crashes/ vehicle kilometer or crashes/ million entering vehicles); REB = accident rate estimated by the EB method; R = average value of R for similar road sections and sites; UCLX = upper control limit for observed accident counts ( X); UCLR = upper control limit for observed accident rate ( R); t = number of years of accident data to be analyzed; α, β = parameters. 6 Perhaps the simplest way to identify sites with promise is by simply ranking them in descending order of their accident frequencies and/ or accident rates. Although this method has the benefit of straightforwardness, the efficiency of identifying truly high risk sites leaves considerable room for improvement. To overcome this deficiency, a substantial body of research has been devoted to providing more efficient and justifiable site identification techniques. For example, Norden et al. ( 1956) proposed a method to analyze accident data for highway sections based on statistical quality control techniques. Using an approximation of the Poisson distribution for crash counts, and 0.5 percent probability, they developed the equations for UCLX and UCLR used to identify critical thresholds. When X exceeds UCLX ( or R exceeds UCLR), a site was identified as deviant with regard to safety. This approach drew much attention at that time, and some similar methods ( with relatively minor differences) based on this procedure were proposed in subsequent years. Researchers then began to ponder the issues of how many years ( t) of accident data are necessary to conduct a defensible analysis. By finding that a 13 year average could be adequately estimated from 3 years of accident counts, May ( 1964) first provided the conclusion, “ There is little to be gained by using a longer study period than three years.” It is reasonable to use the current data instead of using old data that no longer reflect a current situation. However, considering that a sensible choice of t must depend on the magnitude of the average that is being estimated and on some knowledge of what makes past accident counts obsolete, this influential practice seems somewhat arbitrary. Crash severity became the next issue of importance regarding HSID methods. Common sense suggested that a site with more severe crashes ( all else being equal) should receive higher priority in remediation efforts. The safety index was first introduced by Tamburri and Smith ( 1970) and later incorporated into the practice of HSID. In essence, they said each road type ( as examples, rural two lane roads, urban freeways, etc.) had a characteristic mix ( distribution) of accident severities among fatal, injury, and property damage only ( PDO) crashes. On the basis of the accident severity and road type, accident costs were used to weight crashes. They also suggested that all crashes be expressed in terms of PDO equivalent accidents ( for example a certain injury crash may be equivalent to 5 PDO crashes). Deacon et al. ( 1975) considered the difference between identifying hot spots and sections and explored how long analysis sections should be conducted. They also presented an analysis of a sensible t, in comparison to that provided earlier by May ( 1964). Their conclusions suggested that a balance is sought between reliability of the crash data ( longer being more reliable) and the need to detect adverse change quickly ( shorter being more able to reveal adverse safety changes), and that a single t should be determined on this basis. They also recommend 9.5 as the weight for fatal and A injury crashes, and 3.5 for B and C crashes when using a safety index. Laughland et al. ( 1975) first described the ranking procedure using both the number and rate methods. The method proposed identifies hazardous locations when X exceeds some predetermined value UCLX and R exceeds UCLR. The claimed advantage of this 7 procedure is that it excludes so called hazardous locations identified as a result of R being as large as a result of low exposure. Renshaw et al. ( 1980) argued that questions about the length of sections, duration of accident history, amount of traffic, and detection accuracy must all be considered jointly and that reliable detection is often not practical. Hakkert and Mahalel ( 1978) first proposed that blackspots should be defined as those sites whose accident frequency was significantly higher than the expected at some prescribed level of significance. This point was then favored by McGuigan ( 1981; 1982), who put forward the concept of potential accident reduction ( PAR), such as the difference between the observed accident counts and the expected number of similar sites. He stated, with some justification, that PAR should be a better basis on which to rank sites than annual accident totals ( AAT), which tends to identify high flow sites which do not necessarily have the potential for accident reduction. This method is similar to the quality control method to some extent. The former represents the magnitude of the problem, that is, how many accidents can be avoided given the normal situation, and the latter represents how large the probability that the site is abnormal by using the given level of confidence. Estimating E{ λ} using a multivariate model was suggested by Mahal et al. ( 1982). By using E{ λ} as the mean, they deemed a location as deviant if the probability of observing X or more accidents was smaller than some predetermined value. Flak et al. ( 1982) recommended that crashes be categorized according to specific road conditions ( weather, pavement material, etc.) and by accident type ( turning, side swipe, rear end, etc.), and so forth. This concept differed from previous ones in that it seeks to identify deviant locations with regards to very specific conditions. Although appealing from an experimental design point of view, this concept is likely to produce sample sizes too small to detect significant differences for all but the largest of databases. Hauer and Persaud ( 1984) proposed a concept of sieve efficiency in which the number of sites to be inspected and the expected numbers of correct positives, false positives, and false negatives serve as measures of performance. They examined the performance of various HSID techniques on the basis of performance measures that are easy to understand. They argued that the quality control approach to HSID does not give the analyst clues about how well or how poorly the sieve is working. They also suggested that numerical methods are needed to free the procedure from reliance on the assumption that λ obeys the gamma distribution. Regression to the mean ( RTM) bias associated with typical methods of site selection has been identified in the literature and some research dealing with RTM has been developed. Persaud and Hauer ( 1984) compared and evaluated the performance of an EB and a nonparametric method for debiasing before and after analyses. The results of several data sets show that the Bayesian methods in most cases yield better estimates than the other one. Wright et al. ( 1988) made a survey on the previous research dealing with the RTM effect. He examined the validity of assumptions associated with those methods, evaluated 8 the robustness of the results based on the assumptions, and provided some suggestions for improving the quality of the results. Mak et al. ( 1985) developed a procedure to conduct an automated analysis of hazardous locations. The procedure consists of ( a) a mainframe computer program to identify and rank black spots, ( b) a microcomputer program to identify factors overrepresented in accident occurrence at these locations relative to the average for similar highways in the area, ( c) a multidisciplinary approach to identify accident causative factors and to devise appropriate remedial measures, and ( d) evaluation of remedial measures actually implemented. The procedure is based on accident rate ( number of injury and fatal accidents per 100 million vehicle miles of travel). Higle and Witkowski ( 1988) developed a Bayesian model for HSID using accident rate data rather than accident counts, which are shown to have identification criteria analogous to those used in the classical identification scheme. The comparisons between the Bayesian analysis and classical statistical analyses suggest that there is an appreciable difference among the various identification techniques in terms of HSID performance, and that some classically based statistical techniques are prone to err in the direction of excess false negatives. Based on data from 145 intersections in Metropolitan Toronto, Hauer et al. ( 1988) provided Bayesian models to estimate the safety of signalized intersections on the basis of information about its traffic flow and accident history. For each of the 15 accident patterns ( categorized by the movement of the vehicles), an equation is given to estimate the expected number of accidents and the variance using the relevant traffic flows. When data about past accidents are available, estimates based on traffic flow are revised with a simple equation. By applying these Bayesian models, one can estimate safety when both flows and accident history are given and, on this basis, judge whether an intersection is unusually hazardous. This method of estimation is also recommended for accident warrants in the Manual on Uniform Traffic Control Devices. Through a simulation experiment, Higle and Hecht ( 1989) evaluated and compared various techniques for the identification of hazardous locations, based on classically and Bayesian statistical analyses respectively, in terms of their ability to identify hazardous locations correctly. The results reveal that the two classically based techniques suffer from some shortcomings, and the Bayesian method based on accident rate exhibits a tendency to perform well, producing lower numbers of both false negative and false positive errors. By 1990 it was generally becoming accepted among academic circles that the Empirical Bayes approach to unsafety estimation was superior to previous HSID methods. The Bayesian approach generally makes use of two kinds of clues of an entity: its traits ( such as traffic, geometry, age, or gender) and its historical crash record. It requires information about the mean and the variance of the unsafety in a “ reference population” of similar entities. Obviously, this method suffers from several shortcomings: First, a very large reference population is required; second, the choice of reference population is to some 9 extent arbitrary; and third, entities in the reference population usually cannot match the traits of the entity for which the unsafety is estimated. Hauer ( 1992) alleviated these shortcomings by offering the multivariate regression method for estimating the mean and the variance of unsafety in reference population. By describing its logical foundations and illustrating some numerical examples, Hauer shows how the multivariate method makes the Empirical Bayes method to unsafety estimation applicable to a wider range of circumstances and yields better estimates of unsafety than previous methods. Persaud ( 1991) presented a method for estimating the underlying accident potential of Ontario road sections using accident and road related data. The comparative results indicate that the EB estimates are superior to those based on the accident count or the regression predictions by themselves, particularly for sections that might be of interest in a program to identify and treat unsafe road locations. Brown et al. ( 1992) presented the convergence of HSID by police reported data, by highway inventory, and by community reporting. Weighted injury frequencies per unit distance and weighted injury rates per 100 million vehicle km are presented for all sites and for all numbered highway segments. Priority sites are then ranked considering injury frequencies and injury rates. Hauer et al. ( 1993) explored the probabilistic properties of the process of identifying entities, such as drivers or intersections, for some form of remedial action when they experience N crashes within D units of time, the N D “ trigger.” On the basis of the probability distribution of the “ time to trigger,” it is concluded that in road safety the problem of false positives is severe, and therefore entities identified on the basis of accident or conviction counts should be subjected to further safety diagnosis. Moreover, they found that the longer the N D trigger is applied to a population, the less useful it becomes. Tarko et al. ( 1996) presented a methodology of area wide safety analyses to detect those areas ( states, counties, townships, etc.) that should be considered for safety treatment. The method is implemented for Indiana at the county level and uses regression models to estimate the normal number of crashes in individual counties. The counties are priority ranked using the combined criterion including both the above norm number of cashes and the confidence level. This combined criterion helps select counties where the excessive number of crashes is not caused solely by the randomness of the process. This application differs from previous applications in that the HSID was conducted at the planning or county level, instead of at the intersection or road segment level. Stokes and Mutabazi ( 1996) traced the evolution of the formulas used in the rate quality control method from their origin in the late 1950s to their present form, and they also presented and discussed the derivation of the basic formulas used in the method. It is suggested that, contrary to assertions in the literature, the accuracy of the equations used in the rate quality method is not proved by eliminating the normal approximation correction factor from the original equations and the need for a correction factor is particularly apparent at higher probability levels. 10 On the basis of the review of previous procedures for black spots identification, Hauer ( 1996) made an attempt to create some order in the thinking and made some suggestions to improve identification. In comparison with the stage of identification, he pointed out that the stage of site safety diagnosis and remediation is somewhat underdeveloped. Persaud et al. ( 1999) put forward a similar concept to potential accident reduction, such as potential for safety improvement ( PSI). For the sake of correcting for the RTM bias, he replaced the observed accident number with the long term mean of accident counts in the PAR previously stated. Davis and Yang ( 2001) made use of Hierarchical Bayes methods combined with an induced exposure model to identify intersections where the crash risk for a given driver subgroup is relatively higher than that for some other groups. They carried out the necessary computations using Gibbs sampling, producing point and interval estimates of relative crash risk for the specified driver group at each site in a sample. The methods can also be extended to identify hazardous locations for a specified accident type. This method of HSID requires sophisticated modeling skill and software, and is currently beyond the level of most DOT staff expertise. Kononov et al. ( 2002) presented the direct diagnostics method to conduct HSID and develop appropriate countermeasures. The underlying principle is that a site should be identified for further examination if there is overrepresentation of specific accidents relative to the similar sites. With empirical Bayes gradually becoming the standard and staple of professional practice, Hauer et al. ( 2002) presented a tutorial on safety estimation using the EB method. This tutorial contains comprehensive illustration of using the EB procedures and can be viewed as the bridge between theory and practice for the EB application. The above mentioned research represents only a small portion of the extensive past and current HSID research. In summary, the large body of techniques for HSID generally includes simple ranking of accident frequencies and/ or accident rates, rate quality control methods, site identification using the notion of a safety index, number and rate methods, accident pattern recognition method, and various applications of Bayesian approaches on both crash frequencies and crash rates. In comparison with other techniques, Bayesian techniques have been shown to offer improved ability to identify black spots by accounting for both history and expected crashes for similar sites, which can obviate the “ regression to the mean” problem that simpler methods fail to correct. This literature review summary clearly indicates that opportunities exist for possible enhancements leading to improved HSID within the recently released ALGSP model, which currently performs a simple ranking based on accident frequencies. However, as one might expect, the incorporation of Bayesian methods will increase the data collection burden: additional information about site crash histories and reference populations will need to be collected. The following section is devoted to describing the Bayesian techniques in greater detail. 11 BAYESIAN TECHNIQUES TO IDENTIFY HAZARDOUS LOCATIONS An underlying characteristic of crash occurrence is the random fluctuation from year to year of crash counts under constant and unchanging traffic, weather, and roadside conditions ( which of course in reality does not occur). This characteristic significantly reduces the ability to detect truly hazardous locations in the sense that a crash site may appear to represent a relatively high risk in a given year when in fact the site’s underlying, inherent risk level is average or low ( Hauer, 1997). A site that reveals a high observed risk in one year is on average followed by a crash count in the subsequent year that is closer to the mean— a phenomenon known as regression to the mean. However, it was shown in the previous section that Bayesian approaches, by utilizing two kinds of clues of an entity ( its traits and its historical accident record), involve corrections for RTM and can improve significantly the efficiency of site identification. Incorporation of such techniques into the ALGSP model will offer improvements in the performance of HSID. Unfortunately, in contrast to other approaches, which are relatively straightforward, the Bayesian techniques require a greater quantity of information associated with locations inspected and also involve relatively more complicated computations – albeit trivial for a computer. Noting that the large portion of this research is to test the performances of various HSID methods ( including the somewhat typical methods and Bayesian techniques), this section describes in detail the analytical aspects of various Bayesian techniques generally accepted as ‘ state of the art.’ The research reviews are divided into two groups: Bayesian techniques based on accident frequencies and Bayesian techniques based on accident rates. Bayesian Techniques Based on Accident Frequencies To alleviate the RTM bias associated with other site identification techniques, Hauer et al. ( 1984; 1988; 1992) discussed numerous aspects of HSID to derive what is known as the EB method. EB methods differ technically from Bayes’ methods in that the former relies on empirical data as “ subjective” information while the latter relies on truly subjective information ( e. g. expert opinions, judgment, etc.). The EB method rests on the following logic. Two assumptions are first needed, which can be traced back to those of Morin ( 1967) and Norden et al. ( 1956): Assumption 1: At one given location, accident occurrence obeys the Poisson probability law. That is, P x λ denotes the probability of recording x accidents on a site where their expected number is λ, where P x λ = λxe− λ / x!. ( 1) Assumption 2: The probability distribution of the λ of the population of sites is gamma distributed, where g ( λ) is denoted as the gamma probability density function. Estimation of the long term safety of an entity is obtained through using both kinds of clues, that is, the traits such as gender, age, traffic, or geometry of an entity and the 12 historical accident record of the entity. If the count of crashes ( x) obeys the Poisson probability law and the distribution of the λ’ s in the reference population is approximated by a Gamma probability density function, a good estimator of the λ for a specific entity is: αE{ λ}+ ( 1− α ) x, with α = E{ λ}/[ E{ λ }+ VAR{ λ }]. ( 2) From the above equation, we know estimates of E { λ} and VAR { λ} which pertain to the λ’ s of the reference population are needed. There are two methods to estimate the E { λ} and VAR { λ}. One of them is the method of sample moments, the other is the multivariate regression method. To describe the method of sample moments, let us first consider a reference population of n entities of which n( x) entities have recorded X= 0, 1, 2,… accidents during a specified period. With this notation, the sample mean and the sample variance are, respectively: μ = Σxn( x) / Σn( x) ( 3) s2 = [ Σ( x − μ) 2n( x)]/ Σn( x) ( 4) In the method of sample moments, the estimators of E { λ} and VAR { λ} are equal to μ and s2 μ respectively. The larger is the reference population. These estimates are more accurate. The primary attraction of the method is that its validity rests on a single assumption: that if λi remained constant, the occurrence of accidents would be well described by the Poisson probability law. However, there remain two practical difficulties: ( 1) It is rare that a sufficiently large data set can be found to allow for adequately accurate estimation of E { λ} and VAR { λ}; ( 2) Even with very large data sets, one cannot find adequate reference populations when entities are described by several traits ( e. g. geometric conditions, etc.). In order to obviate these difficulties, Hauer ( 1992) provided the multivariate regression method. With this correction, a multivariate model is fitted to accident data to estimate the E { λ} as a function of independent variables, and the residuals ( i. e., the difference between an accident count on some specific entity that served as “ datum” for model fitting and the estimate E { λ} calculated from the fitted model equation) are viewed as coming from a family of compound Poisson distributions: VAR{ x}= VAR{ λ}+ E{ λ} ( 5) The E { λ} of the reference population is estimated using the model equation; VAR{ x} is estimated using the squared residuals. Therefore, based on equation ( 5), the difference [ squared residual – estimate of E { λ}] can be used to estimate VAR { λ} for the imaginary reference population to which this datum point belongs. As mentioned previously, it is easy to note that the primary difference between the method of sample moments and multivariate regression method is that the estimates of E { λ} and VAR { λ} are obtained using different analytical procedures. The method of sample moments is straightforward, while the latter one yields more precise results. Once the estimates of E { λ} and VAR { λ} are obtained, the expected safety of an entity is obtained using Equation 2. However, the truly hazardous locations cannot be screened 13 based solely on the long term safety associated with each entity, a model of the entire distribution function of λ X is required. On the basis of the assumptions stated previously, the probability that a site selected randomly has x accidents is approximated by the negative binomial ( NB) probability distribution. Thus, the parameters of g ( λ) are estimated using EB logic according to the following sequence of steps: Step 1: The sample mean and variance is computed across sites. The notation n( x) is used to denote the number of sites that had x crashes. The estimated mean and variance are computed using: μ = Σxn( x) / Σn( x) ( 6) s2 = [ Σ( x − μ) 2n( x)]/ Σn( x) ( 7) Step 2: The EB weighting parameters α and β are then obtained using: α = μ /( s2 − μ ) ( 8) β = μ * α ( 9) Step 3: With the two weighting parameters obtained, the parameters of the gamma distribution are obtained such that: g( λ ) = α β λβ − 1e− αλ / Γ( β ) . ( 10) The subpopulation of sites that had x accidents also follows a gamma probability distribution and its gamma probability density function is given by: g( λ x) = ( 1 + α ) β + xλβ + x − 1e−( 1+ α ) λ / Γ( β + x) . ( 11) With the probability density functions defined, the selection of hazardous locations is now straightforward. Suppose that λ* is the “ acceptable” upper limit of accident counts, then a site i is identified as hazardous if the probability that λ exceeds λ* is relatively small. Specifically, if: P( λ> λ* x)> δ ( 12) Where δ is the tolerance level that is contingent upon the choice of safety specialists ( i. e. level of acceptable risk) and takes into account conditions in the local jurisdiction, then site i is identified as a truly hazardous location. Bayesian Techniques Based on Accident Rates In contrast to earlier papers regarding EB techniques, which were concerned with predicting the number of crashes that will occur at a particular location, Higle and Witkowski ( 1988) investigated using Bayesian analysis of crashes for the identification of hazardous locations based on accident rates and not frequencies. It should be noted that use of rates has been strongly discouraged by some researchers, and a growing body of literature discourages the use of rates ( Hauer, 1997). Due to the similar assumptions 14 and procedures, the research can be viewed as a complement to the previous research relying on EB approaches. Using empirical comparisons of performance between Bayesian and classical statistical analyses, Higle et al. found that there is an appreciable difference among the various identification techniques, and that some classically based statistical techniques may be prone to err in the direction of excessive false negatives. Higle and Witkowski divided the Bayesian analysis into two steps. In the first step, crash histories are aggregated across a number of sites to get a gross estimation of the probability distribution of the accident rates across the region. In the second step, the regional distribution and the accident history at a particular site are used to obtain a refined estimation of the probability distribution associated with the accident rate at that particular site. In performing the analysis, Higle and Witkowski made two assumptions that are similar to those made by previous researchers: Assumption 1: At any given location, when the accident rate is known ( i. e., if R R i ~ = , note that i R~ is treated as a random variable), the actual number of accidents follows a Poisson distribution with expected value i R( DQ) . That is: } ( ) R DQ i X i i i i e X P X X R R DQ R DQ ( ) ! ( ) { = ~ = ( ) = − ( 13) Assumption 2: The probability distribution of the regional accident rate, fR( R), is the gamma distribution. That is: ( ) R R f R Rα e β α α β − − Γ ( ) = 1 ( 14) Higle and Witkowski recommended that for each computation, it may be preferable to use the MME ( method of moments estimates) values rather than the MLE ( maximum likelihood estimates) values of α and β. Within the framework of Bayesian analysis, the site specific parameters are: i i α = α + X , i i DQ) ( + = β β . Based on α i and βi, the sitespecific probability density functions were then obtained. The steps to identify the truly hazardous locations are shown as follows: Step 1: Estimate the sample mean and variance of the observed accident rates of the population of locations: Σ= = m i i i DQ X m 1 ( ) μ 1 ( 15) Σ− ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ − − = m i i i DQ X m s 1 2 2 1 ( ) 1 μ ( 16) Step 2: Estimate parameters α and β, where: β = μ / s2 ( 17) α = μ * β ( 18) 15 With the two parameters, ( ) R fR R Rα e β α α β − − Γ ( ) = 1 ( 19) Step 3: Obtain i i i f R X,( DQ). The subpopulation of sites that had X accidents also follows gamma distribution and its gamma probability density function is as follows: R i i i i i i i i f R X DQ Rα e β α α β − − Γ = 1 ( ) ,( ) . ( 20) Where: i i α = α + X ( 21) i i β = β + ( DQ) ( 22) With these probability density functions, the selection of hazardous locations is now straightforward. Suppose that λ* is the “ acceptable” upper limit accident counts, then a site i can be deemed as hazardous if the probability that λ exceeds λ* is relatively significant. Say, if: P( λ > λ* x)> δ , ( 23) Where δ is the tolerance level which is contingent upon the choice of safety specialists and the actual situation of local jurisdiction. Sites above the critical threshold are then identified as truly hazardous locations. To summarize, Bayesian techniques, by accounting for both crash history and expected crashes for similar sites, have been shown to offer improved ability to identify truly hazardous locations. The next section quantifies the differences between Bayesian techniques and other typical approaches. 16 17 CHAPTER III  EXPERIMENT DESIGN FOR EVALUATION OF HSID METHODS AND EXPLORATION OF ACCIDENT HISTORY On the basis of the previous literature review for HSID methods, Bayesian methods revealed themselves as superior for accurately identifying sites with promise. However, much of the research was conducted on real crash data ( where hazardous sites are not truly known) and comparisons across various Bayesian methods have not been conducted. This chapter is focused on examining the performances of the EB and alternative typical methods within various environments and exploring the best duration of accident history, which causes minimum false identifications. The chapter is divided into sections as follows. Section 1, “ Experiment for Evaluating HSID Method Performance,” discusses the steps of an experiment designed to evaluate the performance of HSID methods. Section 2, “ Experiment for Optimizing Duration of Crash History” presents the steps with regard to the optimum duration of before period crash data. Both real data and simulated crash data are utilized in the experiments. The real data were obtained from current ALGSP users in Arizona. Simulated data correspond with a designed experiment that varies such as degree ( or percentage) of difference between “ correctable” and “ average” sites, variability in the data, and different crash distributions. The final section provides the conclusions and recommendations that arise from the two experiments performed to evaluated HSID methods for use in the ALGSP, and translate the analytical results into practical recommendations. EXPERIMENT FOR EVALUATING HSID METHOD PERFORMANCE The main objective of this first experiment is to quantify and assess the predictive performance of various HSID methods, such as the simple ranking method, the method based on classical statistical confidence intervals, and the EB method, in order to identify the best one for inclusion in the ALGSP model. Of course there are many aspects of the simulation experiment that desire careful attention, such as sample sizes, nature of crash data, reliability of tests, etc. Prior to describing the detailed aspects of the experiment, HSID methods are first reviewed. Hot Spot Identification Methods A site ( series of sites, etc.) may experience relatively high numbers of crashes due to: 1) an underlying safety problem; or 2) a random “ up” fluctuation in crash counts during the observation period. Simply observing unusually high crash counts does not indicate which of the two conditions prevails at the site. It is possible to articulate the objective of HSID as follows: The objective of hot spot identification is to identify transportation system locations ( road segments, intersections, interchanges, ramps, etc.) that possess underlying correctable safety problems, and whose effect will be revealed through elevated crash frequencies relative to similar locations. 18 Two aspects of the previous statement are noteworthy. First, it is possible to have truly unsafe sites that do not reveal elevated crash frequencies— these are termed ‘ false negatives.’ It is also possible to have elevated crash frequencies, which do not result from underlying safety problems— these are termed ‘ false positives.’ False positives, if acted upon, lead to investment of public funds with few safety benefits. False negatives lead to missed opportunities for effective safety investments. As one might expect, correct determinations include identifying a safe site as “ safe” and an unsafe site as “ high risk.” When considering the seriousness of errors ( false positives and false negatives) with respect to safety management, one generally concludes that false negatives are the least desirable result, since a jurisdiction will fail to make wise investments and reduce fatalities, injuries ( serious and minor), and property damage crashes. For evaluation purposes, an HSID method is sought that produces the smallest proportion of false negatives and false positives. Hence, the percentages of false negatives, false positives, and overall misidentifications ( false positives plus false negatives) are used to compare the performance of three commonly implemented HSID methods: 1) simple ranking of sites; 2) classically based confidence intervals; and 3) the EB methods. These three methods are now described. The simple ranking method ( denoted SR in experiments) is the most straightforward HSID method. Applying this method, a set of locations ( e. g. all 4 lane signalized intersections in a jurisdiction) is ranked in descending order of crash frequencies ( or counts, X), and then the top sites are identified as high risk locations for further inspections. Typically, resources are invested to improve correctable sites from the top down until allocated funds are expended. This method, for example, is one analysis option available in the current version of the ALGSP model. A second method for HSID is based on classical statistical confidence intervals ( denoted CI in experiments) ( 1975). Location i is identified as unsafe if the observed accident count Xi exceeds the observed average of counts of comparison ( similar) locations, μ, with level of confidence equal to δ, that is, Xi > μ+ KδS, where S is denoted as the standard deviation of the comparison locations, and Kδ is the corresponding critical values. In practice δ is typically 0.90, 0.95, or 0.99, and depends upon the actual situation and considerations such as the number of sites, amount of safety investment resources, etc. These values serve as approximations, since they are borrowed from the normal distribution function and thus have no special meaning in terms of the distribution of true accident counts, which typically follow Poisson or negative binomial distributions. This method is commonly used in the sense that it is inferred from the classical statistics and can be performed conveniently. Critical in the SR and CI methods is the notion of ‘ comparison sites.’ Comparison sites are used to obtain an estimate of ‘ expected crashes’ for similar sites. When sites are ranked using simple ranking, it is assumed that sites that are being ranked with similar geometric and traffic conditions. Geometrics and traffic play a significant role in crash potential and thus must be treated carefully. Often jurisdictions will group to the extent 19 possible ‘ similar’ sites together in the ranking; however, it is often the case that sites with different geometric and traffic conditions ( i. e. exposure) are compared in the ranking method. In the confidence interval method, it is assumed that the group or set of comparison sites are similar to the site being compared. Critical to the outcome of any HSID method is the level of sophistication employed to identify comparison sites. For the EB technique, the former section has given a detailed description. It is noteworthy that only the EB based on accident counts would be used herein. Equation 24 is followed to compute the long term accidents of each site: { } ( 1 ) , i i λ = αE λ + − α x with α = E{ λ }/[ E{ λ }+ VAR{ λ }]. ( 24) The weight parameter α is obtained by using the method of sample moments in which the estimators of E{ λ} and VAR{ λ} are equal to μ and s2 respectively ( μ denotes the sample mean and s2 denotes the sample variance). From the above expressions, it is known that the second of the two clues, crash history, significantly affects the estimate of λ, since longer crash histories tend to be more stable ( in crashes per year) than shorter crash histories. Thus, different historical accident records yield different estimators of E{ λ} and VAR{ λ}, and subsequently different identification error rates ( false positives and false negatives). Similarly, these different identification error rates are also supposed to be obtained under simple ranking and confidence analysis methods when utilizing various historical accident records. Because of its importance, the optimum crash history is examined in an experiment described in chapter 2 of this report. Ground Rules for Simulation Experiment To accomplish the evaluation of HSID methods, a simulation experiment was designed to test a variety of conditions. The simulation experiment consists of the following specific steps: 1) Generate mean crash frequencies from real data. Crash datasets from Arizona ( and users of the ALGSP) which represent a range of in situ crash conditions ( i. e., intersections, road segments, etc.) are first obtained. These data are used to determine various shapes of distributions of crash means ( λ’ s). Gamma distributions are fit to the observed data to reflect heterogeneity in site crash means. These gamma distributed means are meant to reflect TRUTH, that is, the true state of underlying safety at various locations on a transportation network ( note that in practice we do not know TRUTH— and herein lies the power of simulation). The gamma distributed means are denoted true Poisson means ( TPMs), and represent the means of crashes across sites. 2) From TPMs, generate random Poisson samples. Thirty independent random numbers for each simulated site are generated. For each of the 1000 sites, the TPM is used to generate 30 crash counts that represent OBSERVED data for 30 different observation periods, which are assumed to represent years in the analysis. 20 3) Evaluate HSID performance. By knowing the true state of safety for sites ( the TPMs), and having observed data ( the randomly generated Poisson numbers), the performance of HSID methods can be tested. The following steps are used to set up the evaluation: a) SR, CI, and EB are applied in separate simulation runs to rank sites for improvement. These are applied by columns ( a single observation period, which represents what an analyst might see in reality). b) For the Bayesian runs, it is assumed that rows ( data across observation periods for the same site) can also be used to represent the comparison group in order to calculate E( x) and VAR( x). This implies that the analyst has accommodated for covariates and is able to estimate an expected value for a site that accounts for things such as exposure, geometrics, etc. c) For the various hot spot thresholds, false positives, false negatives, and total misidentifications in percent are computed. The percent of false positives will always be larger than the percent of false negatives since the latter represent hazardous sites that get identified as non hazardous, which is much larger candidate pool of sites than hazardous sites. Recall that false positives are safe sites that are identified as hazardous, a relatively small pool of sites. 4) Evaluate effect of length of history. In the SR, CI, and EB methods the analyst must decide how long a history to use for calculations. In this experiment the effect of various accident histories ( 1 year until 10 years of data) on performance are evaluated based on the corresponding identification rate. 5) Make practical recommendations. The results of the previous steps are discussed and translated into practical recommendations for improving the ALGSP software. Various aspects of the simulation experiment previously listed need to be discussed, as the quality and design of the simulated data directly impacts the quality and generalizability of the analysis results. Generating Mean Crash Frequencies from Real Data To support the development of simulated crash data, 6 years ( January 1995 through December 2000) of crash counts from intersections in Apache, Gila, Graham, La Paz, Pima, and Santa Cruz counties in the state of Arizona are used. These data and their corresponding cumulative distributions are shown in Appendix A. Three types of characteristically different underlying cumulative distributions of TPMs were observed in the Arizona crash data: an exponential shape ( denoted E), a linear shape ( denoted L), and an s shape ( denoted S). In addition, two levels of heterogeneity in crash counts were observed: low heterogeneity ( denoted 1) where the range in observed crash counts is less than 20 crashes, and high heterogeneity ( denoted 2) where the range is in excess of 50 crashes. Recall that the empirical distributions will be used to generate TRUTH, or the means of Poisson counts of sites with varying underlying means. In this simulation study these are 21 denoted as TPMs. Since the data represent the true underlying safety of a site, crash counts are Poisson distributed at an individual site, and the statistic is the mean. The cumulative distributions used to represent the TPMs are labeled as E1, E2, L1, L2, S1, and S2, respectively. For example, E2 represents an exponential shaped distribution with high heterogeneity in TPMs. These six data sets were selected from various jurisdictions within Arizona to try to represent the range of underlying characteristics related to true accident count distributions, with the intent of making the results gained from this experiment applicable across a variety of typical situations. As stated previously, the observed data are used to inform the simulation of the TPMs. In this experiment three reasonable assumptions are required to establish the foundation for a successful simulation of crash data: Assumption 1: The empirical cumulative distributions shown in Figures 12 through 17 ( see Appendix A) represent the TPMs of the underlying crash process— thus the true safety of all sites in the collection of sites is known. These data in reality are unknowable, since it is not known a priori which sites are “ hazardous.” Assumption 2: Theoretical distribution of these TPMs of the population of sites follows gamma distribution, and the probability that a site selected randomly has a given number of accidents is approximated by the negative binomial distribution. Assumption 3: The TPMs provide the basis for generating observed crash count data. Thus, for example, the median ranked site in Figure 12 ( E1) that has an underlying Poisson mean of around six crashes ( per observation period) is used to randomly generate a crash outcome, which could be 0, 1, 2, 3, …. etc. in any given observation period. The result of assumptions 1 and 2 is that for each simulated site the underlying TPM ( expected crash count) is known, which is then used to randomly generate the observed crash count. Generation of Random Poisson Samples from TPMs The empirical cumulative TPMs shown in Figures 12~ 17 ( see Appendix A) represent the data required to meet Assumption 1 discussed previously. Using these data, observed crash counts are generated to represent observed data for a given observation period. However, due to the relatively small observed sample sizes ( less than 200 sites in all six datasets) and the corresponding dispersion of crash counts, no sites would be identified as hazardous in some cases when using the three HSID methods stated previously. For example, if the top 1% of sites are identified as high risk ( δ = 0.99), all the sites in the datasets labeled as L1, S1, and L2 would be identified as safe when utilizing the classical confidence interval method and Bayesian method, thus leading to zero false negatives in these scenarios and damaging the regulars of results to some degree. 22 To solve this problem and provide sufficient sample sizes for statistical comparisons, theoretical distributions of TPMs are fitted to the six datasets. Then the sample sizes are enlarged by randomly generating the required number of sites under these gamma distributions ( site specific crash means are gamma distributed whereas within site crashes are Poisson distributed). In this experiment, 1,000 sites are simulated. Fitting specific gamma distributions to a given sequence of data can be implemented through various software packages, such as MINITAB, SAS 8.1 ( 1998), and Arena 7.0 ( Kelton, 2003). Herein the Arena 7.0 is employed. Within the context of Arena, the curve fitting is based on the use of maximum likelihood estimators, and the quality of a curve fit is based primarily on the square error criterion. The fitting of probability density function ( PDF) of a gamma distribution to the observed data is based on the histogram plot of these data. The distribution summary report also presents the expression of given probability density function, the corresponding p value of Chi Square test and square error, etc. Figure 1 shows one example of fitting gamma distribution to the dataset. To show the fitting effect, the corresponding theoretical cumulative distribution function ( CDF) is also plotted in the same graph of empirical CDF ( Figure 2 shows the distribution of dataset E1). The figures show that the gamma distribution fits well to the observed data. The summary of all six fittings is shown in Table 1. 23 Distribution Summary Distribution: Gamma Expression: 3.5 + GAMM ( 13.4, 2.27) Square Error: 0.020052 Chi Square Test Results Number of intervals = 8 Degrees of freedom = 5 Test Statistic = 8.2 Corresponding p value = 0.16 Data Summary Number of Data Points = 94 Min Data Value = 4 Max Data Value = 70 Sample Mean = 33.8 Sample Std Dev = 16.7 Histogram Summary Histogram Range = 3.5 to 70.5 Number of Intervals = 67 Figure 1: Observed and Fitted PDF of E1 Crash Data and Fit Summary Statistics 24 0 10 20 30 40 50 60 70 Accident counts  10 10 30 50 70 90 Cumulative distribution E1 Figure 2: Fitted and Empirical CDF of E1 Table 1: Summary of Gamma Fittings of Six Datasets Data set Fitting Expression Square Error Test Statistic p value E1 0.5+ Gamm( 3.79,1.75) 0.022344 26 < 0.005 E2 1.5+ Gamm( 15.9,1.7) 0.011836 13.4 0.0385 L1 0.5+ Gamm( 4.31,1.71) 0.038173 11.1 0.0119 L2 3.5+ Gamm( 13.4,2.27) 0.020052 8.2 0.16 S1 0.5+ Gamm( 2,4.3) 0.014903 33.5 < 0.005 S2 0.5+ Gamm( 9.06,2.57) 0.013211 23 < 0.005 Note: E— Exponential shape; L— Linear shape; S— Sigmoidal shape; 1— Low heterogeneity of crash counts; 2— High heterogeneity of crash counts. After TPMs have been simulated ( the crash means across sites which reflect the true and typically unknown state of nature), the next step is to generate observed crash counts for the sites. These counts will represent the observed crash counts across observation periods for a particular site ( where its true safety is known). It is well established that crash counts fluctuate across observation periods as a result of the randomness inherent in the underlying crash process and is well approximated by a Poisson process. In other words, the count of crashes changes from one period to another even if driver demography, traffic flow, road, weather, and the like remained unchanged. To represent this natural fluctuation, a random sample of 30 observation periods ( which could be months, years, etc.) associated with each location is randomly generated using a random number generator and underlying TPMs defined by the fitted distributions in Figure 12~ 17 ( see Appendix A). A small snapshot of the data obtained from this simulation is shown in Table 2. 25 Table 2: Simulated Data for 30 Sites and 16 Observation Periods SITE TPM PERIOD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 4 5 1 4 1 2 7 4 3 4 4 2 1 1 5 5 6 2 8 5 9 8 6 8 4 9 9 5 4 8 8 9 9 13 8 3 8 12 7 10 5 5 7 11 8 8 8 11 6 6 7 8 7 4 9 12 9 10 16 8 12 7 9 11 8 10 8 16 11 6 8 5 9 10 13 12 8 9 6 12 10 9 9 4 5 12 11 11 4 6 10 15 4 6 10 4 17 6 11 12 7 10 10 15 6 17 10 7 10 8 5 10 8 13 10 11 7 12 10 8 9 9 6 9 10 8 10 7 8 11 14 10 12 7 11 12 11 12 13 7 7 7 11 9 12 13 17 8 14 12 10 16 10 7 15 17 9 11 15 14 15 10 12 10 9 13 13 6 12 18 11 15 12 12 12 13 12 13 9 11 12 9 10 10 14 15 12 7 14 6 12 11 19 9 17 10 18 12 12 11 14 14 9 16 7 15 3 10 13 9 11 7 2 12 14 13 12 15 15 16 13 8 12 13 16 16 12 15 11 15 12 14 9 14 12 14 10 10 11 15 15 12 13 14 15 13 14 11 13 17 19 15 12 11 12 12 8 12 13 12 7 9 11 9 9 9 12 4 9 16 13 8 17 13 8 12 11 17 15 16 13 12 15 16 12 14 19 17 13 9 13 16 16 11 8 6 18 12 8 7 11 12 12 17 15 18 13 10 18 15 16 10 15 10 16 17 10 6 8 8 10 13 6 19 13 14 13 17 11 6 11 18 15 11 17 16 19 13 11 15 14 20 13 7 4 13 11 12 10 17 19 6 7 12 15 7 15 14 12 21 14 16 17 12 18 13 17 12 11 7 13 15 10 18 14 17 19 22 15 15 18 21 15 15 14 13 21 14 13 20 13 12 19 16 16 23 15 11 13 16 12 12 16 10 16 19 20 21 16 13 19 11 16 24 15 9 16 16 11 14 12 15 18 11 16 14 29 11 12 19 14 25 16 18 12 15 9 19 18 14 11 19 15 18 14 18 18 14 20 26 17 22 10 19 12 15 19 18 10 11 17 20 16 15 11 10 15 27 18 14 21 9 19 16 17 19 18 18 14 16 28 19 18 19 10 28 18 8 20 19 5 16 18 20 28 16 17 19 14 15 14 18 15 29 19 26 19 18 21 17 29 12 22 25 15 23 11 19 20 15 24 30 20 22 18 23 21 23 19 26 22 16 20 19 15 14 19 13 15 Note: SITE= number of site, e. g. intersection, road segment, etc.; TPM= true underlying safety of site or Poisson mean; SIMULATED DATA= observed crash count in observation period; Shaded cells represent ‘ truly hazardous’ locations ( sites 19 and 20). Table 2 shows 16 simulated observations periods for 30 sites with TPMs given in the second column from the left. For example, the two sites with 19 or more crashes per observation period may be identified a priori as hazardous since the TPMs reflect the true underlying state of nature. The two sites in the shaded cells are hot spots whereas the 18 sites above the shaded area are ‘ safe.’ In any given observation period such as observation period 5, the observed number of truly hazardous sites that recorded 19 or more crashes was two sites out of 20, where one was a truly hazardous site ( site 20) and 26 one was not ( site 16, a false positive). In observation period 5 there was also a false negative, since truly hazardous site 19 revealed only 17 crashes. So, by simulating large numbers of observation periods ( 30) characterized by different TPM cumulative distribution shapes, a large number of sites ( 1000) for each of the six observed crash distributions, the number of false negatives and positives ( the sum total of the two is called false identifications) can be counted as a consequence of the three different HSID methods described previously. Performance Evaluation Results for HSID Methods Given knowledge of three HSID methods, the ground rules for the simulation experiment, and an explanation of how data were simulated, the three HSID methods were applied to the simulated data to evaluate their relative effectiveness at identifying hot spots. Establishing fair comparisons among the different HSID methods is paramount. In order to objectively compare the performances of the HSID methods described previously, equivalent evaluation criteria must be used. One consideration in this regard is the use of δ, or cutoff level used to establish hazardous locations. Three values of δ are employed in the evaluations, 0.90, 0.95, and 0.99 corresponding to the top 10%, 5%, and 1% of all sites respectively. In practice, this corresponds with the amount of resources available for remediation and the number of similar sites being compared. For example, a local government wanting to remediate hot spot signalized intersections ( where 75 such intersections exist) might fix 7 intersections, or 10% ( δ = 0.90). All parameters of the simulation experiment have now been described. They include shapes of the TPMs ( E, S, and L), levels of heterogeneity in the TPMs ( 1 and 2), and levels of δ ( 0.90, 0.95, and 0.99). Three HSID methods are assessed, SR, CI, and EB. Evaluation criteria include percent of false positives ( FP), percent of false negatives ( FN), and sum total percent of FP and FN, called false identifications ( FI). For all of the simulations, samples sizes were 1,000 for TPMs and 30 for observation periods. To conduct the simulation experiment with these parameters, the following steps were undertaken: 1. All the TPM cumulative distributions are divided into truly hazardous locations and non hazardous locations, using thresholds of 0.90, 0.95, and 0.99 to represent different data separation thresholds. This step results in three “ critical” crash count threshold values, CC0.90, CC0.95, and CC0.99 for each combination of cumulative TPM shape and heterogeneity level. These values represent differentiation values to distinguish between known truly hazardous locations and safe locations. 2. The three different HSID methods are used to identify hot spots using the simulated data. Specifically, the SR method simply ranks observed frequencies as shown in Table 2, the CI method uses the entire sample mean and standard deviation to determine confidence intervals for ranking, and the EB method uses a weighted 27 average of crash history and observed frequency using Gamma distribution parameters to rank sites. 3. Simulated crash data are then compared to the values CC0.90, CC0.95, and CC0.99. For the truly hazardous sites, if the randomly generated crash counts are lower than the values CC0.90, CC0.95, and CC0.99, then FNs are produced. Truly hazardous sites generated observed crash counts lower than the critical crash count values. Similarly, for the collection of non hazardous locations, when the simulated data are larger than the values CC0.90, CC0.95, and CC0.99, FPs are generated. FPs and FNs are simply counted for each simulation run. Similarly, the number of FIs is the sum of the number of false negatives and positives. 4. To make the three performance metrics comparable across simulations, the percentage of FNs, FPs, and FIs are calculated. Because the FNs are the truly hazardous locations that are mistook as “ safe” sites, the percentage is simply the number of simulated FNs divided by the simulated truly safe sites; similarly, the percentage of the FPs is the number of FPs divided by the truly hazardous locations. Finally, the percentage of FIs is obtained by dividing the sum of FNs and FPs by the total number of randomly generated data locations. For example, suppose there are 20 sites under inspection with the top five of them are identified as hot spots according to the corresponding information of TPM. Again, the number of simulated data for each site is assumed as 30, thus, the total truly hazardous locations would be 150, and the number of truly safe ones is 4,500. If 45 sites among the 150 truly hazardous locations are wrongly viewed as safe ones, the percent of FN would be 45/ 4,500* 100%= 1%. 5. Finally, the percentage of FPs, FNs, and FIs across simulation conditions are tallied and reported. Tables 3 and 4 summarize the results of the errors ( FNs, FPs, and FIs) produced under the variety of simulation conditions. Table 3 presents the results when heterogeneity of crash counts is relatively low, while Table 4 presents the results when heterogeneity is relatively high. Critical crash count threshold values increase from left to right in both tables. The runs labeled CI, SR, and EB refer to classical confidence interval, simple ranking, and Bayesian methods of HSID respectively. Finally, L, S, and E refer to the underlying characteristics shapes of the cumulative distributions of TPMs: linear, sshaped, and exponential respectively. For low heterogeneity and high heterogeneity simulations, the trends of percent errors with the increasing of value of δ are in conformance with each other, however, the values of percent errors for low heterogeneity are much higher than those for high heterogeneity. The major reasons are likely because the low heterogeneity dataset has relatively small standard deviations when compared with the other datasets. The small range of crash counts in a dataset makes it more difficult to identify hazardous locations. On the contrary, it is easy to identify hot spots when the corresponding crash counts are greatly dispersed, particularly when dispersion is large on the upper most crash count deciles. 28 Another prominent characteristic associated with both tables is that the percentage of false negatives decreases in the same direction as δ for the three kinds of HSID methods. In most cases the percentage of false negatives is substantially reduced using the EB method. The fairly complicated explanation for this is as follows. The threshold value divides the top ‘ outlying’ crash counts from the remainder of the data, either the top 10%, 5%, or 1% of observed counts. By definition these counts are more likely to suffer from regression to the mean in a subsequent observation period than from counts around the TPM. Thus the crash history of the top x% of crash counts act to reduce the effect of the current crash count x when ranking these sites. As a result, sites that suffer less from regression to the mean get ranked higher in the list— sites that ordinarily would have been ranked as false negatives. Conversely, the decrease of the percentage of the false negatives is accompanied by an increase in the percentage of the false positives ( except for δ of 0.95 for L1 and L2, in these two cases, the percent error of FP under the confidence analysis method is the smallest among the three threshold values). It shows that the stricter identification criteria would select less non hazardous sites for remedy, although it may leave the larger number of truly hazardous locations undetected. Surprisingly, the false identifications also go the same direction to the false negatives with the increase of the value of δ. Probably the best explanation for this phenomenon is that the relatively small number of false negatives can lead to more false positives, and then reduce the efficiency of the investment of local governments. In conclusion, the percent of false positives increases with the rising of thresholds, whereas the percent false negatives and false identifications decrease with the rising of thresholds. Results in almost all simulation scenarios share the same trends. 29 Table 3: Percent Errors for Low Heterogeneity in Crash Counts Percent Errors: Low Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 2.49 3.55 2.40 1.54 2.09 1.41 0.63 0.55 0.38 FP 62.76 31.97 E1 21.63 82.47 39.73 26.87 114.32 54.00 37.67 FI 7.17 6.39 4.33 5.31 3.97 2.69 2.46 1.08 0.75 FN 2.21 4.44 2.91 1.39 2.40 1.73 0.15 0.62 0.45 L1 FP 106.14 39.97 26.20 65.24 45.67 32.80 431.62 61.00 45.00 FI 8.75 7.99 5.24 3.62 4.57 3.28 2.10 1.22 0.90 FN 0.54 6.53 5.28 0.21 3.48 2.90 0.00 0.81 0.73 S1 FP 753.44 58.73 47.50 1251.33 66.20 55.13 NA 80.33 72.33 FI 10.03 11.75 9.50 6.46 6.62 5.51 1.91 1.61 1.45 Note: 1. FN— False negatives; FP— False Positives; FI— False Identifications. 2. In the table, the reason that some FPs can exceed 100% is due to non normality of the distribution and setting of threshold, and in these cases, the CI method identifies more hazardous locations than truly exist. For the same reason, the existence of “ NA” in the table is due to zero truly hazardous locations identified by confidence analysis. 3. The shaded cells show the lowest identification error rate. Table 4: Percent Errors for High Heterogeneity in Crash Counts Percent Errors: High Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 1.78 2.09 1.13 1.33 1.33 0.86 0.39 0.26 0.17 E1 FP 24.37 18.77 10.13 32.56 25.33 16.40 57.07 26.00 16.67 FI 4.13 3.75 2.03 3.34 2.53 1.64 1.54 0.52 0.33 FN 1.89 2.55 1.57 1.50 1.43 0.91 0.44 0.37 0.23 L1 FP 36.33 22.93 14.13 32.20 27.20 17.33 45.22 36.67 22.67 FI 5.14 4.59 2.83 3.40 2.72 1.73 1.29 0.73 0.45 FN 2.16 2.73 1.74 1.17 1.31 0.71 0.47 0.26 0.12 S1 FP 34.80 24.53 15.67 41.08 24.87 13.47 38.37 25.33 12.33 FI 5.16 4.91 3.13 3.31 2.49 1.35 1.32 0.51 0.25 Note: 1. FN— False negatives; FP— False Positives; FI— False Identifications. 2. The shaded cells show the lowest identification error rate. 30 There is also some difference among the percent errors resulting from the three identification methods. Comparing to the other two traditionally methods, the Bayesian technique yields fewer false negatives in most cases in both the tables. That is, the Bayesian technique is more efficient in flagging the sites that require further analysis. Unfortunately, this higher efficiency is at the cost of the substantial number of false positives generated, which reduce the efficiency of the investment of local governments. Only in the case of budgetary constraints may the false positives not result in the unneeded repairs of the locations that are not truly hazardous. As for the confidence interval method and the simple ranking method, there is no big difference between them. Both methods generally generate higher identification error rate than does Bayesian, indicating the relatively worse performance in identifying hazardous locations. EXPERIMENT FOR OPTIMIZING DURATION OF CRASH HISTORY May ( 1964) first discussed the issue that how many years of accident data should be analyzed when determining the accident prone locations. He explored the difference between sorts of average accident counts with “ t” increasing until 13 years. The result has shown that the difference diminishes as “ t” increases as well as the marginal benefit of increasing “ t” declines. The “ knee” of the curve is said to occur at t= 3 years. Based on that information, May then came to the conclusion that “ there is little to be gained by using a longer study period than three years.” In this experiment, a different logic is employed to explore the best study duration for accident data analysis. Instead of using the simple accident counts in the method presented by May, this experiment will utilize the identification error rate as an indicator, or the identification error rates associated with various “ t” years compared to obtain the optimum study period. When conducting history analysis, the three identification methods are also employer, and the corresponding processes remain the same. The only difference lies in how to use the different periods of data. To show the logic clearly, another small snapshot is used again ( Table 5). First, the ith column of data is assumed to represent the ith current year accident data. For example, for site 9, the first four data represent the accident counts during the four current years, and the rest data in the first four columns can be viewed as the accident counts associated with other similar sites during the same period. Let’s consider conducting Bayesian analysis. It is known that for a given t year period, Equation 24 is used for each site to compute the corresponding expected accident counts. However, since the TPM represent the long term number of accidents per year, thus for the t year period, average accident counts per year should be used in this equation. In the end of forth year, the “ x” for site 10 should be 14 accidents ( average of the first 4 data), and E { λ} = 12.88accidents ( row average accident), VAR { λ} = 5.18 accidents2 ( row variance), α= 0.713 thus the expected accident counts associated with site 10 by using the first 4 year data is 13.2 accidents. Obviously, for the 16 different observation periods, we can generate 13 Bayesian expected data associated with site 10 by using the 4 year history record. Based on these Bayesian expected accident counts of various sites, the previously stated process of the Bayesian method can then be employed to compute the percent of false negatives, false positives, and false identification for different “ t” years. The similar history analysis logic can also apply to 31 the other two identification methods. Due to a large amount of iterative computations in this experiment, a special computer code is written to calculate the various identification error rates associated with different period of accident data. Table 5: Snapshot of the Simulated Data Site TPM Simulated data 1 3 7 3 4 3 2 1 2 3 3 4 2 3 3 4 3 2 2 3 3 5 5 2 3 1 1 4 2 2 1 2 2 7 4 5 3 5 5 7 6 5 5 6 4 4 3 4 7 4 2 4 7 2 4 7 4 6 5 9 4 6 7 4 8 10 13 6 9 7 7 3 5 8 8 6 8 6 9 9 12 7 2 3 8 11 7 5 7 7 6 9 15 10 16 12 12 8 8 6 9 12 18 15 9 7 12 8 7 9 9 10 12 8 11 5 8 9 13 9 10 12 7 7 8 5 8 12 12 5 11 18 12 12 16 12 7 10 13 10 9 11 9 13 9 13 13 13 12 10 12 12 13 14 11 7 14 13 7 16 18 7 10 14 16 14 15 11 10 12 15 9 15 15 13 11 11 16 12 11 11 15 17 15 13 15 13 13 16 16 13 11 18 14 9 12 22 18 12 16 18 19 20 11 7 14 12 10 16 18 14 17 9 15 19 18 In theory, as the “ t” increases, the expected accident counts of each site, which is computed based on the simulated data, would converge to its TPM ( the reason is that in the experiment each row of simulated data strictly follow the Poisson distribution) and the corresponding identification error rate would converge to zero. However, in a real situation with “ t” increasing, each site would suffer from more influential factors, and thus the long period of data generally cannot represent the current situation. On the other hand, if the short period of data is used, lots of information would be missing and it is difficult to obtain the true long term accident counts. Consequently, a trade off should be made to find the study period that is short enough to represent the current condition and long enough to obtain the true expected accident counts. In this experiment, various identification rates are plotted versus the different “ t” years. The “ knee” of such a curve is expected as the optimum study period. Considering the data is older than 10 years, it no longer reflects a current situation. In the experiment, the 30 simulated data are averagely divided into 3 groups, that is, the first 10 columns of data belong to group 1, the eleventh to twentieth column of data follows into group 2, the last 10 columns of data belong to group 3. The common characteristic shared by the three groups is assumed to reflect the true relation between identification error rate and “ t” years. For each group, the three common confidence levels, 90%, 95%, and 99% are used for the three analyses. In the diagram of identification error rate vs. “ t” year, there still exists some fluctuations along the curve, although generally the identification error rate decreases while “ t” increases. To quickly determine and eliminate the initial “ warm up” period ( i. e., the period before the knee of the curve), Welch’s moving average method ( Kelton, 2003) is utilized. Through the moving average, this method can further out the statistical 32 fluctuations in observations ( yi) and illustrate clearly the “ warm up” period. As shown in Figure 3, series 1 represents the original Fn rates associated with different “ t.” Due to the existence of two outliers ( the plot of t= 4 and t= 6), it is difficult to obtain the “ knee” of the curve. However, it is easy to know from the series 2 ( the curve of moving averages) that the 5 year range is the best study period. 8 9 10 11 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 9 10 t FN(%) Ser i es1 Ser i es2 Figure 3: Moving Averages vs. Original Statistic The moving average Y ( w) i ( where w, the window size) of random observations is defined as follows: ⎪ ⎪⎩ ⎪ ⎪⎨ ⎧ = − + + + + = + − + + + + + = − − − i w i y y y i w m w w y y y Y w i i i w i i w i , 1, , 2 1 , 1, , ( ) 2 1 1 2 1 L L L L L L ( 25) In this experiment, the window size is selected as 1. RESULTS Similar to the previous experiment, the three HSID methods are also performed in this experiment to explore the optimal duration of accident history. The number of various optimal “ t” across the three confidence levels and three groups is shown in the Tables 6~ 8. For the convenience of viewing, the plots of the frequency of various t periods for the different confidence levels and groups are illustrated in the Figures 4~ 6, and the plots of the cumulative results of all the confidence levels and groups are demonstrated in the Figures 7~ 8. Readers interested in the details of identification error rates associated with various HSID methods, confidence levels, and groups are referred to Appendix B. 33 Table 6: The Number of t year Which is the “ Knee” of the Curve for Group 1 Year 1 2 3 4 5 6 7 8 9 10 90% 1 22 13 6 8 2 2 95% 1 1 23 10 8 7 2 2 99% 2 20 8 10 6 4 3 1 SUM 1 4 65 31 24 21 8 7 1 Note: In this group there are 162 scenarios ( 3 identification methods, 3 kinds of shapes, low and high heterogeneity for crash counts, 3 threshold values for truly hazardous locations, and 3 kinds of false identifications, or FN, FP, FI). Table 7: The Number of t year Which is the “ Knee” of the Curve for Group 2 Year 1 2 3 4 5 6 7 8 9 10 90% 2 0 28 10 4 5 3 1 1 95% 0 3 21 11 7 6 4 2 0 99% 0 1 27 9 5 7 2 3 0 SUM 2 4 76 30 16 18 9 6 1 Note: In this group there are 162 scenarios ( 3 identification methods, 3 kinds of shapes, low and high heterogeneity for crash counts, 3 threshold values for truly hazardous locations, and 3 kinds of false identifications, or FN, FP, FI). Table 8: The Number of t year Which is the “ Knee” of the Curve for Group 3 Year 1 2 3 4 5 6 7 8 9 10 90% 1 22 14 6 5 2 1 1 95% 2 2 20 7 7 8 3 4 1 99% 3 27 11 5 5 4 1 SUM 2 6 69 32 18 18 9 6 2 Note: In this group there are 162 scenarios ( 3 identification methods, 3 kinds of shapes, low and high heterogeneity for crash counts, 3 threshold values for truly hazardous locations, and 3 kinds of false identifications, or FN, FP, FI). 34 0 20 40 60 80 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 4: The Number of t year Which is the “ Knee” of the Curve for 90% Confidence Level 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 5: The Number of t year Which is the “ Knee” of the Curve for 95% Confidence Level 35 0 20 40 60 80 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 6: The Number of t year Which is the “ Knee” of the Curve for 99% Confidence Level 0 50 100 150 200 250 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 7: The Number of t year Which is the “ Knee” of the Curve for All Confidence Levels 36 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 t year cumulative percent Figure 8: The Cumulative Percent Distribution of Various t years In terms of Figures 7 and 8, it is known that across all the simulation scenarios, a 3 year crash history represented the largest portion of “ best” study period of crash history, and 3 through 6 years make up almost 90% of all the optimum t years. Hence, as the trade off between the long and short history record, if there is no significant physical change in the location under securitization and the long history record can be obtained, it is suggested that the most recent 6 years of crash record is sufficient to capture the majority of the beneficial effect of crash history. In contrast, 3 years of crash history data represents the ‘ shortest’ period of time that should be used and which achieves a significant benefit of crash history ( under most general conditions). Crash histories of 1 and 2 years provide relatively little benefit in the methods and under the range of conditions assessed. To illustrate the improvement in identification performance results from using 3 year history data, Tables 9 and 10 are provided ( in contrast to Tables 3 and 4). The differences lie in that Tables 3 and 4 use 1 year of crash data and the percent of identification rates are computed based on the last 30 years of data, whereas Tables 9 and 10 use 3 year data and the corresponding percent of identification rates are calculated on the basis of the current 10 years of data. 37 Table 9: Percent Errors for Low Heterogeneity in Crash Counts ( 3 Years Data) Percent Errors: Low Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 2.02 2.32 1.53 1.36 1.34 0.82 0.89 0.40 0.25 FP 28.06 20.88 E 13.75 38.60 25.50 15.50 48.56 40.00 25.00 FI 4.68 4.18 2.75 3.69 2.55 1.55 2.13 0.80 0.50 FN 2.56 2.75 2.13 1.69 1.72 1.25 0.47 0.51 0.40 L FP 33.16 24.75 19.13 50.00 32.75 23.75 91.07 50.00 40.00 FI 5.56 4.95 3.83 4.33 3.28 2.54 0.14 0.67 0.53 FN 1.10 4.88 4.33 0.68 2.88 2.54 0.14 0.67 0.53 S FP 228.21 43.88 39.00 239.38 54.75 48.25 362.16 66.25 52.50 FI 9.05 8.78 7.80 5.45 5.48 4.83 1.81 1.33 1.05 Note: 1. FN— False Negatives; FP— False Positives; FI— False Identifications; CI— Confidence Interval; SR — Simple Ranking; EB— Empirical Bayesian; E— Exponential Shape; L— Linear Shape; S— Sigmoidal Shape. 2. In the table, the reason that some FPs can exceed 100% is due to non normality of the distribution and setting of threshold, and in these cases, the CI method identifies more hazardous locations than truly exist. For the same reason, the existing of “ NA” in the table is due to zero truly hazardous locations identified by confidence analysis. 3. The shaded cells show the lowest identification error rate. Table 10: Percent Errors for High Heterogeneity in Crash Counts ( 3 Years Data) Percent Errors: High Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 1.08 1.28 0.67 0.96 0.95 0.71 0.24 0.14 0.10 E FP 13.96 11.50 6.00 15.32 18.00 13.50 34.66 13.75 10.00 FI 2.51 2.30 1.20 1.98 1.80 1.35 1.00 0.28 0.20 FN 1.72 1.63 1.36 1.19 0.96 0.87 0.41 0.21 0.20 L FP 14.37 14.63 12.25 15.07 18.25 16.50 20.11 21.25 18.25 FI 3.08 2.93 2.45 2.14 1.83 1.65 0.86 0.43 0.38 FN 2.10 2.04 1.65 0.70 0.66 0.55 0.40 0.15 0.10 S FP 18.01 18.38 14.88 20.83 12.50 10.50 21.03 15.00 10.00 FI 3.73 3.68 2.98 1.85 1.25 1.05 0.90 0.30 0.20 Note: 1. FN— False Negatives; FP— False Positives; FI— False Identifications; CI— Confidence Interval; SR — Simple Ranking; EB— Empirical Bayesian; E— Exponential Shape; L— Linear Shape; S— Sigmoidal Shape. 2. The shaded cells show the lowest identification error rate. 38 By comparing these tables, it is known that using 3 years of crash history data results in significant improvements in error rates for all three methods, CI, SR,
Click tabs to swap between content that is broken into logical sections.
Rating  
TITLE  High Risk Crash Analysis 
CREATOR  Washington, Simon, Cheng, Wen 
SUBJECT  Traffic accidentsStatistical methods; Traffic accidentsMathematical models 
Browse Topic 
Transportation 
DESCRIPTION  156 pages (PDF version). File size 1,027 KB 
Language  English 
Publisher  Arizona Dept. of Transportation 
TYPE  Text 
Material Collection 
State Documents 
RIGHTS MANAGEMENT  Copyright to this resource is held by the creating agency and is provided here for educational purposes only. It may not be downloaded, reproduced or distributed in any format wihtout written permission of the creating agency. Any attempt to circumvent the access controls placed on this file is a violation of United States and international copyright laws, and is subject to criminal prosecution. 
DATE ORIGINAL  200512 
Time Period 
2000s (20002009) 
ORIGINAL FORMAT  Born Digital 
Source Identifier  TRT 1.2:H 43 R 47 
Location  State Documents collection 
DIGITAL IDENTIFIER  AZ558.pdf 
DIGITAL FORMAT  PDF (Portable Document Format) 
REPOSITORY  Arizona State Library, Archives and Public Records—Law and Research Library. 
File Size  1,027 KB 
Full Text  HIGH RISK CRASH ANALYSIS Final Report 558 Prepared by: Simon Washington and Wen Cheng Department of Civil Engineering & Engineering Mechanics University of Arizona Tucson, AZ 85721 December 2005 Prepared for: Arizona Department of Transportation 206 South 17th Avenue Phoenix, Arizona 85007 in cooperation with U. S. Department of Transportation Federal Highway Administration DISCLAIMER The contents of this report reflect the views of the authors who are responsible for the facts and the accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the Arizona Department of Transportation or the Federal Highway Administration. This report does not constitute a standard, specification, or regulation. Trade or manufacturers' names which may appear herein are cited only because they are considered essential to the objectives of the report. The U. S. Government and the State of Arizona do not endorse products or manufacturers. Technical Report Documentation Page 1. Report No. FHWA AZ 05 558 2. Government Accession No. 3. Recipient's Catalog No. 5. Report Date December 2005 4. Title and Subtitle High Risk Crash Analysis 6. Performing Organization Code 7. Author Dr. Simon Washington and Wen Cheng 8. Performing Organization Report No. 10. Work Unit No. 9. Performing Organization Name and Address University of Arizona Tucson, AZ 85721 11. Contract or Grant No. SPR PL 1( 63) 558 13. Type of Report & Period Covered Final Report 12. Sponsoring Agency Name and Address Arizona Department of Transportation 206 S. 17th Avenue Phoenix, Arizona 85007 14. Sponsoring Agency Code 15. Supplementary Notes Prepared in cooperation with the U. S. Department of Transportation, Federal Highway Administration 16. Abstract In agencies with jurisdiction over extensive road infrastructure it is common practice to select and rectify hazardous locations. Improving hazardous locations may arise during safety management activities, during maintenance activities, or as a result of political pressures and/ or public attention. Commonly a two stage process is used. In the first stage the past accident history of all sites is reviewed to screen a limited number of high risk locations for further examination. In the second stage the selected sites are studied in greater detail to devise cost effective remedial actions or countermeasures for a subset of correctable sites. Due often to limited time and resources constraints and the extensive number of candidate sites typically considered in such endeavors, it is impractical for agencies to examine all sites in detail. The current Arizona Local Government Safety Project Analysis Model ( ALGSP) is intended to facilitate conducting these procedures by providing an automated method for analysis and evaluation of motor vehicle crashes and subsequent remediation of ‘ hot spot’ or ‘ high risk’ locations. The software is user friendly and can save lots of time for local jurisdictions and governments such as Metropolitan Planning Organizations ( MPOs), counties, cities, and towns. Some analytical improvements are possible, however. The objective of this study was to provide recommendations that will lead to improvement in the accuracy and reliability of the ALGSP software for identifying true ‘ hot spots’ within the Arizona transportation system or network, be they road segments, ramps, or intersections. The research resulted in 1) a survey of past and current hot spot identification ( HSID) approaches, 2) evaluation of HSID methods and exploration of optimum duration of before period crash data under simulated scenarios, 3) development of safety performance functions ( SPFs) for various functional road sections within Arizona, 4) extended comparisons of alternative HSID methods based on SPFs by using real crash data, and 5) recommendations for improving the identification ability of current ALGSP model. 17. Key Words Hot Spot Identification, High Risk Sites, Sites with Promise, Safety, Motor Vehicle Crashes 18. Distribution Statement Document is available to the U. S. Public through the National Technical Information Service, Springfield, Virginia, 22161 19. Security Classification Unclassified 20. Security Classification Unclassified 21. No. of Pages 154 22. Price 23. Registrant's Seal SI* ( MODERN METRIC) CONVERSION FACTORS APPROXIMATE CONVERSIONS TO SI UNITS APPROXIMATE CONVERSIONS FROM SI UNITS Symbol When You Know Multiply By To Find Symbol Symbol When You Know Multiply By To Find Symbol LENGTH LENGTH in Inches 25.4 millimeters mm mm millimeters 0.039 inches in ft Feet 0.305 meters m m meters 3.28 feet ft yd Yards 0.914 meters m m meters 1.09 yards yd mi Miles 1.61 kilometers km km kilometers 0.621 miles mi AREA AREA in2 square inches 645.2 square millimeters mm2 mm2 Square millimeters 0.0016 square inches in2 ft2 square feet 0.093 square meters m2 m2 Square meters 10.764 square feet ft2 yd2 square yards 0.836 square meters m2 m2 Square meters 1.195 square yards yd2 ac Acres 0.405 hectares ha ha hectares 2.47 acres ac mi2 square miles 2.59 square kilometers km2 km2 Square kilometers 0.386 square miles mi2 VOLUME VOLUME fl oz fluid ounces 29.57 milliliters mL mL milliliters 0.034 fluid ounces fl oz gal Gallons 3.785 liters L L liters 0.264 gallons gal ft3 cubic feet 0.028 cubic meters m3 m3 Cubic meters 35.315 cubic feet ft3 yd3 cubic yards 0.765 cubic meters m3 m3 Cubic meters 1.308 cubic yards yd3 NOTE: Volumes greater than 1000L shall be shown in m3. MASS MASS oz Ounces 28.35 grams g g grams 0.035 ounces oz lb Pounds 0.454 kilograms kg kg kilograms 2.205 pounds lb T short tons ( 2000lb) 0.907 megagrams ( or “ metric ton”) mg ( or “ t”) Mg megagrams ( or “ metric ton”) 1.102 short tons ( 2000lb) T TEMPERATURE ( exact) TEMPERATURE ( exact) º F Fahrenheit temperature 5( F 32)/ 9 or ( F 32)/ 1.8 Celsius temperature º C º C Celsius temperature 1.8C + 32 Fahrenheit temperature º F ILLUMINATION ILLUMINATION fc foot candles 10.76 lux lx lx lux 0.0929 foot candles fc fl foot Lamberts 3.426 candela/ m2 cd/ m2 cd/ m2 candela/ m2 0.2919 foot Lamberts fl FORCE AND PRESSURE OR STRESS FORCE AND PRESSURE OR STRESS lbf Poundforce 4.45 newtons N N newtons 0.225 poundforce lbf lbf/ in2 poundforce per square inch 6.89 kilopascals kPa kPa kilopascals 0.145 poundforce per square inch lbf/ in2 SI is the symbol for the International System of Units. Appropriate rounding should be made to comply with Section 4 of ASTM E380 TABLE OF CONTENTS EXECUTIVE SUMMARY ................................................................................................ 1 CHAPTER I  INTRODUCTION ...................................................................................... 3 CHAPTER II  LITERATURE REVIEW OF HSID METHODS ..................................... 5 HOT SPOT IDENTIFICATION PROBLEM BACKGROUND............................... 5 BAYESIAN TECHNIQUES TO IDENTIFY HAZARDOUS LOCATIONS ......... 11 Bayesian Techniques Based on Accident Frequencies..................................... 11 Bayesian Techniques Based on Accident Rates ............................................... 13 CHAPTER III  EXPERIMENT DESIGN FOR EVALUATION OF HSID METHODS AND EXPLORATION OF ACCIDENT HISTORY....................................................... 17 EXPERIMENT FOR EVALUATING HSID METHOD PERFORMANCE .......... 17 Hot Spot Identification Methods....................................................................... 17 Ground Rules for Simulation Experiment ........................................................ 19 Generating Mean Crash Frequencies from Real Data ...................................... 20 Generation of Random Poisson Samples from TPMs ...................................... 21 Performance Evaluation Results for HSID Methods ........................................ 26 EXPERIMENT FOR OPTIMIZING DURATION OF CRASH HISTORY ........... 30 RESULTS ................................................................................................................. 32 CONCLUSIONS AND RECOMMENDATIONS ................................................... 38 CHAPTER IV  SAFETY PERFORMANCE FUNCTIONS FOR ARIZONA ROAD SEGMENTS ..................................................................................................................... 39 DATA DESCRIPTION ............................................................................................ 39 HOW TO CREATE SPFS? ...................................................................................... 40 RESULTS OF SPFS ................................................................................................. 41 CONCLUSIONS....................................................................................................... 42 CHAPTER V  COMPARISON OF HSID METHODS BASED ON REAL CRASH DATA OF ARIZONA ROAD SEGMENTS.................................................................... 43 HSID METHODS BASED ON SPFS ...................................................................... 43 The EB Approach Based on SPFs .................................................................... 43 Accident Reduction Potential Method Based on SPFs ..................................... 44 Numerical Examples to Show the HSID Methods Based on SPFs .................. 44 DATA DESCRIPTION ............................................................................................ 46 TESTS FOR COMPARISON OF HSID METHODS.............................................. 46 Site Consistency Test........................................................................................ 47 Method Consistency Test.................................................................................. 48 Total Ranking Differences Test ........................................................................ 48 False Identification Test.................................................................................... 49 COMPARISON RESULTS...................................................................................... 51 Site Consistency Test Result............................................................................. 51 Method Consistency Test Result ...................................................................... 52 Total Ranking Differences Test Result............................................................. 53 False Identification Test Result ........................................................................ 54 False True Poisson Means Differences Test Result.......................................... 55 Result of Similarity of Alternative HSID Identification Methods.................... 56 CONCLUSIONS AND RECOMMENDATIONS ................................................... 57 CHAPTER VI  HSID IN CURRENT ALGSP MODEL AND RECOMMENDED SOFTWARE CHANGES ................................................................................................. 59 HSID IN CURRENT ALGSP MODEL ................................................................... 59 RECOMMENDED SOFTWARE CHANGES......................................................... 61 Incorporating the Functional Classification as an Additional User Selection Parameter .......................................................................................................... 61 Data Interface Improvement ............................................................................. 61 Exploring the Relationship between Exposure and Safety as Employed in the ALGSP.............................................................................................................. 62 Incorporation of the EB Techniques to Calculate the Expected Crash Number62 Incorporation of Accident Reduction Potential Method................................... 63 Incorporation of the EB Techniques to Calculate the Expected Crash Costs... 64 Recommended Period of Analysis for Software Users..................................... 64 REFERENCES ................................................................................................................. 67 APPENDIX A: REAL ARIZONA CRASH DATA USED FOR THE DEVELOPMENT OF SIMULATED CRASH DATA................................................................................... 71 APPENDIX B: THE IDENTIFICATION ERROR RATES ASSOCIATED WITH VARIOUS HSID METHODS, CONFIDENCE LEVELS, AND GROUPS ................... 80 APPENDIX C: SAFETY PERFORMANCE FUNCTIONS OF VARIOUS FUNCTIONAL CLASSIFICATIONS OF ARIZONA ROAD SEGMENTS................ 107 APPENDIX D: COMPARISON TESTS RESULTS AND SIMILARITY OF ALTERNATIVE HSID METHODS FOR VARIOUS CLASSIFICATIONS OF HIGHWAY SECTIONS................................................................................................. 117 LIST OF TABLES Table 1: Summary of Gamma Fittings of Six Datasets .................................................... 24 Table 2: Simulated Data for 30 Sites and 16 Observation Periods................................... 25 Table 3: Percent Errors for Low Heterogeneity in Crash Counts..................................... 29 Table 4: Percent Errors for High Heterogeneity in Crash Counts .................................... 29 Table 5: Snapshot of the Simulated Data.......................................................................... 31 Table 6: The Number of t year Which is the “ Knee” of the Curve for Group 1 .............. 33 Table 7: The Number of t year Which is the “ Knee” of the Curve for Group 2 .............. 33 Table 8: The Number of t year Which is the “ Knee” of the Curve for Group 3 .............. 33 Table 9: Percent Errors for Low Heterogeneity in Crash Counts ( 3 Years Data) ............ 37 Table 10: Percent Errors for High Heterogeneity in Crash Counts ( 3 Years Data).......... 37 Table 11: Functional Classification Codes ...................................................................... 39 Table 12: Statistics for Roads of Various Functional Classifications............................... 40 Table 13: Crash Information of a Sample of 20 Principle Arterial Road Sections........... 47 Table 14: Results of Site Consistency Test of Various Methods for All Classifications of Highways: Accumulated Crashes for Hot Spot Sites for Various Methods ............. 51 Table 15: Results of Method Consistency Test of Various Methods for All Classifications of Highways: Number of Sites Commonly Identified across Periods ...................... 52 Table 16: Results of Total Ranking Differences Test of Various Methods for All Classifications of Highways: Cumulative Ranking Differences of Hot Spot Sites .. 53 Table 17: Results of False Identification Test of Various Methods for All Classifications of Highways: Frequency of Errors............................................................................ 54 Table 18: Results of False True Poisson Means Differences Test of Various Methods for All Classifications of Highways: Cumulative Difference in TPMs.......................... 55 Table 19: Accumulated Similarity of Various Methods for All Classifications of Highways ( δ = 0.90).................................................................................................. 56 Table 20: Accumulated Similarity of Various Methods for All Classifications of Highways ( δ = 0.95).................................................................................................. 56 Table 21: Observed Data from Apache ( E1) .................................................................... 71 Table 22: Observed Data from Gila ( E2).......................................................................... 71 Table 23: Observed Data from Graham ( L1).................................................................... 72 Table 24: Observed Data from Lapaz ( L2)....................................................................... 72 Table 25: Observed Data from Pima ( S1)......................................................................... 72 Table 26: Observed Data from Santacruz ( S2) ................................................................. 73 Table 27: The Identification Error Rates of SR Method for Group 1 ( δ = 0.90).............. 80 Table 28: The Identification Error Rates of ER Method for Group 1 ( δ = 0.90).............. 81 Table 29: The Identification Error Rates of CI Method for Group 1 ( δ = 0.90)............... 82 Table 30: The Identification Error Rates of SR Method for Group 1 ( δ = 0.95).............. 83 Table 31: The Identification Error Rates of EB Method for Group 1 ( δ = 0.95).............. 84 Table 32: The Identification Error Rates of CI Method for Group 1 ( δ = 0.95)............... 85 Table 33: The Identification Error Rates of SR Method for Group 1 ( δ = 0.99).............. 86 Table 34: The Identification Error Rates of EB Method for Group 1 ( δ = 0.99).............. 87 Table 35: The Identification Error Rates of CI Method for Group 1 ( δ = 0.99)............... 88 Table 36: The Identification Error Rates of SR Method for Group 2 ( δ = 0.90).............. 89 Table 37: The Identification Error Rates of EB Method for Group 2 ( δ = 0.90).............. 90 Table 38: The Identification Error Rates of CI Method for Group 2 ( δ = 0.90)............... 91 Table 39: The Identification Error Rates of SR Method for Group 2 ( δ = 0.95).............. 92 Table 40: The Identification Error Rates of EB Method for Group 2 ( δ = 0.95).............. 93 Table 41: The Identification Error Rates of CI Method for Group 2 ( δ = 0.95)............... 94 Table 42: The Identification Error Rates of SR Method for Group 2 ( δ = 0.99).............. 95 Table 43: The Identification Error Rates of EB Method for Group 2 ( δ = 0.99).............. 96 Table 44: The Identification Error Rates of CI Method for Group 2 ( δ = 0.99)............... 97 Table 45: The Identification Error Rates of SR Method for Group 3 ( δ = 0.90).............. 98 Table 46: The Identification Error Rates of EB Method for Group 3 ( δ = 0.90).............. 99 Table 47: The Identification Error Rates of CI Method for Group 3 ( δ = 0.90)............. 100 Table 48: The Identification Error Rates of SR Method for Group 3 ( δ = 0.95)............ 101 Table 49: The Identification Error Rates of EB Method for Group 3 ( δ = 0.95)............ 102 Table 50: The Identification Error Rates of CI Method for Group 3 ( δ = 0.95)............. 103 Table 51: The Identification Error Rates of SR Method for Group 3 ( δ = 0.99)............ 104 Table 52: The Identification Error Rates of EB Method for Group 3 ( δ = 0.99)............ 105 Table 53: The Identification Error Rates of CI Method for Group 3 ( δ = 0.99)............. 106 Table 54: Estimation Results for SPF of Rural Interstate Principle Arterials ( Functional Code: 1)................................................................................................................... 108 Table 55: Estimation Results for SPF of Rural Other Principle Arterials ...................... 109 Table 56: Estimation Results for SPF of Rural Minor Arterials..................................... 110 Table 57: Estimation Results for SPF of Rural Major Collectors ( Functional Code: 7) 111 Table 58: Estimation Results for SPF of Rural Minor Collectors ( Functional Code: 8) 112 Table 59: Estimation Results for SPF of Urban Interstate Principle Arterials ( Functional Code: 11)................................................................................................................. 113 Table 60: Estimation Results for SPF of Urban Freeways ............................................. 114 Table 61: Estimation Results for SPF of Urban Other Principle Arterials ( Functional Code: 14)................................................................................................................. 115 Table 62: Estimation Results for SPF of Urban Minor Arterials.................................... 116 Table 63: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 1)................................................................................................................... 117 Table 64: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 1)................................................................................................................... 117 Table 65: Results of Site Consistency Test of Various Methods.................................... 118 Table 66: Results of Method Consistency Test of Various Methods ............................. 118 Table 67: Results of Total Ranking Differences Test of Various Methods.................... 118 Table 68: Results of False Identification Test of Various Methods ............................... 119 Table 69: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 1) ............................................................................................... 119 Table 70: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 2)................................................................................................................... 120 Table 71: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 2)................................................................................................................... 120 Table 72: Results of Site Consistency Test of Various Methods.................................... 120 Table 73: Results of Method Consistency Test of Various Methods ............................. 121 Table 74: Results of Total Ranking Differences Test of Various Methods.................... 121 Table 75: Results of False Identification Test of Various Methods ............................... 121 Table 76: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 2) ............................................................................................... 122 Table 77: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 6)................................................................................................................... 123 Table 78: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 6)................................................................................................................... 123 Table 79: Results of Site Consistency Test of Various Methods.................................... 123 Table 80: Results of Method Consistency Test of Various Methods ............................. 124 Table 81: Results of Total Ranking Differences Test of Various Methods.................... 124 Table 82: Results of False Identification Test of Various Methods ............................... 124 Table 83: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 6) ............................................................................................... 125 Table 84: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 7)................................................................................................................... 126 Table 85: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 7)................................................................................................................... 126 Table 86: Results of Site Consistency Test of Various Methods.................................... 126 Table 87: Results of Method Consistency Test of Various Methods ............................. 127 Table 88: Results of Total Ranking Differences Test of Various Methods.................... 127 Table 89: Results of False Identification Test of Various Methods ............................... 127 Table 90: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 7) ............................................................................................... 128 Table 91: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 8)................................................................................................................... 129 Table 92: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 8)................................................................................................................... 129 Table 93: Results of Site Consistency Test of Various Methods.................................... 129 Table 94: Results of Method Consistency Test of Various Methods ............................. 130 Table 95: Results of Total Ranking Differences Test of Various Methods.................... 130 Table 96: Results of False Identification Test of Various Methods ............................... 130 Table 97: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 8) ............................................................................................... 131 Table 98: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 11)................................................................................................................. 132 Table 99: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 11)................................................................................................................. 132 Table 100: Results of Site Consistency Test of Various Methods.................................. 132 Table 101: Results of Method Consistency Test of Various Methods ........................... 133 Table 102: Results of Total Ranking Differences Test of Various Methods.................. 133 Table 103: Results of False Identification Test of Various Methods ............................. 133 Table 104: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 11) ............................................................................................. 134 Table 105: Similarity of Identification Results ( δ = 0.90) of Various Methods............. 135 Table 106: Similarity of Identification Results ( δ = 0.95) of Various Methods............. 135 Table 107: Results of Site Consistency Test of Various Methods.................................. 135 Table 108: Results of Method Consistency Test of Various Methods ........................... 136 Table 109: Results of Total Ranking Differences Test of Various Methods ( Functional Code: 12)................................................................................................................. 136 Table 110: Results of False Identification Test of Various Methods ............................. 136 Table 111: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 12) ............................................................................................. 137 Table 112: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 14)................................................................................................................. 138 Table 113: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 14)................................................................................................................. 138 Table 114: Results of Site Consistency Test of Various Methods.................................. 138 Table 115: Results of Method Consistency Test of Various Methods ........................... 139 Table 116: Results of Total Ranking Differences Test of Various Methods ( Functional Code: 14)................................................................................................................. 139 Table 117: Results of False Identification Test of Various Methods ............................. 139 Table 118: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 14) ............................................................................................. 140 Table 119: Similarity of Identification Results ( δ = 0.90) of Various Methods ( Functional Code: 16)................................................................................................................. 141 Table 120: Similarity of Identification Results ( δ = 0.95) of Various Methods ( Functional Code: 16)................................................................................................................. 141 Table 121: Results of Site Consistency Test of Various Methods.................................. 141 Table 122: Results of Method Consistency Test of Various Methods ........................... 142 Table 123: Results of Total Ranking Differences Test of Various Methods ( Functional Code: 16)................................................................................................................. 142 Table 124: Results of False Identification Test of Various Methods ............................. 142 Table 125: Results of False True Poisson Means Differences Test of Various Methods ( Functional Code: 16) ............................................................................................. 143 LIST OF FIGURES Figure 1: Observed and Fitted PDF of E1 Crash Data and Fit Summary Statistics......... 23 Figure 2: Fitted and Empirical CDF of E1........................................................................ 24 Figure 3: Moving Averages vs. Original Statistic............................................................. 32 Figure 4: The Number of t year Which is the “ Knee” of the Curve for 90% Confidence Level ......................................................................................................................... 34 Figure 5: The Number of t year Which is the “ Knee” of the Curve for 95% Confidence Level ......................................................................................................................... 34 Figure 6: The Number of t year Which is the “ Knee” of the Curve for 99% Confidence Level ......................................................................................................................... 35 Figure 7: The Number of t year Which is the “ Knee” of the Curve for All Confidence Levels........................................................................................................................ 35 Figure 8: The Cumulative Percent Distribution of Various t years.................................. 36 Figure 9: Key Steps of ALGSP Model ............................................................................. 59 Figure 10: The Flowchart of Conducting EB Analysis .................................................... 63 Figure 11: The Flowchart of Computing Accident Reduction Potential .......................... 64 Figure 12: Empirical Cumulative Distribution of Dataset One ( E1) ................................ 74 Figure 13: Empirical Cumulative Distribution of Dataset Two ( E2) ............................... 75 Figure 14: Empirical Cumulative Distribution of Dataset Three ( L1) ............................. 76 Figure 15: Empirical Cumulative Distribution of Dataset Four ( L2) ............................... 77 Figure 16: Empirical Cumulative Distribution of Dataset Five ( S1)................................ 78 Figure 17: Empirical Cumulative Distribution of Dataset Six ( S2).................................. 79 Figure 18: Relation of AADT and Crashes/ year km for Rural Interstate Principle Arterials ( Functional Code: 1, year: 2000) ............................................................. 108 Figure 19: Relation of AADT and Crashes/ year km for Rural Other Principle Arterials ( Functional Code: 2, year: 2000) ............................................................................ 109 Figure 20: Relation of AADT and Crashes/ year km for Rural Minor Arterials ( Functional Code: 6, year: 2000)................................................................................................ 110 Figure 21: Relation of AADT and Crashes/ year km for Rural Major Collectors ( Functional Code: 7, year: 2000) ............................................................................ 111 Figure 22: Relation of AADT and Crashes/ year km for Rural Minor Collectors ( Functional Code: 8, year: 2000) ............................................................................ 112 Figure 23: Relation of AADT and Crashes/ year km for Urban Interstate Principle Arterials ( Functional Code: 11, year: 2000) ........................................................... 113 Figure 24: Relation of AADT and Crashes/ year km for Urban Freeways ..................... 114 Figure 25: Relation of AADT and Crashes/ year km for Urban Other Principle Arterials ( Functional Code: 14, year: 2000) .......................................................................... 115 Figure 26: Relation of AADT and Crashes/ year km for Urban Minor Arterials ( Functional Code: 16, year: 2000) .......................................................................... 116 1 EXECUTIVE SUMMARY In many agencies with jurisdiction over extensive road infrastructure, it is common practice to select and rectify hazardous locations. Improving hazardous locations may arise during safety management activities, during maintenance activities, or as a result of political pressures and/ or public attention. Commonly a two stage process is used. In the first stage, the past accident history of all sites is reviewed to screen a limited number of high risk locations for further examination. In the second stage, the selected sites are studied in greater detail to devise cost effective remedial actions or countermeasures for a subset of correctable sites. Due to limited time and resources, constraints and the extensive number of candidate sites typically considered in such endeavors, it is impractical for agencies to examine all sites in detail. The current Arizona Local Government Safety Project ( ALGSP) Analysis Model, which was developed by Carey ( 2001) with funding from the Arizona Department of Transportation ( ADOT), is intended to facilitate conducting these procedures by providing an automated method for analysis and evaluation of motor vehicle crashes and subsequent remediation of ‘ hot spot’ or ‘ high risk’ locations. The software is user friendly and can save large amounts of time for local jurisdictions and governments such as Metropolitan Planning Organizations ( MPOs), counties, cities, and towns. However, its analytical core is based on the simple ranking of crash statistics, where the user is offered choices of crash frequency, crash rate, crash severity, or crash cost ( severities associated with average costs per crash severity type). Although this method has the benefit of straightforwardness, the efficiency of identifying truly high risk sites leaves some room for improvement. This research, funded by ADOT, aims to justify and recommend improvements to the analytical algorithms within the ALGSP model, thus enhancing its ability to accurately identify high risk sites. Included in the results of this research are a survey of past and current hot spot identification ( HSID) approaches; evaluation of HSID methods, and exploration of optimum duration of before period crash data under simulated scenarios; development of safety performance functions ( SPFs) for various functional road sections within Arizona; extended comparisons of alternative HSID methods based on SPFs by using real crash data; and recommendations for improving the identification ability of the current ALGSP model. These results are divided into the following sections: • Literature review of HSID methods ( chapter II): Through tracing the historical and conceptual development of various HSID techniques, the strengths and weaknesses associated with alternative approaches are assessed and appropriate directions of future research on HSID methods are explored and proposed. A detailed description of Bayesian approaches is also provided. • Experimental design for evaluation of HSID methods and exploration of accident history ( chapter III): In this experiment, “ sites with promise” are known a priori. Real intersection crash data from six counties within Arizona are used to simulate crash frequency distributions at hypothetical sites. A range of real conditions is manipulated to quantify their effects. Various levels of confidences are explored. 2 False positives ( labeling a safe site as high risk) and false negatives ( labeling a high risk site as safe) are compared across the following three methods, say, simple ranking method, confidence interval method, and Empirical Bayesian ( EB) method. Finally, the effect of crash history duration in these approaches is quantified. • Safety performance functions for Arizona road segments ( chapter IV): The SPFs for nine functional classifications of road sections in Arizona are created based on the crash data of Year 2000 provided by ADOT. Due to the existence of overdispersion of accidents, Negative Binomial models are utilized to develop these SPFs. • Comparison of HSID methods based on real crash data of Arizona road segments ( chapter V): On the basis of SPFs for Arizona road sections, five tests are implemented to evaluate the performances of the EB, accident reduction potential, accident frequency, and the accident rate methods. Two levels of confidences are explored under each test. In addition, the similarity of identification results of the alternative HSID methods is explored as well. • HSID in current ALGSP model and recommended software changes ( chapter VI): The algorithms for conducting HSID in the current ALGSP model are first reviewed and the software changes are then recommended. These recommendations include incorporating functional classification as an additional selection parameter, data interface improvements, accident history requirements, embed ding the relationships between exposure and safety for various roadway functional classes, incorporation of the EB techniques to compute the expected crash count, incorporation of accident reduction potential as an additional weighting method, and incorporation of EB techniques to calculate the expected crash costs. Based on both real and simulated data, the results in this report show significant advantages of the EB methods over other HSID methods across various confidence levels and different statistical tests. Specifically, the research found that: • A higher percentage of truly high risk sites are identified as ‘ high risk.’ • A higher percentage of truly safe sites are identified as ‘ safe.’ • Overall misclassifications are reduced using a Bayesian approach compared to alternative methodologies. • The Bayesian approach shows the best site consistency and method consistency among the alternative methodologies. Although it is shown that incorporation of Bayesian techniques into the ALGSP will provide model users with more accurate prediction of hot spots, improvements are contingent upon accurate safety performance functions, which are currently unavailable in the ALGSP. Safety performance functions— the relationship between traffic volumes, road section lengths, and crashes— are provided in Appendix C for various roadway functional classifications in the state of Arizona. These safety performance functions enable the software enhancements needed to improve the ALGSP and accommodate Empirical Bayes’ procedures. 3 CHAPTER I  INTRODUCTION Hot spot identification is a critical contemporary transportation issue. The Intermodal Surface Transportation Efficiency Act ( ISTEA) of 1991, along with the subsequent Transportation Efficiency Act for the 21st Century ( TEA 21), brought HSID squarely into transportation planning activities. In particular, ISTEA requires each state to develop a work plan outlining strategies to implement Safety Management Systems ( NCHRP, 2003). The objectives outlined in this management system require that several activities be undertaken by MPOs and/ or DOTs: 1) The development and maintenance of a regional safety database so that safety investments can be evaluated regionally and forward in time. 2) The adoption of a defensible ( i. e. state of practice) methodology for identifying safety deficiencies within a region. 3) A maintained and updated record of ‘ sites with promise,’ including intersections, segments, interchanges, ramps, curves, etc. 4) A defensible methodology for evaluating the effectiveness of safety countermeasures. Besides this mandate to spend safety funds wisely, there is professional pressure to conduct rigorous analyses and be held accountable for ‘ good number crunching.’ Due to both public and professional pressures and the import associated with motor vehicle injuries and fatalities, transportation safety professionals desire analytical tools to cope with HSID. As a powerful tool for local governments and jurisdictions, the current ALGSP model can be used to facilitate the selection of hazardous roadway locations in local jurisdictions and to aid in the evaluation of potential spot treatments of safety hazards. Its identification method is to simply rank the crash statistics in descending order and then the top ones are selected in terms of the allowed money budget. Due to a random “ up” fluctuation in crash counts during the observation period, this simple ranking method is always subject to regression to the mean bias, which decreases the identification accuracy. By contrast, Bayesian methods have been proposed for obviating this bias and have revealed themselves as superior for accurately identifying ‘ sites with promise’ in considerable literature. However, much of the research was conducted on real crash data ( where hazardous sites are not truly known) and comparisons across various scenarios have not been conducted. In addition, real crash data specific to Arizona regions have not been used to examine the performance of Bayesian analyses. By designing a special experiment which simulates various scenarios and using the real crash data from Arizona, this research effort evaluates and compares alternative HSID methods. All the results show the consistent superiority of Bayesian techniques for accurately identifying ‘ sites with promise.’ This lays the solid foundation for the future incorporation of Bayesian approaches into the current ALGSP model. Moreover, safety performance functions of various classifications of road sections within Arizona are also provided in this report to facilitate the integration procedure. This report is divided into five primary sections. In the second section of this report, Literature Review of HSID Methods, the historical and conceptual development of HSID 4 procedures is reviewed chronologically, and for the convenience of understanding the more complicated computation procedures, the detailed description about two types of Bayesian techniques is provided. In the third section, an experimental approach is taken to evaluate the performance of simple ranking, classical confidence intervals, and the EB techniques in terms of percent of false negatives and positives. Several practical empirical crash distributions from the state of Arizona are selected to represent a realistic range of ‘ base’ crash data and several degrees of crash heterogeneity are examined in the simulation. The results demonstrate that the EB methods in general outperform the other two relatively conventional methods, especially in the low heterogeneity situations. In addition, the effect of crash history duration employed in the three HSID methods is also explored in this experiment. The moving average method is used to smooth the trend of the various duration data and to find the “ knee” of the curve. Using 3 years of crash history data results in significant improvements in error rates for all three methods, and 3 through 6 years make up almost 90% of all the optimum duration. The major focus of the fourth section is on developing the safety performances of road sections. Since design criteria and level of service vary according to the function of the highway facility, the safety performance function is created for each of nine types of road sections within Arizona. The data for modeling includes accident number, Annual Average Daily Traffic ( AADT), and road section length. The graph showing the relationship among variables, the model form, and measures of goodness of fit are provided as well. It is expected that the input of the alternate SPFs would facilitate the procedure of incorporation of Bayesian techniques into the future ALGSP. The fifth section contains a comprehensive comparison of identification performances of the EB, accident reduction potential, accident frequency, and the accident rate methods using crash data from Arizona and the SPFs developed in the previous section. Five evaluation tests including site consistency test, the method consistency test, total ranking differences test, false identification test, and false/ true Poisson mean differences test are conducted. Both top 10% and top 5% locations ( in terms of accident frequency) are considered as hot spots. The results across the nine types of road sections show the consistent advantage associated with the EB method, and disadvantage of the accident rate method while conducting HSID. The final section provides recommended software changes to improve its ability to select truly hazardous locations from road network. The information of traffic volume is proposed to be incorporated in the software. As one of the factors significantly affecting road safety, it should be included in the safety performance function, which is the basis for conducting the EB analysis. Both the experimental design results based on the simulated data and the results of the evaluation tests based on Arizona crash data support the incorporation of Bayesian technique in the software. The accident reduction potential method is also recommended to be included as an additional weighting method. Finally, the recommendation of length of crash analysis period is provided. 5 CHAPTER II  LITERATURE REVIEW OF HSID METHODS Identifying ‘ sites with promise,’ also known as black spots, hot spots, or high risk locations, has received considerable attention in the literature. This is not surprising, since there is public and professional pressure to allocate safety investment resources efficiently across the transportation system and to invest in sites that will yield safety benefits for relatively modest cost. In addition, US federal legislation requires the practice of remediating high risk locations. It is intended that this identification stage act as an effective sieve that allows sites that do not require remedial action to pass through, while retaining sites that require remediation. This is difficult to accomplish, however, because an individual sites’ safety performance ( i. e. number of crashes) varies from year to year as a result of natural variation— causing two potential errors— false positives and false negatives. False positives are sites identified as needing remediation when in fact they are safe, while false negatives are sites identified as being safe when in fact they require remediation. The following literature review comprehensively examines hot spot identification methods. It is intended to support ongoing work for the Arizona Department of Transportation aimed at improving the current ALGSP Model. It is the first of several steps toward ultimately improving the software that enables jurisdictions in the state of Arizona to identify sites for potential improvement, such as road segments, intersections, ramps, etc. This literature review is divided into two sections: the historical and conceptual development of hot spot identification methods, and a detailed description of Bayesian techniques, the current state of the art. HOT SPOT IDENTIFICATION PROBLEM BACKGROUND Due to the significant importance of identifying sites with promise, a large number of techniques have been employed to improve the detection accuracy. The historical and conceptual development of such procedures is reviewed chronologically in this section to help familiarize you with the hot spot identification problem background. The following notation will be useful in the discussions that follow: X = observed accident count for a road section/ site and period; λ = expected accident count ( E{ X}) for the road section/ site and period; E{ λ} = mean of λ’ s for similar road sections/ sites; D = length of the road section; Q = number of vehicles passing road section/ site during period to which X pertains; R = observed accident rate ( e. g., crashes/ vehicle kilometer or crashes/ million entering vehicles); REB = accident rate estimated by the EB method; R = average value of R for similar road sections and sites; UCLX = upper control limit for observed accident counts ( X); UCLR = upper control limit for observed accident rate ( R); t = number of years of accident data to be analyzed; α, β = parameters. 6 Perhaps the simplest way to identify sites with promise is by simply ranking them in descending order of their accident frequencies and/ or accident rates. Although this method has the benefit of straightforwardness, the efficiency of identifying truly high risk sites leaves considerable room for improvement. To overcome this deficiency, a substantial body of research has been devoted to providing more efficient and justifiable site identification techniques. For example, Norden et al. ( 1956) proposed a method to analyze accident data for highway sections based on statistical quality control techniques. Using an approximation of the Poisson distribution for crash counts, and 0.5 percent probability, they developed the equations for UCLX and UCLR used to identify critical thresholds. When X exceeds UCLX ( or R exceeds UCLR), a site was identified as deviant with regard to safety. This approach drew much attention at that time, and some similar methods ( with relatively minor differences) based on this procedure were proposed in subsequent years. Researchers then began to ponder the issues of how many years ( t) of accident data are necessary to conduct a defensible analysis. By finding that a 13 year average could be adequately estimated from 3 years of accident counts, May ( 1964) first provided the conclusion, “ There is little to be gained by using a longer study period than three years.” It is reasonable to use the current data instead of using old data that no longer reflect a current situation. However, considering that a sensible choice of t must depend on the magnitude of the average that is being estimated and on some knowledge of what makes past accident counts obsolete, this influential practice seems somewhat arbitrary. Crash severity became the next issue of importance regarding HSID methods. Common sense suggested that a site with more severe crashes ( all else being equal) should receive higher priority in remediation efforts. The safety index was first introduced by Tamburri and Smith ( 1970) and later incorporated into the practice of HSID. In essence, they said each road type ( as examples, rural two lane roads, urban freeways, etc.) had a characteristic mix ( distribution) of accident severities among fatal, injury, and property damage only ( PDO) crashes. On the basis of the accident severity and road type, accident costs were used to weight crashes. They also suggested that all crashes be expressed in terms of PDO equivalent accidents ( for example a certain injury crash may be equivalent to 5 PDO crashes). Deacon et al. ( 1975) considered the difference between identifying hot spots and sections and explored how long analysis sections should be conducted. They also presented an analysis of a sensible t, in comparison to that provided earlier by May ( 1964). Their conclusions suggested that a balance is sought between reliability of the crash data ( longer being more reliable) and the need to detect adverse change quickly ( shorter being more able to reveal adverse safety changes), and that a single t should be determined on this basis. They also recommend 9.5 as the weight for fatal and A injury crashes, and 3.5 for B and C crashes when using a safety index. Laughland et al. ( 1975) first described the ranking procedure using both the number and rate methods. The method proposed identifies hazardous locations when X exceeds some predetermined value UCLX and R exceeds UCLR. The claimed advantage of this 7 procedure is that it excludes so called hazardous locations identified as a result of R being as large as a result of low exposure. Renshaw et al. ( 1980) argued that questions about the length of sections, duration of accident history, amount of traffic, and detection accuracy must all be considered jointly and that reliable detection is often not practical. Hakkert and Mahalel ( 1978) first proposed that blackspots should be defined as those sites whose accident frequency was significantly higher than the expected at some prescribed level of significance. This point was then favored by McGuigan ( 1981; 1982), who put forward the concept of potential accident reduction ( PAR), such as the difference between the observed accident counts and the expected number of similar sites. He stated, with some justification, that PAR should be a better basis on which to rank sites than annual accident totals ( AAT), which tends to identify high flow sites which do not necessarily have the potential for accident reduction. This method is similar to the quality control method to some extent. The former represents the magnitude of the problem, that is, how many accidents can be avoided given the normal situation, and the latter represents how large the probability that the site is abnormal by using the given level of confidence. Estimating E{ λ} using a multivariate model was suggested by Mahal et al. ( 1982). By using E{ λ} as the mean, they deemed a location as deviant if the probability of observing X or more accidents was smaller than some predetermined value. Flak et al. ( 1982) recommended that crashes be categorized according to specific road conditions ( weather, pavement material, etc.) and by accident type ( turning, side swipe, rear end, etc.), and so forth. This concept differed from previous ones in that it seeks to identify deviant locations with regards to very specific conditions. Although appealing from an experimental design point of view, this concept is likely to produce sample sizes too small to detect significant differences for all but the largest of databases. Hauer and Persaud ( 1984) proposed a concept of sieve efficiency in which the number of sites to be inspected and the expected numbers of correct positives, false positives, and false negatives serve as measures of performance. They examined the performance of various HSID techniques on the basis of performance measures that are easy to understand. They argued that the quality control approach to HSID does not give the analyst clues about how well or how poorly the sieve is working. They also suggested that numerical methods are needed to free the procedure from reliance on the assumption that λ obeys the gamma distribution. Regression to the mean ( RTM) bias associated with typical methods of site selection has been identified in the literature and some research dealing with RTM has been developed. Persaud and Hauer ( 1984) compared and evaluated the performance of an EB and a nonparametric method for debiasing before and after analyses. The results of several data sets show that the Bayesian methods in most cases yield better estimates than the other one. Wright et al. ( 1988) made a survey on the previous research dealing with the RTM effect. He examined the validity of assumptions associated with those methods, evaluated 8 the robustness of the results based on the assumptions, and provided some suggestions for improving the quality of the results. Mak et al. ( 1985) developed a procedure to conduct an automated analysis of hazardous locations. The procedure consists of ( a) a mainframe computer program to identify and rank black spots, ( b) a microcomputer program to identify factors overrepresented in accident occurrence at these locations relative to the average for similar highways in the area, ( c) a multidisciplinary approach to identify accident causative factors and to devise appropriate remedial measures, and ( d) evaluation of remedial measures actually implemented. The procedure is based on accident rate ( number of injury and fatal accidents per 100 million vehicle miles of travel). Higle and Witkowski ( 1988) developed a Bayesian model for HSID using accident rate data rather than accident counts, which are shown to have identification criteria analogous to those used in the classical identification scheme. The comparisons between the Bayesian analysis and classical statistical analyses suggest that there is an appreciable difference among the various identification techniques in terms of HSID performance, and that some classically based statistical techniques are prone to err in the direction of excess false negatives. Based on data from 145 intersections in Metropolitan Toronto, Hauer et al. ( 1988) provided Bayesian models to estimate the safety of signalized intersections on the basis of information about its traffic flow and accident history. For each of the 15 accident patterns ( categorized by the movement of the vehicles), an equation is given to estimate the expected number of accidents and the variance using the relevant traffic flows. When data about past accidents are available, estimates based on traffic flow are revised with a simple equation. By applying these Bayesian models, one can estimate safety when both flows and accident history are given and, on this basis, judge whether an intersection is unusually hazardous. This method of estimation is also recommended for accident warrants in the Manual on Uniform Traffic Control Devices. Through a simulation experiment, Higle and Hecht ( 1989) evaluated and compared various techniques for the identification of hazardous locations, based on classically and Bayesian statistical analyses respectively, in terms of their ability to identify hazardous locations correctly. The results reveal that the two classically based techniques suffer from some shortcomings, and the Bayesian method based on accident rate exhibits a tendency to perform well, producing lower numbers of both false negative and false positive errors. By 1990 it was generally becoming accepted among academic circles that the Empirical Bayes approach to unsafety estimation was superior to previous HSID methods. The Bayesian approach generally makes use of two kinds of clues of an entity: its traits ( such as traffic, geometry, age, or gender) and its historical crash record. It requires information about the mean and the variance of the unsafety in a “ reference population” of similar entities. Obviously, this method suffers from several shortcomings: First, a very large reference population is required; second, the choice of reference population is to some 9 extent arbitrary; and third, entities in the reference population usually cannot match the traits of the entity for which the unsafety is estimated. Hauer ( 1992) alleviated these shortcomings by offering the multivariate regression method for estimating the mean and the variance of unsafety in reference population. By describing its logical foundations and illustrating some numerical examples, Hauer shows how the multivariate method makes the Empirical Bayes method to unsafety estimation applicable to a wider range of circumstances and yields better estimates of unsafety than previous methods. Persaud ( 1991) presented a method for estimating the underlying accident potential of Ontario road sections using accident and road related data. The comparative results indicate that the EB estimates are superior to those based on the accident count or the regression predictions by themselves, particularly for sections that might be of interest in a program to identify and treat unsafe road locations. Brown et al. ( 1992) presented the convergence of HSID by police reported data, by highway inventory, and by community reporting. Weighted injury frequencies per unit distance and weighted injury rates per 100 million vehicle km are presented for all sites and for all numbered highway segments. Priority sites are then ranked considering injury frequencies and injury rates. Hauer et al. ( 1993) explored the probabilistic properties of the process of identifying entities, such as drivers or intersections, for some form of remedial action when they experience N crashes within D units of time, the N D “ trigger.” On the basis of the probability distribution of the “ time to trigger,” it is concluded that in road safety the problem of false positives is severe, and therefore entities identified on the basis of accident or conviction counts should be subjected to further safety diagnosis. Moreover, they found that the longer the N D trigger is applied to a population, the less useful it becomes. Tarko et al. ( 1996) presented a methodology of area wide safety analyses to detect those areas ( states, counties, townships, etc.) that should be considered for safety treatment. The method is implemented for Indiana at the county level and uses regression models to estimate the normal number of crashes in individual counties. The counties are priority ranked using the combined criterion including both the above norm number of cashes and the confidence level. This combined criterion helps select counties where the excessive number of crashes is not caused solely by the randomness of the process. This application differs from previous applications in that the HSID was conducted at the planning or county level, instead of at the intersection or road segment level. Stokes and Mutabazi ( 1996) traced the evolution of the formulas used in the rate quality control method from their origin in the late 1950s to their present form, and they also presented and discussed the derivation of the basic formulas used in the method. It is suggested that, contrary to assertions in the literature, the accuracy of the equations used in the rate quality method is not proved by eliminating the normal approximation correction factor from the original equations and the need for a correction factor is particularly apparent at higher probability levels. 10 On the basis of the review of previous procedures for black spots identification, Hauer ( 1996) made an attempt to create some order in the thinking and made some suggestions to improve identification. In comparison with the stage of identification, he pointed out that the stage of site safety diagnosis and remediation is somewhat underdeveloped. Persaud et al. ( 1999) put forward a similar concept to potential accident reduction, such as potential for safety improvement ( PSI). For the sake of correcting for the RTM bias, he replaced the observed accident number with the long term mean of accident counts in the PAR previously stated. Davis and Yang ( 2001) made use of Hierarchical Bayes methods combined with an induced exposure model to identify intersections where the crash risk for a given driver subgroup is relatively higher than that for some other groups. They carried out the necessary computations using Gibbs sampling, producing point and interval estimates of relative crash risk for the specified driver group at each site in a sample. The methods can also be extended to identify hazardous locations for a specified accident type. This method of HSID requires sophisticated modeling skill and software, and is currently beyond the level of most DOT staff expertise. Kononov et al. ( 2002) presented the direct diagnostics method to conduct HSID and develop appropriate countermeasures. The underlying principle is that a site should be identified for further examination if there is overrepresentation of specific accidents relative to the similar sites. With empirical Bayes gradually becoming the standard and staple of professional practice, Hauer et al. ( 2002) presented a tutorial on safety estimation using the EB method. This tutorial contains comprehensive illustration of using the EB procedures and can be viewed as the bridge between theory and practice for the EB application. The above mentioned research represents only a small portion of the extensive past and current HSID research. In summary, the large body of techniques for HSID generally includes simple ranking of accident frequencies and/ or accident rates, rate quality control methods, site identification using the notion of a safety index, number and rate methods, accident pattern recognition method, and various applications of Bayesian approaches on both crash frequencies and crash rates. In comparison with other techniques, Bayesian techniques have been shown to offer improved ability to identify black spots by accounting for both history and expected crashes for similar sites, which can obviate the “ regression to the mean” problem that simpler methods fail to correct. This literature review summary clearly indicates that opportunities exist for possible enhancements leading to improved HSID within the recently released ALGSP model, which currently performs a simple ranking based on accident frequencies. However, as one might expect, the incorporation of Bayesian methods will increase the data collection burden: additional information about site crash histories and reference populations will need to be collected. The following section is devoted to describing the Bayesian techniques in greater detail. 11 BAYESIAN TECHNIQUES TO IDENTIFY HAZARDOUS LOCATIONS An underlying characteristic of crash occurrence is the random fluctuation from year to year of crash counts under constant and unchanging traffic, weather, and roadside conditions ( which of course in reality does not occur). This characteristic significantly reduces the ability to detect truly hazardous locations in the sense that a crash site may appear to represent a relatively high risk in a given year when in fact the site’s underlying, inherent risk level is average or low ( Hauer, 1997). A site that reveals a high observed risk in one year is on average followed by a crash count in the subsequent year that is closer to the mean— a phenomenon known as regression to the mean. However, it was shown in the previous section that Bayesian approaches, by utilizing two kinds of clues of an entity ( its traits and its historical accident record), involve corrections for RTM and can improve significantly the efficiency of site identification. Incorporation of such techniques into the ALGSP model will offer improvements in the performance of HSID. Unfortunately, in contrast to other approaches, which are relatively straightforward, the Bayesian techniques require a greater quantity of information associated with locations inspected and also involve relatively more complicated computations – albeit trivial for a computer. Noting that the large portion of this research is to test the performances of various HSID methods ( including the somewhat typical methods and Bayesian techniques), this section describes in detail the analytical aspects of various Bayesian techniques generally accepted as ‘ state of the art.’ The research reviews are divided into two groups: Bayesian techniques based on accident frequencies and Bayesian techniques based on accident rates. Bayesian Techniques Based on Accident Frequencies To alleviate the RTM bias associated with other site identification techniques, Hauer et al. ( 1984; 1988; 1992) discussed numerous aspects of HSID to derive what is known as the EB method. EB methods differ technically from Bayes’ methods in that the former relies on empirical data as “ subjective” information while the latter relies on truly subjective information ( e. g. expert opinions, judgment, etc.). The EB method rests on the following logic. Two assumptions are first needed, which can be traced back to those of Morin ( 1967) and Norden et al. ( 1956): Assumption 1: At one given location, accident occurrence obeys the Poisson probability law. That is, P x λ denotes the probability of recording x accidents on a site where their expected number is λ, where P x λ = λxe− λ / x!. ( 1) Assumption 2: The probability distribution of the λ of the population of sites is gamma distributed, where g ( λ) is denoted as the gamma probability density function. Estimation of the long term safety of an entity is obtained through using both kinds of clues, that is, the traits such as gender, age, traffic, or geometry of an entity and the 12 historical accident record of the entity. If the count of crashes ( x) obeys the Poisson probability law and the distribution of the λ’ s in the reference population is approximated by a Gamma probability density function, a good estimator of the λ for a specific entity is: αE{ λ}+ ( 1− α ) x, with α = E{ λ}/[ E{ λ }+ VAR{ λ }]. ( 2) From the above equation, we know estimates of E { λ} and VAR { λ} which pertain to the λ’ s of the reference population are needed. There are two methods to estimate the E { λ} and VAR { λ}. One of them is the method of sample moments, the other is the multivariate regression method. To describe the method of sample moments, let us first consider a reference population of n entities of which n( x) entities have recorded X= 0, 1, 2,… accidents during a specified period. With this notation, the sample mean and the sample variance are, respectively: μ = Σxn( x) / Σn( x) ( 3) s2 = [ Σ( x − μ) 2n( x)]/ Σn( x) ( 4) In the method of sample moments, the estimators of E { λ} and VAR { λ} are equal to μ and s2 μ respectively. The larger is the reference population. These estimates are more accurate. The primary attraction of the method is that its validity rests on a single assumption: that if λi remained constant, the occurrence of accidents would be well described by the Poisson probability law. However, there remain two practical difficulties: ( 1) It is rare that a sufficiently large data set can be found to allow for adequately accurate estimation of E { λ} and VAR { λ}; ( 2) Even with very large data sets, one cannot find adequate reference populations when entities are described by several traits ( e. g. geometric conditions, etc.). In order to obviate these difficulties, Hauer ( 1992) provided the multivariate regression method. With this correction, a multivariate model is fitted to accident data to estimate the E { λ} as a function of independent variables, and the residuals ( i. e., the difference between an accident count on some specific entity that served as “ datum” for model fitting and the estimate E { λ} calculated from the fitted model equation) are viewed as coming from a family of compound Poisson distributions: VAR{ x}= VAR{ λ}+ E{ λ} ( 5) The E { λ} of the reference population is estimated using the model equation; VAR{ x} is estimated using the squared residuals. Therefore, based on equation ( 5), the difference [ squared residual – estimate of E { λ}] can be used to estimate VAR { λ} for the imaginary reference population to which this datum point belongs. As mentioned previously, it is easy to note that the primary difference between the method of sample moments and multivariate regression method is that the estimates of E { λ} and VAR { λ} are obtained using different analytical procedures. The method of sample moments is straightforward, while the latter one yields more precise results. Once the estimates of E { λ} and VAR { λ} are obtained, the expected safety of an entity is obtained using Equation 2. However, the truly hazardous locations cannot be screened 13 based solely on the long term safety associated with each entity, a model of the entire distribution function of λ X is required. On the basis of the assumptions stated previously, the probability that a site selected randomly has x accidents is approximated by the negative binomial ( NB) probability distribution. Thus, the parameters of g ( λ) are estimated using EB logic according to the following sequence of steps: Step 1: The sample mean and variance is computed across sites. The notation n( x) is used to denote the number of sites that had x crashes. The estimated mean and variance are computed using: μ = Σxn( x) / Σn( x) ( 6) s2 = [ Σ( x − μ) 2n( x)]/ Σn( x) ( 7) Step 2: The EB weighting parameters α and β are then obtained using: α = μ /( s2 − μ ) ( 8) β = μ * α ( 9) Step 3: With the two weighting parameters obtained, the parameters of the gamma distribution are obtained such that: g( λ ) = α β λβ − 1e− αλ / Γ( β ) . ( 10) The subpopulation of sites that had x accidents also follows a gamma probability distribution and its gamma probability density function is given by: g( λ x) = ( 1 + α ) β + xλβ + x − 1e−( 1+ α ) λ / Γ( β + x) . ( 11) With the probability density functions defined, the selection of hazardous locations is now straightforward. Suppose that λ* is the “ acceptable” upper limit of accident counts, then a site i is identified as hazardous if the probability that λ exceeds λ* is relatively small. Specifically, if: P( λ> λ* x)> δ ( 12) Where δ is the tolerance level that is contingent upon the choice of safety specialists ( i. e. level of acceptable risk) and takes into account conditions in the local jurisdiction, then site i is identified as a truly hazardous location. Bayesian Techniques Based on Accident Rates In contrast to earlier papers regarding EB techniques, which were concerned with predicting the number of crashes that will occur at a particular location, Higle and Witkowski ( 1988) investigated using Bayesian analysis of crashes for the identification of hazardous locations based on accident rates and not frequencies. It should be noted that use of rates has been strongly discouraged by some researchers, and a growing body of literature discourages the use of rates ( Hauer, 1997). Due to the similar assumptions 14 and procedures, the research can be viewed as a complement to the previous research relying on EB approaches. Using empirical comparisons of performance between Bayesian and classical statistical analyses, Higle et al. found that there is an appreciable difference among the various identification techniques, and that some classically based statistical techniques may be prone to err in the direction of excessive false negatives. Higle and Witkowski divided the Bayesian analysis into two steps. In the first step, crash histories are aggregated across a number of sites to get a gross estimation of the probability distribution of the accident rates across the region. In the second step, the regional distribution and the accident history at a particular site are used to obtain a refined estimation of the probability distribution associated with the accident rate at that particular site. In performing the analysis, Higle and Witkowski made two assumptions that are similar to those made by previous researchers: Assumption 1: At any given location, when the accident rate is known ( i. e., if R R i ~ = , note that i R~ is treated as a random variable), the actual number of accidents follows a Poisson distribution with expected value i R( DQ) . That is: } ( ) R DQ i X i i i i e X P X X R R DQ R DQ ( ) ! ( ) { = ~ = ( ) = − ( 13) Assumption 2: The probability distribution of the regional accident rate, fR( R), is the gamma distribution. That is: ( ) R R f R Rα e β α α β − − Γ ( ) = 1 ( 14) Higle and Witkowski recommended that for each computation, it may be preferable to use the MME ( method of moments estimates) values rather than the MLE ( maximum likelihood estimates) values of α and β. Within the framework of Bayesian analysis, the site specific parameters are: i i α = α + X , i i DQ) ( + = β β . Based on α i and βi, the sitespecific probability density functions were then obtained. The steps to identify the truly hazardous locations are shown as follows: Step 1: Estimate the sample mean and variance of the observed accident rates of the population of locations: Σ= = m i i i DQ X m 1 ( ) μ 1 ( 15) Σ− ⎟ ⎟⎠ ⎞ ⎜ ⎜⎝ ⎛ − − = m i i i DQ X m s 1 2 2 1 ( ) 1 μ ( 16) Step 2: Estimate parameters α and β, where: β = μ / s2 ( 17) α = μ * β ( 18) 15 With the two parameters, ( ) R fR R Rα e β α α β − − Γ ( ) = 1 ( 19) Step 3: Obtain i i i f R X,( DQ). The subpopulation of sites that had X accidents also follows gamma distribution and its gamma probability density function is as follows: R i i i i i i i i f R X DQ Rα e β α α β − − Γ = 1 ( ) ,( ) . ( 20) Where: i i α = α + X ( 21) i i β = β + ( DQ) ( 22) With these probability density functions, the selection of hazardous locations is now straightforward. Suppose that λ* is the “ acceptable” upper limit accident counts, then a site i can be deemed as hazardous if the probability that λ exceeds λ* is relatively significant. Say, if: P( λ > λ* x)> δ , ( 23) Where δ is the tolerance level which is contingent upon the choice of safety specialists and the actual situation of local jurisdiction. Sites above the critical threshold are then identified as truly hazardous locations. To summarize, Bayesian techniques, by accounting for both crash history and expected crashes for similar sites, have been shown to offer improved ability to identify truly hazardous locations. The next section quantifies the differences between Bayesian techniques and other typical approaches. 16 17 CHAPTER III  EXPERIMENT DESIGN FOR EVALUATION OF HSID METHODS AND EXPLORATION OF ACCIDENT HISTORY On the basis of the previous literature review for HSID methods, Bayesian methods revealed themselves as superior for accurately identifying sites with promise. However, much of the research was conducted on real crash data ( where hazardous sites are not truly known) and comparisons across various Bayesian methods have not been conducted. This chapter is focused on examining the performances of the EB and alternative typical methods within various environments and exploring the best duration of accident history, which causes minimum false identifications. The chapter is divided into sections as follows. Section 1, “ Experiment for Evaluating HSID Method Performance,” discusses the steps of an experiment designed to evaluate the performance of HSID methods. Section 2, “ Experiment for Optimizing Duration of Crash History” presents the steps with regard to the optimum duration of before period crash data. Both real data and simulated crash data are utilized in the experiments. The real data were obtained from current ALGSP users in Arizona. Simulated data correspond with a designed experiment that varies such as degree ( or percentage) of difference between “ correctable” and “ average” sites, variability in the data, and different crash distributions. The final section provides the conclusions and recommendations that arise from the two experiments performed to evaluated HSID methods for use in the ALGSP, and translate the analytical results into practical recommendations. EXPERIMENT FOR EVALUATING HSID METHOD PERFORMANCE The main objective of this first experiment is to quantify and assess the predictive performance of various HSID methods, such as the simple ranking method, the method based on classical statistical confidence intervals, and the EB method, in order to identify the best one for inclusion in the ALGSP model. Of course there are many aspects of the simulation experiment that desire careful attention, such as sample sizes, nature of crash data, reliability of tests, etc. Prior to describing the detailed aspects of the experiment, HSID methods are first reviewed. Hot Spot Identification Methods A site ( series of sites, etc.) may experience relatively high numbers of crashes due to: 1) an underlying safety problem; or 2) a random “ up” fluctuation in crash counts during the observation period. Simply observing unusually high crash counts does not indicate which of the two conditions prevails at the site. It is possible to articulate the objective of HSID as follows: The objective of hot spot identification is to identify transportation system locations ( road segments, intersections, interchanges, ramps, etc.) that possess underlying correctable safety problems, and whose effect will be revealed through elevated crash frequencies relative to similar locations. 18 Two aspects of the previous statement are noteworthy. First, it is possible to have truly unsafe sites that do not reveal elevated crash frequencies— these are termed ‘ false negatives.’ It is also possible to have elevated crash frequencies, which do not result from underlying safety problems— these are termed ‘ false positives.’ False positives, if acted upon, lead to investment of public funds with few safety benefits. False negatives lead to missed opportunities for effective safety investments. As one might expect, correct determinations include identifying a safe site as “ safe” and an unsafe site as “ high risk.” When considering the seriousness of errors ( false positives and false negatives) with respect to safety management, one generally concludes that false negatives are the least desirable result, since a jurisdiction will fail to make wise investments and reduce fatalities, injuries ( serious and minor), and property damage crashes. For evaluation purposes, an HSID method is sought that produces the smallest proportion of false negatives and false positives. Hence, the percentages of false negatives, false positives, and overall misidentifications ( false positives plus false negatives) are used to compare the performance of three commonly implemented HSID methods: 1) simple ranking of sites; 2) classically based confidence intervals; and 3) the EB methods. These three methods are now described. The simple ranking method ( denoted SR in experiments) is the most straightforward HSID method. Applying this method, a set of locations ( e. g. all 4 lane signalized intersections in a jurisdiction) is ranked in descending order of crash frequencies ( or counts, X), and then the top sites are identified as high risk locations for further inspections. Typically, resources are invested to improve correctable sites from the top down until allocated funds are expended. This method, for example, is one analysis option available in the current version of the ALGSP model. A second method for HSID is based on classical statistical confidence intervals ( denoted CI in experiments) ( 1975). Location i is identified as unsafe if the observed accident count Xi exceeds the observed average of counts of comparison ( similar) locations, μ, with level of confidence equal to δ, that is, Xi > μ+ KδS, where S is denoted as the standard deviation of the comparison locations, and Kδ is the corresponding critical values. In practice δ is typically 0.90, 0.95, or 0.99, and depends upon the actual situation and considerations such as the number of sites, amount of safety investment resources, etc. These values serve as approximations, since they are borrowed from the normal distribution function and thus have no special meaning in terms of the distribution of true accident counts, which typically follow Poisson or negative binomial distributions. This method is commonly used in the sense that it is inferred from the classical statistics and can be performed conveniently. Critical in the SR and CI methods is the notion of ‘ comparison sites.’ Comparison sites are used to obtain an estimate of ‘ expected crashes’ for similar sites. When sites are ranked using simple ranking, it is assumed that sites that are being ranked with similar geometric and traffic conditions. Geometrics and traffic play a significant role in crash potential and thus must be treated carefully. Often jurisdictions will group to the extent 19 possible ‘ similar’ sites together in the ranking; however, it is often the case that sites with different geometric and traffic conditions ( i. e. exposure) are compared in the ranking method. In the confidence interval method, it is assumed that the group or set of comparison sites are similar to the site being compared. Critical to the outcome of any HSID method is the level of sophistication employed to identify comparison sites. For the EB technique, the former section has given a detailed description. It is noteworthy that only the EB based on accident counts would be used herein. Equation 24 is followed to compute the long term accidents of each site: { } ( 1 ) , i i λ = αE λ + − α x with α = E{ λ }/[ E{ λ }+ VAR{ λ }]. ( 24) The weight parameter α is obtained by using the method of sample moments in which the estimators of E{ λ} and VAR{ λ} are equal to μ and s2 respectively ( μ denotes the sample mean and s2 denotes the sample variance). From the above expressions, it is known that the second of the two clues, crash history, significantly affects the estimate of λ, since longer crash histories tend to be more stable ( in crashes per year) than shorter crash histories. Thus, different historical accident records yield different estimators of E{ λ} and VAR{ λ}, and subsequently different identification error rates ( false positives and false negatives). Similarly, these different identification error rates are also supposed to be obtained under simple ranking and confidence analysis methods when utilizing various historical accident records. Because of its importance, the optimum crash history is examined in an experiment described in chapter 2 of this report. Ground Rules for Simulation Experiment To accomplish the evaluation of HSID methods, a simulation experiment was designed to test a variety of conditions. The simulation experiment consists of the following specific steps: 1) Generate mean crash frequencies from real data. Crash datasets from Arizona ( and users of the ALGSP) which represent a range of in situ crash conditions ( i. e., intersections, road segments, etc.) are first obtained. These data are used to determine various shapes of distributions of crash means ( λ’ s). Gamma distributions are fit to the observed data to reflect heterogeneity in site crash means. These gamma distributed means are meant to reflect TRUTH, that is, the true state of underlying safety at various locations on a transportation network ( note that in practice we do not know TRUTH— and herein lies the power of simulation). The gamma distributed means are denoted true Poisson means ( TPMs), and represent the means of crashes across sites. 2) From TPMs, generate random Poisson samples. Thirty independent random numbers for each simulated site are generated. For each of the 1000 sites, the TPM is used to generate 30 crash counts that represent OBSERVED data for 30 different observation periods, which are assumed to represent years in the analysis. 20 3) Evaluate HSID performance. By knowing the true state of safety for sites ( the TPMs), and having observed data ( the randomly generated Poisson numbers), the performance of HSID methods can be tested. The following steps are used to set up the evaluation: a) SR, CI, and EB are applied in separate simulation runs to rank sites for improvement. These are applied by columns ( a single observation period, which represents what an analyst might see in reality). b) For the Bayesian runs, it is assumed that rows ( data across observation periods for the same site) can also be used to represent the comparison group in order to calculate E( x) and VAR( x). This implies that the analyst has accommodated for covariates and is able to estimate an expected value for a site that accounts for things such as exposure, geometrics, etc. c) For the various hot spot thresholds, false positives, false negatives, and total misidentifications in percent are computed. The percent of false positives will always be larger than the percent of false negatives since the latter represent hazardous sites that get identified as non hazardous, which is much larger candidate pool of sites than hazardous sites. Recall that false positives are safe sites that are identified as hazardous, a relatively small pool of sites. 4) Evaluate effect of length of history. In the SR, CI, and EB methods the analyst must decide how long a history to use for calculations. In this experiment the effect of various accident histories ( 1 year until 10 years of data) on performance are evaluated based on the corresponding identification rate. 5) Make practical recommendations. The results of the previous steps are discussed and translated into practical recommendations for improving the ALGSP software. Various aspects of the simulation experiment previously listed need to be discussed, as the quality and design of the simulated data directly impacts the quality and generalizability of the analysis results. Generating Mean Crash Frequencies from Real Data To support the development of simulated crash data, 6 years ( January 1995 through December 2000) of crash counts from intersections in Apache, Gila, Graham, La Paz, Pima, and Santa Cruz counties in the state of Arizona are used. These data and their corresponding cumulative distributions are shown in Appendix A. Three types of characteristically different underlying cumulative distributions of TPMs were observed in the Arizona crash data: an exponential shape ( denoted E), a linear shape ( denoted L), and an s shape ( denoted S). In addition, two levels of heterogeneity in crash counts were observed: low heterogeneity ( denoted 1) where the range in observed crash counts is less than 20 crashes, and high heterogeneity ( denoted 2) where the range is in excess of 50 crashes. Recall that the empirical distributions will be used to generate TRUTH, or the means of Poisson counts of sites with varying underlying means. In this simulation study these are 21 denoted as TPMs. Since the data represent the true underlying safety of a site, crash counts are Poisson distributed at an individual site, and the statistic is the mean. The cumulative distributions used to represent the TPMs are labeled as E1, E2, L1, L2, S1, and S2, respectively. For example, E2 represents an exponential shaped distribution with high heterogeneity in TPMs. These six data sets were selected from various jurisdictions within Arizona to try to represent the range of underlying characteristics related to true accident count distributions, with the intent of making the results gained from this experiment applicable across a variety of typical situations. As stated previously, the observed data are used to inform the simulation of the TPMs. In this experiment three reasonable assumptions are required to establish the foundation for a successful simulation of crash data: Assumption 1: The empirical cumulative distributions shown in Figures 12 through 17 ( see Appendix A) represent the TPMs of the underlying crash process— thus the true safety of all sites in the collection of sites is known. These data in reality are unknowable, since it is not known a priori which sites are “ hazardous.” Assumption 2: Theoretical distribution of these TPMs of the population of sites follows gamma distribution, and the probability that a site selected randomly has a given number of accidents is approximated by the negative binomial distribution. Assumption 3: The TPMs provide the basis for generating observed crash count data. Thus, for example, the median ranked site in Figure 12 ( E1) that has an underlying Poisson mean of around six crashes ( per observation period) is used to randomly generate a crash outcome, which could be 0, 1, 2, 3, …. etc. in any given observation period. The result of assumptions 1 and 2 is that for each simulated site the underlying TPM ( expected crash count) is known, which is then used to randomly generate the observed crash count. Generation of Random Poisson Samples from TPMs The empirical cumulative TPMs shown in Figures 12~ 17 ( see Appendix A) represent the data required to meet Assumption 1 discussed previously. Using these data, observed crash counts are generated to represent observed data for a given observation period. However, due to the relatively small observed sample sizes ( less than 200 sites in all six datasets) and the corresponding dispersion of crash counts, no sites would be identified as hazardous in some cases when using the three HSID methods stated previously. For example, if the top 1% of sites are identified as high risk ( δ = 0.99), all the sites in the datasets labeled as L1, S1, and L2 would be identified as safe when utilizing the classical confidence interval method and Bayesian method, thus leading to zero false negatives in these scenarios and damaging the regulars of results to some degree. 22 To solve this problem and provide sufficient sample sizes for statistical comparisons, theoretical distributions of TPMs are fitted to the six datasets. Then the sample sizes are enlarged by randomly generating the required number of sites under these gamma distributions ( site specific crash means are gamma distributed whereas within site crashes are Poisson distributed). In this experiment, 1,000 sites are simulated. Fitting specific gamma distributions to a given sequence of data can be implemented through various software packages, such as MINITAB, SAS 8.1 ( 1998), and Arena 7.0 ( Kelton, 2003). Herein the Arena 7.0 is employed. Within the context of Arena, the curve fitting is based on the use of maximum likelihood estimators, and the quality of a curve fit is based primarily on the square error criterion. The fitting of probability density function ( PDF) of a gamma distribution to the observed data is based on the histogram plot of these data. The distribution summary report also presents the expression of given probability density function, the corresponding p value of Chi Square test and square error, etc. Figure 1 shows one example of fitting gamma distribution to the dataset. To show the fitting effect, the corresponding theoretical cumulative distribution function ( CDF) is also plotted in the same graph of empirical CDF ( Figure 2 shows the distribution of dataset E1). The figures show that the gamma distribution fits well to the observed data. The summary of all six fittings is shown in Table 1. 23 Distribution Summary Distribution: Gamma Expression: 3.5 + GAMM ( 13.4, 2.27) Square Error: 0.020052 Chi Square Test Results Number of intervals = 8 Degrees of freedom = 5 Test Statistic = 8.2 Corresponding p value = 0.16 Data Summary Number of Data Points = 94 Min Data Value = 4 Max Data Value = 70 Sample Mean = 33.8 Sample Std Dev = 16.7 Histogram Summary Histogram Range = 3.5 to 70.5 Number of Intervals = 67 Figure 1: Observed and Fitted PDF of E1 Crash Data and Fit Summary Statistics 24 0 10 20 30 40 50 60 70 Accident counts  10 10 30 50 70 90 Cumulative distribution E1 Figure 2: Fitted and Empirical CDF of E1 Table 1: Summary of Gamma Fittings of Six Datasets Data set Fitting Expression Square Error Test Statistic p value E1 0.5+ Gamm( 3.79,1.75) 0.022344 26 < 0.005 E2 1.5+ Gamm( 15.9,1.7) 0.011836 13.4 0.0385 L1 0.5+ Gamm( 4.31,1.71) 0.038173 11.1 0.0119 L2 3.5+ Gamm( 13.4,2.27) 0.020052 8.2 0.16 S1 0.5+ Gamm( 2,4.3) 0.014903 33.5 < 0.005 S2 0.5+ Gamm( 9.06,2.57) 0.013211 23 < 0.005 Note: E— Exponential shape; L— Linear shape; S— Sigmoidal shape; 1— Low heterogeneity of crash counts; 2— High heterogeneity of crash counts. After TPMs have been simulated ( the crash means across sites which reflect the true and typically unknown state of nature), the next step is to generate observed crash counts for the sites. These counts will represent the observed crash counts across observation periods for a particular site ( where its true safety is known). It is well established that crash counts fluctuate across observation periods as a result of the randomness inherent in the underlying crash process and is well approximated by a Poisson process. In other words, the count of crashes changes from one period to another even if driver demography, traffic flow, road, weather, and the like remained unchanged. To represent this natural fluctuation, a random sample of 30 observation periods ( which could be months, years, etc.) associated with each location is randomly generated using a random number generator and underlying TPMs defined by the fitted distributions in Figure 12~ 17 ( see Appendix A). A small snapshot of the data obtained from this simulation is shown in Table 2. 25 Table 2: Simulated Data for 30 Sites and 16 Observation Periods SITE TPM PERIOD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 4 5 1 4 1 2 7 4 3 4 4 2 1 1 5 5 6 2 8 5 9 8 6 8 4 9 9 5 4 8 8 9 9 13 8 3 8 12 7 10 5 5 7 11 8 8 8 11 6 6 7 8 7 4 9 12 9 10 16 8 12 7 9 11 8 10 8 16 11 6 8 5 9 10 13 12 8 9 6 12 10 9 9 4 5 12 11 11 4 6 10 15 4 6 10 4 17 6 11 12 7 10 10 15 6 17 10 7 10 8 5 10 8 13 10 11 7 12 10 8 9 9 6 9 10 8 10 7 8 11 14 10 12 7 11 12 11 12 13 7 7 7 11 9 12 13 17 8 14 12 10 16 10 7 15 17 9 11 15 14 15 10 12 10 9 13 13 6 12 18 11 15 12 12 12 13 12 13 9 11 12 9 10 10 14 15 12 7 14 6 12 11 19 9 17 10 18 12 12 11 14 14 9 16 7 15 3 10 13 9 11 7 2 12 14 13 12 15 15 16 13 8 12 13 16 16 12 15 11 15 12 14 9 14 12 14 10 10 11 15 15 12 13 14 15 13 14 11 13 17 19 15 12 11 12 12 8 12 13 12 7 9 11 9 9 9 12 4 9 16 13 8 17 13 8 12 11 17 15 16 13 12 15 16 12 14 19 17 13 9 13 16 16 11 8 6 18 12 8 7 11 12 12 17 15 18 13 10 18 15 16 10 15 10 16 17 10 6 8 8 10 13 6 19 13 14 13 17 11 6 11 18 15 11 17 16 19 13 11 15 14 20 13 7 4 13 11 12 10 17 19 6 7 12 15 7 15 14 12 21 14 16 17 12 18 13 17 12 11 7 13 15 10 18 14 17 19 22 15 15 18 21 15 15 14 13 21 14 13 20 13 12 19 16 16 23 15 11 13 16 12 12 16 10 16 19 20 21 16 13 19 11 16 24 15 9 16 16 11 14 12 15 18 11 16 14 29 11 12 19 14 25 16 18 12 15 9 19 18 14 11 19 15 18 14 18 18 14 20 26 17 22 10 19 12 15 19 18 10 11 17 20 16 15 11 10 15 27 18 14 21 9 19 16 17 19 18 18 14 16 28 19 18 19 10 28 18 8 20 19 5 16 18 20 28 16 17 19 14 15 14 18 15 29 19 26 19 18 21 17 29 12 22 25 15 23 11 19 20 15 24 30 20 22 18 23 21 23 19 26 22 16 20 19 15 14 19 13 15 Note: SITE= number of site, e. g. intersection, road segment, etc.; TPM= true underlying safety of site or Poisson mean; SIMULATED DATA= observed crash count in observation period; Shaded cells represent ‘ truly hazardous’ locations ( sites 19 and 20). Table 2 shows 16 simulated observations periods for 30 sites with TPMs given in the second column from the left. For example, the two sites with 19 or more crashes per observation period may be identified a priori as hazardous since the TPMs reflect the true underlying state of nature. The two sites in the shaded cells are hot spots whereas the 18 sites above the shaded area are ‘ safe.’ In any given observation period such as observation period 5, the observed number of truly hazardous sites that recorded 19 or more crashes was two sites out of 20, where one was a truly hazardous site ( site 20) and 26 one was not ( site 16, a false positive). In observation period 5 there was also a false negative, since truly hazardous site 19 revealed only 17 crashes. So, by simulating large numbers of observation periods ( 30) characterized by different TPM cumulative distribution shapes, a large number of sites ( 1000) for each of the six observed crash distributions, the number of false negatives and positives ( the sum total of the two is called false identifications) can be counted as a consequence of the three different HSID methods described previously. Performance Evaluation Results for HSID Methods Given knowledge of three HSID methods, the ground rules for the simulation experiment, and an explanation of how data were simulated, the three HSID methods were applied to the simulated data to evaluate their relative effectiveness at identifying hot spots. Establishing fair comparisons among the different HSID methods is paramount. In order to objectively compare the performances of the HSID methods described previously, equivalent evaluation criteria must be used. One consideration in this regard is the use of δ, or cutoff level used to establish hazardous locations. Three values of δ are employed in the evaluations, 0.90, 0.95, and 0.99 corresponding to the top 10%, 5%, and 1% of all sites respectively. In practice, this corresponds with the amount of resources available for remediation and the number of similar sites being compared. For example, a local government wanting to remediate hot spot signalized intersections ( where 75 such intersections exist) might fix 7 intersections, or 10% ( δ = 0.90). All parameters of the simulation experiment have now been described. They include shapes of the TPMs ( E, S, and L), levels of heterogeneity in the TPMs ( 1 and 2), and levels of δ ( 0.90, 0.95, and 0.99). Three HSID methods are assessed, SR, CI, and EB. Evaluation criteria include percent of false positives ( FP), percent of false negatives ( FN), and sum total percent of FP and FN, called false identifications ( FI). For all of the simulations, samples sizes were 1,000 for TPMs and 30 for observation periods. To conduct the simulation experiment with these parameters, the following steps were undertaken: 1. All the TPM cumulative distributions are divided into truly hazardous locations and non hazardous locations, using thresholds of 0.90, 0.95, and 0.99 to represent different data separation thresholds. This step results in three “ critical” crash count threshold values, CC0.90, CC0.95, and CC0.99 for each combination of cumulative TPM shape and heterogeneity level. These values represent differentiation values to distinguish between known truly hazardous locations and safe locations. 2. The three different HSID methods are used to identify hot spots using the simulated data. Specifically, the SR method simply ranks observed frequencies as shown in Table 2, the CI method uses the entire sample mean and standard deviation to determine confidence intervals for ranking, and the EB method uses a weighted 27 average of crash history and observed frequency using Gamma distribution parameters to rank sites. 3. Simulated crash data are then compared to the values CC0.90, CC0.95, and CC0.99. For the truly hazardous sites, if the randomly generated crash counts are lower than the values CC0.90, CC0.95, and CC0.99, then FNs are produced. Truly hazardous sites generated observed crash counts lower than the critical crash count values. Similarly, for the collection of non hazardous locations, when the simulated data are larger than the values CC0.90, CC0.95, and CC0.99, FPs are generated. FPs and FNs are simply counted for each simulation run. Similarly, the number of FIs is the sum of the number of false negatives and positives. 4. To make the three performance metrics comparable across simulations, the percentage of FNs, FPs, and FIs are calculated. Because the FNs are the truly hazardous locations that are mistook as “ safe” sites, the percentage is simply the number of simulated FNs divided by the simulated truly safe sites; similarly, the percentage of the FPs is the number of FPs divided by the truly hazardous locations. Finally, the percentage of FIs is obtained by dividing the sum of FNs and FPs by the total number of randomly generated data locations. For example, suppose there are 20 sites under inspection with the top five of them are identified as hot spots according to the corresponding information of TPM. Again, the number of simulated data for each site is assumed as 30, thus, the total truly hazardous locations would be 150, and the number of truly safe ones is 4,500. If 45 sites among the 150 truly hazardous locations are wrongly viewed as safe ones, the percent of FN would be 45/ 4,500* 100%= 1%. 5. Finally, the percentage of FPs, FNs, and FIs across simulation conditions are tallied and reported. Tables 3 and 4 summarize the results of the errors ( FNs, FPs, and FIs) produced under the variety of simulation conditions. Table 3 presents the results when heterogeneity of crash counts is relatively low, while Table 4 presents the results when heterogeneity is relatively high. Critical crash count threshold values increase from left to right in both tables. The runs labeled CI, SR, and EB refer to classical confidence interval, simple ranking, and Bayesian methods of HSID respectively. Finally, L, S, and E refer to the underlying characteristics shapes of the cumulative distributions of TPMs: linear, sshaped, and exponential respectively. For low heterogeneity and high heterogeneity simulations, the trends of percent errors with the increasing of value of δ are in conformance with each other, however, the values of percent errors for low heterogeneity are much higher than those for high heterogeneity. The major reasons are likely because the low heterogeneity dataset has relatively small standard deviations when compared with the other datasets. The small range of crash counts in a dataset makes it more difficult to identify hazardous locations. On the contrary, it is easy to identify hot spots when the corresponding crash counts are greatly dispersed, particularly when dispersion is large on the upper most crash count deciles. 28 Another prominent characteristic associated with both tables is that the percentage of false negatives decreases in the same direction as δ for the three kinds of HSID methods. In most cases the percentage of false negatives is substantially reduced using the EB method. The fairly complicated explanation for this is as follows. The threshold value divides the top ‘ outlying’ crash counts from the remainder of the data, either the top 10%, 5%, or 1% of observed counts. By definition these counts are more likely to suffer from regression to the mean in a subsequent observation period than from counts around the TPM. Thus the crash history of the top x% of crash counts act to reduce the effect of the current crash count x when ranking these sites. As a result, sites that suffer less from regression to the mean get ranked higher in the list— sites that ordinarily would have been ranked as false negatives. Conversely, the decrease of the percentage of the false negatives is accompanied by an increase in the percentage of the false positives ( except for δ of 0.95 for L1 and L2, in these two cases, the percent error of FP under the confidence analysis method is the smallest among the three threshold values). It shows that the stricter identification criteria would select less non hazardous sites for remedy, although it may leave the larger number of truly hazardous locations undetected. Surprisingly, the false identifications also go the same direction to the false negatives with the increase of the value of δ. Probably the best explanation for this phenomenon is that the relatively small number of false negatives can lead to more false positives, and then reduce the efficiency of the investment of local governments. In conclusion, the percent of false positives increases with the rising of thresholds, whereas the percent false negatives and false identifications decrease with the rising of thresholds. Results in almost all simulation scenarios share the same trends. 29 Table 3: Percent Errors for Low Heterogeneity in Crash Counts Percent Errors: Low Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 2.49 3.55 2.40 1.54 2.09 1.41 0.63 0.55 0.38 FP 62.76 31.97 E1 21.63 82.47 39.73 26.87 114.32 54.00 37.67 FI 7.17 6.39 4.33 5.31 3.97 2.69 2.46 1.08 0.75 FN 2.21 4.44 2.91 1.39 2.40 1.73 0.15 0.62 0.45 L1 FP 106.14 39.97 26.20 65.24 45.67 32.80 431.62 61.00 45.00 FI 8.75 7.99 5.24 3.62 4.57 3.28 2.10 1.22 0.90 FN 0.54 6.53 5.28 0.21 3.48 2.90 0.00 0.81 0.73 S1 FP 753.44 58.73 47.50 1251.33 66.20 55.13 NA 80.33 72.33 FI 10.03 11.75 9.50 6.46 6.62 5.51 1.91 1.61 1.45 Note: 1. FN— False negatives; FP— False Positives; FI— False Identifications. 2. In the table, the reason that some FPs can exceed 100% is due to non normality of the distribution and setting of threshold, and in these cases, the CI method identifies more hazardous locations than truly exist. For the same reason, the existence of “ NA” in the table is due to zero truly hazardous locations identified by confidence analysis. 3. The shaded cells show the lowest identification error rate. Table 4: Percent Errors for High Heterogeneity in Crash Counts Percent Errors: High Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 1.78 2.09 1.13 1.33 1.33 0.86 0.39 0.26 0.17 E1 FP 24.37 18.77 10.13 32.56 25.33 16.40 57.07 26.00 16.67 FI 4.13 3.75 2.03 3.34 2.53 1.64 1.54 0.52 0.33 FN 1.89 2.55 1.57 1.50 1.43 0.91 0.44 0.37 0.23 L1 FP 36.33 22.93 14.13 32.20 27.20 17.33 45.22 36.67 22.67 FI 5.14 4.59 2.83 3.40 2.72 1.73 1.29 0.73 0.45 FN 2.16 2.73 1.74 1.17 1.31 0.71 0.47 0.26 0.12 S1 FP 34.80 24.53 15.67 41.08 24.87 13.47 38.37 25.33 12.33 FI 5.16 4.91 3.13 3.31 2.49 1.35 1.32 0.51 0.25 Note: 1. FN— False negatives; FP— False Positives; FI— False Identifications. 2. The shaded cells show the lowest identification error rate. 30 There is also some difference among the percent errors resulting from the three identification methods. Comparing to the other two traditionally methods, the Bayesian technique yields fewer false negatives in most cases in both the tables. That is, the Bayesian technique is more efficient in flagging the sites that require further analysis. Unfortunately, this higher efficiency is at the cost of the substantial number of false positives generated, which reduce the efficiency of the investment of local governments. Only in the case of budgetary constraints may the false positives not result in the unneeded repairs of the locations that are not truly hazardous. As for the confidence interval method and the simple ranking method, there is no big difference between them. Both methods generally generate higher identification error rate than does Bayesian, indicating the relatively worse performance in identifying hazardous locations. EXPERIMENT FOR OPTIMIZING DURATION OF CRASH HISTORY May ( 1964) first discussed the issue that how many years of accident data should be analyzed when determining the accident prone locations. He explored the difference between sorts of average accident counts with “ t” increasing until 13 years. The result has shown that the difference diminishes as “ t” increases as well as the marginal benefit of increasing “ t” declines. The “ knee” of the curve is said to occur at t= 3 years. Based on that information, May then came to the conclusion that “ there is little to be gained by using a longer study period than three years.” In this experiment, a different logic is employed to explore the best study duration for accident data analysis. Instead of using the simple accident counts in the method presented by May, this experiment will utilize the identification error rate as an indicator, or the identification error rates associated with various “ t” years compared to obtain the optimum study period. When conducting history analysis, the three identification methods are also employer, and the corresponding processes remain the same. The only difference lies in how to use the different periods of data. To show the logic clearly, another small snapshot is used again ( Table 5). First, the ith column of data is assumed to represent the ith current year accident data. For example, for site 9, the first four data represent the accident counts during the four current years, and the rest data in the first four columns can be viewed as the accident counts associated with other similar sites during the same period. Let’s consider conducting Bayesian analysis. It is known that for a given t year period, Equation 24 is used for each site to compute the corresponding expected accident counts. However, since the TPM represent the long term number of accidents per year, thus for the t year period, average accident counts per year should be used in this equation. In the end of forth year, the “ x” for site 10 should be 14 accidents ( average of the first 4 data), and E { λ} = 12.88accidents ( row average accident), VAR { λ} = 5.18 accidents2 ( row variance), α= 0.713 thus the expected accident counts associated with site 10 by using the first 4 year data is 13.2 accidents. Obviously, for the 16 different observation periods, we can generate 13 Bayesian expected data associated with site 10 by using the 4 year history record. Based on these Bayesian expected accident counts of various sites, the previously stated process of the Bayesian method can then be employed to compute the percent of false negatives, false positives, and false identification for different “ t” years. The similar history analysis logic can also apply to 31 the other two identification methods. Due to a large amount of iterative computations in this experiment, a special computer code is written to calculate the various identification error rates associated with different period of accident data. Table 5: Snapshot of the Simulated Data Site TPM Simulated data 1 3 7 3 4 3 2 1 2 3 3 4 2 3 3 4 3 2 2 3 3 5 5 2 3 1 1 4 2 2 1 2 2 7 4 5 3 5 5 7 6 5 5 6 4 4 3 4 7 4 2 4 7 2 4 7 4 6 5 9 4 6 7 4 8 10 13 6 9 7 7 3 5 8 8 6 8 6 9 9 12 7 2 3 8 11 7 5 7 7 6 9 15 10 16 12 12 8 8 6 9 12 18 15 9 7 12 8 7 9 9 10 12 8 11 5 8 9 13 9 10 12 7 7 8 5 8 12 12 5 11 18 12 12 16 12 7 10 13 10 9 11 9 13 9 13 13 13 12 10 12 12 13 14 11 7 14 13 7 16 18 7 10 14 16 14 15 11 10 12 15 9 15 15 13 11 11 16 12 11 11 15 17 15 13 15 13 13 16 16 13 11 18 14 9 12 22 18 12 16 18 19 20 11 7 14 12 10 16 18 14 17 9 15 19 18 In theory, as the “ t” increases, the expected accident counts of each site, which is computed based on the simulated data, would converge to its TPM ( the reason is that in the experiment each row of simulated data strictly follow the Poisson distribution) and the corresponding identification error rate would converge to zero. However, in a real situation with “ t” increasing, each site would suffer from more influential factors, and thus the long period of data generally cannot represent the current situation. On the other hand, if the short period of data is used, lots of information would be missing and it is difficult to obtain the true long term accident counts. Consequently, a trade off should be made to find the study period that is short enough to represent the current condition and long enough to obtain the true expected accident counts. In this experiment, various identification rates are plotted versus the different “ t” years. The “ knee” of such a curve is expected as the optimum study period. Considering the data is older than 10 years, it no longer reflects a current situation. In the experiment, the 30 simulated data are averagely divided into 3 groups, that is, the first 10 columns of data belong to group 1, the eleventh to twentieth column of data follows into group 2, the last 10 columns of data belong to group 3. The common characteristic shared by the three groups is assumed to reflect the true relation between identification error rate and “ t” years. For each group, the three common confidence levels, 90%, 95%, and 99% are used for the three analyses. In the diagram of identification error rate vs. “ t” year, there still exists some fluctuations along the curve, although generally the identification error rate decreases while “ t” increases. To quickly determine and eliminate the initial “ warm up” period ( i. e., the period before the knee of the curve), Welch’s moving average method ( Kelton, 2003) is utilized. Through the moving average, this method can further out the statistical 32 fluctuations in observations ( yi) and illustrate clearly the “ warm up” period. As shown in Figure 3, series 1 represents the original Fn rates associated with different “ t.” Due to the existence of two outliers ( the plot of t= 4 and t= 6), it is difficult to obtain the “ knee” of the curve. However, it is easy to know from the series 2 ( the curve of moving averages) that the 5 year range is the best study period. 8 9 10 11 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 9 10 t FN(%) Ser i es1 Ser i es2 Figure 3: Moving Averages vs. Original Statistic The moving average Y ( w) i ( where w, the window size) of random observations is defined as follows: ⎪ ⎪⎩ ⎪ ⎪⎨ ⎧ = − + + + + = + − + + + + + = − − − i w i y y y i w m w w y y y Y w i i i w i i w i , 1, , 2 1 , 1, , ( ) 2 1 1 2 1 L L L L L L ( 25) In this experiment, the window size is selected as 1. RESULTS Similar to the previous experiment, the three HSID methods are also performed in this experiment to explore the optimal duration of accident history. The number of various optimal “ t” across the three confidence levels and three groups is shown in the Tables 6~ 8. For the convenience of viewing, the plots of the frequency of various t periods for the different confidence levels and groups are illustrated in the Figures 4~ 6, and the plots of the cumulative results of all the confidence levels and groups are demonstrated in the Figures 7~ 8. Readers interested in the details of identification error rates associated with various HSID methods, confidence levels, and groups are referred to Appendix B. 33 Table 6: The Number of t year Which is the “ Knee” of the Curve for Group 1 Year 1 2 3 4 5 6 7 8 9 10 90% 1 22 13 6 8 2 2 95% 1 1 23 10 8 7 2 2 99% 2 20 8 10 6 4 3 1 SUM 1 4 65 31 24 21 8 7 1 Note: In this group there are 162 scenarios ( 3 identification methods, 3 kinds of shapes, low and high heterogeneity for crash counts, 3 threshold values for truly hazardous locations, and 3 kinds of false identifications, or FN, FP, FI). Table 7: The Number of t year Which is the “ Knee” of the Curve for Group 2 Year 1 2 3 4 5 6 7 8 9 10 90% 2 0 28 10 4 5 3 1 1 95% 0 3 21 11 7 6 4 2 0 99% 0 1 27 9 5 7 2 3 0 SUM 2 4 76 30 16 18 9 6 1 Note: In this group there are 162 scenarios ( 3 identification methods, 3 kinds of shapes, low and high heterogeneity for crash counts, 3 threshold values for truly hazardous locations, and 3 kinds of false identifications, or FN, FP, FI). Table 8: The Number of t year Which is the “ Knee” of the Curve for Group 3 Year 1 2 3 4 5 6 7 8 9 10 90% 1 22 14 6 5 2 1 1 95% 2 2 20 7 7 8 3 4 1 99% 3 27 11 5 5 4 1 SUM 2 6 69 32 18 18 9 6 2 Note: In this group there are 162 scenarios ( 3 identification methods, 3 kinds of shapes, low and high heterogeneity for crash counts, 3 threshold values for truly hazardous locations, and 3 kinds of false identifications, or FN, FP, FI). 34 0 20 40 60 80 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 4: The Number of t year Which is the “ Knee” of the Curve for 90% Confidence Level 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 5: The Number of t year Which is the “ Knee” of the Curve for 95% Confidence Level 35 0 20 40 60 80 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 6: The Number of t year Which is the “ Knee” of the Curve for 99% Confidence Level 0 50 100 150 200 250 1 2 3 4 5 6 7 8 9 t year number Group 3 Group 2 Group 1 Figure 7: The Number of t year Which is the “ Knee” of the Curve for All Confidence Levels 36 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 t year cumulative percent Figure 8: The Cumulative Percent Distribution of Various t years In terms of Figures 7 and 8, it is known that across all the simulation scenarios, a 3 year crash history represented the largest portion of “ best” study period of crash history, and 3 through 6 years make up almost 90% of all the optimum t years. Hence, as the trade off between the long and short history record, if there is no significant physical change in the location under securitization and the long history record can be obtained, it is suggested that the most recent 6 years of crash record is sufficient to capture the majority of the beneficial effect of crash history. In contrast, 3 years of crash history data represents the ‘ shortest’ period of time that should be used and which achieves a significant benefit of crash history ( under most general conditions). Crash histories of 1 and 2 years provide relatively little benefit in the methods and under the range of conditions assessed. To illustrate the improvement in identification performance results from using 3 year history data, Tables 9 and 10 are provided ( in contrast to Tables 3 and 4). The differences lie in that Tables 3 and 4 use 1 year of crash data and the percent of identification rates are computed based on the last 30 years of data, whereas Tables 9 and 10 use 3 year data and the corresponding percent of identification rates are calculated on the basis of the current 10 years of data. 37 Table 9: Percent Errors for Low Heterogeneity in Crash Counts ( 3 Years Data) Percent Errors: Low Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 2.02 2.32 1.53 1.36 1.34 0.82 0.89 0.40 0.25 FP 28.06 20.88 E 13.75 38.60 25.50 15.50 48.56 40.00 25.00 FI 4.68 4.18 2.75 3.69 2.55 1.55 2.13 0.80 0.50 FN 2.56 2.75 2.13 1.69 1.72 1.25 0.47 0.51 0.40 L FP 33.16 24.75 19.13 50.00 32.75 23.75 91.07 50.00 40.00 FI 5.56 4.95 3.83 4.33 3.28 2.54 0.14 0.67 0.53 FN 1.10 4.88 4.33 0.68 2.88 2.54 0.14 0.67 0.53 S FP 228.21 43.88 39.00 239.38 54.75 48.25 362.16 66.25 52.50 FI 9.05 8.78 7.80 5.45 5.48 4.83 1.81 1.33 1.05 Note: 1. FN— False Negatives; FP— False Positives; FI— False Identifications; CI— Confidence Interval; SR — Simple Ranking; EB— Empirical Bayesian; E— Exponential Shape; L— Linear Shape; S— Sigmoidal Shape. 2. In the table, the reason that some FPs can exceed 100% is due to non normality of the distribution and setting of threshold, and in these cases, the CI method identifies more hazardous locations than truly exist. For the same reason, the existing of “ NA” in the table is due to zero truly hazardous locations identified by confidence analysis. 3. The shaded cells show the lowest identification error rate. Table 10: Percent Errors for High Heterogeneity in Crash Counts ( 3 Years Data) Percent Errors: High Heterogeneity δ 0.9 0.95 0.99 Method CI SR EB CI SR EB CI SR EB FN 1.08 1.28 0.67 0.96 0.95 0.71 0.24 0.14 0.10 E FP 13.96 11.50 6.00 15.32 18.00 13.50 34.66 13.75 10.00 FI 2.51 2.30 1.20 1.98 1.80 1.35 1.00 0.28 0.20 FN 1.72 1.63 1.36 1.19 0.96 0.87 0.41 0.21 0.20 L FP 14.37 14.63 12.25 15.07 18.25 16.50 20.11 21.25 18.25 FI 3.08 2.93 2.45 2.14 1.83 1.65 0.86 0.43 0.38 FN 2.10 2.04 1.65 0.70 0.66 0.55 0.40 0.15 0.10 S FP 18.01 18.38 14.88 20.83 12.50 10.50 21.03 15.00 10.00 FI 3.73 3.68 2.98 1.85 1.25 1.05 0.90 0.30 0.20 Note: 1. FN— False Negatives; FP— False Positives; FI— False Identifications; CI— Confidence Interval; SR — Simple Ranking; EB— Empirical Bayesian; E— Exponential Shape; L— Linear Shape; S— Sigmoidal Shape. 2. The shaded cells show the lowest identification error rate. 38 By comparing these tables, it is known that using 3 years of crash history data results in significant improvements in error rates for all three methods, CI, SR, 



A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

L 

M 

N 

O 

P 

R 

S 

T 

U 

V 

W 

Y 


