当前位置: 首页 > news >正文

导航网站建设代码网站推荐

导航网站建设,代码网站推荐,学校网站建设工作计划,有人在天琥设计学过吗目录 一、实验题目描述 二、实验步骤 三、Python代码实现基于SVM进行分类预测 四、我的收获 五、我的感受 一、实验题目描述 实验题目#xff1a;基于SVM进行分类预测 实验要求#xff1a;通过给定数据#xff0c;使用支持向量机算法#xff08;SVM#xff09;实现分…目录 一、实验题目描述 二、实验步骤 三、Python代码实现基于SVM进行分类预测 四、我的收获 五、我的感受 一、实验题目描述 实验题目基于SVM进行分类预测 实验要求通过给定数据使用支持向量机算法SVM实现分类预测具体为 筛选变量如行程距离、费用、时间等进行数据预处理如处理缺失值、异常值、归一化/标准化数据关于数据量过大的问题可以从中筛选部分数据但要求数据总量不可少于10w条解释数据选取依据。使用SVM算法实现对芝加哥出租车出行支付方式现金/信用卡的分类预测注要求使用给定数据集并且使用python进行数据处理。训练和预测的数据比例为80%20%给出明确的实验准确度验证过程。此外根据数据集中的其他变量进行进一步分析探索不同因素对支付方式的影响强度期待有新的发现并完成报告。 实验报告内容实验问题实验目标数据介绍需要文字介绍并辅助配合时间、空间、多因素分布等图表实验方法统一要求使用支持向量机算法SVM并且要将算法流程及公式、数据处理流程、实验验证流程、完整写入方法章节实验结果需要标注参数设置预测准确度等实验结果分析支持有多图表的实验结果分析不同因素对实验结果的重要性等有趣的发现额外加分。 实验数据集说明 数据集Chicago Taxi Trips Dataset (2023)-----包含芝加哥出租车出行记录 Trip ID出行编号    Taxi ID出租车编号    Trip Start/End Timestamp 上车/下车时间  Trip Seconds行程时长秒 Trip Miles行程距离英里 Pickup/ Dropoff Census Tract上车/下车人口普查区编号乘客上车位置所在的美国人口普查地块编号 Pickup/Dropoff Community Area上车/下车社区区域编号芝加哥市规划划分的77个社区区域 Fare基础车费    Tips小费金额   Tolls路桥费   Extras额外费用   Trip Total总 费用 Payment Type支付方式    Company所属出租车公司 Pickup Centroid Latitude/Longitude上车点纬度/经度 Pickup Centroid Location上车位置坐标上车地点的地理坐标格式为纬度, 经度 Dropoff Centroid Latitude/Longitude下车点纬度/经度 Dropoff Centroid Location下车位置坐标下车地点的地理坐标格式为纬度, 经度 实验数据集可以参考https://download.csdn.net/download/2401_84149564/90962954?spm1011.2124.3001.6210由于数据量过大这里只列出前108行数据都可以放在linear.csv和spiral.csv文件中。 Trip ID,Taxi ID,Trip Start Timestamp,Trip End Timestamp,Trip Seconds,Trip Miles,Pickup Census Tract,Dropoff Census Tract,Pickup Community Area,Dropoff Community Area,Fare,Tips,Tolls,Extras,Trip Total,Payment Type,Company,Pickup Centroid Latitude,Pickup Centroid Longitude,Pickup Centroid Location,Dropoff Centroid Latitude,Dropoff Centroid Longitude,Dropoff Centroid Location 011106b6114f83af0c17aace3867a464a7fc742b,4628ef9dfa973bdfe877c5aa9d9738f9dc1204e54f2f1a4cc18141f37e2e66d080533f82510a96d1525b28eee833696f7e1337e9999a38f2fd5babf71585a344,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,982,0.81,17031081800,17031081500,8,8,8.75,3,0,1,13.25,Credit Card,Chicago Independents,41.89321636,-87.63784421,POINT (-87.6378442095 41.8932163595),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809) e9a66ddcc78cfd79f419165314cbe5ee380f16c3,8efe74ab61de459003dcedd85c637ce11bba19bac633cde9559a4895c98d7185ce0f7742dbd8b1938151fc3cdb89a3b4234bf80bf6368654831c79cf9685b3a9,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,390,0.09,,,,,34.86,0,0,0,34.86,Cash,Flash Cab,,,,,, e765192268db3480b5d9bd0443f7ce7fd5ba047d,6f45c05aa231c9dd389ebdb65ca751cd82ef7634766017a1240d6554bf91840a924cf3cd16564a1ca643c9b1880db706d977ab5164f4b0bd030e0fff21cb3934,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1271,4.18,,,8,6,15.5,0,0,0,15.5,Cash,Flash Cab,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) c6510d4f82541cfacf8c20cab44fbb7c0b2c5efe,89fc6b1f0628f328ccd1021fcf4e7318bb2f9962da9259b522bde63ca44f9f201a016291bbd2801fad04845cfd5b30b954afedccb22f6c49fadda05821804a06,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1280,8.19,,,,,22.5,3,0,0,26,Credit Card,Flash Cab,,,,,, f9445eed26da9a0eff247350df942616cb51e764,14275cab8bb64007379de40be92944817231f63047033992a1964ce85a4a5405085b87a804736c24ecd8a334baa315255c066e271d7f73dfb756a19a24a44d25,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,585,5.02,,,28,6,35,2,0,0,37.5,Credit Card,Sun Taxi,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) f74fa03a6cf8cdcf668d0726efa1671d398b4450,2fea69c8a6e08471bc4339a05e9ee7955bef68d791f77a202bd54f3ae41c805907d7ac13a89f86fac4494c976ca87883157baa32ea41f59056661884135f6bba,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1473,9.94,,,56,,26.75,0,0,18,44.75,Cash,Sun Taxi,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),,, f40c2cda1cea33c2265a34b2ce1eb454067ad8d2,3618045f9110d4d88482266ade23659c1a50d32ac37f205c15614b1ada9d4ca14b171329afed5dfd81c7525bd5a05fe614cb63b2aa48d920626b519e20d9e146,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,0,0,,,24,24,3.25,0,0,11.25,14.5,Credit Card,Taxi Affiliation Services,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941) f3a139c0df3513324ff3f699bf40db2e84291e3a,9b48ad5744e86450fb4db78e7095a6827bafc43a6a9d9a8f656aac46cc0e429d129471cdad31f8a5a97b3a45c8af5fcbc80d003c1c4839075733900786e1a5a9,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1228,4.29,,,6,21,15.25,0,0,0,15.25,Cash,Sun Taxi,41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014),41.9386662,-87.71121059,POINT (-87.7112105933 41.9386661962) ee7cce18a4b24e080366930ab5ec72d1aeb6556c,adb1cb74113851b651b474182fbe95a9663783779db8cca0fdc3ff7ac82cc8fe5d864b7f86a995c64e405fa6c89cd87d00b458108ebe0c62bc23c4c79e61da46,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1310,18.23,,,28,76,60,0,0,0,60.5,Credit Card,5 Star Taxi,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146) ed445ada05f17c5f359892eda3c329e1445b5e7b,4b034948aceedd53262ae713f864b0364953a1852b6b24669f192cad26c5014f1af0b6c87b941abb1fa93e1abbe09f70d7f02d48e5371d2c55534b68565a3060,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,4,0,17031320100,17031320100,32,32,20,5.12,0,0,25.62,Credit Card,Sun Taxi,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918) ec183abaa7ff142f17ebcdafa1f3d4e611a9f494,f6d1b6c930d62f6d8cbbd8f86a593ff057408c82f764744a7a38ee63957a74b84eaeea80224ea3a0021ba1572323a282530b0659b5e1ce48d04939eacb504060,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,786,2.02,17031081500,17031330100,8,33,10,0,0,0,10,Cash,Chicago Independents,41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809),41.85934972,-87.61735801,POINT (-87.6173580061 41.859349715) e5c03bc6d864518431ce24706a4a9055221dc333,99ec13d5d806f5f5fa7a57910f8e38d84f90630529f2f8766d65b47caae8cb7cacf3d4bb9ca6576dfa49bc45b9f0f615e79577ace618514c9c59dc52ffbf40b6,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1064,9.58,,,76,12,25,0,0,5,30,Cash,5 Star Taxi,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.99393013,-87.75835359,POINT (-87.7583535876 41.9939301285) e2b8bea5dbc60464ff88ba8dc8b66836513101e8,884655d853cbe41e1cdf747969f0dc5b55ed2d5f76c09ae207083297c948a813b2dd57912bfb4a6f4230556ab61363257e586827846cf2a89ff35c7c3b1c08e5,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,279,1.51,,,49,49,6.5,0,0,0,6.5,Cash,Flash Cab,41.70658788,-87.62336651,POINT (-87.6233665115 41.7065878819),41.70658788,-87.62336651,POINT (-87.6233665115 41.7065878819) dcfbaa5d01e81e18637185fc5e822d6a08456f59,2659a61c08f91c6efd9e7d7947a00006a7bc26aa518241786d51cb05c853cdd86fdd1adc4010867706a2daa9f0da856cb2d7a705c111d3c89e53f5499741247e,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,300,1.8,,,76,7,7.5,4,0,1,13,Credit Card,Globe Taxi,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843) d51430f93404726b121d82d42efc29a2062895a8,24d4c5e51d147aecbb7c4a1ad70c38dbc05c7b4485f6de4b41fee5ea270228859c19c4160b101df87b889acc5cc7ab49b5e0491676026518677d34ea81f76591,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,379,1.25,,,14,13,6.75,3,0,0,10.25,Credit Card,Taxicab Insurance Agency Llc,41.968069,-87.72155906,POINT (-87.7215590627 41.968069),41.98363631,-87.72358319,POINT (-87.7235831853 41.9836363072) ce61e0f21271970f7bf3006d489638d1c320a62f,f7782da531b08c6ce5a1e16a8c2998f6f4f7943f29ab53713949ec17f3b4d7f8b4cd8a84da2fe7c4b0a17f1fc3439e376d3bfec25d83652dfa4468342e25f6b1,12/31/2023 11:45:00 PM,01/01/2024 12:45:00 AM,3845,4.98,,,32,6,31,0,0,0,31,Cash,Taxicab Insurance Agency Llc,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) cde7d22932829a7b19fb43bfd9a1d635c1e3f04e,52e8915b8a7b8851b341adb6797c3652a198b772561e9e9888d9963de61b796f60653ebdd06f44aa2fad304efb0a53160d6210b2a26fcc64b21932db0d658e32,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1852,22.97,17031980000,,76,,55.5,10,0,35,101,Credit Card,Flash Cab,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),,, cb50d6951086242beccb8fe7d248cfad3fab3dd7,179f1a051e9e6d3fc0726628962faff68506086ee8df14091e91399452fada055453e421df026fcc4449a43ba54a357c364482e62a20420f836389bd2269b5ee,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1383,17.95,,,28,,43.75,11.06,0,0,55.31,Credit Card,Blue Ribbon Taxi Association,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),,, c6cb3aad561e0c407239333d535a4922540f9adc,bca79085da78d157007711d04c6e06f655ee5eafb1e5b654033c2f34fcea1d1fd230b48cff6ab67c598d1baa406d0689fd207404711267f2d56fdd93e3a0e6ec,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,839,7.17,,,76,10,19.75,5.05,0,5,30.3,Credit Card,Flash Cab,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.9850151,-87.80453201,POINT (-87.8045320063 41.9850151008) c5ad9b572bff9d5cb9a8cf0da173789f7910a835,c7a8a53874bbcdb11e70a488485e8bdd0bb8cc0de8f5d98d3ef4d9c3223d7b4024a6d8fda9c00e8e55f3985e5a728ddce0175359c31e0bc5cd3c008d89b23ed1,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,7,0,,,28,28,30,0,0,0,30.5,Credit Card,Flash Cab,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) c23cbb41a952defb103a40ca767a32c387532614,32f1eebe57165cc17acba84eb8bb85d69a063ed0e5e15e108a68bc4548403834ab9c888ed3c3d474d10a0f95d498a5a9a0ef801331a61db60adec230c48e61b0,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1163,4.5,,,8,6,16.25,0,0,1,17.25,Cash,Chicago Independents,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) b3f31e1c4e3673813abc423c9b2e415bdbd1b3a8,acad560bbc140c4015f4685c6559c93b61ccaf0f7d80143fa408d25169652c7b861a3be42e8fbcb91cac8da7e691489626884bc2bab1ae6015710cde1eab4e3f,12/31/2023 11:45:00 PM,01/01/2024 12:30:00 AM,2305,12.63,,,8,47,37,0,0,0,37,Cash,Taxicab Insurance Agency Llc,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.72818206,-87.5964756,POINT (-87.5964755956 41.728182061) ae20c529b8423608f3f0bfcfa243100219f1241e,d3e38cf4471f5b65aa0c41a155252c395b7c8593ac6fb5741e0cac5f68831f4f418eebc0f19ea0acd5b398d6b376ec7cbb2dfa9b846af33a9d27146e2a009b4b,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,1,0.04,,,27,27,40,0,0,0,40,Cash,5 Star Taxi,41.8789145,-87.70589713,POINT (-87.7058971305 41.8789144956),41.8789145,-87.70589713,POINT (-87.7058971305 41.8789144956) ac9c9cd082dbdbfb67fc062dcb74ed713820e47f,75cf3a53aae5e5858361a7ca64f75d3407dc0a44d7bc42843fd566a614cf1adcb57d543a15db44103c4801f879ceb236261b336079807dbfd2cd7a7775f166dc,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,439,0.98,,,32,32,6.5,4,0,1,12,Credit Card,City Service,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841) a5a598958fc186bb7c09f6eb61fc57ce7c11f898,a31d2ea87ea4f5a4793c30f84f000b0c6aad4cd956f6ea73b5628ebd509d47a7d051a0b7ccd201a550c236c42ed6e19ce5efbd4caa4bf2d1fcd77827b64b39f9,12/31/2023 11:45:00 PM,01/01/2024 12:30:00 AM,3121,7.63,,,8,39,32,0,0,1,33,Cash,Flash Cab,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.80891628,-87.59618334,POINT (-87.5961833442 41.8089162826) a55c9af7b91b2239e4e432131062cc342f3cd2fe,dd16496faf01009b70959e7c0d5b86f9bb7f432a1771c5737f75542f886f1c16d56afc454b02a96737372836e47c77ffc172b0bdf80e13640ff9203b7d0d6dbc,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,502,0.38,,,8,8,6.25,2,0,0,8.75,Credit Card,Medallion Leasin,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 9f2747ed96c9f1465b93ce1f5114c907464c5d76,c26dee3edb5d4bce731d586ef40b399162a1c3a05cb5bb035e148b89a986a90612bb81bfe1745453c85fe7ae4a859e566611067e2fb730da1c1e5ce92674dd2d,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,0,0,,,8,8,3.25,0,0,1,4.25,Cash,Taxicab Insurance Agency, LLC,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 9937207e717533bf0a3e76621f06857138d6c2df,171ec426eaf8f54c5acbb7e3fde8e0683bfa6042af0b00e428e650cd9bc909011a2517de5b32358b9cb9d9ad3d5a5b26bbb14a09f8b7f5c1c9a37fa22f26f7a7,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,12,0.04,,,33,33,3.25,0,0,0,3.25,Cash,Flash Cab,41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585),41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585) 8bcad727d56e9761517e7129cc94ede7274f60fa,eed4cbab8d3be11fc5fcff8f92b3ba140f63602f2760446756572fc4262d89af90bd665c89604918253358364f34ae32113be46209e794e28390e6cca1a87768,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,361,1.16,,,28,28,20,4.1,0,0,24.6,Credit Card,Taxicab Insurance Agency Llc,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) 8142f4b1547f4e4a683a80b5a6c7d0325ce09559,f75191fdf728d7ed7f4277ee1e39372c16658b87abc26a057a7e74b79dd5457cb375f859ea318a2aa47f19d24142bc3563cd5b8c0bfa633161570ec9b3686897,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,600,1.1,17031320100,17031320400,32,32,7.5,0,0,2,9.5,Cash,Chicago Independents,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.87740612,-87.62197165,POINT (-87.6219716519 41.8774061234) 7de7d6b1667cea33735670f88c50e9631e719f04,756721b3418247472431e2bd1022cc8ce0806af1b6b7dfeb3927318f86819fc67bf385b5829a0f13e006d05aed02020e7c1801204414d5eadf17b3e2ce71ce14,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1138,12.08,,,56,24,31.25,5,0,7,43.75,Credit Card,Medallion Leasin,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941) 7d27d382cf9f0c93d7d0bf60c14cf7ec523624a0,be7e1462a37397809dadade8e174ef3ccbc3073294df4a0c1786610d3fbbc2cd18543a46938764af03553b89897d23ee9f88307fd485162626822cd306c2fed5,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1236,2.83,,,28,8,13,0,0,0,13,Cash,Blue Ribbon Taxi Association,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 716d7a0a2a097facc3f0f63e326830ecdf923d0a,2d72c5e6313ad93f663008a55045cad0c76164b057dcb756f23448dcfc082f616d8626020794704f296e6ad06f65837ff799930aa961729096167c5ef8612663,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,649,2.57,17031833000,17031330100,28,33,10,1,0,1,12.5,Credit Card,City Service,41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201),41.85934972,-87.61735801,POINT (-87.6173580061 41.859349715) 6f9899fa6b248a960572d5442018da559c192adb,90a7cf3946c408e70e8d64b08f2bc6819ae5de6159ecef3460c5287031148a66c4c0d4b6b6c53f13919fcb785db502dcd99c94fb60daa9ac6b338f01ed8c3a2b,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,940,5.02,,,1,14,15.75,0,0,0,15.75,Cash,Flash Cab,42.00962288,-87.67016686,POINT (-87.6701668569 42.0096228806),41.968069,-87.72155906,POINT (-87.7215590627 41.968069) 6b1b21ca32da77c68ee5d8816194ac27d9206082,38f6145c9a2b848dc1baa16fd91087e606b12bcb8757a9eb003dfab2c031fcaeb931c1ae6b486fab5f1c21037f33a187d1cb97080f4334a63f7ce0713d0f47b4,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1260,2.8,,,8,28,13,0,0,0,13,Credit Card,Taxi Affiliation Services,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) 6722a6b9bd13d58f95c3914e12e1ef8b6b48a507,fc9af5a263f70826b274b29067232130b35f23b91479bb66a0655224a22b586ae2c4f88090c3de82a4f428726dd5018b74b0c84627b9e2cf57ee329c5d794575,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1143,1.12,17031320400,17031081500,32,8,10,0,0,0,10,Cash,City Service,41.87740612,-87.62197165,POINT (-87.6219716519 41.8774061234),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809) 6138f0b3dff7fc33cd748727eb6714535a747657,35467b44491f6f51eaa0f4fb1cd65e4c23117aa268d9dd52d88a484194323088fc5a0d30455d37c6bced24218c2ed42b421d7260c119161f29b8381bd11b1784,12/31/2023 11:45:00 PM,01/01/2024 12:45:00 AM,3259,1.97,,,28,8,16,3.7,0,2,22.2,Credit Card,Sun Taxi,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 6132f6fb53c7329ed2e12fef54749d2ffc3d4d2f,b71c6761efe32829e7e453b0c6fcb78a456a7d83c720c746ac0575025dd8c5e3cd6b554288cf71419c89931c34166201ab5f47fe928d5d18e377bad66b8750fc,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1308,4.87,,,8,23,16.25,0,0,0,16.25,Cash,Flash Cab,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9000696,-87.72091824,POINT (-87.7209182385 41.9000696026) 5f54dc81353c871c63b217a7d117c478dadc3a4b,083b7260314e48be5e10a9191da36fb2c0974b91499a5445d8a895ce901d4458b2a95e4fda48ae1ab55dfac3302268fbf967709c30ef58135041ce5d7d844065,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1440,4,,,28,7,15,5,0,0,20,Credit Card,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843) 5e3f05fac03791828973a9d5e273d756478e76a5,ae61536025042a43c682f2450eaa073da8c7a7f736aec5de1dde1d7e0e2c6be21402ea0d779c9b079b91c58fdccc9091ce99dbf01dbf8de1a81648a34c1f267b,12/31/2023 11:45:00 PM,01/01/2024 12:30:00 AM,1980,0,,,8,,43.75,2,0,7,52.75,Credit Card,Choice Taxi Association,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),,, 5d78f62496278dfb2b96a1c2c6ac428f09ea1ffc,6c6606251e8d2b1609f34d755bf884c4d972ab44b47bd75faa7e533a102e1cc2eed88f9d8d25cda28aaf89c678a3b2d17640c226cbb20375fd3f79685b719945,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,65,0.04,,,32,32,3.5,0,0,0,3.5,Cash,Flash Cab,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841) 5bf80daf02fc3a3bab75740a1f72ebc09d4b0fe2,4cebb9edbffeb3a0eace8cccef967730a62f5a978869e216e7855270c48891c6ba7575d0ffe0fc7e5347a411f4b4149a76bb8d65812d6c0607ff975eb7c7f566,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1323,12.94,,,76,,33.25,5.89,0,5.5,45.14,Credit Card,5 Star Taxi,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),,, 5beed53a8f8bf37104223411fef93b1aa2df46e8,644680ecf5bbb5af6329b0c9d4595c39344cd6c50fababc6e2e17811d9cfe0d67ac4a8b828340bf260428913ffd4b8b82dfaa8e1e83da0e939723ee47ae034c9,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,552,0.56,,,32,32,6.5,2,0,1,10,Credit Card,Sun Taxi,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841) 5bcabb19b28a7d07c1e114244476cba232dbfe78,171ec426eaf8f54c5acbb7e3fde8e0683bfa6042af0b00e428e650cd9bc909011a2517de5b32358b9cb9d9ad3d5a5b26bbb14a09f8b7f5c1c9a37fa22f26f7a7,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,463,3.29,,,32,35,11,3,0,0.5,15,Credit Card,Flash Cab,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.83511799,-87.61867777,POINT (-87.6186777673 41.8351179863) 54b2e6aa52ea342d65be8a7ac93a82650e781319,4ae32e2eb244ce143800e0c40055e537cc50e3358a07ce1e877bf9f91aa6c10db986c727b9d4674705f8d124a18b05a68d07d1bc8d70e95e173f77c2c0437c22,12/31/2023 11:45:00 PM,01/01/2024 12:45:00 AM,4044,1.66,,,32,8,26.25,0,0,0,26.25,Cash,City Service,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 4f2165874756524b46ba42d39db0f5c59e1c159d,f9f12d79733b1fa7934f8d9bd17ca1927f3c99ded1640bbd1c77ef4f0e8a5992897a445315545bedf550405a9c5bc5a9f4b8b03c7d06a1f36050a32c9733164d,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,9,0,,,33,33,3.25,0,0,47,50.75,Credit Card,Medallion Leasin,41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585),41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585) 4e366fa290c59b3d3c6ced770bc8b6b1d3519a0c,071d031c64f608418d27905c9ffe95bf52695615683d5f4e7072ed77fe2757fe623e369ce677a96e4535360841f5f1ad3f1d6de25ecb0e47e8848ec83bce4da3,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,709,1.21,17031081202,17031081700,8,8,8,0,0,0,8,Cash,5 Star Taxi,41.90278805,-87.62614559,POINT (-87.6261455896 41.9027880476),41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365) 45b165d46f064d1c685e5fa0ff222437970114f8,c1ffe6edab518145aedcfc816682cbfdcab6ecab156dc3d5b230407ef441db82ba5ad37bcea8436642d6a67a8f92e482dc4005c529ce8d2814d31a9681001bac,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,2,0,17031833000,17031833000,28,28,20,0,0,0,20.5,Credit Card,Flash Cab,41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201),41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201) 43dd2fec7bbaa6808d3f6ada656f5969b517c9ae,42560393a9c9b9ae28339f4b5aec77fd89bd49916ad54175d9ee679d69939f973c177065f2816d7990a6663a07270335a4a852190c3258497ba7978edced68c8,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,639,1.41,,,24,8,8.25,4,0,1,13.75,Credit Card,Patriot Taxi Dba Peace Taxi Associat,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 41bb85ba82698b51f96cebd8915b62767fd0698d,f9448164dcb56f4f31c2b2ad562f31443a01885c2bda20d6325a0747c9857007a024d358c98ed634fba5791a9dcbaf7252302b7dad34fa6bea7e625d2d185d4e,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1183,11.51,,,56,38,30.25,0,0,4,34.25,Cash,Patriot Taxi Dba Peace Taxi Associat,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.81294894,-87.61785968,POINT (-87.6178596758 41.8129489392) 408dbbbbb5efe8825b9802e9e47b73bde2cad640,f29ed34900f8b339ab279eda0189ecae3312801dab967e2c71b537bcc8c744c8a8691d428541cc969b2eceae6fc36a8c6bfde2f469eba49c78ac96fa96665d9d,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1303,10.65,,,56,28,28.75,0,0,5.5,34.25,Cash,Sun Taxi,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) 3e119576753a4a807e8b8702c2caa589a92c153c,f78d14baa2d1f80febaa17d73381c2eadb406cf4537522e111615ca2ccc9854f515cf1bb9dc9a9f48c4fdbf3e4e3adac120d6c6cc00dcd2aa715276b28bed0ad,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,960,2.7,,,7,8,11.5,3,0,1,15.5,Credit Card,Choice Taxi Association,41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 3d4ad7f2659a6f86fecfee4d4f3a8559716ca894,083b7260314e48be5e10a9191da36fb2c0974b91499a5445d8a895ce901d4458b2a95e4fda48ae1ab55dfac3302268fbf967709c30ef58135041ce5d7d844065,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,0,0,,,28,28,3.25,0,0,0,3.25,Cash,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) 36fbb628f333cf2a39d450485ea41df93d5b2554,2eda36427e0a5394e90d77488294cd75e2fd87f04acb02c2db58dcfcf473ee221e5404b47fad3df4874a934c12a36244bbadb66f265cbab0c3ff00aa25ac3ed0,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1698,14.76,,,76,6,37.75,8.45,0,4,50.7,Credit Card,Taxicab Insurance Agency Llc,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) 363810b6cfd667eace3ef3266ec553a546729ff5,847cf962bd6f62040673e6c24c24940aeb2d7fdaa54677eed6a0aaa4aeef61984916b32d763b4baa6c32476531543bb77e2346cd64f505618f6b9d562243f950,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,420,0.9,17031320100,17031320400,32,32,6.25,4,0,3,13.25,Credit Card,Taxi Affiliation Services,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.87740612,-87.62197165,POINT (-87.6219716519 41.8774061234) 36323a8a14400312e7cee05020326b7bf8dc301e,624e8f2a6af3b7f032d3c40d6f925f6fc5f0bf6a358ecc7d01503a55553689cf6b36732e7d972a664d2d55e928baadb5d5d387884a6b0cf2701453f45c14a7cf,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1440,15.4,,,76,8,39,10.85,0,4,53.85,Credit Card,Taxicab Insurance Agency, LLC,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 327fa02e9cb1cc29e7898cf98830f6ade295f9e9,15ddbeeb791d41c7683b885617281c0b548544f189ee3630ea6205078abf793173f13acb37440222ebe2a7a3b701fdfca26b4a2d5d75921a3218ad63ab23aa3a,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1080,8.1,,,56,,22,7.8,0,16.5,46.3,Credit Card,Taxi Affiliation Services,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),,, 32747746b9fd2ac09476eabaac05c588f4f4cb83,e0e1f19080d131afa810280c286bc1f57e78b48fe55e992ac816f33602973ff890905d82df0bec55f791ff66f6f3d7d366281cdb4271de2f2d2acb047b743f32,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,358,0.61,17031081202,17031081500,8,8,5.75,0,0,1,6.75,Cash,Star North Taxi Management Llc,41.90278805,-87.62614559,POINT (-87.6261455896 41.9027880476),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809) 2ab8133db10a059ff43e2abddaa7e19c20352451,c19109878e8ba25e09c0e464f8972f146c9d07502de920483fdbf2ef6686a35003a397903f3c330b3ee8f14feb74f29cb8a294bd950e9af6fed94b9dbc267aae,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1003,10.53,,,76,21,27.5,0,0,5.5,33,Cash,Flash Cab,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.9386662,-87.71121059,POINT (-87.7112105933 41.9386661962) 23a321e48c465182b749d4e3d6fb901b39a28c36,c09f5ee2dc22a2a3c342dd27432eb0fe98506ef3698f5b2e066d6c56fd7da673e58d85db53894ac092f9eeae70bac42cbb470ffff8cedadee60cf639c49fc711,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,6,0,,,3,3,40,0,0,0,40.5,Credit Card,Sun Taxi,41.96581197,-87.65587879,POINT (-87.6558787862 41.96581197),41.96581197,-87.65587879,POINT (-87.6558787862 41.96581197) 21ca9b2d87a053138fe98c5ce8a3152ae752c945,e7f8c9242fc38babca76de5c34b1e59b9b7ae3ff40812cb34a7374980b9cfb20213b8cce120d9bf339e7974754eec9bd823adab5f83852410c0af0b1c0a7b6ec,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,11,0,,,8,8,10,2,0,0,12.5,Credit Card,Taxicab Insurance Agency Llc,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 1f99d4a620dd942bfe2e98dc274214751258bc9c,8eca35a570101ad24c638f1f43eecce9d0cb7843e13a75f0af0c911c3e31ddec549c4808e216bcf31634542025c1e7de2442b92d5d7d73463c4e05fd959e47b4,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1004,5.31,,,8,5,17,4.4,0,0.1,22,Credit Card,Sun Taxi,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.94779159,-87.68383494,POINT (-87.6838349425 41.9477915865) 1d518052b3bcea69bdfec3508886d7551406202d,ea8e6df913a36562d8eddf662abe7722f4c0dc08527e9819364aed7a595eb61abb3728e49185db0616deb070840f410f6672275961076b6fcacc6bdfcf9edcf1,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1080,8.9,,,8,40,24.5,0,0,0,24.5,Cash,Taxi Affiliation Services,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.79235722,-87.61793138,POINT (-87.6179313803 41.7923572233) 1adedef3b9733f6a1859137ce37d8c685ad36cea,7d2e7cdd59237335e96b9b1a897a5e48cb4df467e6c09242b1e9461256f36aa4f9ab9649279f19ab5c8ad32ad7eb683800a0fd566f3f4fbd48323bab81908955,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,121,0.45,,,8,8,4.25,2,0,0,6.75,Credit Card,City Service,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 1686a96446c079079e6b53574c3d4f78da6fcfdc,64b71bd4e488e9c5571cdfcca045e7cb7a4abb0931f17b9689b4049c2a71df3161c216d73fd2cb53ba148afc4bb00df5bcdc9790cd679e9000319c12e37217e3,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,954,0.41,17031081800,17031081700,8,8,8.5,0,0,1,9.5,Cash,Flash Cab,41.89321636,-87.63784421,POINT (-87.6378442095 41.8932163595),41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365) 0353da5e93e2f5f973dc685d76fcd15f6bc0256e,ad4b1730fcbfdb84e41313179a688924012db322823f487d70ffcdbf1fa0e9ec11c35045af7e7cf561db41f5a46939ab7ea0565dc6fa26a0d14f68f6f568b92e,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,12,0.57,,,24,24,4.5,0,0,17,21.5,Cash,5 Star Taxi,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941) ef9aabfd57aa87f78421e37dcc6225790e777cb6,093e9e4c05ea53bf75c51763839d5f5bd5d1785c11ee5ec5e805c14bcb833c9fbfd81ab2b7874a85cba14046e54062335b2221738a0bb0bf1ddcfe83a7efa382,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1920,3.76,17031081201,17031330100,8,33,18,0,0,0,18,Cash,City Service,41.89915561,-87.62621053,POINT (-87.6262105324 41.8991556134),41.85934972,-87.61735801,POINT (-87.6173580061 41.859349715) 6ede890339b9db28cce204e37f36a312c2f073d1,3f46ef398d3308fb9794b8c5de450a88439d16c47b77b79398f0e84b804e7aad4789cb5ee08f74c8b7f89a444653706802a31cfdc8b99d3867a16794641fb759,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,56,0.03,17031980000,,76,,3.5,0,0,0,3.5,Cash,Sun Taxi,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),,, 6be0efd956926d6600d23f45470d638f3c5c01c3,c797f1560410b9db343567ea7c8e4095f66ceb65800fa466623d4695efdf3151679fb9bfe88ee18d47096e518c23d9c517e741de11df233e4c4bc11da8c3d8b1,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,578,2.63,,,4,77,9.75,4,0,0,14.25,Credit Card,Flash Cab,41.97517094,-87.68751552,POINT (-87.6875155152 41.9751709433),41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999) fc955fb2be6161f771c45ab35fd37b08e13dc1f6,d461dc72b7a599bfba3f33fae867f5530e0c5aa5c200d89b4a5cbd270da1eba6488b76e3ce70a8371b8242f3529dd35a1230f16dc7e7bf2626243840f6261b97,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1046,0.61,17031320100,17031839100,32,32,9.25,2,0,0,11.75,Credit Card,Taxicab Insurance Agency Llc,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707) ed59a2b85933b8086d71aa55c04f85bbfa3f37c6,698ec513d27602fcd211bb62440a555a3f23bebbe3b2a1ec9ba6466a63bab46628c6dc1622de7c3dcbe4b0f98a7048c9fb07ae9c8a1572f432db4df68b1a4803,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1504,13.51,,,76,,34.5,8.4,0,7,50.4,Credit Card,Taxicab Insurance Agency Llc,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),,, e9dfed1215cf9b95d528aabfdd3cab775b255913,0bea3de3c36237d68b009b24ee3db86c78e9e618a73a3b5776e5f4bba06775f91b3520db910d24b97d577e57c4372f5d9d2eb58d338f3a0add0e37c0f71f6701,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,660,0.9,17031081700,17031081201,8,8,7.25,0,0,0,7.25,Cash,Taxi Affiliation Services,41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365),41.89915561,-87.62621053,POINT (-87.6262105324 41.8991556134) e9b1cfd8bc49629663f84f697badf17a88b7ab1f,bb4e75d3065311c33024a434640731c43fd2cf9e4482eb9e17cbf9f0ff0ed005455ffe22797df66b7467489a738e7be52c5983e16615b31c7c1d6af3ee0eb965,12/31/2023 11:30:00 PM,01/01/2024 12:15:00 AM,1920,1.2,,,24,74,50,0,0,0,50,Cash,Taxi Affiliation Services,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.69487897,-87.7131925,POINT (-87.7131924966 41.6948789661) e92cbef122dee337d7502c7177916016d36e964b,847cf962bd6f62040673e6c24c24940aeb2d7fdaa54677eed6a0aaa4aeef61984916b32d763b4baa6c32476531543bb77e2346cd64f505618f6b9d562243f950,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,720,1.4,17031081700,17031320100,8,32,8.25,4,0,1,13.25,Credit Card,Taxi Affiliation Services,41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365),41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918) e8473ad9fa9148bc0feaccfe68caaeef0ca1f648,259d38cfdbc9ac6f9bb01f0df740e0ddf4a631a70bbdd6525862b20b7ed0e0554dbbde64c2955b8b6f41468c8970e5507490db36348f884461783472621bda08,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,480,1.5,,,8,8,7.5,0,0,1,8.5,Cash,Taxicab Insurance Agency, LLC,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) e1968da25611d8cee47d5088502f9ab1c76877c8,56a1119c6ca57e39525cf06829f9ecff553cf4b5ac24821259d086c8ab30406ec45ae77335c646417897d2f4916479c3ed8b6313c2ccb9fb3fc248a4c3387800,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,4,0,17031081403,17031081403,8,8,15,0,0,0,15.5,Credit Card,Medallion Leasin,41.89092203,-87.61886836,POINT (-87.6188683546 41.8909220259),41.89092203,-87.61886836,POINT (-87.6188683546 41.8909220259) e1739faf183448c03ab821871c96de984ace8697,552720f76dd5338d0cf254f8eb4045839a5501e095a0d34fee849df1633dce909ed9b7e001b6e904f64f5b235fc56377ef450dd8c29f16fcbd7a7c2116386654,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1048,0.88,17031081202,17031081500,8,8,9.5,0,0,1,10.5,Cash,Choice Taxi Association Inc,41.90278805,-87.62614559,POINT (-87.6261455896 41.9027880476),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809) e151cf39ae70e33ac5df78ac76ca2c3706216321,913c95ba782fa447b7c55fbfc38d040907d13e7ddf7282a75fe448d2a25082dcdab927cd930805ea14e62bc534d5288669b15f73751bd43a746cc9e3bbddb2f4,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,830,2.66,,,7,28,10.5,0,0,0,11,Credit Card,Flash Cab,41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) de5e1fc2ebc09d6c56a83685e61b15d582059d2b,8be2c5887fd81a4918e0464359436d6fc5ed1dbbe4f5b0317403a4ead72c95b37979d3534c5ede73a29ceace53bdc820f692587ba4a88f651de93b329e4cf2f8,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1320,16.3,,,76,28,40,8.9,0,4,52.9,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) dd01d2cbf95e044799a49c5988de327fd0b4ed2f,b875e9e053d893ee490e723c96773ed5f81c0a2339545f941b006167253d2bd537c68266a6c87ecda89948c234e7ae93ae51869767a4e8345492b407ab62e424,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,637,3.96,,,8,6,13,4.35,0,1,18.85,Credit Card,Taxicab Insurance Agency Llc,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) daed3f1e3c0cd866cca1f50fe0513d250ea80eff,f9bc93a0ba6b1f18c9709a96c99bb9c5a99054b1711f80ddfe986a0f78a02470f146c48fd7d66bf74fe374f53d032676cb2fb7871afce4314dfdac97d4f22d32,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1674,8.3,,,28,77,25.25,6.44,0,0,32.19,Credit Card,City Service,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999) d949547c3f9f0bbce64bce18b50cca6df60f88f2,b52493d43f7de565ab5eaaa0b1238709ac2073a9cdd626a411f99151188aa290435bada1d7c0119f6423891cbc9c3ce5c9ddd4ed068bca8e8aaf75cedacb9f0d,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,900,7.9,,,76,12,22,5.3,0,4,31.3,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.99393013,-87.75835359,POINT (-87.7583535876 41.9939301285) d45e012bbd6fffa5ffa8aba12b6d61961c89e9e0,73052f4ccaf4e0fa9178722e491f8e5eda869f56e08aa4d659ef38139d36bd69df925ac00c96564af9fc30db0c616fb4c9312cb3ba36fe8c8cdd470750d4681d,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1320,0,,,56,28,25.5,5.02,0,4,34.52,Credit Card,Taxi Affiliation Services,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) d12be01c208e273389ccc1f306cbd9cc98bfc73d,f6aac57dbd69c58200d6fb22bbffe1343ca6ea5eece073d452f97009803408626e6357e405c35bffde1495f078c83419ca0378e26d8dca04c6b81644638cefc9,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1260,0.2,,,6,8,16.5,0,0,0,16.5,Cash,Taxi Affiliation Services,41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) d115aa399492b5b2bd4faed5d6c4fd36122918aa,24d4c5e51d147aecbb7c4a1ad70c38dbc05c7b4485f6de4b41fee5ea270228859c19c4160b101df87b889acc5cc7ab49b5e0491676026518677d34ea81f76591,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1038,10.24,,,76,14,27,9.75,0,5,42.25,Credit Card,Taxicab Insurance Agency Llc,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.968069,-87.72155906,POINT (-87.7215590627 41.968069) cee99899eff1d446626df83a97b5b5f0571c7ec4,7d8179131ea9952793af4cda8635e94b56c2b92d3c376cd92517f7319ec4a3031207af4d7b8165367e1f8a185275814ab89c26ace551ac3bf96a04ea174371c1,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,696,1.22,17031081500,17031833000,8,28,8,2,0,0,10.5,Credit Card,5 Star Taxi,41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809),41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201) ce3d8fe7f1f0906a2502c728358705f6f547d872,6898e40854937399e0ef25dad63740d21b20593439090721b2f747b039ab24c5e74ebda726b98515a4b4c6b7dd9f87717cfc7c1b52e3a8b4d52245214f09d229,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,16,0.02,,,28,28,35,5.32,0,0,40.82,Credit Card,Flash Cab,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383) ce0f3501cd03c9e1795241db4da2a1285559f906,1de191ccc486f8d0e0e6b25a03d592e58ed4511cfed79e912c895802cb808ea9a2d609cc77a536d7ad6431160a92d39dc6b79ec88381059d540f95215364582e,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1260,11.2,,,76,2,29,5,0,4,38,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),42.00157103,-87.69501259,POINT (-87.6950125892 42.001571027) cc3f3f6214a8b4ad9f15f472e0ea734b441728fa,599e7935e8f7321862152296420d8552c36d7fe97517f0bec1048c18ed7f2a434e06c55f5e16e5ec5d1da15f02eb49079258d610b196bd9eb0cf4183166878e4,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,776,4.3,,,56,30,13.75,4.81,0,5,24.06,Credit Card,Flash Cab,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.83908691,-87.71400381,POINT (-87.714003807 41.8390869059) cb026ca7cca9c89c7bf96c3efcd12376b2800fc3,4477f5eda3c0c9379d7526db1b5029184a7d75a2adcad3b338b20c83f351865360b02546bc50125c663edc0ed86b206261dc50f7f002199e4d0880802c51311d,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1028,5.45,,,8,6,17,4.38,0,0,21.88,Credit Card,Medallion Leasin,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014) cab8410b2d60210e11fa09bc929a1a6ba0696084,e533bfdc483206f9c02c1c879a118d88f0a3ca1cd2703f3cf88e318716bbbb0c71d5f1c5f86b042b4ee1a06dbc750fa840acec0ebaf5fc1d90edbdc215114a1d,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1736,17.38,17031980000,17031081201,76,8,53.94,0,0,0,53.94,Cash,Flash Cab,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),41.89915561,-87.62621053,POINT (-87.6262105324 41.8991556134) c38ed8ec5467492783aed71363709b87a31ac8d9,a62df4e9bfec5f3babb7922b1346263cce5c3116fa5fa3465e4845b94774ef86b30bb243f80f396cf2211d7e3309028193f159567ee341b44d39dc3a1f5495f9,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1204,4.79,,,6,15,15.75,0,0,0,15.75,Cash,Medallion Leasin,41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014),41.95402765,-87.76339903,POINT (-87.7633990316 41.9540276487) bfbc39a914248481c0b5c4899d1d0ca4f54f9851,d2a9362483decbe7b2d28d38ff371f05fafd542a60e8c9e4d5e3150e2c0b41b9e2b69e561807859b9ccb8c0f27922c183460b9f858df03f53b9656b8b829ceb1,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,2168,23.1,17031980000,17031839100,76,32,57.5,0,0,5,62.5,Cash,Flash Cab,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707) bdb3a185915415969458eeaf805d9a0012252754,3b95cedc13d4a99243e1974616a6a25267c25878336faa586d4372370c847c3618753718dca887e3713efb476a5af11bc5f5a86c9785c9f749ca45f3f8be4764,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,480,0,17031839100,17031833000,32,28,7.5,0,0,0,7.5,Cash,Taxi Affiliation Services,41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707),41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201) bcab4490118535b7135fe1394c37507df7f6ad90,3665a72ee495b03f4dae72307dc6e5e58e21518f77d8e67dcd386c3b9daa1a0db86555cef4a877234542af8d1c0da6fa7a28a4e0e643e382236470d569d78668,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1800,14.6,,,76,,37,8.7,0,6,51.7,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),,, b8f99264a52fe95e5160333451d51cd83f3b34c8,924ad289d7377302678c3954095a96778a3a5b2a9a2a69d5335f59ff00e672ea71a0d95fe058ecfa5d53fb86dc1eba63a1af3e51ab33ec04e8b5a679c91f564b,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1380,0,,,76,24,41,0,0,5,46,Cash,Taxicab Insurance Agency, LLC,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941) b819f56e53b38f9d75067716f5701a3bdbae8761,422aa525858cddde977f39fa4e58947555918726746ebd72be48d2a2d09af86e2b5e5318fea36ecc84de5b6af8354307064e73672f67e4bc907dcabc21e61c09,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1020,0.2,,,28,7,12.75,0,0,0,12.75,Cash,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843) b1a91a0cbb11e273aa8fe96eaf32cb26389570cf,c0d525ee45b1b77f1fcc69c7c56ff91661795d15482cc46a75ca8164ea25736a32169b1ba73fb5eee3ee98e629942c90ed23a5998f29bfe050afd3a08608a9a9,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1140,0.1,,,32,8,11.5,0,0,2,13.5,Cash,Taxi Affiliation Services,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) aa31ae6e712e0a21598087799a7439a280301056,82bc059c3b13e97341f941d60f772ae9f83687498e91f7c399644ec42449cced734834174cb0a29955229b910c3c9810dc67997226ce38a8600bf7f24d149423,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,5,0,,,32,32,25,5.1,0,0,30.6,Credit Card,5 Star Taxi,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841) a96c2ef996a9458f48eecfe4f62e2fcb0790cb9d,e8d374b4e7bc344add5893f1a1ae3b611823439ac1caf06087c4bf2cb6fe114201a38c49646f2851d86f9e73c21c1e99f29903cceb84612fd1118e3224528b2b,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,720,0.4,,,,,7.5,0,0,0,7.5,Cash,Taxi Affiliation Services,,,,,, a4d44dd31babbf742d35571b483e2f8ab7f5256a,0cae7ec64456b1830bd58df1991f046410f5506cf28b3aa16b6d5c4940b44ff0ca069324233093161b43d212c2c5eac61536cfa6e3284117bb62ee4105e945b1,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,0,0,,,77,,3.25,0,0,0,3.25,Cash,Taxi Affiliation Services,41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999),,, a2136304c06eeb1897684e0402905d1d2b528cc8,42560393a9c9b9ae28339f4b5aec77fd89bd49916ad54175d9ee679d69939f973c177065f2816d7990a6663a07270335a4a852190c3258497ba7978edced68c8,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,239,0.7,,,28,8,5.25,2,0,1,8.75,Credit Card,Patriot Taxi Dba Peace Taxi Associat,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) a1383cbd5fab084a75d9a0c6302d33e3cb6104d3,b4ac2893286a7c3a55df851a3732ea65d7fb82e1da7a19f728a71651761babfd88544152301073319650be263fd4e1aabc072601e6f452daf22ceb764f8d70d5,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,475,0.45,,,8,8,6,0,0,1,7,Cash,City Service,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111) 9f072b4e70e16ebb84b0a2f6ff718150ecc3e345,00f4b381570486f8575cbaa57ed41f116ed2e1f9d85f73bb2f6dba13a72541761d2ad4cb1727990d97795a2b0bdd99f0e4a8826245c81dac443cebb1c19b26fb,12/31/2023 11:30:00 PM,01/01/2024 12:15:00 AM,2460,1.1,,,56,3,48.25,16.4,0,6,70.65,Credit Card,Taxi Affiliation Services,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.96581197,-87.65587879,POINT (-87.6558787862 41.96581197) 9ca870a06a41bcf8a0472890df4d404d7d592d5d,42e3ec7750e4be6e56c47bcdefe5cb86ddb0d0c65bcf4d09773512b3e854ed08adeacdad835a4e92a8ca871021858984bb70a72c1dc17d22b49d2f664a6e0fd2,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1399,16.2,17031980000,17031839100,76,32,40.25,0,0,29,69.25,Cash,Taxicab Insurance Agency Llc,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707) 97d0f9bb2bc7aed4e8c84da3444743f9f1256d32,f1c4fb891f4812fb2865e801d2185b401283b34401b71f25cafc8b108f48241363276826a3fe8f4830d1979de0179f5850a26e115de686d6af99b79e66218656,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1680,0.6,,,28,77,30,5,0,1,36,Credit Card,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999) 97059ca7943e828e9b3b5da926d6f27d6ddc9f30,f81c929ea7d9107e6de8bd7ee335f42563b3413e967e98288480648a66455138dcdcde8b46b353ca4d6c287be49cd3087636ba13de6b7db6db3854c2ac8a157f,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,240,0.9,,,7,7,5.25,0,0,1,6.25,Cash,Taxi Affiliation Services,41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843) 二、实验步骤 一实验题目基于SVM进行分类预测 程序输出 基于SVM进行分类预测 二加载CSV文件 数学模型输入数据矩阵和标签向量 筛选数据根据CSV文件文件中一共有2002行23列数据不能完全满足实验要求因为CPU跑10万条的数据集效率很低运行时间太长感觉等不到运行结果数据类型的输出如下 步骤1: 数据加载 文件 linear.csv 原始形状: (2001, 23) 前几行数据: Trip ID  ...            Dropoff Centroid  Location 0  011106b6114f83af0c17aace3867a464a7fc742b  ...  POINT (-87.6262149064 41.8925077809) 1  e9a66ddcc78cfd79f419165314cbe5ee380f16c3  ...                                   NaN 2  e765192268db3480b5d9bd0443f7ce7fd5ba047d  ...  POINT (-87.6559981815 41.9442266014) 3  c6510d4f82541cfacf8c20cab44fbb7c0b2c5efe  ...                                   NaN 4  f9445eed26da9a0eff247350df942616cb51e764  ...  POINT (-87.6559981815 41.9442266014) [5 rows x 23 columns] 数据类型: Trip ID                        object Taxi ID                        object Trip Start Timestamp           object Trip End Timestamp             object Trip Seconds                  float64 Trip Miles                    float64 Pickup Census Tract           float64 Dropoff Census Tract          float64 Pickup Community Area         float64 Dropoff Community Area        float64 Fare                          float64 Tips                          float64 Tolls                         float64 Extras                        float64 Trip Total                    float64 Payment Type                   object Company                        object Pickup Centroid Latitude      float64 Pickup Centroid Longitude     float64 Pickup Centroid Location       object Dropoff Centroid Latitude     float64 Dropoff Centroid Longitude    float64 Dropoff Centroid  Location     object 根据上述的数据类型的输出我们容易发现经度和纬度由于数据变化范围特别小因此Python不用访问对于非数值类型object根据观察可以发现Python只能处理第16列非数值类型的数据可以采用映射的方式将Cash映射为-1将Credit Card映射为1第5到第15列数据是数值类型float64Python可以处理因此可以筛选第5-16列数据。 2.处理数据特征列处理函数 过程模型对于特征矩阵的每一列  进行数值转换和缺失值填充 若转换成功则 若存在缺失值NaN则使用均值填充 数值转换和缺失值处理 数学模型对于向量数值转换和缺失值填充的处理过程为 尝试将转换为数值向量 对于缺失值计算的均值 填充缺失值 标签处理函数特殊处理标签列 数学模型对于标签向量我们定义映射函数 对于字符串标签 对于数值标签 对于缺失值NaN直接跳过该样本 由于第5-16列数据可能有缺失值异常值的情况需要标准化和归一化进行处理。处理结果如下 dtype: object 处理列 特征列 Trip Seconds, 原始类型: int64 处理列 特征列 Trip Miles, 原始类型: float64 处理列 特征列 Pickup Census Tract, 原始类型: float64 列 特征列 Pickup Census Tract 中有 1287 个值无法转换为数字将使用均值填充 处理列 特征列 Dropoff Census Tract, 原始类型: float64 列 特征列 Dropoff Census Tract 中有 1337 个值无法转换为数字将使用均值填充 处理列 特征列 Pickup Community Area, 原始类型: float64 列 特征列 Pickup Community Area 中有 66 个值无法转换为数字将使用均值填充 处理列 特征列 Dropoff Community Area, 原始类型: float64 列 特征列 Dropoff Community Area 中有 317 个值无法转换为数字将使用均值填充 处理列 特征列 Fare, 原始类型: float64 列 特征列 Fare 中有 2 个值无法转换为数字将使用均值填充 处理列 特征列 Tips, 原始类型: float64 列 特征列 Tips 中有 2 个值无法转换为数字将使用均值填充 处理列 特征列 Tolls, 原始类型: float64 列 特征列 Tolls 中有 2 个值无法转换为数字将使用均值填充 处理列 特征列 Extras, 原始类型: float64 列 特征列 Extras 中有 2 个值无法转换为数字将使用均值填充 处理列 特征列 Trip Total, 原始类型: float64 列 特征列 Trip Total 中有 2 个值无法转换为数字将使用均值填充 成功加载文件: linear.csv, 特征数据形状: (2000, 11), 标签数量: 2000 处理标签数据类型: class numpy.ndarray, 形状: (2000,) 标签的唯一值: [Cash Credit Card] 处理后的标签分布: -1 (Cash): 969, 1 (Credit Card): 1031 数据处理完成 - 特征维度: (2000, 11), 标签分布: 负类(-1): 969, 正类(1): 1031 成功加载CSV文件 三数据检查与预处理 检查数据维度与类型分布输出结果如下 步骤2: 数据检查与预处理 数据维度: X(2000, 11), y(2000,) 类别分布 - 类别(-1): 969, 类别(1): 1031 四数据标准化 划分为训练集和测试集输出结果如下 步骤3: 数据标准化 训练集大小: (1600, 11) 测试集大小: (400, 11) 五手动SMO算法训练 1.支持向量机原理 支持向量机(SVM)的基本思想是在特征空间中寻找一个最优超平面使得不同类别的样本分别位于超平面的两侧且间隔最大。 原始优化问题 约束条件 为处理线性不可分情况引入松弛变量和惩罚参数C 约束条件 2.拉格朗日对偶问题 通过引入拉格朗日乘子原问题转化为对偶问题 约束条件 3 。核函数定义 核函数用于在高维空间中计算内积常用的核函数包括 线性核 多项式核 RBF核其中 Sigmoid核 4。 序列最小优化算法(SMO) SMO算法通过迭代选择两个拉格朗日乘子进行优化关键步骤如下 1 选择拉格朗日乘子 选择违反KKT条件的两个变量和 2 计算边界根据约束和 当时: ,  当时: ,  3 更新 其中,  4 截断 5 更新 6 计算截距b 如果则 如果则 否则 7决策函数 优化完成后决策函数为 显示支出项两个数权重向量和偏置项输出结果如下 步骤4: 手动SMO算法训练 支持向量个数: 1384 权重向量 w [-0.1751, 0.9524] 偏置项 b -0.2822 决策边界可视化 数学模型根据SVM决策函数可视化决策边界 手动SMO SVM - 准确率: 0.5975, 精确率: 0.7068, 召回率: 0.4352, F1: 0.5387 六不同核函数比较 混淆矩阵和评估指标 数学模型计算分类性能指标 真正例(TP) 真负例(TN) 假正例(FP) 假负例(FN) 指标计算 准确率(Accuracy) 精确率(Precision) 召回率(Recall) F1分数 比较线性核SVMRBF核SVM多项式核SVMSigmoid核SVM依次计算这些核函数的准确率精确率召回率F1的值输出结果如下 步骤5: 不同核函数比较 线性核 SVM - 准确率: 0.9275, 精确率: 1.0000, 召回率: 0.8657, F1: 0.9280 RBF核 SVM - 准确率: 0.9300, 精确率: 0.9896, 召回率: 0.8796, F1: 0.9314 多项式核 SVM - 准确率: 0.8875, 精确率: 0.9476, 召回率: 0.8380, F1: 0.8894 Sigmoid核 SVM - 准确率: 0.7600, 精确率: 0.7857, 召回率: 0.7639, F1: 0.7746 七自动参数调优 网格搜索自动调参 数学模型通过网格搜索和交叉验证寻找最优超参数 交叉验证过程 1. 将数据分成份 2. 对每个参数组合计算交叉验证分数 3. 选择最优参数组合 输出结果如下 步骤6: 自动超参数调优 对通用数据集进行调参... 开始自动调参... 自动调参失败: ascii codec cant encode characters in position 18-20: ordinal not in range(128) 八参数对性能的影响 步骤7: 参数对性能的影响 九学习曲线分析 步骤8: 学习曲线分析 十生成3D可视化网页版 3D可视化 数学模型使用PCA或t-SNE进行降维在3D空间中可视化数据分布和决策边界 PCA降维过程 1. 计算协方差矩阵  2. 对协方差矩阵进行特征值分解 3. 选择前三个最大特征值对应的特征向量  4. 降维投影  输出结果如下 步骤9: 生成3D可视化 十一创建决策边界动画网页版 动画可视化 数学模型创建不同核函数决策边界的平滑过渡动画 过程 1. 对每个核函数计算决策函数 2. 通过权重函数实现平滑过渡 3. 使用帧序列可视化时间序列 输出结果如下 步骤10: 创建决策边界动画 十二生成综合性能报告网页版 步骤11: 生成综合性能报告 所有演示完成 最佳模型参数: {kernel: linear, C: 1.0} 数据处理、模型训练、可视化和性能评估已全部完成 三、Python代码实现基于SVM进行分类预测 import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap from mpl_toolkits.mplot3d import Axes3D from sklearn.decomposition import PCA from sklearn.manifold import TSNE from sklearn.svm import SVC from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler from sklearn.metrics import confusion_matrix, roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score from sklearn.impute import SimpleImputer # 导入缺失值处理模块 import plotly.graph_objects as go import plotly.express as px from plotly.subplots import make_subplots import seaborn as sns import os import warningswarnings.filterwarnings(ignore) # 设置中文字体支持 plt.rcParams[font.sans-serif] [SimHei] # 用来正常显示中文标签 plt.rcParams[axes.unicode_minus] False # 用来正常显示负号# CSV数据访问功能 def load_csv_with_specific_columns(file_paths, selected_columns[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], skip_headerTrue):从一个或两个CSV文件加载特定列数据参数:file_paths: str或list - CSV文件路径列表selected_columns: list - 要选择的列索引第5,6,7,8,9,10,11,12,13,14,15,16列对应索引4-15skip_header: bool - 是否跳过第一行返回:X: np.array - 特征数据y: np.array - 标签数据Cash-1, Credit Card1all_data []all_labels []# 处理单个或多个文件if isinstance(file_paths, str):file_paths [file_paths]for file_path in file_paths:try:# 读取CSV文件df pd.read_csv(file_path)print(f文件 {file_path} 原始形状: {df.shape})print(f前几行数据:)print(df.head())print(f数据类型:)print(df.dtypes)# 跳过第一行如果需要if skip_header:df df.iloc[1:]# 检查列索引是否有效max_col max(selected_columns) if selected_columns else df.shape[1] - 1if max_col df.shape[1]:print(f警告: 文件 {file_path} 列数不足最大列索引: {df.shape[1] - 1})# 调整选择的列valid_columns [col for col in selected_columns if col df.shape[1]]else:valid_columns selected_columns# 确保至少有两列(特征列标签列)if len(valid_columns) 2:print(f警告: 有效列数不足至少需要一个特征列和一个标签列)continue# 分离特征列和标签列feature_columns valid_columns[:-1]label_column valid_columns[-1]# 处理特征列数值型X_data process_feature_columns(df, feature_columns)# 处理标签列保留字符串labels_data df.iloc[:, label_column].valuesall_data.append(X_data)all_labels.extend(labels_data)print(f成功加载文件: {file_path}, 特征数据形状: {X_data.shape}, 标签数量: {len(labels_data)})except Exception as e:print(f加载文件 {file_path} 失败: {e})continueif not all_data:print(未能加载任何数据生成模拟数据...)return generate_sample_data()# 合并特征数据X np.vstack(all_data) if len(all_data) 1 else all_data[0]# 将标签列转换为数组labels np.array(all_labels)# 映射标签(不使用均值填充)y process_labels(labels)print(f数据处理完成 - 特征维度: {X.shape}, 标签分布: 负类(-1): {np.sum(y -1)}, 正类(1): {np.sum(y 1)})return X, ydef process_feature_columns(df, feature_columns):处理特征列数值型处理参数:df: DataFrame - 输入数据feature_columns: list - 特征列索引返回:X: np.array - 处理后的特征数据# 提取特征列数据features_df df.iloc[:, feature_columns]# 处理每一列processed_features []for col_idx, col in enumerate(features_df.columns):series features_df[col]# 尝试转换为数值类型并处理缺失值numeric_series, _ convert_to_numeric(series, f特征列 {col})processed_features.append(numeric_series)# 合并处理后的特征列X np.column_stack(processed_features)return Xdef convert_to_numeric(series, col_name):将pandas Series转换为数值类型参数:series: pandas Seriescol_name: 列名用于调试返回:numeric_array: 数值数组conversion_info: 转换信息print(f处理列 {col_name}, 原始类型: {series.dtype})# 尝试直接转换为数值类型try:# 首先尝试 pd.to_numericnumeric_series pd.to_numeric(series, errorscoerce)# 检查转换后的缺失值nan_count numeric_series.isna().sum()if nan_count 0:print(f列 {col_name} 中有 {nan_count} 个值无法转换为数字将使用均值填充)# 使用均值填充NaN值if not numeric_series.isna().all(): # 确保不是全部都是NaNmean_value numeric_series.mean()numeric_series.fillna(mean_value, inplaceTrue)else:print(f列 {col_name} 全部为非数值使用0填充)numeric_series.fillna(0, inplaceTrue)return numeric_series.values, 数值转换成功except Exception as e:print(f列 {col_name} 数值转换失败: {e})# 如果是字符串列尝试特殊处理if series.dtype object:return process_numeric_object_column(series, col_name)else:# 最后的备选方案全部设为0print(f对列 {col_name} 使用默认值0)return np.zeros(len(series)), 使用默认值def process_numeric_object_column(series, col_name):处理object类型通常是字符串的特征列尝试转换为数值print(f处理object类型特征列 {col_name})# 查看唯一值unique_values series.unique()if len(unique_values) 10:print(f列 {col_name} 的唯一值: {unique_values})else:print(f列 {col_name} 有 {len(unique_values)} 个唯一值)# 尝试映射常见的字符串到数值result []for value in series:if pd.isna(value) or value is None:result.append(0) # NaN用0代替elif isinstance(value, str):# 尝试提取数字numeric_value extract_number_from_string(value)result.append(numeric_value)else:try:result.append(float(value))except:result.append(0)return np.array(result), 字符串处理完成def extract_number_from_string(s):从字符串中提取数字if not isinstance(s, str):return 0# 移除空格s s.strip()# 常见的字符串到数字的映射string_to_number {cash: -1,credit: 1,credit card: 1,debit: 0,yes: 1,no: 0,true: 1,false: 0,male: 1,female: 0,high: 1,low: -1,medium: 0}# 检查字符串映射s_lower s.lower()if s_lower in string_to_number:return string_to_number[s_lower]# 尝试提取数字import renumbers re.findall(r-?\d\.?\d*, s)if numbers:try:return float(numbers[0])except:pass# 如果无法提取使用哈希值return hash(s) % 1000 / 1000.0 # 转换为0-1之间的小数def process_labels(labels):处理标签数据保留字符串格式不使用均值填充print(f处理标签数据类型: {type(labels)}, 形状: {labels.shape if hasattr(labels, shape) else len(labels)})# 查看标签的唯一值if isinstance(labels, np.ndarray):unique_labels np.unique(labels)else:unique_labels pd.Series(labels).unique()# 显示唯一标签值if len(unique_labels) 10:print(f标签的唯一值: {unique_labels})else:print(f标签有 {len(unique_labels)} 个唯一值)# 转换标签y []for label in labels:# 对于缺失的标签跳过对应的样本if pd.isna(label) or label is None:continuemapped_label map_payment_label(label)y.append(mapped_label)# 输出转换后的标签分布y_array np.array(y)print(f处理后的标签分布: -1 (Cash): {np.sum(y_array -1)}, 1 (Credit Card): {np.sum(y_array 1)})return y_arraydef map_payment_label(label):映射支付方式标签保留字符串特性Cash/cash - -1Credit Card/credit/credit card - 1# 处理字符串标签if isinstance(label, str):label_lower label.strip().lower()# 检查大小写不敏感的匹配if cash in label_lower:return -1elif credit in label_lower or credit card in label_lower:return 1# 检查精确的匹配 (区分大小写)if label.strip() Cash:return -1elif label.strip() Credit Card:return 1# 其他常见字符串值elif label_lower in [0, false, no, negative, failure, fail, n]:return -1elif label_lower in [1, true, yes, positive, success, pass, y]:return 1else:# 对于数值标签try:num_label float(label)# 对于明确的 -1/1 值直接使用if num_label -1:return -1elif num_label 1:return 1# 其他数值使用符号规则return -1 if num_label 0 else 1except:pass# 默认返回值对于无法识别的标签return 1def generate_linear_data(n_samples200):生成线性可分数据集(data1)np.random.seed(42)X np.random.randn(n_samples, 2) * 2# 线性决策边界: x y 0y np.where(X[:, 0] X[:, 1] 0, 1, -1)# 添加少量噪声noise_idx np.random.choice(n_samples, sizeint(0.05 * n_samples), replaceFalse)y[noise_idx] -y[noise_idx]print(已生成线性可分模拟数据)return X, ydef generate_spiral_data(n_samples200):生成螺旋形数据集(data2)np.random.seed(42)def spiral_xy(i, spiral_num):生成螺旋坐标angle i * np.pi / 16radius 2 * i / n_samplesif spiral_num 0:return [radius * np.cos(angle), radius * np.sin(angle)]else:return [-radius * np.cos(angle), -radius * np.sin(angle)]half_samples n_samples // 2X np.zeros((n_samples, 2))y np.zeros(n_samples)# 第一个螺旋 (类别1)for i in range(half_samples):X[i] spiral_xy(i, 0)y[i] 1# 第二个螺旋 (类别-1)for i in range(half_samples):X[i half_samples] spiral_xy(i, 1)y[i half_samples] -1# 添加噪声X np.random.randn(n_samples, 2) * 0.1print(已生成螺旋形模拟数据)return X, ydef generate_sample_data(n_samples200, n_features8):生成常规模拟数据np.random.seed(42)X np.random.randn(n_samples, n_features)y np.where(X[:, 0] X[:, 1] 0.3 * X[:, 2] 0, 1, -1)print(已生成常规模拟数据)return X, y# 数据预处理函数 def preprocess_data(X, y):数据预处理处理缺失值、缩放特征参数:X: 特征数据y: 标签数据返回:X_scaled: 预处理后的特征数据y: 预处理后的标签数据# 1. 处理特征中的缺失值if np.isnan(X).any():imputer SimpleImputer(strategymean)X imputer.fit_transform(X)print(已使用均值填充特征中的缺失值)# 2. 处理标签中的缺失值valid_indices ~np.isnan(y)if not all(valid_indices):X X[valid_indices]y y[valid_indices]print(f已移除 {np.sum(~valid_indices)} 个标签缺失的样本)# 3. 标准化缩放特征scaler StandardScaler()X_scaled scaler.fit_transform(X)return X_scaled, y# 评估指标计算函数 def calculate_metrics(y_true, y_pred):计算评估指标true_positives np.sum((y_true 1) (y_pred 1))true_negatives np.sum((y_true -1) (y_pred -1))false_positives np.sum((y_true -1) (y_pred 1))false_negatives np.sum((y_true 1) (y_pred -1))precision true_positives / (true_positives false_positives) if (true_positives false_positives) 0 else 0recall true_positives / (true_positives false_negatives) if (true_positives false_negatives) 0 else 0f1 2 * (precision * recall) / (precision recall) if (precision recall) 0 else 0accuracy np.sum(y_true y_pred) / len(y_true)return accuracy, precision, recall, f1# SMO算法实现 def SMO(x, y, ker, C, max_iter, tol1e-3):SMO算法实现SVM训练m x.shape[0]alpha np.zeros(m)b 0passes 0# 预计算核矩阵K np.zeros((m, m))for i in range(m):for j in range(m):K[i, j] ker(x[i], x[j])# SMO主循环while passes max_iter:num_changed_alphas 0for i in range(m):Ei np.sum(alpha * y * K[:, i]) b - y[i]if (y[i] * Ei -tol and alpha[i] C) or (y[i] * Ei tol and alpha[i] 0):j np.random.choice([l for l in range(m) if l ! i])Ej np.sum(alpha * y * K[:, j]) b - y[j]alpha_i_old alpha[i]alpha_j_old alpha[j]if y[i] ! y[j]:L max(0, alpha[j] - alpha[i])H min(C, C alpha[j] - alpha[i])else:L max(0, alpha[i] alpha[j] - C)H min(C, alpha[i] alpha[j])if L H:continueeta 2 * K[i, j] - K[i, i] - K[j, j]if eta 0:continuealpha[j] alpha[j] - (y[j] * (Ei - Ej)) / etaalpha[j] np.clip(alpha[j], L, H)if abs(alpha[j] - alpha_j_old) tol:continuealpha[i] alpha[i] y[i] * y[j] * (alpha_j_old - alpha[j])b1 b - Ei - y[i] * (alpha[i] - alpha_i_old) * K[i, i] - y[j] * (alpha[j] - alpha_j_old) * K[i, j]b2 b - Ej - y[i] * (alpha[i] - alpha_i_old) * K[i, j] - y[j] * (alpha[j] - alpha_j_old) * K[j, j]if 0 alpha[i] C:b b1elif 0 alpha[j] C:b b2else:b (b1 b2) / 2num_changed_alphas 1if num_changed_alphas 0:passes 1else:passes 0return alpha, b# 核函数定义 def linear_kernel(x, y):线性核函数return np.inner(x, y)def polynomial_kernel(d):多项式核函数def kernel(x, y):return np.inner(x, y) ** dreturn kerneldef rbf_kernel(sigma):RBF核函数def kernel(x, y):return np.exp(-np.inner(x - y, x - y) / (2.0 * sigma ** 2))return kerneldef cosine_kernel(x, y):余弦相似度核函数return np.inner(x, y) / (np.linalg.norm(x, 2) * np.linalg.norm(y, 2) 1e-10)def sigmoid_kernel(beta, c):Sigmoid核函数def kernel(x, y):return np.tanh(beta * np.inner(x, y) c)return kernel# 增强可视化功能 def plot_decision_boundary_enhanced(X, y, model, titleNone, axNone, alpha0.8,show_support_vectorsTrue, confidenceTrue,show_marginTrue, point_size60):绘制增强的决策边界可视化参数:X: 特征数据y: 标签数据model: SVM模型title: 标题ax: 坐标轴对象alpha: 透明度show_support_vectors: 是否显示支持向量confidence: 是否显示置信度show_margin: 是否显示间隔point_size: 数据点大小if ax is None:fig, ax plt.subplots(figsize(10, 8))# 使用前两个特征X_2d X[:, :2] if X.shape[1] 2 else X# 创建网格x_min, x_max X_2d[:, 0].min() - 1, X_2d[:, 0].max() 1y_min, y_max X_2d[:, 1].min() - 1, X_2d[:, 1].max() 1xx, yy np.meshgrid(np.linspace(x_min, x_max, 200),np.linspace(y_min, y_max, 200))# 对网格点进行预测if X.shape[1] 2:# 创建与原始特征维度相同的网格点grid np.zeros((xx.size, X.shape[1]))grid[:, 0] xx.ravel()grid[:, 1] yy.ravel()# 对其余特征用均值填充for i in range(2, X.shape[1]):grid[:, i] X[:, i].mean()else:grid np.c_[xx.ravel(), yy.ravel()]try:# 获取决策函数值距离超平面的距离Z model.decision_function(grid).reshape(xx.shape)# 预测结果Z_pred model.predict(grid).reshape(xx.shape)if confidence:# 使用绝对值距离来绘制渐变色的决策区域abs_Z np.abs(Z)max_abs_Z abs_Z.max()# 创建归一化的置信度值0-1范围conf abs_Z / max_abs_Z# 分别为不同类别创建颜色图cmap_blue plt.cm.Bluescmap_red plt.cm.Reds# 提取两个类别区域region_a np.copy(conf)region_b np.copy(conf)region_a[Z_pred ! 1] 0region_b[Z_pred ! -1] 0# 绘制带有渐变置信度的区域ax.imshow(region_a, cmapcmap_blue, alphaalpha,extent(x_min, x_max, y_min, y_max), originlower)ax.imshow(region_b, cmapcmap_red, alphaalpha,extent(x_min, x_max, y_min, y_max), originlower)else:# 简单的二分类区域ax.contourf(xx, yy, Z_pred, alphaalpha, cmapListedColormap([#FFAAAA, #AAAAFF]))# 绘制决策边界和间隔边界if show_margin:ax.contour(xx, yy, Z, levels[-1, 0, 1], colors[red, black, blue],linestyles[--, -, --], linewidths[1, 2, 1])else:ax.contour(xx, yy, Z, levels[0], colors[black],linestyles[-], linewidths[2])except Exception as e:print(f绘制决策边界时出错: {e})# 只绘制数据点不绘制决策边界ax.text(0.5, 0.5, 绘制决策边界失败,hacenter, vacenter, transformax.transAxes,bboxdict(facecolorred, alpha0.1))# 绘制数据点scatter ax.scatter(X_2d[:, 0], X_2d[:, 1], cy, cmapListedColormap([red, blue]),spoint_size, edgecolorsk, alpha0.8)# 绘制支持向量if show_support_vectors and hasattr(model, support_vectors_):sv model.support_vectors_if sv.shape[1] 2:sv sv[:, :2] # 只取前两个维度ax.scatter(sv[:, 0], sv[:, 1],spoint_size * 2, linewidth1, facecolorsnone, edgecolorsgreen)# 添加标题和图例if title:ax.set_title(title, fontsize14)else:kernel_type model.kernel if hasattr(model, kernel) else unknownax.set_title(fSVM (kernel{kernel_type}), fontsize14)ax.set_xlabel(特征 1, fontsize12)ax.set_ylabel(特征 2, fontsize12)# 设置坐标轴ax.set_xlim(xx.min(), xx.max())ax.set_ylim(yy.min(), yy.max())# 添加图例handles, labels scatter.legend_elements()class_labels [类别 -1, 类别 1]legend1 ax.legend(handles, class_labels, locupper right)ax.add_artist(legend1)if show_support_vectors and hasattr(model, support_vectors_):sv_handle plt.Line2D([0], [0], markero, colorw, markerfacecolornone,markeredgecolorgreen, markersize10, linewidth0)ax.legend([sv_handle], [支持向量], locupper left)# 添加网格ax.grid(True, linestyle--, alpha0.3)return axdef create_3d_visualization_advanced(X, y, methodpca, modelNone, title_suffix,show_decision_surfaceTrue):增强版3D可视化支持显示决策边界和支持向量# 确保没有NaN值if np.isnan(X).any():print(警告3D可视化数据中包含NaN值将使用均值填充)imputer SimpleImputer(strategymean)X imputer.fit_transform(X)# 降维到3Dif method pca:# PCA降维到3Dpca PCA(n_componentsmin(3, X.shape[1]))X_3d pca.fit_transform(X)title fPCA 3D可视化 {title_suffix}explained_var pca.explained_variance_ratio_axis_labels [fPC{i 1} ({explained_var[i]:.1%}) for i in range(min(3, X.shape[1]))]else:# t-SNE降维到3Dn_components min(3, X.shape[1])perplexity min(30, len(X) // 4) if len(X) 12 else 3tsne TSNE(n_componentsn_components, random_state42, perplexityperplexity)X_3d tsne.fit_transform(X)title ft-SNE 3D可视化 {title_suffix}axis_labels [ft-SNE {i 1} for i in range(n_components)]# 如果维度不足3填充零向量if X_3d.shape[1] 3:pad np.zeros((X_3d.shape[0], 3 - X_3d.shape[1]))X_3d np.hstack((X_3d, pad))for i in range(X_3d.shape[1] - len(axis_labels)):axis_labels.append(f填充维度 {i 1})# 绘制3D散点图fig go.Figure()# 添加决策曲面如果需要且模型可用if show_decision_surface and model is not None and X.shape[1] 3:try:# 创建3D网格x_min, x_max X_3d[:, 0].min() - 0.5, X_3d[:, 0].max() 0.5y_min, y_max X_3d[:, 1].min() - 0.5, X_3d[:, 1].max() 0.5xx, yy np.meshgrid(np.linspace(x_min, x_max, 30),np.linspace(y_min, y_max, 30))# 网格点在原始空间中的坐标if method pca:grid np.c_[xx.ravel(), yy.ravel(), np.zeros(xx.size)]# 计算第三维的值使得点在决策边界上# 这里简化了计算实际应用可能需要更复杂的方法z_vals []for i in range(grid.shape[0]):# 尝试找到在决策边界上的z值z_test np.linspace(X_3d[:, 2].min(), X_3d[:, 2].max(), 5)decision_vals []for z in z_test:point_3d np.array([grid[i, 0], grid[i, 1], z])try:# 将3D点投影回原始空间point_orig pca.inverse_transform(point_3d)decision_vals.append(model.decision_function([point_orig])[0])except:decision_vals.append(float(inf))# 找到最接近决策边界的z值idx np.argmin(np.abs(decision_vals))z_vals.append(z_test[idx])grid[:, 2] np.array(z_vals)# 重塑网格z grid[:, 2].reshape(xx.shape)# 添加决策曲面fig.add_trace(go.Surface(xxx, yyy, zz,colorscaleRdBu,opacity0.7,showscaleFalse,name决策曲面))except Exception as e:print(f3D决策曲面创建失败: {e})# 添加数据点for class_val in np.unique(y):mask y class_valname 负类 if class_val -1 else 正类color red if class_val -1 else bluefig.add_trace(go.Scatter3d(xX_3d[mask, 0],yX_3d[mask, 1],zX_3d[mask, 2],modemarkers,markerdict(size5,colorcolor,opacity0.8),namename,text[f样本 {i}, 类别: {负类 if label -1 else 正类} for i, label in enumerate(y[mask])],hovertemplate%{text}brx: %{x:.2f}bry: %{y:.2f}brz: %{z:.2f}extra/extra))# 如果提供了模型添加支持向量if model is not None and hasattr(model, support_vectors_):try:# 将支持向量映射到降维空间if method pca:sv_3d pca.transform(model.support_vectors_)# 如果维度不足3填充零向量if sv_3d.shape[1] 3:pad np.zeros((sv_3d.shape[0], 3 - sv_3d.shape[1]))sv_3d np.hstack((sv_3d, pad))else:# t-SNE不支持transform简单方案是寻找最接近支持向量的训练样本sv_3d np.zeros((len(model.support_vectors_), 3))for i, sv in enumerate(model.support_vectors_):# 找到最近的原始样本distances np.sum((X - sv) ** 2, axis1)nearest_idx np.argmin(distances)sv_3d[i] X_3d[nearest_idx]# 添加支持向量fig.add_trace(go.Scatter3d(xsv_3d[:, 0],ysv_3d[:, 1],zsv_3d[:, 2],modemarkers,markerdict(size8,colorgreen,symbolcircle,linedict(colorgreen, width2),opacity0.9),name支持向量))except Exception as e:print(f添加支持向量时出错: {e})# 更新布局fig.update_layout(titletitle,scenedict(xaxis_titleaxis_labels[0],yaxis_titleaxis_labels[1],zaxis_titleaxis_labels[2]),width900,height700,margindict(l0, r0, b0, t40))return figdef create_animated_decision_boundary(X, y, models, model_names, steps50):创建动画展示不同核函数的决策边界try:# 确保使用前两个特征X_2d X[:, :2] if X.shape[1] 2 else X# 创建网格x_min, x_max X_2d[:, 0].min() - 1, X_2d[:, 0].max() 1y_min, y_max X_2d[:, 1].min() - 1, X_2d[:, 1].max() 1xx, yy np.meshgrid(np.linspace(x_min, x_max, 100),np.linspace(y_min, y_max, 100))# 对于高维数据创建一个满足维度的网格if X.shape[1] 2:grid np.zeros((xx.size, X.shape[1]))grid[:, 0] xx.ravel()grid[:, 1] yy.ravel()# 使用平均值填充其余维度for i in range(2, X.shape[1]):grid[:, i] X[:, i].mean()else:grid np.c_[xx.ravel(), yy.ravel()]# 计算每个模型的决策函数Z_values []Z_pred_values []valid_models []valid_model_names []for i, (model, name) in enumerate(zip(models, model_names)):try:Z model.decision_function(grid).reshape(xx.shape)Z_pred model.predict(grid).reshape(xx.shape)Z_values.append(Z)Z_pred_values.append(Z_pred)valid_models.append(model)valid_model_names.append(name)except Exception as e:print(f模型 {name} 无法计算决策边界: {e})# 如果没有有效模型返回Noneif not valid_models:print(没有可用的模型来创建动画)return None# 创建动画帧frames []for step in range(steps):# 计算插值权重weights [np.sin(np.pi * (step / steps i / len(valid_models))) ** 2 for i in range(len(valid_models))]weights np.array(weights) / sum(weights) # 归一化权重# 混合决策函数Z_mix np.zeros_like(Z_values[0])for i, Z in enumerate(Z_values):Z_mix weights[i] * Z# 预测结果基于最高权重max_weight_idx np.argmax(weights)Z_pred_mix Z_pred_values[max_weight_idx]# 创建帧frame go.Frame(data[# 数据点go.Scatter(xX_2d[:, 0],yX_2d[:, 1],modemarkers,markerdict(size8,color[red if label -1 else blue for label in y],linedict(width1, colorblack)),showlegendFalse,),# 决策函数热图go.Contour(zZ_mix,xnp.linspace(x_min, x_max, 100),ynp.linspace(y_min, y_max, 100),colorscaleRdBu,showscaleFalse,contoursdict(start-2,end2,size0.5,showlabelsFalse),linedict(width1),opacity0.8),# 决策边界线go.Contour(zZ_mix,xnp.linspace(x_min, x_max, 100),ynp.linspace(y_min, y_max, 100),colorscale[[0, black], [1, black]],showscaleFalse,contoursdict(start0,end0,size1,showlabelsFalse),linedict(width2),opacity1)],namefframe{step})frames.append(frame)# 创建基础图形fig go.Figure(data[# 数据点go.Scatter(xX_2d[:, 0],yX_2d[:, 1],modemarkers,markerdict(size8,color[red if label -1 else blue for label in y],linedict(width1, colorblack)),name数据点),# 初始决策边界go.Contour(zZ_values[0],xnp.linspace(x_min, x_max, 100),ynp.linspace(y_min, y_max, 100),colorscaleRdBu,showscaleFalse,contoursdict(start-2,end2,size0.5,showlabelsFalse),linedict(width1),opacity0.8,name决策函数),# 初始决策边界线go.Contour(zZ_values[0],xnp.linspace(x_min, x_max, 100),ynp.linspace(y_min, y_max, 100),colorscale[[0, black], [1, black]],showscaleFalse,contoursdict(start0,end0,size1,showlabelsFalse),linedict(width2),opacity1,name决策边界)],framesframes,layoutgo.Layout(titleSVM决策边界动画,xaxisdict(range[x_min, x_max], title特征1),yaxisdict(range[y_min, y_max], title特征2),updatemenus[{type: buttons,buttons: [{label: 播放,method: animate,args: [None, {frame: {duration: 100, redraw: True}}]},{label: 暂停,method: animate,args: [[None], {frame: {duration: 0, redraw: True}}]}],direction: left,pad: {r: 10, t: 10},x: 0.1,y: 0,xanchor: right,yanchor: top}],sliders[{steps: [{args: [[fframe{k}],{frame: {duration: 100, redraw: True}}],label: str(valid_model_names[i % len(valid_model_names)]),method: animate}for k, i in zip(range(0, steps, steps // len(valid_model_names)), range(len(valid_model_names)))],x: 0.1,y: 0,currentvalue: {font: {size: 12},prefix: 模型: ,visible: True,xanchor: center},len: 0.9,pad: {b: 10, t: 50},transition: {duration: 300}}]))return figexcept Exception as e:print(f创建动画时出错: {e})return Nonedef visualize_metrics_over_C_gamma(X_train, y_train, X_test, y_test, kernelrbf):可视化C和gamma参数对模型指标的影响参数:X_train, y_train: 训练数据X_test, y_test: 测试数据kernel: 核函数类型try:# C参数网格C_range np.logspace(-3, 3, 7)# gamma参数网格(仅用于非线性核)if kernel ! linear:gamma_range np.logspace(-3, 2, 6)else:gamma_range [0.01] # 线性核不需要gamma但为了代码一致性设置一个默认值# 记录不同参数的性能指标results []# 训练和评估不同参数组合的模型for C in C_range:for gamma in gamma_range:try:if kernel linear:model SVC(kernelkernel, CC, probabilityTrue)else:model SVC(kernelkernel, CC, gammagamma, probabilityTrue)# 训练模型model.fit(X_train, y_train)# 在测试集上评估y_pred model.predict(X_test)accuracy, precision, recall, f1 calculate_metrics(y_test, y_pred)# 记录结果results.append({C: C,gamma: gamma,accuracy: accuracy,precision: precision,recall: recall,f1: f1})except Exception as e:print(f训练参数 C{C}, gamma{gamma} 失败: {e})# 添加一个无效结果results.append({C: C,gamma: gamma,accuracy: 0,precision: 0,recall: 0,f1: 0})# 转换为DataFrameresults_df pd.DataFrame(results)# 创建图形if kernel ! linear and len(gamma_range) 1:# 3D曲面图Cgamma vs 准确率fig plt.figure(figsize(18, 10))metrics [accuracy, precision, recall, f1]titles [准确率, 精确率, 召回率, F1分数]for i, (metric, title) in enumerate(zip(metrics, titles)):ax fig.add_subplot(2, 2, i 1, projection3d)try:# 重塑数据以适应3D曲面图pivoted results_df.pivot_table(valuesmetric,indexC,columnsgamma)X, Y np.meshgrid(np.log10(gamma_range), np.log10(C_range))Z pivoted.values# 绘制曲面surf ax.plot_surface(X, Y, Z, cmapviridis,linewidth0, antialiasedTrue, alpha0.8)# 添加标题和标签ax.set_title(f{kernel}核函数: {title} vs C,gamma)ax.set_xlabel(log10(gamma))ax.set_ylabel(log10(C))ax.set_zlabel(title)# 添加颜色条fig.colorbar(surf, axax, shrink0.5, aspect5)except Exception as e:print(f绘制3D曲面图失败: {e})ax.text2D(0.5, 0.5, 绘图失败,hacenter, transformax.transAxes,bboxdict(facecolorred, alpha0.1))plt.tight_layout()plt.show()else:# 对于线性核仅展示C的影响plt.figure(figsize(15, 5))metrics [accuracy, precision, recall, f1]titles [准确率, 精确率, 召回率, F1分数]for i, (metric, title) in enumerate(zip(metrics, titles)):plt.subplot(1, 4, i 1)plt.semilogx(results_df[C], results_df[metric], markero, linewidth2)plt.title(f线性核: {title} vs C)plt.xlabel(C值 (log scale))plt.ylabel(title)plt.grid(True)plt.tight_layout()plt.show()return results_dfexcept Exception as e:print(f参数可视化失败: {e})return pd.DataFrame()def plot_learning_curve(X, y, model_typesvm, kernels[linear, rbf, poly],train_sizesnp.linspace(0.1, 1.0, 10)):绘制学习曲线显示训练集大小对模型性能的影响参数:X, y: 数据和标签model_type: 模型类型kernels: 核函数列表train_sizes: 训练集比例try:plt.figure(figsize(15, 5))for i, kernel in enumerate(kernels):# 每个核函数一个子图plt.subplot(1, len(kernels), i 1)train_acc []test_acc []# 随机打乱数据indices np.random.permutation(len(X))X_shuffled X[indices]y_shuffled y[indices]for size in train_sizes:try:# 划分训练集和测试集train_size max(10, int(len(X) * size)) # 确保至少有10个样本if train_size len(X) - 10:train_size len(X) - 10 # 确保测试集至少有10个样本X_train, X_test X_shuffled[:train_size], X_shuffled[train_size:train_size 10]y_train, y_test y_shuffled[:train_size], y_shuffled[train_size:train_size 10]# 如果数据太少或类别不全跳过if len(np.unique(y_train)) 2 or len(np.unique(y_test)) 2:continue# 训练模型if kernel linear:model SVC(kernelkernel, C1.0)elif kernel rbf:model SVC(kernelkernel, C10.0, gamma0.1)else: # polymodel SVC(kernelkernel, C1.0, degree3)model.fit(X_train, y_train)# 评估模型train_acc.append(model.score(X_train, y_train))test_acc.append(model.score(X_test, y_test))except Exception as e:print(f学习曲线计算失败 (kernel{kernel}, size{size}): {e})# 绘制学习曲线train_sizes_plt train_sizes[:len(train_acc)]if len(train_acc) 0: # 确保有数据点plt.plot(train_sizes_plt, train_acc, o-, label训练集准确率)plt.plot(train_sizes_plt, test_acc, s-, label测试集准确率)else:plt.text(0.5, 0.5, 数据不足以绘制学习曲线,hacenter, vacenter, transformplt.gca().transAxes)plt.title(f{kernel}核函数的学习曲线)plt.xlabel(训练集比例)plt.ylabel(准确率)plt.grid(True)plt.legend(locbest)plt.tight_layout()plt.show()except Exception as e:print(f绘制学习曲线失败: {e})def create_comprehensive_performance_report(models, X_test, y_test, model_names):创建综合性能报告try:# 创建子图fig make_subplots(rows2, cols2,subplot_titles(模型性能对比, 混淆矩阵热图, ROC曲线对比, 特征重要性),specs[[{type: bar}, {type: heatmap}],[{type: scatter}, {type: bar}]])# 收集所有模型的性能指标results {}colors [blue, red, green, orange, purple]# 筛选有效的模型valid_models []valid_model_names []for model, name in zip(models, model_names):try:# 测试模型是否可用y_pred model.predict(X_test)valid_models.append(model)valid_model_names.append(name)except Exception as e:print(f模型 {name} 不可用: {e})if not valid_models:print(没有有效的模型可供评估)return None, {}for i, (model, name) in enumerate(zip(valid_models, valid_model_names)):# 预测y_pred model.predict(X_test)try:y_pred_proba model.predict_proba(X_test)[:, 1] if hasattr(model, predict_proba) else Noneexcept:y_pred_proba None# 计算指标accuracy, precision, recall, f1 calculate_metrics(y_test, y_pred)results[name] {accuracy: accuracy,precision: precision,recall: recall,f1_score: f1}# 混淆矩阵if i 0: # 只为第一个模型添加cm confusion_matrix(y_test, y_pred)# 归一化混淆矩阵cm_sum cm.sum(axis1)cm_norm np.zeros_like(cm, dtypefloat)for j in range(len(cm_sum)):if cm_sum[j] 0:cm_norm[j] cm[j] / cm_sum[j]# 添加混淆矩阵热图fig.add_trace(go.Heatmap(zcm_norm,x[预测-1, 预测1],y[实际-1, 实际1],colorscaleBlues,showscaleTrue,text[[f{cm[i, j]}br({cm_norm[i, j]:.1%}) for j in range(2)] for i in range(2)],hoverinfotext),row1, col2)# ROC曲线if y_pred_proba is not None:try:fpr, tpr, _ roc_curve(y_test, y_pred_proba)auc_score auc(fpr, tpr)results[name][auc] auc_scorefig.add_trace(go.Scatter(xfpr, ytpr, modelines,namef{name} (AUC{auc_score:.3f}),linedict(colorcolors[i % len(colors)])),row2, col1)except Exception as e:print(f计算ROC曲线时出错: {e})# 添加随机分类器线fig.add_trace(go.Scatter(x[0, 1], y[0, 1], modelines,name随机分类器, linedict(dashdash, colorblack)),row2, col1)# 性能指标对比柱状图metrics [accuracy, precision, recall, f1_score]metric_names [准确率, 精确率, 召回率, F1分数]for i, (metric, metric_name) in enumerate(zip(metrics, metric_names)):values [results[name].get(metric, 0) for name in valid_model_names]fig.add_trace(go.Bar(xvalid_model_names, yvalues, namemetric_name,marker_colorcolors[i % len(colors)]),row1, col1)# 特征重要性如果有线性模型has_linear Falsefor model, name in zip(valid_models, valid_model_names):if hasattr(model, coef_) and len(model.coef_) 0:has_linear Trueimportance np.abs(model.coef_[0])fig.add_trace(go.Bar(ximportance,y[f特征 {i 1} for i in range(len(importance))],orientationh,namename),row2, col2)break # 只显示一个线性模型的特征重要性if not has_linear:fig.add_annotation(text非线性模型br无法显示特征重要性,x0.5, y0.5,xrefx3, yrefy3,showarrowFalse,fontdict(size14))fig.update_layout(height800, showlegendTrue, title_textSVM模型综合性能报告)fig.update_xaxes(title_text模型, row1, col1)fig.update_yaxes(title_text分数, row1, col1)fig.update_xaxes(title_text假正例率, row2, col1)fig.update_yaxes(title_text真正例率, row2, col1)fig.update_xaxes(title_text重要性, row2, col2)fig.update_yaxes(title_text特征, row2, col2)return fig, resultsexcept Exception as e:print(f创建性能报告失败: {e})return None, {}# 自动调参功能 def auto_hyperparameter_tuning(X_train, y_train, cv5, dataset_typeNone):SVM自动调参针对不同数据集类型优化try:# 根据数据集类型调整参数网格if dataset_type data1 or dataset_type linear:# 线性数据集偏好线性核param_grid [{kernel: [linear], C: [0.1, 1, 10, 100]},{kernel: [rbf], C: [1, 10, 100], gamma: [0.1, 1, scale]}]print(对线性可分数据集进行调参...)elif dataset_type data2 or dataset_type spiral:# 螺旋数据集偏好RBF和多项式核param_grid [{kernel: [rbf], C: [0.1, 1, 10, 100], gamma: [0.01, 0.1, 1, 10]},{kernel: [poly], C: [0.1, 1, 10], gamma: [0.1, 1], degree: [2, 3, 4]}]print(对螺旋数据集进行调参...)else:# 通用参数网格param_grid [{kernel: [linear], C: [0.1, 1, 10, 100]},{kernel: [rbf], C: [0.1, 1, 10, 100], gamma: [0.001, 0.01, 0.1, 1, scale]},]print(对通用数据集进行调参...)# 如果数据集较小简化参数网格if len(X_train) 50:print(数据集较小使用简化调参...)param_grid [{kernel: [linear], C: [1, 10]},{kernel: [rbf], C: [1, 10], gamma: [scale]}]cv min(cv, 3) # 减少交叉验证折数# 网格搜索grid_search GridSearchCV(SVC(probabilityTrue),param_grid,cvcv,scoringaccuracy,n_jobs-1,verbose1)print(开始自动调参...)grid_search.fit(X_train, y_train)print(f最佳参数: {grid_search.best_params_})print(f最佳交叉验证分数: {grid_search.best_score_:.4f})return grid_search.best_estimator_, grid_search.best_params_except Exception as e:print(f自动调参失败: {e})# 返回一个默认模型default_model SVC(kernellinear, C1.0, probabilityTrue)default_model.fit(X_train, y_train)return default_model, {kernel: linear, C: 1.0}# 主函数 def main():主函数展示所有功能print( * 60)print(基于SVM进行分类预测)print( * 60)# 1. 数据加载 - 尝试加载CSV文件print(\n步骤1: 数据加载)# 尝试加载linear.csv和spiral.csvcsv_files [linear.csv, spiral.csv] # 可以替换为实际文件路径try:# 先尝试加载linear.csvX, y load_csv_with_specific_columns(csv_files[0])print(成功加载CSV文件)except Exception as e:print(fCSV加载失败: {e})print(使用模拟数据...)X, y generate_linear_data() # 生成线性可分数据作为默认# 2. 数据预处理及可视化print(\n步骤2: 数据检查与预处理)# 检查是否有NaN值if np.isnan(X).any() or np.isnan(y).any():print(数据中包含NaN值进行预处理...)X, y preprocess_data(X, y)# 基本统计信息print(f数据维度: X{X.shape}, y{y.shape})print(f类别分布 - 类别(-1): {np.sum(y -1)}, 类别(1): {np.sum(y 1)})# 可视化数据plt.figure(figsize(10, 8))if X.shape[1] 2: # 至少有2个特征才能2D可视化plt.scatter(X[y -1, 0], X[y -1, 1],colorred, markero, label类别 -1)plt.scatter(X[y 1, 0], X[y 1, 1],colorblue, markerx, label类别 1)else: # 1维特征用y0作为第二维plt.scatter(X[y -1], np.zeros_like(X[y -1]),colorred, markero, label类别 -1)plt.scatter(X[y 1], np.zeros_like(X[y 1]),colorblue, markerx, label类别 1)plt.title(数据集可视化, fontsize14)plt.xlabel(特征 1, fontsize12)plt.ylabel(特征 2 if X.shape[1] 2 else Y 0, fontsize12)plt.legend()plt.grid(True, linestyle--, alpha0.7)plt.tight_layout()plt.show()# 3. 数据预处理print(\n步骤3: 数据标准化)scaler StandardScaler()X_scaled scaler.fit_transform(X)X_train, X_test, y_train, y_test train_test_split(X_scaled, y, test_size0.2, random_state42)print(f训练集大小: {X_train.shape})print(f测试集大小: {X_test.shape})# 4. 手动SMO算法训练print(\n步骤4: 手动SMO算法训练)if X.shape[1] 2:# 使用前两个特征进行线性SVM演示X_demo X_scaled[:, :2]X_train_demo, X_test_demo, y_train_demo, y_test_demo train_test_split(X_demo, y, test_size0.2, random_state42)try:# 训练线性SVMalpha, b SMO(X_train_demo, y_train_demo, kerlinear_kernel, C1.0, max_iter100)# 计算权重和支持向量sup_idx alpha 1e-5if np.sum(sup_idx) 0: # 确保有支持向量w np.sum((alpha[sup_idx] * y_train_demo[sup_idx]).reshape(-1, 1) * X_train_demo[sup_idx], axis0)print(f支持向量个数: {np.sum(sup_idx)})print(f权重向量 w [{w[0]:.4f}, {w[1]:.4f}])print(f偏置项 b {b:.4f})# 绘制手动SMO的决策边界plt.figure(figsize(10, 8))# 创建网格x_min, x_max X_train_demo[:, 0].min() - 1, X_train_demo[:, 0].max() 1y_min, y_max X_train_demo[:, 1].min() - 1, X_train_demo[:, 1].max() 1xx, yy np.meshgrid(np.arange(x_min, x_max, 0.02),np.arange(y_min, y_max, 0.02))# 计算网格点的预测值Z np.sign(xx * w[0] yy * w[1] b)# 绘制决策边界plt.contourf(xx, yy, Z, alpha0.3, cmapListedColormap([#FFAAAA, #AAAAFF]))# 绘制数据点plt.scatter(X_train_demo[y_train_demo -1, 0], X_train_demo[y_train_demo -1, 1],colorred, markero, label训练集 - 类别 -1)plt.scatter(X_train_demo[y_train_demo 1, 0], X_train_demo[y_train_demo 1, 1],colorblue, markerx, label训练集 - 类别 1)# 绘制支持向量plt.scatter(X_train_demo[sup_idx, 0], X_train_demo[sup_idx, 1],s100, facecolorsnone, edgecolorsgreen, linewidth2,label支持向量)# 绘制测试点plt.scatter(X_test_demo[:, 0], X_test_demo[:, 1],markers, cy_test_demo, cmapListedColormap([red, blue]),alpha0.3, s50, label测试集)# 绘制超平面plt.plot([x_min, x_max], [(-b - w[0] * x_min) / w[1], (-b - w[0] * x_max) / w[1]],k-, linewidth2)# 绘制间隔plt.plot([x_min, x_max], [(-b - w[0] * x_min - 1) / w[1], (-b - w[0] * x_max - 1) / w[1]],k--, linewidth1)plt.plot([x_min, x_max], [(-b - w[0] * x_min 1) / w[1], (-b - w[0] * x_max 1) / w[1]],k--, linewidth1)plt.title(手动SMO算法实现的SVM决策边界, fontsize14)plt.xlabel(特征 1, fontsize12)plt.ylabel(特征 2, fontsize12)plt.legend()plt.grid(True, linestyle--, alpha0.3)plt.xlim(x_min, x_max)plt.ylim(y_min, y_max)plt.tight_layout()plt.show()# 预测和评估y_pred_demo np.sign(X_test_demo w.reshape(-1, 1) b).flatten()accuracy, precision, recall, f1 calculate_metrics(y_test_demo, y_pred_demo)print(f手动SMO SVM - 准确率: {accuracy:.4f}, 精确率: {precision:.4f}, 召回率: {recall:.4f}, F1: {f1:.4f})else:print(SMO算法没有找到支持向量跳过手动SVM演示)except Exception as e:print(f手动SMO算法训练失败: {e})print(跳过手动SVM演示)else:print(特征维度不足跳过手动SVM演示)# 5. 不同核函数比较print(\n步骤5: 不同核函数比较)# 训练不同核函数的sklearn SVM模型kernels [linear, rbf, poly, sigmoid]kernel_names [线性核, RBF核, 多项式核, Sigmoid核]fig, axs plt.subplots(2, 2, figsize(18, 14))axs axs.flatten()models []names []for i, (kernel, name) in enumerate(zip(kernels, kernel_names)):try:# 调整参数if kernel linear:model SVC(kernelkernel, C1.0, probabilityTrue)elif kernel rbf:model SVC(kernelkernel, C10.0, gamma0.1, probabilityTrue)elif kernel poly:model SVC(kernelkernel, C1.0, degree3, gamma0.1, probabilityTrue)else: # sigmoidmodel SVC(kernelkernel, C1.0, gamma0.1, probabilityTrue)# 训练模型model.fit(X_train, y_train)models.append(model)names.append(name)# 评估模型y_pred model.predict(X_test)accuracy, precision, recall, f1 calculate_metrics(y_test, y_pred)print(f{name} SVM - 准确率: {accuracy:.4f}, 精确率: {precision:.4f}, 召回率: {recall:.4f}, F1: {f1:.4f})# 绘制决策边界try:plot_decision_boundary_enhanced(X_scaled, y, model,titlef{name} SVM,axaxs[i],confidenceTrue)except Exception as e:print(f绘制决策边界失败 ({name}): {e})axs[i].set_title(f{name} SVM (绘制失败))axs[i].text(0.5, 0.5, 绘制决策边界失败,hacenter, vacenter, transformaxs[i].transAxes,bboxdict(facecolorred, alpha0.1))except Exception as e:print(f模型训练失败 ({name}): {e})axs[i].text(0.5, 0.5, f模型训练失败: {name},hacenter, vacenter, transformaxs[i].transAxes,bboxdict(facecolorred, alpha0.1))plt.tight_layout()plt.show()# 6. 自动调参print(\n步骤6: 自动超参数调优)# 确定数据集类型 - 如果是简单的线性可分数据使用data1类型if X.shape[1] 2: # 如果特征数小于等于2可能是线性或螺旋数据# 这里简化处理假设线性数据dataset_type data1else:# 对于高维数据使用通用调参dataset_type Nonebest_model, best_params auto_hyperparameter_tuning(X_train, y_train, dataset_typedataset_type)# 展示最佳模型的决策边界if X.shape[1] 2:plt.figure(figsize(10, 8))try:plot_decision_boundary_enhanced(X_scaled, y, best_model,titlef最佳SVM模型 ({best_model.kernel}),confidenceTrue,show_marginTrue)plt.show()except Exception as e:print(f绘制最佳模型决策边界失败: {e})# 7. 参数对性能的影响print(\n步骤7: 参数对性能的影响)# 可视化C和gamma参数对性能的影响if X_train.shape[0] 20 and not np.isnan(X_train).any(): # 数据点足够多且无缺失值才展示results_df visualize_metrics_over_C_gamma(X_train, y_train, X_test, y_test, kernelbest_model.kernel)else:print(数据量不足或有缺失值跳过参数影响可视化)# 8. 学习曲线print(\n步骤8: 学习曲线分析)# 绘制学习曲线 - 数据足够多时才展示if X.shape[0] 50 and not np.isnan(X).any():plot_learning_curve(X_scaled, y, kernels[linear, rbf])else:print(数据量不足或有缺失值跳过学习曲线分析)# 9. 3D可视化print(\n步骤9: 生成3D可视化)# 检查数据是否适合3D可视化if not np.isnan(X_scaled).any() and len(X) 10:try:# 创建3D可视化fig_3d create_3d_visualization_advanced(X_scaled, y,methodpca,modelbest_model,title_suffix(PCA降维))fig_3d.show()except Exception as e:print(f3D可视化创建失败: {e})else:print(数据不适合3D可视化跳过此步骤)# 10. 动画可视化print(\n步骤10: 创建决策边界动画)# 检查数据是否适合创建动画if not np.isnan(X_scaled).any() and len(X) 10 and X.shape[1] 2:try:# 过滤有效的模型valid_models []valid_names []for model, name in zip(models, names):try:# 测试模型是否可用model.predict(X_test[:1])valid_models.append(model)valid_names.append(name)except:continueif valid_models:# 添加最佳模型if best_model not in valid_models:valid_models.append(best_model)valid_names.append(最佳模型)# 创建动画anim_fig create_animated_decision_boundary(X_scaled, y, valid_models, valid_names)if anim_fig:anim_fig.show()else:print(没有有效的模型可创建动画)except Exception as e:print(f动画创建失败: {e})else:print(数据不适合创建动画跳过此步骤)# 11. 综合性能报告print(\n步骤11: 生成综合性能报告)# 汇总所有模型all_models models.copy()all_names names.copy()# 添加最佳模型if best_model not in all_models:all_models.append(best_model)all_names.append(最佳模型)# 生成性能报告performance_fig, performance_results create_comprehensive_performance_report(all_models, X_test, y_test, all_names)if performance_fig:performance_fig.show()# 12. 总结print(\n 所有演示完成 )print(f最佳模型参数: {best_params})print(数据处理、模型训练、可视化和性能评估已全部完成)if __name__ __main__:try:main()except Exception as e:print(f程序运行出错: {e})print(使用简化版本演示基本功能...)# 简化版本演示print(\n简化版本演示:)try:# 尝试加载CSVX, y load_csv_with_specific_columns(linear.csv)# 处理缺失值if np.isnan(X).any() or np.isnan(y).any():imputer SimpleImputer(strategymean)X imputer.fit_transform(X)# 如果y中有缺失值移除这些样本valid_indices ~np.isnan(y)if not all(valid_indices):X X[valid_indices]y y[valid_indices]except:X, y generate_linear_data()# 简单的2D可视化plt.figure(figsize(8, 6))if X.shape[1] 2:colors [red if label -1 else blue for label in y]plt.scatter(X[:, 0], X[:, 1], ccolors, alpha0.7)else:plt.scatter(X[:, 0], np.zeros_like(X[:, 0]),c[red if label -1 else blue for label in y],alpha0.7)plt.title(数据可视化)plt.xlabel(特征1)plt.ylabel(特征2 if X.shape[1] 2 else )plt.legend([负类(-1), 正类(1)])plt.grid(True, linestyle--, alpha0.5)plt.show()# 简单的SVM模型X_train, X_test, y_train, y_test train_test_split(X, y, test_size0.2, random_state42)try:model SVC(kernellinear)model.fit(X_train, y_train)print(f模型准确率: {model.score(X_test, y_test):.4f})except Exception as e:print(f模型训练失败: {e})四、我的收获 支持向量机是一种强大而优雅的算法它将优化理论、凸分析和核方法等高级数学概念与实用的分类器结合起来。通过这次实验我不仅掌握了支持向量机SVM的理论和实现更重要的是建立了理论与实践的连接培养了分析问题和实现复杂系统的能力。特别是对标签列的特殊处理需求让我意识到在实际应用中算法往往需要根据具体业务需求进行定制和调整。因此在本节的实验中我的收获有 一理论与实践的结合 支持向量机的理论在课本上看起来十分抽象特别是涉及到拉格朗日乘子、对偶问题和KKT条件等数学概念时。然而通过亲手实现SMO算法我真正理解了这些理论的实际含义 1.最大间隔的直观感受通过可视化决策边界我直观地看到了SVM如何在保证分类正确的前提下最大化间隔这使得抽象的优化目标变得具体可感。 2. 对偶问题的意义以前只知道SVM求解时会转化为对偶问题但不理解为什么。通过编码实现我发现对偶形式不仅计算效率更高而且为核技巧的应用提供了可能性。 3. 支持向量的作用观察到大部分训练点的拉格朗日乘子为零只有少数支持向量真正影响决策边界这极大地提高了模型的泛化能力和计算效率。 二核函数的选择与影响 实验中尝试了不同的核函数(线性、RBF、多项式、Sigmoid)对比它们在各类数据集上的表现 1. 线性核在线性可分数据上表现优秀模型简单且计算速度快但在复杂数据上无法找到有效的决策边界。 2. RBF核适应性最强能处理各种复杂模式但调参难度大。特别是γ参数对模型影响显著 - 过小会导致欠拟合过大则容易过拟合。 3. 多项式核在某些特定问题上表现出色但计算开销大且数值稳定性较差。度数参数需要谨慎选择。 4. Sigmoid核虽然理论上很有趣但在实际应用中往往不如其他核函数参数调整也更为困难。 通过3D可视化和动画我清晰地看到不同核函数如何在特征空间中构建决策边界这大大加深了我对核方法本质的理解。 三数据处理的重要性 本项目特别关注CSV数据处理尤其是标签列的特殊处理这让我认识到数据预处理对机器学习模型的重要性 1. 缺失值处理对特征列使用均值填充是常见做法但对标签列则需要更谨慎的处理策略。 2. 字符串到数值的映射设计合理的映射函数既保留原始数据语义又满足算法需求这是实际应用中的关键挑战。 3. 标准化的必要性未经标准化的数据可能导致某些特征主导模型决策从实验中可以明显看到标准化对SVM性能的显著影响。 四可视化的价值 交互式可视化不仅美观更是理解和调试模型的强大工具 决策边界可视化通过可视化决策边界和支持向量我能够直观地判断模型是否过拟合或欠拟合。 2. 参数影响分析3D图表展示了C和γ参数对模型性能的影响帮助我更有针对性地调整参数。 3. 降维技术的应用使用PCA和t-SNE进行3D可视化让我理解了高维数据的结构以及模型在实际空间中的行为方式。 4. 动画效果动态展示不同核函数的决策边界变化这种动态视角比静态图表能提供更多信息。 五写Python代码的收获 从编码角度这个项目也带给我很多收获 1. 模块化设计将复杂系统拆分为数据访问、算法实现、可视化和自动调参等模块大大提高了代码的可读性和可维护性。 2. 错误处理在实际数据处理中异常情况远比预想的多全面的错误处理和降级策略确保了系统的稳定运行。 3. 算法效率通过实现SMO算法我体会到了算法优化的重要性特别是启发式选择变量和矩阵预计算等技巧。 4. 交互性设计设计交互式界面比简单的数据处理要复杂得多但带来的用户体验提升也是显著的。 六未来改进方向 1. 增加更多核函数实现更多特殊核函数如Chi-Square核、波形核等探索它们在特定问题上的表现。 2. 优化SMO算法当前实现的是简化版SMO未来可以加入完整的启发式变量选择策略进一步提高收敛速度。 3. 扩展到多分类使用one-vs-one或one-vs-all策略将SVM扩展到多分类问题。 4. 集成学习将SVM作为基学习器探索集成方法如SVM-Bagging或多核融合的可能性。 5. 在线学习探索增量SVM算法使模型能够处理流式数据。 五、我的感受 支持向量机虽然在近年来被深度学习的热潮所掩盖但它依然是机器学习领域的基石在许多场景中有着不可替代的价值。这次实验不仅加深了我对机器学习的理解也培养了我解决实际问题的能力更加深了我对人工智能算法的兴趣是一次非常有价值的学习经历。
http://www.pierceye.com/news/25344/

相关文章:

  • 个人网站设计及实现论文h5网站建站
  • 做网站 兼职番禺网站开发哪家专业
  • 网站原型图设计找个做网站的人
  • 荆门做网站网站整改建设安全设备方案
  • 扬州哪家公司做网站比较好做网站公司东莞
  • 广告营销策划张家界网站seo
  • 做爰全过程免费的视网站长域名转换短域名
  • 北京手机站建站网站制作完成后如何发布
  • 做什么网站开发好深圳网站建设服务哪个便宜点
  • 广州网站设计公司排名网站需要多大宽带
  • 中江门户网站公众平台官网登录入口
  • 内容导购网站模板网站建设教育培训
  • 山西建设监理协会网站wordpress二维码手工
  • 网站的组成开源程序做网站
  • 17网站一起做网店潮汕依依三站合一 网站建设
  • 长春网站制作套餐莆田seo外包公司
  • 建设企业网站公司价格最新营销模式
  • 安防公司网站建设展示型的网站用
  • 网站建设2017排名做swf网站
  • 网站如何解析2017年网站建设公司
  • 搭建三合一网站一般做淘宝的素材都有哪个网站
  • 淘宝网站的建设模仿ios系统的html网站
  • 高端网站设计企业迁安网站建设
  • 网站跳出率 报告网站建设保障方案
  • 服装网站的建设策划手机网站建设语言
  • 温州做网站seo计算机网站建设知识
  • 韩国网站的风格wordpress的语言
  • 新乡网站建设微信扫码即可打开的网站如何做
  • 网站是别人做的域名自己怎么续费福田公司官网
  • 购物网站 后台模板想要去国外网站买东西怎么做