Skip to content

关于地址后期出现高级信息对标准化的影响 #165

@Borber

Description

@Borber

去除后期出现的更高级的信息. 会大幅提升相似度, 作者大大能优化一些这种情况吗?

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果:

海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#()2(单元)2()203()
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=null, 
	roadNum=null, 
	buildingNum=A-32, 
	text=西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典dic/region.dic
加载扩展词典dic/community.dic
加载扩展停止词典dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(A), Term(32), Term(西片), Term(), Term(), Term(), Term(项目), Term(地块), Term(11#), Term(), Term(22203), Term(), Term(单元), Term(), Term()], town=Term(灵山镇), village=null, road=null, roadNum=null, roadNumValue=0), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@2cfb4a64], 
	similarity=0.4886777774252209
)

去除第二个海口市

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果

海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#()2(单元)2()203()
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4, 
	buildingNum=A-32, 
	text=绿地城润园灵山西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典dic/region.dic
加载扩展词典dic/community.dic
加载扩展停止词典dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4), Term(A), Term(32), Term(绿地城), Term(润园), Term(灵山), Term(西片), Term(), Term(), Term(), Term(项目), Term(地块), Term(11#), Term(), Term(22203), Term(), Term(单元), Term(), Term()], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4), roadNumValue=4), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@4b6995df, io.patamon.geocoding.similarity.MatchedTerm@2fc14f68, io.patamon.geocoding.similarity.MatchedTerm@591f989e, io.patamon.geocoding.similarity.MatchedTerm@66048bfd, io.patamon.geocoding.similarity.MatchedTerm@61443d8f], 
	similarity=0.7152705001057788
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions