Python implements breadth-first search BFS and heuristic search algorithm for path query of two subway stations
Operation ideas and processes:
- Get data from webpage
- process the data
- Data Mapping
- Search Algorithm Design
data collection
Required Package :
requests
#Prepare a link, grab the page information through requests, and print
r = requests . get ( 'http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json' )
r . encoding = r.apparent_encoding print ( r.text
) _ _ _ _
- 1
- 2
- 3
- 4
Part of the data is as follows:
Through the returned information [a series of dictionary types] we can find some key-value pair data that seems to be what we want [site name
,Longitude and latitude coordinates
Wait]
Data processing and extraction
Required Packages:
re
,numpy
,networkx
Before extracting, we need to think about what data we want to get?
- All line information
- Dictionary stores dict: key: line name [string]; value: station name [list]
- All site information
- Dictionary stores dict: key: site name [string]; value: site coordinates [tuple]
- 1
- 2
- 3
- 4
- 5
- 6
def get_lines_stations_info ( text ) :
# Traverse text format data to form a location data structure
# dict of all line information: key: line name; value: station name list
lines_info = { }
# dict of all station information: key: station name; value : station coordinates (x,y)
stations_info = { }
pattern = re . compile ( '"st".*?"kn"' )
lines_list = pattern . findall ( text )
for i in range ( len ( lines_list ) ) :
# subway line names
pattern = re . compile ( '"ln ":".*?"' )
line_name = pattern . findall ( lines_list [ i ] ) [ 0] [ 6 : - 1 ]
# station info list
pattern = re . compile ( '"rs".*?"sp"' )
temp_list = pattern . findall ( lines_list [ i ] )
station_name_list = [ ]
for j in range ( len ( temp_list ) ) :
# station name
pattern = re . compile ('"n":".*?"' )
station_name = pattern . findall ( temp_list [ j ] ) [ 0 ] [ 5 : - 1 ]
station_name_list . append ( station_name )
# coordinates(x,y)
pattern = re . compile ( '"sl":".*?"' )
#Convert the format
position = tuple ( map ( float , pattern.findall ( temp_list [ j ] ) [ 0 ] [ 6 : - 1 ] . split ( ' ,' ) ) ) # data join station info dict
stations_info [ station_name ] = position
# Add data to subway lines dict
lines_info [ line_name ] = station_name_list
return lines_info , stations_infos
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
After each function is written, test it as follows:
lines_info , stations_info = get_lines_stations_info ( r . text ) print ( lines_info ) #line info print ( stations_info ) #station info
- 1
- 2
- 3
data visualization
import networkx as nx
# This code solves the problem of not displaying Chinese characters Set the font
matplotlib . rcParams [ 'font.sans-serif' ] = [ 'SimHei' ]
# Specify the width and height of the figure
plt . figure ( figsize = ( 20 , 20 ) )
# define an undirected graph
stations_graph = nx . Graph ( )
# add nodes
stations_graph . add_nodes_from ( list ( stations_info . keys ( )) )
#plot
nx .draw ( stations_graph , stations_info , with_labels = True , node_size = 6 ) # show
plt .show ( ) _
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
Creation of graphs
There are generally two storage formats for graphs:adjacency matrix
andadjacency list
This time, the adjacency list is used to build the map:
#Add sta2 to the value of the dictionary whose key is sta1
def add_sta2tosta1 ( info , sta1 , sta2 ) :
list_tmp = info . get ( sta1 )
if not list_tmp :
list_tmp = [ ]
list_tmp . append ( sta2 )
info [ sta1 ] = list_tmp
return info
#Create an adjacency table based on line information
def create_map ( lines_info ) :
#Adjacency list: dictionary form key: station name value: adjacent station name list
neighbor_info = { } #Create an empty dictionary
for line_name , station_list in lines_info . items ( ) :
for i in range ( len ( station_list ) - 1 ) :
sta_one = station_list [ i ]
sta_two = station_list [ i + 1 ]
# Both parties must add each other to add information
neighbor_info = add_sta2tosta1 ( neighbor_info , sta_one , sta_two )
neighbor_info = add_sta2tosta1 ( neighbor_info , sta_two , sta_one )
return neighbor_info
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
Test the function:
nb_info = create_map ( lines_info ) print ( nb_info )
- 1
- 2
With the adjacency list, we can search and traverse the map. Before that, we add a polyline to the original graph according to the adjacency list information.
stations_connection_graph = nx . Graph ( nb_info )
plt . figure ( figsize = ( 30 , 30 ) )
nx . draw ( stations_connection_graph , stations_info , with_labels = True , node_size = 6 )
plt . show ( )
- 1
- 2
- 3
- 4
- 5
Thank you for being picturedugly
to ^ ^
Search Algorithm Design
Method 1: Non-heuristic searchDFS, BFS
If we want to find a shortest path, we can use
DFS
? The answer is no.
- When the purpose of solving is to find the optimal solution, use
BFS
.- When the solution goal is to just find the solution, and the memory size is limited, use
DFS
.
#BFD finds the shortest path
def BFS ( lines_info , neighbor_info , start_station , end_station ) :
# Search strategy: take the number of stations as the cost (because the ticket price is calculated by station)
# In this case, the coordinate distance between stations is difficult to convert into Reliable heuristic function, so only use a simple BFS algorithm
# Since each deep layer is the cost plus 1, the cost of each layer is the same, and there is no difference between counting and not, so omit #Input
information error
if ( not neighbor_info . get ( start_station ) ) or ( not neighbor_info . get ( end_station ) ) :
return None #Search
route: key: site name value: path from start_station to key
paths = { } #Add
initial site
paths [ start_station ] = [ start_station ]
while True : #Temporary
variable: store the node to be traversed in the next layer key: site name value: path from start_station to key
paths_tmp = { }
#traverse All nodes to be traversed this time
for ( k , val ) in paths . items ( ) : #Get
a list of adjacent nodes of the current node
neighbors = neighbor_info [ k ] . copy ( )
if ( len ( val ) >= 2 ) :
pre_station = val [ - 2 ]
# Do not go to the previous station
neighbors . remove ( pre_station )
# Traverse the adjacent nodes
for station in neighbors :
# If you have passed the
if station in paths :
continue #Add
the node
path = val . copy ( )
path .append ( station )
# enqueue
paths_tmp [ station ] = path
if station == end_station :
return path
# update queue
paths = paths_tmp
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
test:
print ( BFS ( lines_info , nb_info , 'Chaoyangmen' , 'Wangjing West' ) )
- 1
['Chaoyangmen', 'Dongshitiao', 'Dongzhimen', 'Liufang', 'Guangximen', 'Shaoyaoju', 'Wangjingxi']
Method 2: Heuristic search for the shortest distance first
#Heuristic search with path distance as cost
def get_path_Astar ( lines_info , neighbor_info , stations_info , from_station , to_station ) :
# Search strategy: The cost is the accumulation of the straight-line distance between the stations of the path, and the straight-line distance from the current station to the target is the heuristic function
# Check the input station name
if not neighbor_info . get ( from_station ) :
print ( 'The starting station "%s" does not exist. Please enter it correctly!' % from_station )
return None
if not neighbor_info . get ( to_station ) :
print ( 'Destination Station '%s' does not exist. Please enter it correctly!' % to_station )
return None
# Calculate the straight-line distance from all nodes to the target node, alternate
distances = { }
x , y = stations_info . get ( to_station )
for ( k , v ) in stations_info . items ( ) :
x0 , y0 = stations_info . get ( k )
l = ( ( x - x0 ) ** 2 + ( y -y0 ) ** 2 ) ** 0.5
distances [ k ] = l
# Searched nodes, dict
# key=site name, value is the minimum cost from a known origin to this site
searched = { }
searched [ from_station ] = 0
# The data structure is a pandas dataframe
# index is the station name
# g is the traveled path, h is the heuristic function value (the current straight-line distance to the target)
nodes = pd . DataFrame ( [ [ [ from_station ] , 0 , 0 , distances . get ( from_station ) ] ] ,
index = [ from_station ] , columns = [ 'path' , 'cost' , 'g' , 'h' ] )
print (nodes )
count = 0
while True :
if count > 1000 :
break
nodes . sort_values ( 'cost' , inplace = True )
#index: the name of the site with the smallest cost function node: some other attribute information
for index , node in nodes . iterrows ( ) :
#Number of iterations +1
count += 1
#Search neighbors to the station with the shortest distance from the destination: neighbors of the node closest to the destination
neighbors = neighbor_info . get (index ) . copy ( )
if len ( node [ 'path' ] ) >= 2 :
# Do not search for
neighbors in the reverse direction of this path . remove ( node [ 'path' ] [ - 2 ] )
for i in range ( len ( neighbors ) ) :
count += 1
neighbor = neighbors [ i ]
# g : the path already traveled + the distance
g = node [ 'g' ] + get_distance ( stations_info , index , neighbor )
# h : the distance from the current node to the target node
h = distances [ neighbor ]
# cost Calculate the estimated Hasan household
cost = g + h
#Add the current node to the shortest distance path index
path = node [ 'path' ] . copy ( )
path .append ( neighbor )
# find the target, end
if neighbor == to_station :
print ( 'retrieved %d times in total.' % count )
return path
#if the current node has been searched
if neighbor in searched :
# and now searched The path is not optimal, ignore
if g >= searched [ neighbor ] :
continue
else : #Otherwise
modify g information
searched [ neighbor ] = g
# Modify the node information corresponding to this site [delete first, then add]
nodes . drop ( neighbor , axis = 0 , inplace = True ) #inplace=True means to delete the original data
row = pd . DataFrame ( [ [ path , cost , g , h ] ] , index = [ neighbor ] , columns = [ 'path' , 'cost' , 'g', 'h' ] )
nodes = nodes . append ( row )
else :
# mark if not
searched [ neighbor ] = g
row = pd . DataFrame ( [ [ path , cost , g , h ] ] , index = [ neighbor ] , columns = [ 'path' , 'cost' , 'g' , 'h' ] )
nodes = nodes . append ( row )
# All neighbors of this site have been searched, delete this node
nodes . drop ( index, axis = 0 , inplace = True )
# The outer for loop only runs the first line of data, and then re-sort and then calculate the
continue
print ( 'Failed to find path' )
return None
#Get the distance between the two
def get_distance ( stations_info , str1 , str2 ) :
x1 , y1 = stations_info . get ( str1 )
x2 , y2 = stations_info . get ( str2 )
return ( ( x1 - x2 ) ** 2 + ( y1 - y2 ) ** 2 ) ** 0.5
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
test:
paths = get_path_Astar ( lines_info , nb_info , stations_info , 'Chaoyangmen' , 'Wangjing West' ) if paths : print ( "The path totals %d stations." % ( len ( paths ) - 1 ) ) print ( "- " .join ( paths ) )
- 1
- 2
- 3
- 4
A total of 122 searches were performed.
The route has a total of 6 stops.
Chaoyangmen-Dongshitiao-Dongzhimen-Liufang-Guangximen-Shaoyaoju-Wangjing West
Main function code:
if __name__ == '__main__' :
# get data
r = requests . get ( 'http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json' )
r . encoding = r . apparent_encoding
# data extraction and Processing
lines_info , stations_info = get_lines_stations_info ( r . text )
#print(lines_info)
#print(stations_info)
# Draw a subway map
matplotlib . rcParams [ 'font.sans-serif' ] = [ 'SimHei' ] # This code solves the problem of not displaying Chinese characters
plt . figure ( figsize = ( 20 , 20 ) )
stations_graph = nx . Graph ( )
stations_graph . add_nodes_from ( list ( stations_info . keys ( ) ) )
nx . draw (stations_graph , stations_info , with_labels = True , node_size = 6 )
#plt.show()
#Create a map [connection table/connection matrix]
nb_info = create_map ( lines_info )
#print(nb_info)
stations_connection_graph = nx .Graph ( nb_info )
plt . figure ( figsize = ( 30 , 30 ) )
nx . draw ( stations_connection_graph , stations_info , with_labels = True , node_size = 6 ) #plt.show( )
print ( BFS ( lines_info , nb_info , 'Chaoyangmen' , 'Wangjing West' ) )
paths = get_path_Astar ( lines_info , nb_info , stations_info , 'Chaoyangmen' , 'Wangjing West' )
if paths :
print ( "The path totals %d stations ." % ( len ( paths ) - 1 ) )
print ( "-" . join ( paths ) )
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31