Python implements breadth-first search BFS and heuristic search algorithm for path query of two subway stations

Operation ideas and processes:

  • Get data from webpage
  • process the data
  • Data Mapping
  • Search Algorithm Design

data collection

Required Package :requests

#Prepare a link, grab the page information through requests, and print 
r = requests . get ( 'http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json' ) 
r . encoding = r.apparent_encoding print ( r.text
 ) _ _ _ _
  • 1
  • 2
  • 3
  • 4

Part of the data is as follows:

insert image description here

Through the returned information [a series of dictionary types] we can find some key-value pair data that seems to be what we want [site name,Longitude and latitude coordinatesWait]

Data processing and extraction

Required Packages:re,numpy,networkx

Before extracting, we need to think about what data we want to get?
- All line information
   - Dictionary stores dict: key: line name [string]; value: station name [list]
   
- All site information
   - Dictionary stores dict: key: site name [string]; value: site coordinates [tuple]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
def  get_lines_stations_info ( text ) : 
    # Traverse text format data to form a location data structure 
    # dict of all line information: key: line name; value: station name list 
    lines_info =  { } 
    # dict of all station information: key: station name; value : station coordinates (x,y) 
    stations_info =  { }

    pattern = re . compile ( '"st".*?"kn"' ) 
    lines_list = pattern . findall ( text ) 
    for i in  range ( len ( lines_list ) ) : 
        # subway line names 
        pattern = re . compile ( '"ln ":".*?"' ) 
        line_name = pattern . findall ( lines_list [ i ] ) [ 0] [ 6 : - 1 ] 
        # station info list 
        pattern = re . compile ( '"rs".*?"sp"' ) 
        temp_list = pattern . findall ( lines_list [ i ] ) 
        station_name_list =  [ ] 
        for j in  range ( len ( temp_list ) ) : 
            # station name 
            pattern = re . compile ('"n":".*?"' ) 
            station_name = pattern . findall ( temp_list [ j ] ) [ 0 ] [ 5 : - 1 ] 
            station_name_list . append ( station_name ) 
            # coordinates(x,y) 
            pattern = re . compile ( '"sl":".*?"' ) 
            #Convert the format 
            position =  tuple ( map ( float , pattern.findall ( temp_list [ j ] ) [ 0 ] [ 6 : - 1 ] . split ( ' ,' ) ) ) # data join station info dict 
            stations_info [ station_name ] = position
             

        # Add data to subway lines dict 
        lines_info [ line_name ]  = station_name_list
     return lines_info , stations_infos
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

After each function is written, test it as follows:

lines_info , stations_info = get_lines_stations_info ( r . text ) 
 print ( lines_info ) #line   info 
 print ( stations_info ) #station  info
  • 1
  • 2
  • 3

insert image description here

insert image description here

data visualization

import networkx as nx
 # This code solves the problem of not displaying Chinese characters Set the font 
matplotlib . rcParams [ 'font.sans-serif' ]  =  [ 'SimHei' ] 
# Specify the width and height of the figure 
plt . figure ( figsize = ( 20 ,  20 ) ) 
# define an undirected graph 
stations_graph = nx . Graph ( ) 
# add nodes 
stations_graph . add_nodes_from ( list ( stations_info . keys ( )) ) 
#plot 
nx .draw ( stations_graph , stations_info , with_labels = True , node_size = 6 ) # show 
plt .show ( ) _

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

insert image description here

Creation of graphs

There are generally two storage formats for graphs:adjacency matrixandadjacency list

This time, the adjacency list is used to build the map:

#Add sta2 to the value of the dictionary whose key is sta1 
def  add_sta2tosta1 ( info , sta1 , sta2 ) : 
    list_tmp = info . get ( sta1 ) 
    if  not list_tmp : 
        list_tmp =  [ ] 
    list_tmp . append ( sta2 ) 
    info [ sta1 ]  = list_tmp
     return info
 #Create an adjacency table based on line information 
def  create_map ( lines_info ) :
    #Adjacency list: dictionary form key: station name value: adjacent station name list 
    neighbor_info =  { } #Create  an empty dictionary 
    for line_name , station_list in lines_info . items ( ) : 
        for i in  range ( len ( station_list )  -  1 ) : 
            sta_one = station_list [ i ] 
            sta_two = station_list [ i + 1 ] 
            # Both parties must add each other to add information
            neighbor_info = add_sta2tosta1 ( neighbor_info , sta_one , sta_two ) 
            neighbor_info = add_sta2tosta1 ( neighbor_info , sta_two , sta_one ) 
    return neighbor_info
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

Test the function:

nb_info = create_map ( lines_info ) 
print ( nb_info )
  • 1
  • 2

insert image description here

With the adjacency list, we can search and traverse the map. Before that, we add a polyline to the original graph according to the adjacency list information.

stations_connection_graph = nx . Graph ( nb_info ) 
plt . figure ( figsize = ( 30 ,  30 ) ) 
nx . draw ( stations_connection_graph , stations_info , with_labels = True , node_size = 6 ) 
plt . show ( )

  • 1
  • 2
  • 3
  • 4
  • 5

Thank you for being pictureduglyto ^ ^

insert image description here

Search Algorithm Design

Method 1: Non-heuristic searchDFS, BFS

If we want to find a shortest path, we can useDFS? The answer is no.

  • When the purpose of solving is to find the optimal solution, useBFS.
  • When the solution goal is to just find the solution, and the memory size is limited, useDFS.
#BFD finds the shortest path 
def  BFS ( lines_info , neighbor_info , start_station , end_station ) : 
    # Search strategy: take the number of stations as the cost (because the ticket price is calculated by station) 
    # In this case, the coordinate distance between stations is difficult to convert into Reliable heuristic function, so only use a simple BFS algorithm 
    # Since each deep layer is the cost plus 1, the cost of each layer is the same, and there is no difference between counting and not, so omit #Input 
    information error 
    if  ( not neighbor_info . get ( start_station ) )  or  ( not neighbor_info . get ( end_station ) ) : 
        return  None #Search 
    route: key: site name value: path from start_station to key
    paths =  { } #Add 
    initial site 
    paths [ start_station ]  =  [ start_station ] 
    while  True : #Temporary 
        variable: store the node to be traversed in the next layer key: site name value: path from start_station to key 
        paths_tmp =  { } 
        #traverse All nodes to be traversed this time 
        for  ( k , val )  in paths . items ( ) : #Get 
            a list of adjacent nodes of the current node 
            neighbors = neighbor_info [ k ] . copy ( )
            if  ( len ( val )  >=  2 ) : 
                pre_station = val [ - 2 ] 
                # Do not go to the previous station 
                neighbors . remove ( pre_station ) 
            # Traverse the adjacent nodes 
            for station in neighbors : 
                # If you have passed the 
                if station in paths : 
                    continue #Add 
                the node 
                path = val . copy ( ) 
                path .append ( station ) 
                # enqueue 
                paths_tmp [ station ]  = path
                 if station == end_station : 
                    return path
         # update queue 
        paths = paths_tmp
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

test:

print ( BFS ( lines_info , nb_info , 'Chaoyangmen' , 'Wangjing West' ) )
  • 1

['Chaoyangmen', 'Dongshitiao', 'Dongzhimen', 'Liufang', 'Guangximen', 'Shaoyaoju', 'Wangjingxi']

Method 2: Heuristic search for the shortest distance first

#Heuristic search with path distance as cost 
def  get_path_Astar ( lines_info , neighbor_info , stations_info , from_station , to_station ) : 
    # Search strategy: The cost is the accumulation of the straight-line distance between the stations of the path, and the straight-line distance from the current station to the target is the heuristic function

    # Check the input station name 
    if  not neighbor_info . get ( from_station ) : 
        print ( 'The starting station "%s" does not exist. Please enter it correctly!' % from_station ) 
        return  None 
    if  not neighbor_info . get ( to_station ) : 
        print ( 'Destination Station '%s' does not exist. Please enter it correctly!' % to_station ) 
        return  None
    
    # Calculate the straight-line distance from all nodes to the target node, alternate 
    distances =  { } 
    x , y = stations_info . get ( to_station ) 
    for  ( k , v )  in stations_info . items ( ) : 
        x0 , y0 = stations_info . get ( k ) 
        l =  ( ( x - x0 ) ** 2  +  ( y -y0 ) ** 2 ) ** 0.5 
        distances [ k ]  = l
        
    # Searched nodes, dict 
    # key=site name, value is the minimum cost from a known origin to this site 
    searched =  { } 
    searched [ from_station ]  =  0
    
    # The data structure is a pandas dataframe 
    # index is the station name 
    # g is the traveled path, h is the heuristic function value (the current straight-line distance to the target) 
    nodes = pd . DataFrame ( [ [ [ from_station ] ,  0 ,  0 , distances . get ( from_station ) ] ] , 
                         index = [ from_station ] , columns = [ 'path' ,  'cost' ,  'g' ,  'h' ] ) 
    print (nodes )
    
    count =  0 
    while  True : 
        if count >  1000 : 
            break 
        nodes . sort_values ( 'cost' , inplace = True ) 
        #index: the name of the site with the smallest cost function node: some other attribute information 
        for index , node in nodes . iterrows ( ) : 
            #Number of iterations +1 
            count +=  1 
            #Search neighbors to the station with the shortest distance from the destination: neighbors of the node closest to the destination 
            neighbors = neighbor_info . get (index ) . copy ( ) 
            if  len ( node [ 'path' ] )  >=  2 : 
                # Do not search for 
                neighbors in the reverse direction of this path . remove ( node [ 'path' ] [ - 2 ] ) 
            for i in  range ( len ( neighbors ) ) : 
                count +=  1 
                neighbor = neighbors [ i ]
                # g : the path already traveled + the distance 
                g = node [ 'g' ]  + get_distance ( stations_info , index , neighbor ) 
                # h : the distance from the current node to the target node 
                h = distances [ neighbor ] 
                # cost Calculate the estimated Hasan household 
                cost = g + h
                 #Add the current node to the shortest distance path index 
                path = node [ 'path' ] . copy ( ) 
                path .append ( neighbor ) 
                # find the target, end 
                if neighbor == to_station : 
                    print ( 'retrieved %d times in total.' % count ) 
                    return path
                 #if the current node has been searched 
                if neighbor in searched : 
                    # and now searched The path is not optimal, ignore 
                    if g >= searched [ neighbor ] : 
                        continue 
                    else : #Otherwise 
                        modify g information 
                        searched [ neighbor ]  = g
                        # Modify the node information corresponding to this site [delete first, then add] 
                        nodes . drop ( neighbor , axis = 0 , inplace = True ) #inplace=True means to delete the original data 
                        row = pd . DataFrame ( [ [ path , cost , g , h ] ] , index = [ neighbor ] , columns = [ 'path' ,  'cost' ,  'g',  'h' ] ) 
                        nodes = nodes . append ( row ) 
                else : 
                    # mark if not 
                    searched [ neighbor ]  = g
                    row = pd . DataFrame ( [ [ path , cost , g , h ] ] , index = [ neighbor ] , columns = [ 'path' ,  'cost' ,  'g' ,  'h' ] ) 
                    nodes = nodes . append ( row ) 
            # All neighbors of this site have been searched, delete this node 
            nodes . drop ( index, axis = 0 , inplace = True )

        # The outer for loop only runs the first line of data, and then re-sort and then calculate the 
        continue     
        
    print ( 'Failed to find path' ) 
    return  None

#Get the distance between the two 
def  get_distance ( stations_info , str1 , str2 ) : 
    x1 , y1 = stations_info . get ( str1 ) 
    x2 , y2 = stations_info . get ( str2 ) 
    return  ( ( x1 - x2 ) ** 2  +  ( y1 - y2 ) ** 2 ) **  0.5
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93

test:

paths = get_path_Astar ( lines_info , nb_info , stations_info ,  'Chaoyangmen' ,  'Wangjing West' ) 
if paths : 
 print ( "The path totals %d stations." % ( len ( paths ) - 1 ) ) 
 print ( "- " .join ( paths ) )
  • 1
  • 2
  • 3
  • 4

A total of 122 searches were performed.
The route has a total of 6 stops.
Chaoyangmen-Dongshitiao-Dongzhimen-Liufang-Guangximen-Shaoyaoju-Wangjing West

Main function code:

if __name__ ==  '__main__' : 
    # get data 
    r = requests . get ( 'http://map.amap.com/service/subway?_1469083453978&srhdata=1100_drw_beijing.json' ) 
    r . encoding = r . apparent_encoding
     # data extraction and Processing 
    lines_info , stations_info = get_lines_stations_info ( r . text ) 
    #print(lines_info) 
    #print(stations_info)

    # Draw a subway map 
    matplotlib . rcParams [ 'font.sans-serif' ]  =  [ 'SimHei' ] # This code solves the problem of not displaying Chinese characters 
    plt . figure ( figsize = ( 20 ,  20 ) ) 
    stations_graph = nx . Graph ( ) 
    stations_graph . add_nodes_from ( list ( stations_info . keys ( ) ) ) 
    nx . draw (stations_graph , stations_info , with_labels = True , node_size = 6 ) 
    #plt.show()

    #Create a map [connection table/connection matrix] 
    nb_info = create_map ( lines_info ) 
    #print(nb_info)

    stations_connection_graph = nx .Graph ( nb_info ) 
    plt . figure ( figsize = ( 30 , 30 ) ) 
    nx . draw ( stations_connection_graph , stations_info , with_labels = True , node_size = 6 ) #plt.show( ) 
    

    print ( BFS ( lines_info , nb_info , 'Chaoyangmen' , 'Wangjing West' ) ) 
    paths = get_path_Astar ( lines_info , nb_info , stations_info ,  'Chaoyangmen' ,  'Wangjing West' ) 
    if paths : 
        print ( "The path totals %d stations ." % ( len ( paths ) - 1 ) ) 
        print ( "-" . join ( paths ) )
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

Related: Python implements breadth-first search BFS and heuristic search algorithm for path query of two subway stations