8 classic data structures commonly used in python

  • Python native data structures: Tuple(), List[], Set{}, Dictionary{A:B};
  • Data structures in the NumPy package: array Ndarray (with multiple operations), matrix Matrix (multiple linear algebra calculations);
  • Data structures in the Pandas package: series Series (index + 1 column data), data frame DataFrame (index + multi-column data table).

Table of contents

Data structures in the NumPy package

Array (Ndarray)

Matrix

Data structures in Pandas, including Series and DataFrame

Series

DataFrame (DataFrame)

python native data structure

Tuple

List

Collection (Set)

Dictionary


 

Data structures in the NumPy package

Data structures in NumPy , including Ndarray, Matrix

Array (Ndarray)

Create Ndarray

  • Introduce the NumPy package and name it np. Array data structures can only be used after the introduction of the NumPy package
import numpy as np

To create an array object, in the NumPy package:

  1. The array() method can convert a sequence object into an array;
  2. The arange() method can generate a bunch of arrays of custom endpoints;
  3. ones generates an array of all 1s;
  4. The empty() method generates an array of the given type and dimensions without data initialization;
  5. random() generates a random array;
  6. linspace() Generate a one-dimensional array with specified start and end values ​​and step size, for example, generate an array with 5 elements from 1 to 10
  1. import numpy as np
  2. array001 = np.array([ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ])
  3. a2 = np.arange( 5 )
  4. a3 = np.ones(( 2 , 2 ))
  5. a4 = np.empty(( 2 , 2 ))
  6. a5 = np.random.rand( 4 , 2 )
  7. a6 = np.linspace( 10 , 30 , 5 )
  8. print ( '\nThe serial data is converted into an array:' ,array001,
  9. '\nShow the data structure type:' , type (array001),
  10. '\narange() function created array:' ,a2,
  11. '\nones() function to create an array of all 1s:\n' ,a3,
  12. '\nempty() function to create an unassigned array:\n' ,a4,
  13. '\nrandom() function to create random array:\n' ,a5,
  14. '\nlinespace() function to create random array:' ,a6)

Convert the sequence data to get an array: [ 1 2 3 4 5 6 7 8 9 10 11 12] 
Display the data structure type: <class 'numpy.ndarray'> 
The array created by the arange() function: [0 1 2 3 4] 
Array of all ones created by ones() function:
 [[1. 1.]
 [1. 1.]] 
Unassigned array created by empty() function:
 [[0. 0.]
 [0. 0.]] 
random Random array created by () function:
 [[0.39902074 0.63298526]
 [0.09231821 0.23007193]
 [0.09899536 0.83000881]
 [0.27760961 0.65135898]] 
Random array created by linespace() function: [10. 15. 20. 25

Ndarray query operations

  • Arrays can use array[a:b] to extract subsets from the array, or perform mass assignment operations on this basis.
  1. array002 = np.array([[ 1 , 2 , 3 , 4 ],[ 5 , 6 , 7 , 8 ],[ 9 , 10 , 11 , 12 ]])
  2. print ( '\nOne-dimensional array index:' ,array001[ 4 :],
  3. '\n2D array index:' ,array002[ 1 : 3 , 2 : 4 ]) #2-3 rows, 3-4 columns

1D array index: [ 5 6 7 8 9 10 11 12] 
2D array index: [[ 7 8] [11 12]]

The following are commonly used attributes in multidimensional arrays. Among them, shape can return the data structure of the object, such as the number of rows and columns. In addition to returning a tuple representing each dimension of the array , the structure of the array can also be changed by reshape

  1. array004 = array001.reshape( 3 ,- 1 )
  2. print ( '\nThe array after changing the structure\n' ,array004,
  3. '\nThe dimensions of the array:' ,array004.shape,
  4. '\nArray structure type:' ,array004.dtype,
  5. '\nNumber of array data:' ,array004.size,
  6. '\nNumber of bytes of array data type:' ,array004.itemsize,
  7. '\nArray dimension:' ,array004.ndim)

Array after changing the structure
 [[ 1 2 3 4]
 [ 5 6 7 8]
 [ 9 10 11 12]] 
Array dimensions: (3, 4) 
Array structure type: int32 
Array data number: 12 
Array data type bytes Number: 4 
Array Dimensions: 2

Ndarray increase operation

  • The append() function can add elements or list-type data, but it must be noted that the dimensions need to be consistent.
  1. array003 = np.append(array002,[[ 1 ],[ 2 ],[ 3 ]],axis = 1 ) # axis = 1 add in column direction
  2. print ( '\nThe array after adding one column\n' ,array003)

Array after adding one column
 [[ 1 2 3 4 1]
 [ 5 6 7 8 2]
 [ 9 10 11 12 3]]

Ndarray delete operation

  • Use the delete(x,i,axis=) method to delete rows or columns in the array object. The third parameter axis determines whether the row or column is to be deleted. The object to be deleted can be a number or a tuple.
  1. array003 = array002.T
  2. print ( 'The array after deleting a single line:\n' ,np.delete(array003, 1 ,axis= 0 )) # axis=0 delete the line
  3. array003 = array002.T
  4. print ( 'The array after batch deletion: \n' ,np.delete(array003,( 1 , 3 ), 0 ))
  5. array003 = array002.T
  6. print ( 'Delete the array after a single column\n' ,np.delete(array003, 1 , 1 )) # axis=1 delete column

Array after deleting single row:
 [[ 1 5 9]
 [ 3 7 11]
 [ 4 8 12]]
Array after batch deleting:
 [[ 1 5 9]
 [ 3 7 11]]
Array after deleting single column
 [[ 1 9 ]
 [ 2 10]
 [ 3 11]
 [ 4 12]]

NdarrayModification

  • Batch modification of array data can be performed by indexing.
  1. array002[ 1 : 2 ]= 0
  2. print ( 'Array batch assignment\n' ,array002)
  3. array003 = array002.T
  4. array003[ 1 ][ 1 ] = 100
  5. print ( 'The modified array\n' ,array003)

Array mass assignment
 [[ 1 2 3 4]
 [ 0 0 0 0]
 [ 9 10 11 12]]
Modified array
 [[ 1 0 9]
 [ 2 100 10]
 [ 3 0 11]
 [ 4 0 12]]

Ndarray other operations

  1. 2D array transpose. array.T can get the result of the transposed array object
  2. stacking of arrays. First enter two new arrays, then use vstack for vertical stacking and hstack for horizontal stacking
  1. arr1 = np.array([ 1 , 2 , 3 ])
  2. arr2 = np.array([ 4 , 5 , 6 ])
  3. print ( 'After vertical stacking:\n' ,np.vstack((arr1,arr2)),
  4. '\nAfter horizontal stacking:\n' ,np.hstack((arr1,arr2)))

After stacking vertically:
 [[1 2 3]
 [4 5 6]] 
After stacking horizontally:
 [1 2 3 4 5 6]

Convert Ndarray to other data structures

  1. arr3 = np.array([[ 1 , 2 , 3 ],[ 4 , 5 , 6 ]])
  2. print ( 'The Ndarray before conversion is:\n' ,arr3)
  3. import pandas as pd
  4. dfFromNdarray = pd.DataFrame(arr3)
  5. print ( 'The result of converting Ndarray to DataFrame is:\n' , dfFromNdarray) #with row number and column number

The Ndarray before conversion is:
 [[1 2 3]
 [4 5 6]] The result of
Ndarray conversion to DataFrame is:
    0 1 2
0 1 2 3
1 4 5 6

  1. arrFromDataFrame = dfFromNdarray.values
  2. print ( 'The result of converting DataFrame to Ndarry is:\n' ,arrFromDataFrame) #Only extract the value

The result of converting the DataFrame to Ndarry is:
 [[1 2 3]
 [4 5 6]]

Matrix _

Create Matrix

  • Objects of other data structures can be converted to matrix types using the mat() method.
  1. array1 = [ 1 , 2 , 3 ]
  2. array2 = [ 6 , 7 , 8 ]
  3. array3 = [ 11 , 12 , 17 ]
  4. matrix = np.mat([array1,array2,array3])
  5. print ( 'Display the data structure type:' , type (matrix))
  6. print (matrix)

Show the data structure type: <class 'numpy.matrix'>
[[ 1 2 3]
 [ 6 7 8]
 [11 12 17]]

Create a random matrix. There are many methods for creating special matrices in numpy. Here use the empty() method to create a new matrix with random data

  1. matrix1 = np.empty(( 3 , 3 ))
  2. print (matrix1)

[0.00000000e+000 0.00000000E+000 0.0000000000E+000]
 [0.00000000E+000 0.00000000E+000 2.27270197e-321]
 [9.30350261e+19910343781E-3.38460783e+125]]]

Matrix query operation

  • There are the following common properties in the matrix for viewing the matrix
  1. print ( 'The size of each dimension of the matrix:' ,matrix.shape)
  2. print ( 'Number of all data in the matrix:' ,matrix.size)
  3. print ( 'The type of each data of the matrix:' ,matrix.dtype)

The size of each dimension of the matrix: (3, 3)
The number of all data in the matrix: 9
The type of each data in the matrix: int32

Matrix increase operation

  • Matrix merge. The c_() method is used for concatenation, and the result of the production matrix will also be determined according to the parameter order; the r_() method is used for column concatenation.
  1. mat1 = np.mat([[ 1 , 2 ],[ 3 , 4 ]])
  2. mat2 = np.mat([ 4 , 5 ])
  3. matrix_r = np.c_[mat1,mat2.T]
  4. print ( 'Add mat2 matrix to the right side of the original matrix\n' ,matrix_r)
  5. matrix_l = np.c_[mat2.T,mat1]
  6. print ( 'Add mat2 matrix to the left of the original matrix\n' ,matrix_l)
  7. matrix_u = np.r_[np.mat([array1]),matrix]
  8. print ( 'Connect the matrix above the original matrix\n' ,matrix_u)

Add mat2 matrix to the right of the original matrix
 [[1 2 4]
 [3 4 5]]
Add mat2 matrix to the left of the original matrix
 [[4 1 2]
 [5 3 4]]
Connect the matrix above the original matrix
 [[ 1 2 3 ]
 [ 1 2 3]
 [ 6 7 8]
 [11 12 17]]

Matrix delete operation

  • The delete() method can delete the specified row and column of the matrix, which is similar to the usage in the array.
  1. matrix2 = np.delete(matrix, 1 ,axis = 1 )
  2. print ( 'The result after deleting the first line\n' ,matrix2)
  3. matrix3 = np.delete(matrix, 1 ,axis= 0 )
  4. print ( 'The result after deleting the first column\n' ,matrix3)

The result after deleting the first row
 [[ 1 3]
 [ 6 8]
 [11 17]]
The result after deleting the first column
 [[ 1 2 3]
 [11 12 17]]

Matrix special operations

  1. Matrix operations, in matrix operations, * is rewritten for matrix multiplication, and dot() is used to calculate matrix dot product
  2. If you need to multiply the corresponding positions, you need to use other functions.
  1. mat3 = np.mat([[ 5 , 6 ],[ 7 , 8 ]])
  2. matrix4 = mat1*mat3
  3. print ( 'Matrix multiplication result\n' ,matrix4)
  4. matrix5 = mat1.dot(mat3)
  5. print ( 'Matrix dot product result\n' ,matrix5)

matrix multiplication result
 [[19 22]
 [43 50]]
matrix dot multiplication result
 [[19 22]
 [43 50]]

Matrix common functions. Matrices can also be transposed using .T. linalg.inv() can be used for inversion operation, and an error will be reported if there is no inverse matrix.

  1. matrix6 = matrix.T
  2. matrix7 = np.linalg.inv(mat1)
  3. print ( '\nAfter matrix transposition:\n' ,matrix6,
  4. '\nAfter matrix inversion:\n' ,matrix7)

After matrix transpose:
 [[ 1 6 11]
 [ 2 7 12]
 [ 3 8 17]] 
After matrix inversion:
 [[-2. 1. ]
 [ 1.5 -0.5]]

Find matrix eigenvalues ​​(must be a square matrix using numpy)

  1. matrix8 = np.linalg.eig(matrix)
  2. print (matrix8)

(array([24.88734753, -0.8418908 , 0.95454327]), matrix([[-0.1481723 , -0.87920199, 0.10036602],
        [-0.4447565 , 0.3814255 , -0.82855015],
        [-0.88331004, 0.28551435, 0.550846 ]]))

Matrix conversion to other data structures

  • Due to their similar structure, matrices are often converted to lists and arrays.
  1. print ( 'Matrix list conversion:\n' ,matrix.tolist(),
  2. '\nMatrix to array:\n' ,np.array(matrix))

Matrix list conversion:
 [[1, 2, 3], [6, 7, 8], [11, 12, 17]] 
Matrix to array:
 [[ 1 2 3]
 [ 6 7 8]
 [11 12 17]]

 

Data structures in Pandas , including Series and DataFrame

Series ( Series )

Create Series

  • Introduce the Pandas package and take the alias pd
import pandas as pd
  • First create a dictionary, use the Series() method to convert the dictionary into a sequence object, and the key of the dictionary will automatically become the index of the series; if the list is converted, the produced sequence object will be automatically assigned an index value.
  1. sdata = { 'Ohio' : 35000 , 'Texas' : 71000 , 'Oregon' : 16000 , 'Utah' : 5000 }
  2. s0 = pd.Series(sdata)
  3. print ( 'Using the sequence object generated by the dictionary\n' ,s0)
  4. print ( 'Display the data structure type:' , type (s0))
  5. s1 = pd.Series([ 6 , 1 , 2 , 9 ])
  6. print ( 'Using the sequence object generated by the list\n' ,s1)

Sequence object generated from dictionary
 Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
dtype: int64
Show the data structure type: <class 'pandas.core.series.Series'>
Sequence object generated from list
 0 6
1 1
2 2
3 9
dtype : int64

  • Add an index, increase the index for the series by specifying the index
  1. s1 = pd.Series([ 6 , 1 , 2 , 9 ],index=[ 'a' , 'b' , 'c' , 'd' ])
  2. print (s1)

a 6
b 1
c 2
d 9
dtype: int64

Series query operation

  • values ​​displays the values ​​in the series, index displays the index, and you can also display elements by index value.
  1. print ( 'The value of the sequence\n' ,s0.values)
  2. print ( 'The index of the sequence\n' ,s0.index)
  3. print ( 'Search sequence by subscript' ,s0[ 2 ])
  4. print ( 'Find element by index value' ,s0[ 'Utah' ])
  5. print ( 'Batch search sequence by subscript\n' ,s0[: 2 ])
  6. print ( 'Batch find elements by index value\n' ,s0[[ 'Ohio' , 'Oregon' ]])

The value of the sequence
 [35000 71000 16000 5000]
The index of the sequence
 Index(['Ohio', 'Texas', 'Oregon', 'Utah'], dtype='object')
Find the sequence 16000 according to the subscript
Find the element 5000 according to the index value
Batch find sequence by subscript
 Ohio 35000
Texas 71000
dtype: int64
Batch find element by index value
 Ohio 35000
Oregon 16000
dtype: int64

Series increase operation

  • The append() method adds elements to the series, and index can specify the index value.
  1. s2 = s1.append(pd.Series([ 12 ],index=[ 'e' ]))
  2. print (s2)

a 6
b 1
c 2
d 9
e 12
dtype: int64

Series delete operation

  • Delete elements in Series (elements can only be deleted by index)
  1. s3 = s1.drop( 'a' )
  2. print (s3)

dtype: int64
b 1
c 2
d 9

dtype: int64

Series modification operation

  • Elements in the sequence can be found and updated directly by index.
  1. s1[ 'a' ] = 4 #Change the element with index a in s1 to 4
  2. print (s1)

a 4
b 1
c 2
d 9
dtype: int64

SeriesSpecial Operations

  • sequence ordering. The sort_values() method can use the values ​​of the series to sort in ascending order.
print (s1.sort_values)

a 4

b 1
c 2
d 9
dtype: int64>

  • Find the median of a sequence. The median() method can directly get the median of the sequence, and operations such as comparison can be performed on it.
  1. print (s1)
  2. print ( 'The median is: ' + str (s1.median()))
  3. print ( 'The number greater than the median of the sequence\n' ,s1[s1>s1.median()])

The median is: 3.0
The number greater than the median of the sequence
 a 4
d 9
dtype: int64

  • Sequence operations, operations between two series, can add, subtract, multiply and divide (must ensure that the index is consistent).
  1. s2 = pd.Series([ 4 , 3 , 5 , 8 ],index=[ 'a' , 'b' , 'c' , 'd' ])
  2. print (s2+s1)

a 8
b 4
c 7
d 17
dtype: int64

  • sequentially. The data_range() method in the pandas package can generate time series for easy data processing.
  1. s3 = pd.Series([ 100 , 150 , 200 ])
  2. print ( 'The resulting sequence is:\n' ,s3)
  3. idx = pd.date_range(start= '2019-9' ,freq= 'M' ,periods= 3 )
  4. print ( '\nThe time series generated is:\n' ,idx)
  5. s3.index = idx
  6. print ( '\nThe time series generated is:\n' ,s3)

The resulting sequence is:
 0 100
1 150
2 200
dtype: int64

The resulting time series is:
 DatetimeIndex(['2019-09-30', '2019-10-31', '2019-11-30'], dtype='datetime64[ns]', freq='M')

The resulting time series is:
 2019-09-30 100
2019-10-31 150
2019-11-30 200
Freq: M, dtype: int64

Convert Series to other data structures

  1. dfFromSeries = s2.to_frame()
  2. print ( 'Series to DataFrame\n' , dfFromSeries)
  3. print ( 'Display data structure type:' , type (dfFromSeries))

Series to DataFrame
    0
a 4
b 3
c 5
d 8
Display data structure type: <class 'pandas.core.frame.DataFrame'>

  1. dictFromSeries = s2.to_dict()
  2. print ( 'Series to Dict\n' , dictFromSeries)
  3. print ( 'Display data structure type:' , type (dictFromSeries))

Series to Dict
 {'a': 4, 'b': 3, 'c': 5, 'd': 8}
Display data structure type: <class 'dict'>

DataFrame (DataFrame)

Create DataFrame

Introduce pandas package to create DataFrame objects. The dictionary is created first, then the DataFrame object is created using the DataFrame() method. Name its index by index.name. Finally use the to_csv and to_excel methods to save it as a csv and excel file; it can also be created with a list: pd.DataFrame(data,columns,index).

  1. dic1 = { 'name' :[ 'Tom' , 'Lily' , 'Cindy' , 'Petter' ], 'no' :[ '001' , '002' , '003' , '004' ], 'age' :[ 16 , 16 , 15 , 16 ], 'gender' :[ 'm' , 'f' , 'f' , 'm' ]}
  2. df1 = pd.DataFrame(dic1)
  3. print ( 'Display the data structure type' , type (df1))
  4. df1.index.name = 'id'
  5. #df1.to_csv('students.csv')
  6. #df1.to_excel('students.xls') ! ! ! will report an error
  7. print (df1)

Show the data structure type <class 'pandas.core.frame.DataFrame'>
      name no age gender
id                         
0 Tom 001 16 m
1 Lily 002 16 f
2 Cindy 003 15 f
3 Petter 004 16 m

DataFrame query operations

  • DataFrame.name can return the entire column of data whose index value is name, and DataFrame.loc[i] can return all the data of the specified number of rows. In addition to this, you can also use Find content based on time series.
  • ! ! ! loc[ ] operates by column name iloc[ ] operates by column number
  • Get column index: df.cloums 
  • Get row index: df.index
  • Get value: df.value
  1. column = df1.no
  2. row = df1.loc[ 3 ]
  3. print ( '\nColumn data index\n' ,column, '\nRow data index\n' ,row)

Column data index
 id
0 001
1 002
2 003
3 004
Name: no, dtype: object 
Row data index
name Petter
no 004
age 16
gender m
Name: 3, dtype: object

DataFrame increase operation

  • Use the append() method to add a classmate's information, where the values ​​are added according to the row index. The update() method can add columns to the data frame.
  1. print ( 'Before modification:\n' ,df1)
  2. df2 = df1.append([{ 'name' : 'Stark' , 'no' : '005' , 'age' : 15 , 'gender' : 'm' }],ignore_index= True ) #Then the index number is 4 , if not written, it is 0
  3. print ( 'Add line:\n' , df2)
  4. df2[ 'new_Col' ] = [ 1 , 2 , 3 , 4 , 5 ]
  5. print ( 'Add column:\n' , df2)

Before modification:
       name no age gender
id                         
0 Tom 001 16 m
1 Lily 002 16 f
2 Cindy 003 15 f
3 Petter 004 16 m
Added row:
      name no age gender
0 Tom 001 16 m
1 Lily 002 16 f
2 Cindy 003 15 f
3 Petter 004 16 m
4 Stark 005 15 m
Add columns:
      name no age gender new_Col
0 Tom 001 16 m 1
1 Lily 002 16 f 2
2 Cindy 003 15 f 3
3 Petter 004 16 m 4
4 Stark 005 15 m 5

DataFrame delete operation

  • Drop the 'address' column using the drop method, and also drop rows by modifying the parameters. In addition, the entire column of data of the specified index value can be deleted through the del command (the operation cannot be recovered once the operation is performed).
  1. df3 = df1.copy()
  2. print ( 'Data before processing\n' ,df1)
  3. df3b = df3.drop([ 'name' ],axis= 1 )
  4. print ( 'Data frame after deleting columns\n' ,df3b)
  5. df3c = df3.drop([ 2 ])
  6. print ( 'Data frame after deleting rows\n' ,df3c)

Data before processing
       name no age gender
id                         
0 Tom 001 16 m
1 Lily 002 16 f
2 Cindy 003 15 f
3 Petter 004 16 m
Data frame after deleting columns
      no age gender
id                 
0 001 16 m
1 002 16 f
2 003 15 f
3 004 16 m Dataframe
after deleting rows
       name no age gender
id                         
0 Tom 001 16 m
1 Lily 002 16 f
3 Petter 004 16 m

DataFrame modification operations

  • Dataframes are merged by column (the effect is the same as adding columns)
  1. df4 = pd.DataFrame({ 'address' :[ 'school' , 'home' , 'school' , 'school' , 'home' ]})
  2. df5 = pd.concat([df2,df4],axis= 1 )
  3. print ( 'df2 before merging\n' ,df2)
  4. print ( 'df4 before merging\n' ,df4)
  5. print ( 'The merged df5\n' ,df5)

Pre-merge df2
      name no age gender new_Col
0 Tom 001 16 m 1
1 Lily 002 16 f 2
2 Cindy 003 15 f 3
3 Petter 004 16 m 4
4 Stark 005 15 m 5
Pre-merge df4
   address
0 school
1 home
2 school
3 school
4 home
merged df5
      name no age gender new_Col address
0 Tom 001 16 m 1 school
1 Lily 002 16 f 2 home
2 Cindy 003 15 f 3 school
3 Petter 004 16 m 4 school
4 Stark 005 15 m 5 home

  • The data frame is merged by row (the effect is the same as adding student information)
  1. df6 = pd.DataFrame({ 'name' :[ 'Tony' ], 'no' :[ '005' ], 'age' :[ 16 ], 'gender' :[ 'm' ]})
  2. df7 = pd.concat([df1,df6],axis= 0 )
  3. print ( 'df1 before merging\n' ,df1)
  4. print ( 'df6 before merging\n' ,df6)
  5. print ( 'The merged df7\n' ,df7)
df1 before merge
       name no age gender
id                         
0 Tom 001 16 m
1 Lily 002 16f
2 Cindy 003 15f
3 Petter 004 16 m
df6 before merge
    name no age gender
0 Tony 005 16 m
merged df7
      name no age gender
0 Tom 001 16 m
1 Lily 002 16f
2 Cindy 003 15f
3 Petter 004 16 m
0 Tony 005 16 m

DataFrame special operations

  • The time series of the data frame. Generate a series through the date_range function and join the data, such as creating a time series of 4 consecutive days starting from September 21, 2019. Use the read_csv() method in the pandas package to read the previously saved student data. After updating the data, you can see that the generated time series has been added to the data frame
  1. i1 = pd.date_range( '2019/9/21' ,periods= 4 ,freq= '7D' )
  2. df10 = pd.read_csv( 'students.csv' )
  3. df10.index = i1
  4. print (df10)

            id name no age gender
2019-09-21 0 Tom 1 16 m
2019-09-28 1 Lily 2 16 f
2019-10-05 2 Cindy 3 15 f
2019-10-12 3 Petter 4 16 m

time series query

print ( '\nThe value obtained from the time series index\n' ,df10.loc[ '2019-09-21' : '2019-09-30' ,[ 'gender' , 'age' , 'name' ]] )

The value obtained from the time series index
            gender age name
2019-09-21 m 16 Tom
2019-09-28 f 16 Lily

Convert DataFrame to other data structures

  1. print ( 'DataFrame to ndarray\n' ,df10.values,
  2. '\nDataFrame to series\n' ,df10[ 'gender' ])

DataFrame to ndarray
 [[0 'Tom' 1 16 'm']
 [1 'Lily' 2 16 'f']
 [2 'Cindy' 3 15 'f']
 [3 'Petter' 4 16 'm']] 
DataFrame Transfer series
 2019-09-21 m
2019-09-28 f
2019-10-05 f
2019-10-12 m
Freq: 7D, Name: gender, dtype: object

python native data structure

Tuple

  1. Use (), tuple() to create tuples, tuples can be empty and element types can be different;
  2. If the tuple contains only one number, a comma should be added to distinguish the operator: tup=(1,);
  3. Once a tuple is created, its elements cannot be added, deleted, or modified.

Tuple query operation

  • Tuples can use subscript indexing to access the values ​​in the tuple.
  1. tup1=( 'Google' , 'Runoob' , 1997 , 2000 )
  2. tup2=( 1 ,) #Create a single tuple of numbers
  3. print ( "tup1[0]:" ,tup1[ 0 ]) #Access the first elements in the tuple
  4. print ( "tup2[1:5]:" ,tup2[ 1 : 5 ])

tup1[0]: Google
tup2[1:5]: ()

Tuple overall delete operation

  • Using the del method can delete the specified tuple object, but cannot delete the tuple element with the specified subscript.

Tuple connection and replication

  • Although the elements in a tuple are not allowed to be modified, a new tuple can be created by concatenating and combining tuples.
  1. tup3=tup1+tup2
  2. tup4=tup2* 3 #Copy three copies

Tuple other operations

  1. len() returns the number of elements in the tuple;
  2. max()/min() returns the largest and smallest elements in the tuple elements.

Tulpe to other data structures (example)

  • Tuples can be converted to strings, lists... but a single tuple cannot be directly converted to a dictionary
  1. print ( "\nTuple to list:\n" , list (tup1),
  2. "\nTuple to string:\n" ,tup1.__str__())

List

Create a list

  1. Creation of one-dimensional lists. Use [] to create a list object, a list is an ordered collection whose elements can be added and removed at any time;
  2. Creation of multidimensional lists. Although lists are one-dimensional by default, multi-dimensional lists can be created using [] nesting.

List query operation

  1. list[a:b] returns a list object with elements a to b-1 in the list;
  2. list[::a] returns a list object starting from the first element of the list with a step size of a;
  3. list[i] returns the element whose index is i in the list. If i is negative, the i-th element is accessed from the end of the list to the front.

List addition operation

  • append() can add new items at the end of the list, add an element, or add a list object to become a multidimensional list.

List delete operation

  1. The remove() function can delete the element with the specified value, list.remove(i) will delete the element with the value i in the list object, and an error will be reported if it does not exist;
  2. The pop() function can delete the element with the specified subscript. The default is the last element of the list object. list.pop(i) will delete the element with the subscript i.

List modification operation

  • list[i]=x can directly replace the element with the specified subscript in the list

List other operations

  1. The reverse() function can reverse the list;
  2. The len() function can return the number of elements in the list;
  3. The sort() function sorts the list elements in ascending order.

Convert List to other data structures

  • Lists can be easily converted to various data types; note that a single list cannot be converted to a dictionary.

Collection (Set)

Create Set

  • There will be no duplicate values ​​in the set. All elements are arranged in a certain order. If the elements are numbers, they are arranged by the size of the numbers. Using the set() function to create a set will automatically split a string composed of multiple letters.
  1. myset = set ( 'aabc' ) #Using the set() function to create a set will automatically split a string of multiple letters
  2. print (myset)
  3. myset1 = set (( 'hello' , 'world' ))
  4. print (myset1)

{'a', 'c', 'b'}
{'hello', 'world'}

Set query operation

  • Use in to determine whether a is in the set, if it exists, it is true, otherwise it is false.
'a'  in myset

Set increase operation

  1. The add() function can add new elements to the collection object. If the element already exists, it has no effect;
  2. Use update to indicate that additions (not modifications) are added one by one, and added to the collection in order.
  1. myset.add( 'ghk' )
  2. myset.update( 'tyu' ) #Add element by element
  3. print (myset)

{'t', 'b', 'a', 'ghk', 'c', 'y', 'u'}

Set delete operation

  1. The remove() function can delete elements in the collection, and an error will be reported if the element does not exist;
  2. The discard() function can delete the specified element in the collection, and the element does not exist without reporting an error;
  3. The pop() function can randomly remove an element in the collection (remove the last element in interactive mode);
  4. The clear() function clears the collection.

Set other operations

  • The len() function can query the length of the collection;
  • copy() can copy the elements of a collection and generate a new collection
  1. copy_myset=myset.copy()
  2. print ( '\nlen() returns the length of the set:' , len (myset),
  3. '\nThe set generated by copy():' ,copy_myset)

len() returns the length of the set: 7 
copy() generates the set: {'a', 'c', 'u', 't', 'ghk', 'b', 'y'}

  • Operations on sets. First establish two sets for operation. In set operation, '-' means difference, '&' means sum, '|' means union, '^' means union minus intersection of two sets
  1. a = set ( 'apple' )
  2. b = set ( 'banana' )
  3. print ( '\nDifference:' ,ab,
  4. '\nUnion:' ,a|b,
  5. '\nIntersection:' ,a&b,
  6. '\nSeek their unique:' ,a^b)

Difference: {'e', 'p', 'l'} 
Union: {'p', 'n', 'l', 'a', 'b', 'e'} 
Intersection: { 'a'} 
find each unique: {'n', 'p', 'l', 'b', 'e'}

Dictionary

Create Dict

  • Generate a dictionary and a list of dictionaries containing three dictionary objects. (Dictionaries are nested in the list, students is actually a list, and the elements in students are dictionaries)
  1. dict1={ "ID" : "L100" , "Name" : "COCO" }
  2. students = [{ 'name' : 'n1' , 'id' : '001' },{ 'name' : 'n2' , 'id' : '002' },{ 'name' : 'n3' , 'id ' : '003' }]
  3. print ( "Display the data structure type" , type (dict1))
  4. print (dict1)

Show the data structure type <class 'dict'>
{'ID': 'L100', 'Name': 'COCO'}

  • Use the zip method to create a dictionary. The zip() method returns a list of tuples, which can be used to quickly build dictionaries.
  1. demo_dict = dict ( zip ( 'abc' , '123' ))
  2. print (demo_dict)

{'a': '1', 'b': '2', 'c': '3'}

Dict query operation

  • Find the student number of the first student (display the value of the id key of the first dictionary element); in addition, you can use the get(key, default=None) method to get the value of the specified key.
  1. print ( 'General query:' ,students[ 0 ][ 'id' ])
  2. print ( 'Query by key:' ,students[ 0 ].get( 'id' ))

General query: 001
Query by key: 001

Dict increase operation

  • Add a student's information (adding a row is actually adding an element in the list), and then adding a student information subject (adding a column is actually adding a key-value pair in the dictionary)
  1. students.append({ 'name' : 'n4' , 'id' : '004' })
  2. print ( 'After adding a dictionary object:' ,students)
  3. students[ 0 ][ 'school' ]= 'school1'
  4. students[ 1 ][ 'school' ]= 'school2'
  5. students[ 2 ][ 'school' ]= 'school2'
  6. print ( 'Dictionary after adding key-value pairs:' ,students)

After adding a dictionary object: [{'name': 'n1', 'id': '001'}, {'name': 'n2', 'id': '002'}, {'name': 'n3 ', 'id': '003'}, {'name': 'n4', 'id': '004'}]
The dictionary after adding key-value pairs: [{'name': 'n1', 'id' : '001', 'school': 'school1'}, {'name': 'n2', 'id': '002', 'school': 'school2'}, {'name': 'n3', ' id': '003', 'school': 'school2'}, {'name': 'n4', 'id': '004'}]

Dict delete operation

  • Use del to delete a student's information (deleting a row, in fact, deletes an element in the list). Then use pop to delete the student number of the first student (deleting a column in a row is actually deleting a key-value pair in the dictionary)
  1. del students[ 3 ] #Delete row 4 (subscript 3)
  2. print ( 'After deleting a dictionary object in the list:\n' ,students)
  3. students[ 0 ].pop( 'id' )
  4. print ( 'After deleting a key-value pair:\n' ,students)

After deleting a dictionary object in the list
 [{'name': 'n1', 'id': '001', 'school': 'school1'}, {'name': 'n2', 'id': '002 ', 'school': 'school2'}, {'name': 'n3', 'id': '003', 'school': 'school2'}]
After deleting a key-value pair
 [{'name': ' n1', 'school': 'school1'}, {'name': 'n2', 'id': '002', 'school': 'school2'}, {'name': 'n3', 'id' : '003', 'school': 'school2'}]

  • Delete the student numbers of all students (deleting a column is actually deleting a key-value pair in all dictionaries)
  1. for i in range ( 0 , len (students)):
  2. students[i].pop( 'school' )
  3. print (students)

[{'name': 'n1'}, {'name': 'n2', 'id': '002'}, {'name': 'n3', 'id': '003'}]

Dict modification operation

  • Add (change) the student number of the first student (add/change key-value pair in the first dictionary element of the list)
  1. students[ 0 ].update({ 'id' : '001' })
  2. print ( '\nUpdated dictionary\n' ,students)

Updated dictionary
 [{'name': 'n1', 'id': '001'}, {'name': 'n2', 'id': '002'}, {'name': 'n3', 'id': '003'}]

Dict to other data structures

  • Dictionary keys and values ​​can be individually converted to lists
  1. print ( "Dictionary value to List:" , list (demo_dict.values()))
  2. print ( "dictionary keys to List:" , list (demo_dict.keys()))

Dictionary value to List: ['1', '2', '3']
Dictionary key to List: ['a', 'b', 'c']

 

Tags: 8 classic data structures commonly used in python

Python Data Analysis Notes (MOOC) Learning about data analysis tools Big Data python

Related: 8 classic data structures commonly used in python