Pandas CSV File
CSV (Comma-Separated Values, sometimes also referred to as Character-Separated Values, as the separator can be any character) stores tabular data (numbers and text) in plain text format.
CSV is a common, relatively simple file format widely used by users, businesses, and scientists.
Pandas can conveniently handle CSV files. This article uses nba.csv as an example. You can download nba.csv or open nba.csv to view it.
Example
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.to_string())
The to_string()
function is used to return DataFrame data. If this function is not used, the output will show the first 5 rows and the last 5 rows, with the middle part replaced by ...
.
Example
import pandas as pd
df = pd.read_csv('nba.csv')
print(df)
Name Team Number Position Age Height Weight College Salary
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
.. ... ... ... ... ... ... ... ... ...
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN
We can also use the to_csv()
method to save the DataFrame as a CSV file:
Example
import pandas as pd
# Three fields: name, site, age
nme = ["Google", "tutorialpro", "Taobao", "Wiki"]
st = ["www.google.com", "www.tutorialpro.org", "www.taobao.com", "www.wikipedia.org"] ag = [90, 40, 80, 98]
Dictionary
dict = {'name': nme, 'site': st, 'age': ag}
df = pd.DataFrame(dict)
Save dataframe
df.to_csv('site.csv')
After successful execution, we open the site.csv file, and the results are displayed as follows:
---
## Data Processing
### head()
The `head(n)` method is used to read the first n rows. If the parameter n is not specified, it defaults to returning 5 rows.
## Example - Read the first 5 rows
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.head())
The output is:
Name Team Number Position Age Height Weight College Salary 0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN 3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
## Example - Read the first 10 rows
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.head(10))
The output is:
Name Team Number Position Age Height Weight College Salary 0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN 3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0 5 Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0 NaN 12000000.0 6 Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0
8 Terry Rozier Boston Celtics 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0
9 Marcus Smart Boston Celtics 36.0 PG 22.0 6-4 220.0 Oklahoma State 3431040.0
### tail()
The `tail(n)` method is used to read the last n rows. If the parameter n is not specified, it defaults to returning 5 rows, and empty lines return **NaN** for each field value.
## Example - Reading the last 5 rows
```python
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.tail())
Output:
Name Team Number Position Age Height Weight College Salary
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Example - Reading the last 10 rows
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.tail(10))
Output:
Name Team Number Position Age Height Weight College Salary
448 Gordon Hayward Utah Jazz 20.0 SF 26.0 6-8 226.0 Butler 15409570.0
449 Rodney Hood Utah Jazz 5.0 SG 23.0 6-8 206.0 Duke 1348440.0
450 Joe Ingles Utah Jazz 2.0 SF 28.0 6-8 226.0 NaN 2050000.0
451 Chris Johnson Utah Jazz 23.0 SF 26.0 6-6 206.0 Dayton 981348.0
452 Trey Lyles Utah Jazz 41.0 PF 20.0 6-10 234.0 Kentucky 2239800.0
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN
info()
The info() method returns some basic information about the table:
Example
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.info())
import pandas as pd
df = pd.read_csv('nba.csv')
print(df.info())
Output result:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457 # Number of rows, 458 rows, first row number is 0
Data columns (total 9 columns): # Number of columns, 9 columns
# Column Non-Null Count Dtype # Data type of each column
--- ------ -------------- -----
0 Name 457 non-null object
1 Team 457 non-null object
2 Number 457 non-null float64
3 Position 457 non-null object
4 Age 457 non-null float64
5 Height 457 non-null object
6 Weight 457 non-null float64
7 College 373 non-null object # non-null means non-empty data
8 Salary 446 non-null float64
dtypes: float64(4), object(5) # Types
non-null indicates non-empty data. From the information above, we can see that there are a total of 458 rows, and the College field has the most null values.