❮ Home Pandas Series ❯

Pandas CSV File

CSV (Comma-Separated Values, sometimes also referred to as Character-Separated Values, as the separator can be any character) stores tabular data (numbers and text) in plain text format.

CSV is a common, relatively simple file format widely used by users, businesses, and scientists.

Pandas can conveniently handle CSV files. This article uses nba.csv as an example. You can download nba.csv or open nba.csv to view it.

Example

import pandas as pd

df = pd.read_csv('nba.csv')

print(df.to_string())

The to_string() function is used to return DataFrame data. If this function is not used, the output will show the first 5 rows and the last 5 rows, with the middle part replaced by ....

Example

import pandas as pd

df = pd.read_csv('nba.csv')

print(df)

Name            Team  Number Position   Age Height  Weight            College     Salary
0    Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0              Texas  7730337.0
1      Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0          Marquette  6796117.0
2     John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0  Boston University        NaN
3      R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0      Georgia State  1148640.0
4    Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0                NaN  5000000.0
..             ...             ...     ...      ...   ...    ...     ...                ...        ...
453   Shelvin Mack       Utah Jazz     8.0       PG  26.0    6-3   203.0             Butler  2433333.0
454      Raul Neto       Utah Jazz    25.0       PG  24.0    6-1   179.0                NaN   900000.0
455   Tibor Pleiss       Utah Jazz    21.0        C  26.0    7-3   256.0                NaN  2900000.0
456    Jeff Withey       Utah Jazz    24.0        C  26.0    7-0   231.0             Kansas   947276.0
457            NaN             NaN     NaN      NaN   NaN    NaN     NaN                NaN        NaN

We can also use the to_csv() method to save the DataFrame as a CSV file:

Example

import pandas as pd 
   
# Three fields: name, site, age
nme = ["Google", "tutorialpro", "Taobao", "Wiki"]

st = ["www.google.com", "www.tutorialpro.org", "www.taobao.com", "www.wikipedia.org"] ag = [90, 40, 80, 98]

Dictionary

dict = {'name': nme, 'site': st, 'age': ag}

df = pd.DataFrame(dict)

Save dataframe

df.to_csv('site.csv')


After successful execution, we open the site.csv file, and the results are displayed as follows:

---

## Data Processing

### head()

The `head(n)` method is used to read the first n rows. If the parameter n is not specified, it defaults to returning 5 rows.

## Example - Read the first 5 rows

import pandas as pd

df = pd.read_csv('nba.csv')

print(df.head())


The output is:


## Example - Read the first 10 rows

import pandas as pd

df = pd.read_csv('nba.csv')

print(df.head(10))


The output is:

Name Team Number Position Age Height Weight College Salary 0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0 1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0 2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN 3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0 4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0 5 Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0 NaN 12000000.0 6 Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0 LSU 1170960.0 7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0

8 Terry Rozier Boston Celtics 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0
9 Marcus Smart Boston Celtics 36.0 PG 22.0 6-4 220.0 Oklahoma State 3431040.0

### tail()

The `tail(n)` method is used to read the last n rows. If the parameter n is not specified, it defaults to returning 5 rows, and empty lines return **NaN** for each field value.

## Example - Reading the last 5 rows

```python
import pandas as pd

df = pd.read_csv('nba.csv')

print(df.tail())

Output:

Name Team Number Position Age Height Weight College Salary
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN

Example - Reading the last 10 rows

import pandas as pd

df = pd.read_csv('nba.csv')

print(df.tail(10))

Output:

Name Team Number Position Age Height Weight College Salary
448 Gordon Hayward Utah Jazz 20.0 SF 26.0 6-8 226.0 Butler 15409570.0
449 Rodney Hood Utah Jazz 5.0 SG 23.0 6-8 206.0 Duke 1348440.0
450 Joe Ingles Utah Jazz 2.0 SF 28.0 6-8 226.0 NaN 2050000.0
451 Chris Johnson Utah Jazz 23.0 SF 26.0 6-6 206.0 Dayton 981348.0
452 Trey Lyles Utah Jazz 41.0 PF 20.0 6-10 234.0 Kentucky 2239800.0
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
457 NaN NaN NaN NaN NaN NaN NaN NaN NaN

info()

The info() method returns some basic information about the table:

Example

import pandas as pd

df = pd.read_csv('nba.csv')

print(df.info())

import pandas as pd

df = pd.read_csv('nba.csv')

print(df.info())

Output result:

&lt;class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457          # Number of rows, 458 rows, first row number is 0
Data columns (total 9 columns):            # Number of columns, 9 columns
 #   Column    Non-Null Count  Dtype       # Data type of each column
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   373 non-null    object         # non-null means non-empty data    
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)                 # Types

non-null indicates non-empty data. From the information above, we can see that there are a total of 458 rows, and the College field has the most null values.

❮ Home Pandas Series ❯