Sentry Now This tutorial has a related video course created by the Existent Python team. Watch it together with the written tutorial to deepen your understanding: Reading and Writing Files With Pandas

Pandas is a powerful and flexible Python parcel that allows you to piece of work with labeled and time series data. It likewise provides statistics methods, enables plotting, and more than. One crucial feature of Pandas is its ability to write and read Excel, CSV, and many other types of files. Functions like the Pandas read_csv() method enable y'all to piece of work with files effectively. You can use them to salve the data and labels from Pandas objects to a file and load them later as Pandas Series or DataFrame instances.

In this tutorial, you'll learn:

  • What the Pandas IO tools API is
  • How to read and write data to and from files
  • How to work with various file formats
  • How to work with big data efficiently

Let's start reading and writing files!

Installing Pandas

The code in this tutorial is executed with CPython 3.7.iv and Pandas 0.25.i. It would be beneficial to make sure yous accept the latest versions of Python and Pandas on your machine. You might want to create a new virtual surroundings and install the dependencies for this tutorial.

First, you'll need the Pandas library. You may already have it installed. If you don't, so yous can install it with pip:

Once the installation process completes, you should have Pandas installed and set up.

Anaconda is an excellent Python distribution that comes with Python, many useful packages similar Pandas, and a parcel and environment managing director called Conda. To acquire more almost Anaconda, check out Setting Upwards Python for Machine Learning on Windows.

If you don't accept Pandas in your virtual surround, and so you tin can install information technology with Conda:

Conda is powerful as information technology manages the dependencies and their versions. To learn more about working with Conda, you lot tin can check out the official documentation.

Preparing Data

In this tutorial, you'll use the data related to 20 countries. Here'south an overview of the data and sources you'll be working with:

  • Country is denoted by the country proper noun. Each country is in the superlative x list for either population, area, or gross domestic product (GDP). The row labels for the dataset are the three-letter of the alphabet country codes defined in ISO 3166-1. The column label for the dataset is Country.

  • Population is expressed in millions. The information comes from a listing of countries and dependencies by population on Wikipedia. The column characterization for the dataset is POP.

  • Area is expressed in thousands of kilometers squared. The data comes from a list of countries and dependencies by area on Wikipedia. The cavalcade label for the dataset is AREA.

  • Gross domestic product is expressed in millions of U.S. dollars, according to the United nations data for 2017. Yous can find this data in the list of countries by nominal GDP on Wikipedia. The column characterization for the dataset is Gross domestic product.

  • Continent is either Africa, Asia, Oceania, Europe, North America, or South America. Y'all can find this information on Wikipedia also. The column label for the dataset is CONT.

  • Independence twenty-four hours is a appointment that commemorates a nation'south independence. The information comes from the list of national independence days on Wikipedia. The dates are shown in ISO 8601 format. The first four digits stand for the year, the next two numbers are the month, and the concluding two are for the day of the month. The column label for the dataset is IND_DAY.

This is how the data looks as a table:

Country POP Surface area Gdp CONT IND_DAY
CHN People's republic of china 1398.72 9596.96 12234.78 Asia
IND India 1351.16 3287.26 2575.67 Asia 1947-08-15
USA US 329.74 9833.52 19485.39 N.America 1776-07-04
IDN Indonesia 268.07 1910.93 1015.54 Asia 1945-08-17
BRA Brazil 210.32 8515.77 2055.51 South.America 1822-09-07
PAK Pakistan 205.71 881.91 302.14 Asia 1947-08-fourteen
NGA Nigeria 200.96 923.77 375.77 Africa 1960-ten-01
BGD People's republic of bangladesh 167.09 147.57 245.63 Asia 1971-03-26
RUS Russia 146.79 17098.25 1530.75 1992-06-12
MEX Mexico 126.58 1964.38 1158.23 N.America 1810-09-16
JPN Nippon 126.22 377.97 4872.42 Asia
DEU Frg 83.02 357.xi 3693.twenty Europe
FRA French republic 67.02 640.68 2582.49 Europe 1789-07-14
GBR UK 66.44 242.l 2631.23 Europe
ITA Italia 60.36 301.34 1943.84 Europe
ARG Argentina 44.94 2780.40 637.49 South.America 1816-07-09
DZA People's democratic republic of algeria 43.38 2381.74 167.56 Africa 1962-07-05
Tin Canada 37.59 9984.67 1647.12 N.America 1867-07-01
AUS Australia 25.47 7692.02 1408.68 Oceania
KAZ Republic of kazakhstan xviii.53 2724.90 159.41 Asia 1991-12-16

You may notice that some of the information is missing. For case, the continent for Russian federation is not specified considering information technology spreads across both Europe and Asia. At that place are also several missing independence days considering the data source omits them.

Y'all tin organize this data in Python using a nested dictionary:

                                            data                =                {                'CHN'                :                {                'COUNTRY'                :                'Communist china'                ,                'Pop'                :                1_398.72                ,                'AREA'                :                9_596.96                ,                'GDP'                :                12_234.78                ,                'CONT'                :                'Asia'                },                'IND'                :                {                'State'                :                'Bharat'                ,                'Pop'                :                1_351.16                ,                'AREA'                :                3_287.26                ,                'Gross domestic product'                :                2_575.67                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1947-08-15'                },                'Us'                :                {                'COUNTRY'                :                'United states'                ,                'Pop'                :                329.74                ,                'Surface area'                :                9_833.52                ,                'Gdp'                :                19_485.39                ,                'CONT'                :                'N.America'                ,                'IND_DAY'                :                '1776-07-04'                },                'IDN'                :                {                'Land'                :                'Indonesia'                ,                'Popular'                :                268.07                ,                'AREA'                :                1_910.93                ,                'GDP'                :                1_015.54                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1945-08-17'                },                'BRA'                :                {                'COUNTRY'                :                'Brazil'                ,                'Pop'                :                210.32                ,                'AREA'                :                8_515.77                ,                'GDP'                :                2_055.51                ,                'CONT'                :                'S.America'                ,                'IND_DAY'                :                '1822-09-07'                },                'PAK'                :                {                'Country'                :                'Pakistan'                ,                'Pop'                :                205.71                ,                'Expanse'                :                881.91                ,                'GDP'                :                302.14                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1947-08-fourteen'                },                'NGA'                :                {                'State'                :                'Nigeria'                ,                'Popular'                :                200.96                ,                'AREA'                :                923.77                ,                'GDP'                :                375.77                ,                'CONT'                :                'Africa'                ,                'IND_DAY'                :                '1960-ten-01'                },                'BGD'                :                {                'Land'                :                'Bangladesh'                ,                'POP'                :                167.09                ,                'AREA'                :                147.57                ,                'GDP'                :                245.63                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1971-03-26'                },                'RUS'                :                {                'State'                :                'Russian federation'                ,                'Popular'                :                146.79                ,                'AREA'                :                17_098.25                ,                'Gdp'                :                1_530.75                ,                'IND_DAY'                :                '1992-06-12'                },                'MEX'                :                {                'State'                :                'Mexico'                ,                'POP'                :                126.58                ,                'Expanse'                :                1_964.38                ,                'GDP'                :                1_158.23                ,                'CONT'                :                'Northward.America'                ,                'IND_DAY'                :                '1810-09-sixteen'                },                'JPN'                :                {                'COUNTRY'                :                'Japan'                ,                'POP'                :                126.22                ,                'AREA'                :                377.97                ,                'GDP'                :                4_872.42                ,                'CONT'                :                'Asia'                },                'DEU'                :                {                'State'                :                'Deutschland'                ,                'Pop'                :                83.02                ,                'AREA'                :                357.11                ,                'Gross domestic product'                :                3_693.20                ,                'CONT'                :                'Europe'                },                'FRA'                :                {                'COUNTRY'                :                'France'                ,                'POP'                :                67.02                ,                'AREA'                :                640.68                ,                'Gross domestic product'                :                2_582.49                ,                'CONT'                :                'Europe'                ,                'IND_DAY'                :                '1789-07-14'                },                'GBR'                :                {                'COUNTRY'                :                'UK'                ,                'POP'                :                66.44                ,                'AREA'                :                242.l                ,                'GDP'                :                2_631.23                ,                'CONT'                :                'Europe'                },                'ITA'                :                {                'Country'                :                'Italy'                ,                'Pop'                :                60.36                ,                'AREA'                :                301.34                ,                'GDP'                :                1_943.84                ,                'CONT'                :                'Europe'                },                'ARG'                :                {                'State'                :                'Argentine republic'                ,                'Popular'                :                44.94                ,                'AREA'                :                2_780.forty                ,                'Gross domestic product'                :                637.49                ,                'CONT'                :                'Southward.America'                ,                'IND_DAY'                :                '1816-07-09'                },                'DZA'                :                {                'COUNTRY'                :                'Algeria'                ,                'Popular'                :                43.38                ,                'AREA'                :                2_381.74                ,                'GDP'                :                167.56                ,                'CONT'                :                'Africa'                ,                'IND_DAY'                :                '1962-07-05'                },                'Tin can'                :                {                'COUNTRY'                :                'Canada'                ,                'Popular'                :                37.59                ,                'Area'                :                9_984.67                ,                'GDP'                :                1_647.12                ,                'CONT'                :                'N.America'                ,                'IND_DAY'                :                '1867-07-01'                },                'AUS'                :                {                'COUNTRY'                :                'Australia'                ,                'Popular'                :                25.47                ,                'Expanse'                :                7_692.02                ,                'Gross domestic product'                :                1_408.68                ,                'CONT'                :                'Oceania'                },                'KAZ'                :                {                'COUNTRY'                :                'Kazakhstan'                ,                'POP'                :                18.53                ,                'AREA'                :                2_724.xc                ,                'GDP'                :                159.41                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1991-12-16'                }                }                columns                =                (                'Land'                ,                'POP'                ,                'AREA'                ,                'Gross domestic product'                ,                'CONT'                ,                'IND_DAY'                )                          

Each row of the table is written as an inner dictionary whose keys are the column names and values are the corresponding data. These dictionaries are and then nerveless as the values in the outer information dictionary. The respective keys for data are the three-letter country codes.

You can use this information to create an example of a Pandas DataFrame. Kickoff, y'all need to import Pandas:

>>>

                                            >>>                                import                pandas                as                pd                          

At present that you lot have Pandas imported, you lot tin use the DataFrame constructor and data to create a DataFrame object.

information is organized in such a way that the country codes represent to columns. You can reverse the rows and columns of a DataFrame with the belongings .T:

>>>

                                            >>>                                df                =                pd                .                DataFrame                (                data                =                data                )                .                T                >>>                                df                                  Country      POP     Surface area      GDP       CONT     IND_DAY                CHN       Red china  1398.72  9596.96  12234.8       Asia         NaN                IND       Republic of india  1351.16  3287.26  2575.67       Asia  1947-08-15                USA          U.s.a.   329.74  9833.52  19485.4  N.America  1776-07-04                IDN   Republic of indonesia   268.07  1910.93  1015.54       Asia  1945-08-17                BRA      Brazil   210.32  8515.77  2055.51  S.America  1822-09-07                PAK    Pakistan   205.71   881.91   302.14       Asia  1947-08-14                NGA     Nigeria   200.96   923.77   375.77     Africa  1960-10-01                BGD  Bangladesh   167.09   147.57   245.63       Asia  1971-03-26                RUS      Russia   146.79  17098.2  1530.75        NaN  1992-06-12                MEX      Mexico   126.58  1964.38  1158.23  N.America  1810-09-16                JPN       Nihon   126.22   377.97  4872.42       Asia         NaN                DEU     Germany    83.02   357.11   3693.2     Europe         NaN                FRA      France    67.02   640.68  2582.49     Europe  1789-07-14                GBR          UK    66.44    242.5  2631.23     Europe         NaN                ITA       Italian republic    60.36   301.34  1943.84     Europe         NaN                ARG   Argentine republic    44.94   2780.4   637.49  S.America  1816-07-09                DZA     Algeria    43.38  2381.74   167.56     Africa  1962-07-05                CAN      Canada    37.59  9984.67  1647.12  N.America  1867-07-01                AUS   Australia    25.47  7692.02  1408.68    Oceania         NaN                KAZ  Kazakhstan    xviii.53   2724.9   159.41       Asia  1991-12-sixteen                          

Now yous have your DataFrame object populated with the data almost each country.

Versions of Python older than 3.six did not guarantee the lodge of keys in dictionaries. To ensure the lodge of columns is maintained for older versions of Python and Pandas, you can specify alphabetize=columns:

>>>

                                            >>>                                df                =                pd                .                DataFrame                (                data                =                data                ,                index                =                columns                )                .                T                          

At present that you've prepared your information, you're set to start working with files!

Using the Pandas read_csv() and .to_csv() Functions

A comma-separated values (CSV) file is a plaintext file with a .csv extension that holds tabular information. This is ane of the most popular file formats for storing big amounts of data. Each row of the CSV file represents a unmarried table row. The values in the same row are past default separated with commas, but you could change the separator to a semicolon, tab, space, or another character.

Write a CSV File

You can relieve your Pandas DataFrame as a CSV file with .to_csv():

>>>

                                                  >>>                                    df                  .                  to_csv                  (                  'data.csv'                  )                              

That'southward it! You've created the file data.csv in your current working directory. You lot tin expand the code block below to run into how your CSV file should wait:

                                ,COUNTRY,Popular,AREA,GDP,CONT,IND_DAY CHN,Prc,1398.72,9596.96,12234.78,Asia, IND,Bharat,1351.16,3287.26,2575.67,Asia,1947-08-xv The states,US,329.74,9833.52,19485.39,Due north.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.14,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,,1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16 JPN,Nihon,126.22,377.97,4872.42,Asia, DEU,Germany,83.02,357.11,3693.ii,Europe, FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,UK,66.44,242.5,2631.23,Europe, ITA,Italy,60.36,301.34,1943.84,Europe, ARG,Argentine republic,44.94,2780.4,637.49,S.America,1816-07-09 DZA,Algeria,43.38,2381.74,167.56,Africa,1962-07-05 CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,1991-12-16                              

This text file contains the data separated with commas. The commencement column contains the row labels. In some cases, you'll find them irrelevant. If you don't want to continue them, and so you can pass the argument index=False to .to_csv().

Read a CSV File

Once your information is saved in a CSV file, you'll likely want to load and use it from time to time. You can do that with the Pandas read_csv() role:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  )                  >>>                                    df                                      State      POP      AREA       Gdp       CONT     IND_DAY                  CHN       Cathay  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.sixteen   3287.26   2575.67       Asia  1947-08-15                  The states          United states   329.74   9833.52  19485.39  Northward.America  1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Due south.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.xiv       Asia  1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-xvi                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Germany    83.02    357.eleven   3693.twenty     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-14                  GBR          UK    66.44    242.50   2631.23     Europe         NaN                  ITA       Italian republic    lx.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.xl    637.49  S.America  1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa  1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  N.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Republic of kazakhstan    eighteen.53   2724.ninety    159.41       Asia  1991-12-16                              

In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data.csv, which you specified with the offset argument. This cord tin be any valid path, including URLs.

The parameter index_col specifies the cavalcade from the CSV file that contains the row labels. You assign a zero-based column index to this parameter. You should determine the value of index_col when the CSV file contains the row labels to avert loading them as data.

Yous'll learn more than about using Pandas with CSV files later on in this tutorial. Y'all tin also check out Reading and Writing CSV Files in Python to see how to handle CSV files with the congenital-in Python library csv likewise.

Using Pandas to Write and Read Excel Files

Microsoft Excel is probably the nigh widely-used spreadsheet software. While older versions used binary .xls files, Excel 2007 introduced the new XML-based .xlsx file. You tin can read and write Excel files in Pandas, similar to CSV files. Nevertheless, yous'll need to install the following Python packages first:

  • xlwt to write to .xls files
  • openpyxl or XlsxWriter to write to .xlsx files
  • xlrd to read Excel files

You can install them using pip with a single command:

                                            $                pip install xlwt openpyxl xlsxwriter xlrd                          

Yous can also use Conda:

                                            $                conda install xlwt openpyxl xlsxwriter xlrd                          

Please annotation that y'all don't have to install all these packages. For example, you don't need both openpyxl and XlsxWriter. If you're going to work merely with .xls files, so you don't need whatever of them! However, if you intend to piece of work simply with .xlsx files, and then you lot're going to need at least one of them, simply non xlwt. Take some time to decide which packages are right for your project.

Write an Excel File

Once you have those packages installed, you tin save your DataFrame in an Excel file with .to_excel():

>>>

                                                  >>>                                    df                  .                  to_excel                  (                  'information.xlsx'                  )                              

The argument 'data.xlsx' represents the target file and, optionally, its path. The above statement should create the file information.xlsx in your current working directory. That file should look like this:

mmst-pandas-rw-files-excel

The outset column of the file contains the labels of the rows, while the other columns store data.

Read an Excel File

You can load data from Excel files with read_excel():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'data.xlsx'                  ,                  index_col                  =                  0                  )                  >>>                                    df                                      COUNTRY      Pop      AREA       GDP       CONT     IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.16   3287.26   2575.67       Asia  1947-08-15                  United states of america          The states   329.74   9833.52  19485.39  N.America  1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Southward.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia  1947-08-xiv                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-ten-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-16                  JPN       Nippon   126.22    377.97   4872.42       Asia         NaN                  DEU     Germany    83.02    357.xi   3693.20     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-fourteen                  GBR          U.k.    66.44    242.50   2631.23     Europe         NaN                  ITA       Italian republic    sixty.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.40    637.49  S.America  1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa  1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Due north.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia  1991-12-sixteen                              

read_excel() returns a new DataFrame that contains the values from data.xlsx. Y'all can also employ read_excel() with OpenDocument spreadsheets, or .ods files.

You'll learn more about working with Excel files afterward on in this tutorial. Yous tin can as well check out Using Pandas to Read Large Excel Files in Python.

Understanding the Pandas IO API

Pandas IO Tools is the API that allows you to save the contents of Series and DataFrame objects to the clipboard, objects, or files of various types. It also enables loading data from the clipboard, objects, or files.

Write Files

Serial and DataFrame objects have methods that enable writing data and labels to the clipboard or files. They're named with the pattern .to_<file-blazon>() , where <file-type> is the blazon of the target file.

Yous've learned about .to_csv() and .to_excel(), but there are others, including:

  • .to_json()
  • .to_html()
  • .to_sql()
  • .to_pickle()

At that place are nonetheless more file types that you tin write to, so this list is not exhaustive.

These methods have parameters specifying the target file path where you saved the data and labels. This is mandatory in some cases and optional in others. If this option is available and you choose to omit it, then the methods return the objects (like strings or iterables) with the contents of DataFrame instances.

The optional parameter compression decides how to compress the file with the data and labels. You'll learn more about information technology afterwards on. There are a few other parameters, but they're mostly specific to one or several methods. Y'all won't go into them in detail here.

Read Files

Pandas functions for reading the contents of files are named using the pattern .read_<file-blazon>() , where <file-type> indicates the blazon of the file to read. Y'all've already seen the Pandas read_csv() and read_excel() functions. Here are a few others:

  • read_json()
  • read_html()
  • read_sql()
  • read_pickle()

These functions take a parameter that specifies the target file path. It can be any valid cord that represents the path, either on a local machine or in a URL. Other objects are also acceptable depending on the file blazon.

The optional parameter compression determines the type of decompression to use for the compressed files. You'll acquire near it subsequently on in this tutorial. There are other parameters, but they're specific to i or several functions. You won't go into them in detail here.

Working With Different File Types

The Pandas library offers a wide range of possibilities for saving your data to files and loading data from files. In this department, you lot'll learn more than about working with CSV and Excel files. You'll as well see how to use other types of files, like JSON, web pages, databases, and Python pickle files.

CSV Files

You lot've already learned how to read and write CSV files. At present let'southward dig a little deeper into the details. When you use .to_csv() to relieve your DataFrame, you lot can provide an statement for the parameter path_or_buf to specify the path, name, and extension of the target file.

path_or_buf is the first argument .to_csv() will go. Information technology can exist any string that represents a valid file path that includes the file name and its extension. You lot've seen this in a previous example. However, if yous omit path_or_buf, then .to_csv() won't create whatsoever files. Instead, information technology'll return the respective string:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    s                  =                  df                  .                  to_csv                  ()                  >>>                                    impress                  (                  south                  )                  ,State,POP,AREA,GDP,CONT,IND_DAY                  CHN,China,1398.72,9596.96,12234.78,Asia,                  IND,India,1351.16,3287.26,2575.67,Asia,1947-08-xv                  USA,US,329.74,9833.52,19485.39,N.America,1776-07-04                  IDN,Republic of indonesia,268.07,1910.93,1015.54,Asia,1945-08-17                  BRA,Brazil,210.32,8515.77,2055.51,Due south.America,1822-09-07                  PAK,Islamic republic of pakistan,205.71,881.91,302.14,Asia,1947-08-14                  NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01                  BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26                  RUS,Russia,146.79,17098.25,1530.75,,1992-06-12                  MEX,Mexico,126.58,1964.38,1158.23,Northward.America,1810-09-xvi                  JPN,Japan,126.22,377.97,4872.42,Asia,                  DEU,Federal republic of germany,83.02,357.11,3693.two,Europe,                  FRA,France,67.02,640.68,2582.49,Europe,1789-07-14                  GBR,Uk,66.44,242.5,2631.23,Europe,                  ITA,Italia,lx.36,301.34,1943.84,Europe,                  ARG,Argentina,44.94,2780.4,637.49,Due south.America,1816-07-09                  DZA,Algeria,43.38,2381.74,167.56,Africa,1962-07-05                  CAN,Canada,37.59,9984.67,1647.12,North.America,1867-07-01                  AUS,Australia,25.47,7692.02,1408.68,Oceania,                  KAZ,Republic of kazakhstan,18.53,2724.9,159.41,Asia,1991-12-16                              

Now you have the string s instead of a CSV file. Yous likewise have some missing values in your DataFrame object. For instance, the continent for Russia and the independence days for several countries (China, Japan, and and then on) are not available. In data science and machine learning, you must handle missing values carefully. Pandas excels here! By default, Pandas uses the NaN value to replace the missing values.

The continent that corresponds to Russia in df is nan:

>>>

                                                  >>>                                    df                  .                  loc                  [                  'RUS'                  ,                  'CONT'                  ]                  nan                              

This example uses .loc[] to get data with the specified row and column names.

When yous salvage your DataFrame to a CSV file, empty strings ('') will stand for the missing data. You tin run across this both in your file data.csv and in the string s. If you lot want to alter this behavior, then utilize the optional parameter na_rep:

>>>

                                                  >>>                                    df                  .                  to_csv                  (                  'new-information.csv'                  ,                  na_rep                  =                  '(missing)'                  )                              

This code produces the file new-data.csv where the missing values are no longer empty strings. You tin expand the code block below to run across how this file should look:

                                ,Land,Pop,AREA,Gross domestic product,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia,(missing) IND,India,1351.sixteen,3287.26,2575.67,Asia,1947-08-xv Us,The states,329.74,9833.52,19485.39,Northward.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,South.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.fourteen,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01 BGD,People's republic of bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,(missing),1992-06-12 MEX,United mexican states,126.58,1964.38,1158.23,N.America,1810-09-16 JPN,Nihon,126.22,377.97,4872.42,Asia,(missing) DEU,Germany,83.02,357.eleven,3693.ii,Europe,(missing) FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,Britain,66.44,242.5,2631.23,Europe,(missing) ITA,Italy,60.36,301.34,1943.84,Europe,(missing) ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09 DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,1962-07-05 Tin,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania,(missing) KAZ,Kazakhstan,eighteen.53,2724.9,159.41,Asia,1991-12-16                              

Now, the string '(missing)' in the file corresponds to the nan values from df.

When Pandas reads files, it considers the empty string ('') and a few others as missing values past default:

  • 'nan'
  • '-nan'
  • 'NA'
  • 'N/A'
  • 'NaN'
  • 'null'

If you don't desire this beliefs, then you can pass keep_default_na=False to the Pandas read_csv() function. To specify other labels for missing values, apply the parameter na_values:

>>>

                                                  >>>                                    pd                  .                  read_csv                  (                  'new-information.csv'                  ,                  index_col                  =                  0                  ,                  na_values                  =                  '(missing)'                  )                                      Country      POP      AREA       GDP       CONT     IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.sixteen   3287.26   2575.67       Asia  1947-08-15                  U.s.          United states   329.74   9833.52  19485.39  N.America  1776-07-04                  IDN   Republic of indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Due south.America  1822-09-07                  PAK    Islamic republic of pakistan   205.71    881.91    302.fourteen       Asia  1947-08-fourteen                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-ten-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Deutschland    83.02    357.11   3693.20     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-14                  GBR          United kingdom of great britain and northern ireland    66.44    242.50   2631.23     Europe         NaN                  ITA       Italy    60.36    301.34   1943.84     Europe         NaN                  ARG   Argentine republic    44.94   2780.xl    637.49  Southward.America  1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa  1962-07-05                  Tin      Canada    37.59   9984.67   1647.12  N.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia  1991-12-xvi                              

Hither, you've marked the string '(missing)' every bit a new missing information label, and Pandas replaced it with nan when it read the file.

When you load data from a file, Pandas assigns the data types to the values of each cavalcade past default. You tin bank check these types with .dtypes:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  )                  >>>                                    df                  .                  dtypes                  COUNTRY     object                  POP        float64                  Expanse       float64                  GDP        float64                  CONT        object                  IND_DAY     object                  dtype: object                              

The columns with strings and dates ('State', 'CONT', and 'IND_DAY') have the data blazon object. Meanwhile, the numeric columns contain 64-bit floating-point numbers (float64).

You can use the parameter dtype to specify the desired data types and parse_dates to forcefulness use of datetimes:

>>>

                                                  >>>                                    dtypes                  =                  {                  'Pop'                  :                  'float32'                  ,                  'AREA'                  :                  'float32'                  ,                  'Gdp'                  :                  'float32'                  }                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  dtype                  =                  dtypes                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  Land            object                  POP               float32                  Expanse              float32                  GDP               float32                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  [                  'IND_DAY'                  ]                  CHN          NaT                  IND   1947-08-15                  United states of america   1776-07-04                  IDN   1945-08-17                  BRA   1822-09-07                  PAK   1947-08-14                  NGA   1960-10-01                  BGD   1971-03-26                  RUS   1992-06-12                  MEX   1810-09-16                  JPN          NaT                  DEU          NaT                  FRA   1789-07-14                  GBR          NaT                  ITA          NaT                  ARG   1816-07-09                  DZA   1962-07-05                  CAN   1867-07-01                  AUS          NaT                  KAZ   1991-12-16                  Proper noun: IND_DAY, dtype: datetime64[ns]                              

Now, you have 32-fleck floating-point numbers (float32) as specified with dtype. These differ slightly from the original 64-chip numbers because of smaller precision. The values in the last column are considered as dates and accept the data blazon datetime64. That'due south why the NaN values in this column are replaced with NaT.

Now that you lot have existent dates, yous can save them in the format y'all like:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  to_csv                  (                  'formatted-information.csv'                  ,                  date_format                  =                  '%B                                    %d                  , %Y'                  )                              

Here, yous've specified the parameter date_format to exist '%B %d, %Y'. You tin expand the lawmaking cake below to see the resulting file:

                                ,COUNTRY,Popular,AREA,GDP,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia, IND,India,1351.16,3287.26,2575.67,Asia,"August xv, 1947" United states,Us,329.74,9833.52,19485.39,N.America,"July 04, 1776" IDN,Republic of indonesia,268.07,1910.93,1015.54,Asia,"August 17, 1945" BRA,Brazil,210.32,8515.77,2055.51,S.America,"September 07, 1822" PAK,Pakistan,205.71,881.91,302.14,Asia,"Baronial 14, 1947" NGA,Nigeria,200.96,923.77,375.77,Africa,"October 01, 1960" BGD,Bangladesh,167.09,147.57,245.63,Asia,"March 26, 1971" RUS,Russia,146.79,17098.25,1530.75,,"June 12, 1992" MEX,Mexico,126.58,1964.38,1158.23,Due north.America,"September 16, 1810" JPN,Japan,126.22,377.97,4872.42,Asia, DEU,Frg,83.02,357.11,3693.2,Europe, FRA,France,67.02,640.68,2582.49,Europe,"July 14, 1789" GBR,United kingdom,66.44,242.five,2631.23,Europe, ITA,Italian republic,60.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.4,637.49,South.America,"July 09, 1816" DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,"July 05, 1962" CAN,Canada,37.59,9984.67,1647.12,Due north.America,"July 01, 1867" AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,"December 16, 1991"                              

The format of the dates is dissimilar now. The format '%B %d, %Y' ways the date volition first display the full name of the month, then the day followed by a comma, and finally the full year.

In that location are several other optional parameters that you can use with .to_csv():

  • sep denotes a values separator.
  • decimal indicates a decimal separator.
  • encoding sets the file encoding.
  • header specifies whether you want to write column labels in the file.

Here's how you lot would laissez passer arguments for sep and header:

>>>

                                                  >>>                                    s                  =                  df                  .                  to_csv                  (                  sep                  =                  ';'                  ,                  header                  =                  False                  )                  >>>                                    print                  (                  s                  )                  CHN;China;1398.72;9596.96;12234.78;Asia;                  IND;Bharat;1351.16;3287.26;2575.67;Asia;1947-08-fifteen                  U.s.a.;U.s.a.;329.74;9833.52;19485.39;N.America;1776-07-04                  IDN;Republic of indonesia;268.07;1910.93;1015.54;Asia;1945-08-17                  BRA;Brazil;210.32;8515.77;2055.51;Due south.America;1822-09-07                  PAK;Pakistan;205.71;881.91;302.xiv;Asia;1947-08-14                  NGA;Nigeria;200.96;923.77;375.77;Africa;1960-x-01                  BGD;Bangladesh;167.09;147.57;245.63;Asia;1971-03-26                  RUS;Russian federation;146.79;17098.25;1530.75;;1992-06-12                  MEX;Mexico;126.58;1964.38;1158.23;N.America;1810-09-16                  JPN;Nippon;126.22;377.97;4872.42;Asia;                  DEU;Germany;83.02;357.11;3693.2;Europe;                  FRA;France;67.02;640.68;2582.49;Europe;1789-07-xiv                  GBR;United kingdom;66.44;242.5;2631.23;Europe;                  ITA;Italy;60.36;301.34;1943.84;Europe;                  ARG;Argentine republic;44.94;2780.four;637.49;Due south.America;1816-07-09                  DZA;Algeria;43.38;2381.74;167.56;Africa;1962-07-05                  Can;Canada;37.59;9984.67;1647.12;N.America;1867-07-01                  AUS;Australia;25.47;7692.02;1408.68;Oceania;                  KAZ;Kazakhstan;18.53;2724.9;159.41;Asia;1991-12-16                              

The data is separated with a semicolon (';') considering you've specified sep=';'. Likewise, since yous passed header=False, you see your data without the header row of cavalcade names.

The Pandas read_csv() role has many additional options for managing missing data, working with dates and times, quoting, encoding, treatment errors, and more than. For case, if you have a file with one data column and want to get a Serial object instead of a DataFrame, then you can pass squeeze=True to read_csv(). Y'all'll learn afterwards on about information compression and decompression, besides as how to skip rows and columns.

JSON Files

JSON stands for JavaScript object notation. JSON files are plaintext files used for data interchange, and humans can read them easily. They follow the ISO/IEC 21778:2017 and ECMA-404 standards and apply the .json extension. Python and Pandas work well with JSON files, as Python'southward json library offers congenital-in support for them.

You tin save the data from your DataFrame to a JSON file with .to_json(). Start by creating a DataFrame object again. Use the dictionary data that holds the data about countries and then apply .to_json():

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_json                  (                  'data-columns.json'                  )                              

This code produces the file data-columns.json. You can expand the lawmaking cake below to see how this file should wait:

                                                  {                  "Land"                  :{                  "CHN"                  :                  "People's republic of china"                  ,                  "IND"                  :                  "India"                  ,                  "USA"                  :                  "US"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russian federation"                  ,                  "MEX"                  :                  "United mexican states"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Deutschland"                  ,                  "FRA"                  :                  "French republic"                  ,                  "GBR"                  :                  "Uk"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "People's democratic republic of algeria"                  ,                  "Can"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "Popular"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "Usa"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  60.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "CAN"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  18.53                  },                  "AREA"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "United states of america"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.11                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.five                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.4                  ,                  "DZA"                  :                  2381.74                  ,                  "CAN"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.nine                  },                  "GDP"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "United states of america"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "Tin can"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "South.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  null                  ,                  "MEX"                  :                  "N.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "South.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "CAN"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  nothing                  ,                  "IND"                  :                  "1947-08-15"                  ,                  "USA"                  :                  "1776-07-04"                  ,                  "IDN"                  :                  "1945-08-17"                  ,                  "BRA"                  :                  "1822-09-07"                  ,                  "PAK"                  :                  "1947-08-fourteen"                  ,                  "NGA"                  :                  "1960-x-01"                  ,                  "BGD"                  :                  "1971-03-26"                  ,                  "RUS"                  :                  "1992-06-12"                  ,                  "MEX"                  :                  "1810-09-16"                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  "1789-07-14"                  ,                  "GBR"                  :                  cypher                  ,                  "ITA"                  :                  zippo                  ,                  "ARG"                  :                  "1816-07-09"                  ,                  "DZA"                  :                  "1962-07-05"                  ,                  "CAN"                  :                  "1867-07-01"                  ,                  "AUS"                  :                  null                  ,                  "KAZ"                  :                  "1991-12-xvi"                  }}                              

information-columns.json has 1 large dictionary with the column labels every bit keys and the corresponding inner dictionaries equally values.

You lot tin get a different file construction if you laissez passer an argument for the optional parameter orient:

>>>

                                                  >>>                                    df                  .                  to_json                  (                  'data-index.json'                  ,                  orient                  =                  'index'                  )                              

The orient parameter defaults to 'columns'. Here, yous've set it to index.

You should go a new file information-index.json. You tin can expand the code block beneath to see the changes:

                                                  {                  "CHN"                  :{                  "State"                  :                  "Red china"                  ,                  "POP"                  :                  1398.72                  ,                  "AREA"                  :                  9596.96                  ,                  "Gross domestic product"                  :                  12234.78                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  goose egg                  },                  "IND"                  :{                  "COUNTRY"                  :                  "India"                  ,                  "POP"                  :                  1351.16                  ,                  "Surface area"                  :                  3287.26                  ,                  "GDP"                  :                  2575.67                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-15"                  },                  "Us"                  :{                  "COUNTRY"                  :                  "US"                  ,                  "Pop"                  :                  329.74                  ,                  "AREA"                  :                  9833.52                  ,                  "GDP"                  :                  19485.39                  ,                  "CONT"                  :                  "North.America"                  ,                  "IND_DAY"                  :                  "1776-07-04"                  },                  "IDN"                  :{                  "Land"                  :                  "Republic of indonesia"                  ,                  "Popular"                  :                  268.07                  ,                  "Expanse"                  :                  1910.93                  ,                  "Gdp"                  :                  1015.54                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1945-08-17"                  },                  "BRA"                  :{                  "COUNTRY"                  :                  "Brazil"                  ,                  "POP"                  :                  210.32                  ,                  "AREA"                  :                  8515.77                  ,                  "GDP"                  :                  2055.51                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1822-09-07"                  },                  "PAK"                  :{                  "COUNTRY"                  :                  "Pakistan"                  ,                  "Pop"                  :                  205.71                  ,                  "Expanse"                  :                  881.91                  ,                  "Gross domestic product"                  :                  302.fourteen                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-14"                  },                  "NGA"                  :{                  "COUNTRY"                  :                  "Nigeria"                  ,                  "Pop"                  :                  200.96                  ,                  "AREA"                  :                  923.77                  ,                  "Gdp"                  :                  375.77                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1960-10-01"                  },                  "BGD"                  :{                  "COUNTRY"                  :                  "People's republic of bangladesh"                  ,                  "Pop"                  :                  167.09                  ,                  "Expanse"                  :                  147.57                  ,                  "Gross domestic product"                  :                  245.63                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1971-03-26"                  },                  "RUS"                  :{                  "Country"                  :                  "Russian federation"                  ,                  "Pop"                  :                  146.79                  ,                  "AREA"                  :                  17098.25                  ,                  "Gdp"                  :                  1530.75                  ,                  "CONT"                  :                  nada                  ,                  "IND_DAY"                  :                  "1992-06-12"                  },                  "MEX"                  :{                  "Country"                  :                  "Mexico"                  ,                  "POP"                  :                  126.58                  ,                  "Expanse"                  :                  1964.38                  ,                  "GDP"                  :                  1158.23                  ,                  "CONT"                  :                  "North.America"                  ,                  "IND_DAY"                  :                  "1810-09-16"                  },                  "JPN"                  :{                  "COUNTRY"                  :                  "Japan"                  ,                  "POP"                  :                  126.22                  ,                  "Surface area"                  :                  377.97                  ,                  "Gross domestic product"                  :                  4872.42                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  nothing                  },                  "DEU"                  :{                  "COUNTRY"                  :                  "Frg"                  ,                  "Popular"                  :                  83.02                  ,                  "AREA"                  :                  357.11                  ,                  "GDP"                  :                  3693.2                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "FRA"                  :{                  "COUNTRY"                  :                  "France"                  ,                  "POP"                  :                  67.02                  ,                  "Expanse"                  :                  640.68                  ,                  "Gdp"                  :                  2582.49                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  "1789-07-fourteen"                  },                  "GBR"                  :{                  "Land"                  :                  "Britain"                  ,                  "Pop"                  :                  66.44                  ,                  "AREA"                  :                  242.5                  ,                  "GDP"                  :                  2631.23                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "ITA"                  :{                  "COUNTRY"                  :                  "Italia"                  ,                  "POP"                  :                  60.36                  ,                  "Area"                  :                  301.34                  ,                  "Gdp"                  :                  1943.84                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "ARG"                  :{                  "Country"                  :                  "Argentine republic"                  ,                  "POP"                  :                  44.94                  ,                  "AREA"                  :                  2780.4                  ,                  "GDP"                  :                  637.49                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1816-07-09"                  },                  "DZA"                  :{                  "COUNTRY"                  :                  "Algeria"                  ,                  "POP"                  :                  43.38                  ,                  "AREA"                  :                  2381.74                  ,                  "Gdp"                  :                  167.56                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1962-07-05"                  },                  "CAN"                  :{                  "Country"                  :                  "Canada"                  ,                  "POP"                  :                  37.59                  ,                  "AREA"                  :                  9984.67                  ,                  "GDP"                  :                  1647.12                  ,                  "CONT"                  :                  "Northward.America"                  ,                  "IND_DAY"                  :                  "1867-07-01"                  },                  "AUS"                  :{                  "Country"                  :                  "Australia"                  ,                  "Popular"                  :                  25.47                  ,                  "Expanse"                  :                  7692.02                  ,                  "Gdp"                  :                  1408.68                  ,                  "CONT"                  :                  "Oceania"                  ,                  "IND_DAY"                  :                  nada                  },                  "KAZ"                  :{                  "COUNTRY"                  :                  "Kazakhstan"                  ,                  "POP"                  :                  18.53                  ,                  "AREA"                  :                  2724.nine                  ,                  "Gross domestic product"                  :                  159.41                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1991-12-16"                  }}                              

data-alphabetize.json besides has ane large dictionary, only this time the row labels are the keys, and the inner dictionaries are the values.

At that place are few more options for orient. Ane of them is 'records':

>>>

                                                  >>>                                    df                  .                  to_json                  (                  'data-records.json'                  ,                  orient                  =                  'records'                  )                              

This code should yield the file data-records.json. You tin expand the code block below to see the content:

                                                  [{                  "Country"                  :                  "Prc"                  ,                  "Pop"                  :                  1398.72                  ,                  "AREA"                  :                  9596.96                  ,                  "GDP"                  :                  12234.78                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  cypher                  },{                  "State"                  :                  "India"                  ,                  "POP"                  :                  1351.16                  ,                  "AREA"                  :                  3287.26                  ,                  "Gross domestic product"                  :                  2575.67                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-15"                  },{                  "Land"                  :                  "Us"                  ,                  "POP"                  :                  329.74                  ,                  "Expanse"                  :                  9833.52                  ,                  "Gross domestic product"                  :                  19485.39                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1776-07-04"                  },{                  "State"                  :                  "Indonesia"                  ,                  "Pop"                  :                  268.07                  ,                  "Expanse"                  :                  1910.93                  ,                  "Gross domestic product"                  :                  1015.54                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1945-08-17"                  },{                  "State"                  :                  "Brazil"                  ,                  "Pop"                  :                  210.32                  ,                  "AREA"                  :                  8515.77                  ,                  "GDP"                  :                  2055.51                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1822-09-07"                  },{                  "COUNTRY"                  :                  "Pakistan"                  ,                  "POP"                  :                  205.71                  ,                  "Area"                  :                  881.91                  ,                  "Gdp"                  :                  302.14                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-fourteen"                  },{                  "State"                  :                  "Nigeria"                  ,                  "Popular"                  :                  200.96                  ,                  "Area"                  :                  923.77                  ,                  "Gdp"                  :                  375.77                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1960-10-01"                  },{                  "Country"                  :                  "Bangladesh"                  ,                  "POP"                  :                  167.09                  ,                  "AREA"                  :                  147.57                  ,                  "Gross domestic product"                  :                  245.63                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1971-03-26"                  },{                  "COUNTRY"                  :                  "Russian federation"                  ,                  "Pop"                  :                  146.79                  ,                  "Expanse"                  :                  17098.25                  ,                  "GDP"                  :                  1530.75                  ,                  "CONT"                  :                  zippo                  ,                  "IND_DAY"                  :                  "1992-06-12"                  },{                  "State"                  :                  "Mexico"                  ,                  "Popular"                  :                  126.58                  ,                  "Surface area"                  :                  1964.38                  ,                  "GDP"                  :                  1158.23                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1810-09-xvi"                  },{                  "COUNTRY"                  :                  "Japan"                  ,                  "Pop"                  :                  126.22                  ,                  "Expanse"                  :                  377.97                  ,                  "GDP"                  :                  4872.42                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  null                  },{                  "State"                  :                  "Federal republic of germany"                  ,                  "Popular"                  :                  83.02                  ,                  "Expanse"                  :                  357.11                  ,                  "GDP"                  :                  3693.2                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  cypher                  },{                  "Country"                  :                  "France"                  ,                  "POP"                  :                  67.02                  ,                  "AREA"                  :                  640.68                  ,                  "Gdp"                  :                  2582.49                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  "1789-07-14"                  },{                  "COUNTRY"                  :                  "UK"                  ,                  "Popular"                  :                  66.44                  ,                  "AREA"                  :                  242.5                  ,                  "GDP"                  :                  2631.23                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },{                  "Country"                  :                  "Italy"                  ,                  "Pop"                  :                  60.36                  ,                  "Expanse"                  :                  301.34                  ,                  "GDP"                  :                  1943.84                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  zip                  },{                  "Land"                  :                  "Argentina"                  ,                  "POP"                  :                  44.94                  ,                  "Area"                  :                  2780.four                  ,                  "Gross domestic product"                  :                  637.49                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1816-07-09"                  },{                  "COUNTRY"                  :                  "Algeria"                  ,                  "POP"                  :                  43.38                  ,                  "AREA"                  :                  2381.74                  ,                  "Gross domestic product"                  :                  167.56                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1962-07-05"                  },{                  "COUNTRY"                  :                  "Canada"                  ,                  "Popular"                  :                  37.59                  ,                  "Surface area"                  :                  9984.67                  ,                  "Gross domestic product"                  :                  1647.12                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1867-07-01"                  },{                  "State"                  :                  "Commonwealth of australia"                  ,                  "Popular"                  :                  25.47                  ,                  "AREA"                  :                  7692.02                  ,                  "GDP"                  :                  1408.68                  ,                  "CONT"                  :                  "Oceania"                  ,                  "IND_DAY"                  :                  cypher                  },{                  "COUNTRY"                  :                  "Kazakhstan"                  ,                  "POP"                  :                  xviii.53                  ,                  "Area"                  :                  2724.9                  ,                  "GDP"                  :                  159.41                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1991-12-16"                  }]                              

data-records.json holds a list with i dictionary for each row. The row labels are non written.

Y'all can get some other interesting file structure with orient='split up':

>>>

                                                  >>>                                    df                  .                  to_json                  (                  'data-split.json'                  ,                  orient                  =                  'carve up'                  )                              

The resulting file is data-split.json. You can expand the code block beneath to come across how this file should look:

                                                  {                  "columns"                  :[                  "Land"                  ,                  "Pop"                  ,                  "AREA"                  ,                  "Gdp"                  ,                  "CONT"                  ,                  "IND_DAY"                  ],                  "index"                  :[                  "CHN"                  ,                  "IND"                  ,                  "USA"                  ,                  "IDN"                  ,                  "BRA"                  ,                  "PAK"                  ,                  "NGA"                  ,                  "BGD"                  ,                  "RUS"                  ,                  "MEX"                  ,                  "JPN"                  ,                  "DEU"                  ,                  "FRA"                  ,                  "GBR"                  ,                  "ITA"                  ,                  "ARG"                  ,                  "DZA"                  ,                  "Tin can"                  ,                  "AUS"                  ,                  "KAZ"                  ],                  "data"                  :[[                  "Prc"                  ,                  1398.72                  ,                  9596.96                  ,                  12234.78                  ,                  "Asia"                  ,                  null                  ],[                  "India"                  ,                  1351.16                  ,                  3287.26                  ,                  2575.67                  ,                  "Asia"                  ,                  "1947-08-15"                  ],[                  "US"                  ,                  329.74                  ,                  9833.52                  ,                  19485.39                  ,                  "N.America"                  ,                  "1776-07-04"                  ],[                  "Indonesia"                  ,                  268.07                  ,                  1910.93                  ,                  1015.54                  ,                  "Asia"                  ,                  "1945-08-17"                  ],[                  "Brazil"                  ,                  210.32                  ,                  8515.77                  ,                  2055.51                  ,                  "S.America"                  ,                  "1822-09-07"                  ],[                  "Islamic republic of pakistan"                  ,                  205.71                  ,                  881.91                  ,                  302.xiv                  ,                  "Asia"                  ,                  "1947-08-14"                  ],[                  "Nigeria"                  ,                  200.96                  ,                  923.77                  ,                  375.77                  ,                  "Africa"                  ,                  "1960-x-01"                  ],[                  "Bangladesh"                  ,                  167.09                  ,                  147.57                  ,                  245.63                  ,                  "Asia"                  ,                  "1971-03-26"                  ],[                  "Russia"                  ,                  146.79                  ,                  17098.25                  ,                  1530.75                  ,                  null                  ,                  "1992-06-12"                  ],[                  "Mexico"                  ,                  126.58                  ,                  1964.38                  ,                  1158.23                  ,                  "Due north.America"                  ,                  "1810-09-xvi"                  ],[                  "Nihon"                  ,                  126.22                  ,                  377.97                  ,                  4872.42                  ,                  "Asia"                  ,                  zero                  ],[                  "Germany"                  ,                  83.02                  ,                  357.11                  ,                  3693.2                  ,                  "Europe"                  ,                  aught                  ],[                  "French republic"                  ,                  67.02                  ,                  640.68                  ,                  2582.49                  ,                  "Europe"                  ,                  "1789-07-14"                  ],[                  "UK"                  ,                  66.44                  ,                  242.5                  ,                  2631.23                  ,                  "Europe"                  ,                  null                  ],[                  "Italian republic"                  ,                  60.36                  ,                  301.34                  ,                  1943.84                  ,                  "Europe"                  ,                  null                  ],[                  "Argentina"                  ,                  44.94                  ,                  2780.4                  ,                  637.49                  ,                  "S.America"                  ,                  "1816-07-09"                  ],[                  "People's democratic republic of algeria"                  ,                  43.38                  ,                  2381.74                  ,                  167.56                  ,                  "Africa"                  ,                  "1962-07-05"                  ],[                  "Canada"                  ,                  37.59                  ,                  9984.67                  ,                  1647.12                  ,                  "N.America"                  ,                  "1867-07-01"                  ],[                  "Australia"                  ,                  25.47                  ,                  7692.02                  ,                  1408.68                  ,                  "Oceania"                  ,                  null                  ],[                  "Kazakhstan"                  ,                  eighteen.53                  ,                  2724.9                  ,                  159.41                  ,                  "Asia"                  ,                  "1991-12-16"                  ]]}                              

data-carve up.json contains one dictionary that holds the following lists:

  • The names of the columns
  • The labels of the rows
  • The inner lists (ii-dimensional sequence) that hold information values

If you don't provide the value for the optional parameter path_or_buf that defines the file path, and then .to_json() will return a JSON cord instead of writing the results to a file. This beliefs is consequent with .to_csv().

At that place are other optional parameters you tin can use. For instance, yous tin set index=False to forgo saving row labels. You can manipulate precision with double_precision, and dates with date_format and date_unit. These concluding two parameters are particularly important when you have time series among your data:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  information                  =                  data                  )                  .                  T                  >>>                                    df                  [                  'IND_DAY'                  ]                  =                  pd                  .                  to_datetime                  (                  df                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  State            object                  Popular                object                  AREA               object                  Gdp                object                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  .                  to_json                  (                  'data-time.json'                  )                              

In this instance, you lot've created the DataFrame from the dictionary data and used to_datetime() to convert the values in the last column to datetime64. You tin expand the code block below to run across the resulting file:

                                                  {                  "COUNTRY"                  :{                  "CHN"                  :                  "Red china"                  ,                  "IND"                  :                  "Republic of india"                  ,                  "USA"                  :                  "US"                  ,                  "IDN"                  :                  "Republic of indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Islamic republic of pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russia"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Germany"                  ,                  "FRA"                  :                  "France"                  ,                  "GBR"                  :                  "Uk"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "Algeria"                  ,                  "CAN"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "POP"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "USA"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  threescore.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "Tin"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  18.53                  },                  "AREA"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "United states of america"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.xi                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.5                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.four                  ,                  "DZA"                  :                  2381.74                  ,                  "Tin can"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.9                  },                  "Gdp"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "Usa"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "Can"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "South.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  naught                  ,                  "MEX"                  :                  "N.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "CAN"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  null                  ,                  "IND"                  :                  -706320000000                  ,                  "United states of america"                  :                  -6106060800000                  ,                  "IDN"                  :                  -769219200000                  ,                  "BRA"                  :                  -4648924800000                  ,                  "PAK"                  :                  -706406400000                  ,                  "NGA"                  :                  -291945600000                  ,                  "BGD"                  :                  38793600000                  ,                  "RUS"                  :                  708307200000                  ,                  "MEX"                  :                  -5026838400000                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  -5694969600000                  ,                  "GBR"                  :                  null                  ,                  "ITA"                  :                  aught                  ,                  "ARG"                  :                  -4843411200000                  ,                  "DZA"                  :                  -236476800000                  ,                  "Tin can"                  :                  -3234729600000                  ,                  "AUS"                  :                  goose egg                  ,                  "KAZ"                  :                  692841600000                  }}                              

In this file, you have large integers instead of dates for the independence days. That's because the default value of the optional parameter date_format is 'epoch' whenever orient isn't 'tabular array'. This default behavior expresses dates every bit an epoch in milliseconds relative to midnight on January ane, 1970.

However, if you pass date_format='iso', and so you'll get the dates in the ISO 8601 format. In addition, date_unit decides the units of fourth dimension:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  [                  'IND_DAY'                  ]                  =                  pd                  .                  to_datetime                  (                  df                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  to_json                  (                  'new-data-time.json'                  ,                  date_format                  =                  'iso'                  ,                  date_unit                  =                  'due south'                  )                              

This code produces the following JSON file:

                                                  {                  "COUNTRY"                  :{                  "CHN"                  :                  "China"                  ,                  "IND"                  :                  "India"                  ,                  "USA"                  :                  "Us"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Islamic republic of pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russia"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Nihon"                  ,                  "DEU"                  :                  "Germany"                  ,                  "FRA"                  :                  "France"                  ,                  "GBR"                  :                  "UK"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "Algeria"                  ,                  "CAN"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "POP"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.xvi                  ,                  "USA"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  threescore.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "Can"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  xviii.53                  },                  "Expanse"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "Us"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.11                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.five                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.four                  ,                  "DZA"                  :                  2381.74                  ,                  "CAN"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.9                  },                  "Gdp"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "United states of america"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.fourteen                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "CAN"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "S.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  zilch                  ,                  "MEX"                  :                  "Due north.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "Tin can"                  :                  "North.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  null                  ,                  "IND"                  :                  "1947-08-15T00:00:00Z"                  ,                  "USA"                  :                  "1776-07-04T00:00:00Z"                  ,                  "IDN"                  :                  "1945-08-17T00:00:00Z"                  ,                  "BRA"                  :                  "1822-09-07T00:00:00Z"                  ,                  "PAK"                  :                  "1947-08-14T00:00:00Z"                  ,                  "NGA"                  :                  "1960-x-01T00:00:00Z"                  ,                  "BGD"                  :                  "1971-03-26T00:00:00Z"                  ,                  "RUS"                  :                  "1992-06-12T00:00:00Z"                  ,                  "MEX"                  :                  "1810-09-16T00:00:00Z"                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  nothing                  ,                  "FRA"                  :                  "1789-07-14T00:00:00Z"                  ,                  "GBR"                  :                  null                  ,                  "ITA"                  :                  null                  ,                  "ARG"                  :                  "1816-07-09T00:00:00Z"                  ,                  "DZA"                  :                  "1962-07-05T00:00:00Z"                  ,                  "Tin can"                  :                  "1867-07-01T00:00:00Z"                  ,                  "AUS"                  :                  nil                  ,                  "KAZ"                  :                  "1991-12-16T00:00:00Z"                  }}                              

The dates in the resulting file are in the ISO 8601 format.

Y'all can load the data from a JSON file with read_json():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_json                  (                  'data-alphabetize.json'                  ,                  orient                  =                  'index'                  ,                  ...                                    convert_dates                  =                  [                  'IND_DAY'                  ])                              

The parameter convert_dates has a similar purpose as parse_dates when y'all utilize information technology to read CSV files. The optional parameter orient is very of import considering it specifies how Pandas understands the construction of the file.

There are other optional parameters you tin can use also:

  • Set the encoding with encoding.
  • Manipulate dates with convert_dates and keep_default_dates.
  • Bear upon precision with dtype and precise_float.
  • Decode numeric information directly to NumPy arrays with numpy=True.

Note that y'all might lose the club of rows and columns when using the JSON format to store your data.

HTML Files

An HTML is a plaintext file that uses hypertext markup linguistic communication to help browsers render web pages. The extensions for HTML files are .html and .htm. You'll need to install an HTML parser library like lxml or html5lib to be able to piece of work with HTML files:

                                                  $pip install lxml html5lib                              

You can also utilize Conda to install the same packages:

                                                  $                  conda install lxml html5lib                              

Once you have these libraries, you can salvage the contents of your DataFrame as an HTML file with .to_html():

>>>

                                                  df = pd.DataFrame(data=data).T                  df.to_html('information.html')                              

This code generates a file data.html. You lot tin can expand the code block below to see how this file should look:

                                                  <                  table                  border                  =                  "1"                  class                  =                  "dataframe"                  >                  <                  thead                  >                  <                  tr                  style                  =                  "text-align: correct;"                  >                  <                  th                  ></                  th                  >                  <                  th                  >COUNTRY</                  th                  >                  <                  thursday                  >POP</                  th                  >                  <                  th                  >AREA</                  th                  >                  <                  thursday                  >GDP</                  th                  >                  <                  th                  >CONT</                  th                  >                  <                  thursday                  >IND_DAY</                  th                  >                  </                  tr                  >                  </                  thead                  >                  <                  tbody                  >                  <                  tr                  >                  <                  th                  >CHN</                  th                  >                  <                  td                  >Red china</                  td                  >                  <                  td                  >1398.72</                  td                  >                  <                  td                  >9596.96</                  td                  >                  <                  td                  >12234.eight</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >IND</                  thursday                  >                  <                  td                  >Bharat</                  td                  >                  <                  td                  >1351.sixteen</                  td                  >                  <                  td                  >3287.26</                  td                  >                  <                  td                  >2575.67</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1947-08-15</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >USA</                  thursday                  >                  <                  td                  >United states of america</                  td                  >                  <                  td                  >329.74</                  td                  >                  <                  td                  >9833.52</                  td                  >                  <                  td                  >19485.4</                  td                  >                  <                  td                  >N.America</                  td                  >                  <                  td                  >1776-07-04</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >IDN</                  th                  >                  <                  td                  >Indonesia</                  td                  >                  <                  td                  >268.07</                  td                  >                  <                  td                  >1910.93</                  td                  >                  <                  td                  >1015.54</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1945-08-17</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >BRA</                  thursday                  >                  <                  td                  >Brazil</                  td                  >                  <                  td                  >210.32</                  td                  >                  <                  td                  >8515.77</                  td                  >                  <                  td                  >2055.51</                  td                  >                  <                  td                  >S.America</                  td                  >                  <                  td                  >1822-09-07</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >PAK</                  th                  >                  <                  td                  >Pakistan</                  td                  >                  <                  td                  >205.71</                  td                  >                  <                  td                  >881.91</                  td                  >                  <                  td                  >302.xiv</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1947-08-14</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >NGA</                  th                  >                  <                  td                  >Nigeria</                  td                  >                  <                  td                  >200.96</                  td                  >                  <                  td                  >923.77</                  td                  >                  <                  td                  >375.77</                  td                  >                  <                  td                  >Africa</                  td                  >                  <                  td                  >1960-x-01</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >BGD</                  th                  >                  <                  td                  >Bangladesh</                  td                  >                  <                  td                  >167.09</                  td                  >                  <                  td                  >147.57</                  td                  >                  <                  td                  >245.63</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1971-03-26</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >RUS</                  th                  >                  <                  td                  >Russia</                  td                  >                  <                  td                  >146.79</                  td                  >                  <                  td                  >17098.2</                  td                  >                  <                  td                  >1530.75</                  td                  >                  <                  td                  >NaN</                  td                  >                  <                  td                  >1992-06-12</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >MEX</                  th                  >                  <                  td                  >Mexico</                  td                  >                  <                  td                  >126.58</                  td                  >                  <                  td                  >1964.38</                  td                  >                  <                  td                  >1158.23</                  td                  >                  <                  td                  >Northward.America</                  td                  >                  <                  td                  >1810-09-xvi</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >JPN</                  thursday                  >                  <                  td                  >Japan</                  td                  >                  <                  td                  >126.22</                  td                  >                  <                  td                  >377.97</                  td                  >                  <                  td                  >4872.42</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >DEU</                  th                  >                  <                  td                  >Germany</                  td                  >                  <                  td                  >83.02</                  td                  >                  <                  td                  >357.11</                  td                  >                  <                  td                  >3693.ii</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >FRA</                  th                  >                  <                  td                  >France</                  td                  >                  <                  td                  >67.02</                  td                  >                  <                  td                  >640.68</                  td                  >                  <                  td                  >2582.49</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >1789-07-fourteen</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >GBR</                  th                  >                  <                  td                  >UK</                  td                  >                  <                  td                  >66.44</                  td                  >                  <                  td                  >242.v</                  td                  >                  <                  td                  >2631.23</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >ITA</                  th                  >                  <                  td                  >Italy</                  td                  >                  <                  td                  >60.36</                  td                  >                  <                  td                  >301.34</                  td                  >                  <                  td                  >1943.84</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >ARG</                  th                  >                  <                  td                  >Argentina</                  td                  >                  <                  td                  >44.94</                  td                  >                  <                  td                  >2780.iv</                  td                  >                  <                  td                  >637.49</                  td                  >                  <                  td                  >S.America</                  td                  >                  <                  td                  >1816-07-09</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >DZA</                  th                  >                  <                  td                  >People's democratic republic of algeria</                  td                  >                  <                  td                  >43.38</                  td                  >                  <                  td                  >2381.74</                  td                  >                  <                  td                  >167.56</                  td                  >                  <                  td                  >Africa</                  td                  >                  <                  td                  >1962-07-05</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >Tin</                  thursday                  >                  <                  td                  >Canada</                  td                  >                  <                  td                  >37.59</                  td                  >                  <                  td                  >9984.67</                  td                  >                  <                  td                  >1647.12</                  td                  >                  <                  td                  >N.America</                  td                  >                  <                  td                  >1867-07-01</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >AUS</                  th                  >                  <                  td                  >Commonwealth of australia</                  td                  >                  <                  td                  >25.47</                  td                  >                  <                  td                  >7692.02</                  td                  >                  <                  td                  >1408.68</                  td                  >                  <                  td                  >Oceania</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >KAZ</                  th                  >                  <                  td                  >Kazakhstan</                  td                  >                  <                  td                  >xviii.53</                  td                  >                  <                  td                  >2724.ix</                  td                  >                  <                  td                  >159.41</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1991-12-xvi</                  td                  >                  </                  tr                  >                  </                  tbody                  >                  </                  tabular array                  >                              

This file shows the DataFrame contents nicely. Even so, notice that yous haven't obtained an entire spider web page. You've but output the information that corresponds to df in the HTML format.

.to_html() won't create a file if y'all don't provide the optional parameter buf, which denotes the buffer to write to. If you go out this parameter out, and so your code will render a string as information technology did with .to_csv() and .to_json().

Here are some other optional parameters:

  • header determines whether to save the column names.
  • index determines whether to relieve the row labels.
  • classes assigns cascading style sheet (CSS) classes.
  • render_links specifies whether to convert URLs to HTML links.
  • table_id assigns the CSS id to the table tag.
  • escape decides whether to catechumen the characters <, >, and & to HTML-safe strings.

You employ parameters like these to specify different aspects of the resulting files or strings.

You can create a DataFrame object from a suitable HTML file using read_html(), which will render a DataFrame example or a list of them:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_html                  (                  'data.html'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                              

This is very similar to what yous did when reading CSV files. You also have parameters that help you work with dates, missing values, precision, encoding, HTML parsers, and more.

Excel Files

You've already learned how to read and write Excel files with Pandas. However, there are a few more options worth considering. For 1, when you use .to_excel(), y'all tin specify the proper noun of the target worksheet with the optional parameter sheet_name:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  information                  )                  .                  T                  >>>                                    df                  .                  to_excel                  (                  'data.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  )                              

Here, you create a file information.xlsx with a worksheet chosen COUNTRIES that stores the data. The string 'information.xlsx' is the argument for the parameter excel_writer that defines the proper noun of the Excel file or its path.

The optional parameters startrow and startcol both default to 0 and betoken the upper left-most prison cell where the data should start being written:

>>>

                                                  >>>                                    df                  .                  to_excel                  (                  'data-shifted.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  ,                  ...                                    startrow                  =                  2                  ,                  startcol                  =                  4                  )                              

Here, you specify that the tabular array should showtime in the third row and the fifth column. Yous also used nothing-based indexing, so the third row is denoted past 2 and the fifth cavalcade by 4.

At present the resulting worksheet looks similar this:

mmst-pandas-rw-files-excel-shifted

Equally y'all tin can see, the table starts in the tertiary row two and the fifth column E.

.read_excel() also has the optional parameter sheet_name that specifies which worksheets to read when loading information. It can take on i of the post-obit values:

  • The zero-based index of the worksheet
  • The name of the worksheet
  • The list of indices or names to read multiple sheets
  • The value None to read all sheets

Hither'southward how you lot would use this parameter in your code:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'information.xlsx'                  ,                  sheet_name                  =                  0                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'information.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                              

Both statements above create the aforementioned DataFrame considering the sheet_name parameters have the aforementioned values. In both cases, sheet_name=0 and sheet_name='COUNTRIES' refer to the same worksheet. The argument parse_dates=['IND_DAY'] tells Pandas to endeavor to consider the values in this column equally dates or times.

There are other optional parameters you can employ with .read_excel() and .to_excel() to determine the Excel engine, the encoding, the style to handle missing values and infinities, the method for writing cavalcade names and row labels, and so on.

SQL Files

Pandas IO tools can also read and write databases. In this side by side example, you'll write your information to a database called information.db. To get started, you'll need the SQLAlchemy package. To larn more most information technology, you lot can read the official ORM tutorial. Y'all'll also need the database driver. Python has a built-in commuter for SQLite.

Y'all can install SQLAlchemy with pip:

You tin also install it with Conda:

                                                  $                  conda install sqlalchemy                              

Once y'all have SQLAlchemy installed, import create_engine() and create a database engine:

>>>

                                                  >>>                                    from                  sqlalchemy                  import                  create_engine                  >>>                                    engine                  =                  create_engine                  (                  'sqlite:///data.db'                  ,                  echo                  =                  Imitation                  )                              

Now that you have everything gear up upwardly, the next pace is to create a DataFrame object. It's convenient to specify the information types and apply .to_sql().

>>>

                                                  >>>                                    dtypes                  =                  {                  'Pop'                  :                  'float64'                  ,                  'AREA'                  :                  'float64'                  ,                  'GDP'                  :                  'float64'                  ,                  ...                                    'IND_DAY'                  :                  'datetime64'                  }                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  .                  astype                  (                  dtype                  =                  dtypes                  )                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  Popular               float64                  AREA              float64                  Gdp               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                              

.astype() is a very convenient method y'all can apply to prepare multiple data types at once.

One time you've created your DataFrame, you can save it to the database with .to_sql():

>>>

                                                  >>>                                    df                  .                  to_sql                  (                  'information.db'                  ,                  con                  =                  engine                  ,                  index_label                  =                  'ID'                  )                              

The parameter con is used to specify the database connection or engine that you want to employ. The optional parameter index_label specifies how to call the database column with the row labels. You lot'll frequently encounter it take on the value ID, Id, or id.

You should go the database data.db with a single table that looks like this:

mmst-pandas-rw-files-db

The get-go column contains the row labels. To omit writing them into the database, pass alphabetize=False to .to_sql(). The other columns represent to the columns of the DataFrame.

In that location are a few more optional parameters. For example, you can use schema to specify the database schema and dtype to make up one's mind the types of the database columns. You can also utilise if_exists, which says what to do if a database with the same name and path already exists:

  • if_exists='fail' raises a ValueError and is the default.
  • if_exists='replace' drops the tabular array and inserts new values.
  • if_exists='append' inserts new values into the table.

You lot tin can load the data from the database with read_sql():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_sql                  (                  'information.db'                  ,                  con                  =                  engine                  ,                  index_col                  =                  'ID'                  )                  >>>                                    df                                      COUNTRY      Pop      AREA       Gdp       CONT    IND_DAY                  ID                  CHN       China  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.sixteen   3287.26   2575.67       Asia 1947-08-15                  United states          US   329.74   9833.52  19485.39  North.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Southward.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-fourteen                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75       None 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.eleven   3693.twenty     Europe        NaT                  FRA      French republic    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          United kingdom    66.44    242.50   2631.23     Europe        NaT                  ITA       Italia    sixty.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  N.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia 1991-12-16                              

The parameter index_col specifies the proper noun of the column with the row labels. Notation that this inserts an extra row after the header that starts with ID. You can fix this behavior with the following line of code:

>>>

                                                  >>>                                    df                  .                  alphabetize                  .                  name                  =                  None                  >>>                                    df                                      State      POP      AREA       GDP       CONT    IND_DAY                  CHN       Red china  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.sixteen   3287.26   2575.67       Asia 1947-08-xv                  United states of america          US   329.74   9833.52  19485.39  N.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-xiv                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75       None 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.xi   3693.20     Europe        NaT                  FRA      French republic    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          UK    66.44    242.50   2631.23     Europe        NaT                  ITA       Italian republic    60.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  Due south.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Northward.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    eighteen.53   2724.xc    159.41       Asia 1991-12-16                              

Now you accept the same DataFrame object every bit before.

Note that the continent for Russian federation is now None instead of nan. If you want to fill the missing values with nan, and so you can use .fillna():

>>>

                                                  >>>                                    df                  .                  fillna                  (                  value                  =                  float                  (                  'nan'                  ),                  inplace                  =                  True                  )                              

.fillna() replaces all missing values with any you pass to value. Hither, you passed float('nan'), which says to fill all missing values with nan.

Likewise annotation that you didn't have to pass parse_dates=['IND_DAY'] to read_sql(). That's because your database was able to observe that the last column contains dates. However, you tin pass parse_dates if you'd similar. Yous'll get the same results.

In that location are other functions that you can utilize to read databases, like read_sql_table() and read_sql_query(). Feel costless to try them out!

Pickle Files

Pickling is the act of converting Python objects into byte streams. Unpickling is the changed process. Python pickle files are the binary files that keep the information and hierarchy of Python objects. They usually have the extension .pickle or .pkl.

You can save your DataFrame in a pickle file with .to_pickle():

>>>

                                                  >>>                                    dtypes                  =                  {                  'POP'                  :                  'float64'                  ,                  'Expanse'                  :                  'float64'                  ,                  'GDP'                  :                  'float64'                  ,                  ...                                    'IND_DAY'                  :                  'datetime64'                  }                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  .                  astype                  (                  dtype                  =                  dtypes                  )                  >>>                                    df                  .                  to_pickle                  (                  'data.pickle'                  )                              

Like you did with databases, it can be user-friendly first to specify the data types. Then, you create a file data.pickle to contain your data. Y'all could also laissez passer an integer value to the optional parameter protocol, which specifies the protocol of the pickler.

Y'all can get the data from a pickle file with read_pickle():

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_pickle                  (                  'information.pickle'                  )                  >>>                                    df                                      Country      POP      AREA       GDP       CONT    IND_DAY                  CHN       Prc  1398.72   9596.96  12234.78       Asia        NaT                  IND       Bharat  1351.16   3287.26   2575.67       Asia 1947-08-15                  USA          US   329.74   9833.52  19485.39  N.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-fourteen                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-sixteen                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.xi   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          UK    66.44    242.fifty   2631.23     Europe        NaT                  ITA       Italy    threescore.36    301.34   1943.84     Europe        NaT                  ARG   Argentine republic    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Northward.America 1867-07-01                  AUS   Commonwealth of australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia 1991-12-16                              

read_pickle() returns the DataFrame with the stored data. You can too check the data types:

>>>

                                                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  POP               float64                  AREA              float64                  GDP               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                              

These are the aforementioned ones that you specified before using .to_pickle().

As a word of caution, you should always beware of loading pickles from untrusted sources. This can be dangerous! When y'all unpickle an untrustworthy file, it could execute arbitrary lawmaking on your machine, gain remote access to your computer, or otherwise exploit your device in other ways.

Working With Big Data

If your files are also large for saving or processing, then there are several approaches you tin take to reduce the required disk space:

  • Compress your files
  • Choose merely the columns you lot want
  • Omit the rows you don't demand
  • Force the use of less precise data types
  • Split the information into chunks

You'll take a look at each of these techniques in plough.

Compress and Decompress Files

You can create an archive file similar you would a regular one, with the addition of a suffix that corresponds to the desired compression type:

  • '.gz'
  • '.bz2'
  • '.zip'
  • '.xz'

Pandas can deduce the compression blazon by itself:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  information                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_csv                  (                  'data.csv.zip'                  )                              

Hither, you create a compressed .csv file as an archive. The size of the regular .csv file is 1048 bytes, while the compressed file merely has 766 bytes.

You can open up this compressed file as usual with the Pandas read_csv() office:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'information.csv.zip'                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                                      COUNTRY      Popular      AREA       Gdp       CONT    IND_DAY                  CHN       People's republic of china  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.16   3287.26   2575.67       Asia 1947-08-15                  USA          US   329.74   9833.52  19485.39  North.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  South.America 1822-09-07                  PAK    Islamic republic of pakistan   205.71    881.91    302.14       Asia 1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Nihon   126.22    377.97   4872.42       Asia        NaT                  DEU     Deutschland    83.02    357.11   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          U.k.    66.44    242.50   2631.23     Europe        NaT                  ITA       Italian republic    60.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa 1962-07-05                  Tin      Canada    37.59   9984.67   1647.12  North.America 1867-07-01                  AUS   Commonwealth of australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    eighteen.53   2724.90    159.41       Asia 1991-12-16                              

read_csv() decompresses the file earlier reading it into a DataFrame.

You lot tin can specify the type of compression with the optional parameter compression, which can take on whatever of the following values:

  • 'infer'
  • 'gzip'
  • 'bz2'
  • 'null'
  • 'xz'
  • None

The default value compression='infer' indicates that Pandas should deduce the pinch type from the file extension.

Hither'south how you would shrink a pickle file:

>>>

                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  information                  )                  .                  T                  >>>                                    df                  .                  to_pickle                  (                  'data.pickle.shrink'                  ,                  compression                  =                  'gzip'                  )                              

Y'all should get the file data.pickle.shrink that you tin can later decompress and read:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_pickle                  (                  'data.pickle.compress'                  ,                  compression                  =                  'gzip'                  )                              

df again corresponds to the DataFrame with the same data as before.

You can give the other compression methods a try, as well. If you're using pickle files, and then keep in mind that the .zip format supports reading only.

Choose Columns

The Pandas read_csv() and read_excel() functions accept the optional parameter usecols that y'all can apply to specify the columns you want to load from the file. You can laissez passer the list of cavalcade names as the corresponding argument:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  usecols                  =                  [                  'COUNTRY'                  ,                  'AREA'                  ])                  >>>                                    df                                      Country      AREA                  0        Red china   9596.96                  1        India   3287.26                  2           Us   9833.52                  3    Indonesia   1910.93                  4       Brazil   8515.77                  5     Pakistan    881.91                  6      Nigeria    923.77                  7   Bangladesh    147.57                  8       Russia  17098.25                  9       Mexico   1964.38                  10       Nippon    377.97                  xi     Germany    357.11                  12      French republic    640.68                  13          United kingdom of great britain and northern ireland    242.l                  14       Italy    301.34                  15   Argentina   2780.xl                  16     Algeria   2381.74                  17      Canada   9984.67                  xviii   Australia   7692.02                  19  Kazakhstan   2724.90                              

Now you accept a DataFrame that contains less data than earlier. Hither, in that location are but the names of the countries and their areas.

Instead of the column names, you lot can also pass their indices:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  usecols                  =                  [                  0                  ,                  1                  ,                  three                  ])                  >>>                                    df                                      COUNTRY      Surface area                  CHN       China   9596.96                  IND       India   3287.26                  USA          US   9833.52                  IDN   Indonesia   1910.93                  BRA      Brazil   8515.77                  PAK    Pakistan    881.91                  NGA     Nigeria    923.77                  BGD  Bangladesh    147.57                  RUS      Russian federation  17098.25                  MEX      Mexico   1964.38                  JPN       Japan    377.97                  DEU     Germany    357.11                  FRA      France    640.68                  GBR          UK    242.50                  ITA       Italia    301.34                  ARG   Argentina   2780.twoscore                  DZA     Algeria   2381.74                  Tin can      Canada   9984.67                  AUS   Australia   7692.02                  KAZ  Republic of kazakhstan   2724.ninety                              

Expand the code block below to compare these results with the file 'data.csv':

                                ,State,POP,Expanse,Gross domestic product,CONT,IND_DAY CHN,Communist china,1398.72,9596.96,12234.78,Asia, IND,Bharat,1351.sixteen,3287.26,2575.67,Asia,1947-08-15 The states,US,329.74,9833.52,19485.39,N.America,1776-07-04 IDN,Republic of indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07 PAK,Islamic republic of pakistan,205.71,881.91,302.14,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-ten-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,,1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-sixteen JPN,Japan,126.22,377.97,4872.42,Asia, DEU,Germany,83.02,357.xi,3693.2,Europe, FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,UK,66.44,242.v,2631.23,Europe, ITA,Italy,60.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09 DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,1962-07-05 CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,1991-12-sixteen                              

You can see the following columns:

  • The column at index 0 contains the row labels.
  • The column at alphabetize i contains the country names.
  • The column at index 3 contains the areas.

Simlarly, read_sql() has the optional parameter columns that takes a list of column names to read:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_sql                  (                  'data.db'                  ,                  con                  =                  engine                  ,                  index_col                  =                  'ID'                  ,                  ...                                    columns                  =                  [                  'Country'                  ,                  'AREA'                  ])                  >>>                                    df                  .                  alphabetize                  .                  name                  =                  None                  >>>                                    df                                      Land      AREA                  CHN       China   9596.96                  IND       Bharat   3287.26                  United states          United states   9833.52                  IDN   Indonesia   1910.93                  BRA      Brazil   8515.77                  PAK    Pakistan    881.91                  NGA     Nigeria    923.77                  BGD  People's republic of bangladesh    147.57                  RUS      Russian federation  17098.25                  MEX      United mexican states   1964.38                  JPN       Japan    377.97                  DEU     Germany    357.11                  FRA      France    640.68                  GBR          UK    242.l                  ITA       Italy    301.34                  ARG   Argentine republic   2780.40                  DZA     People's democratic republic of algeria   2381.74                  CAN      Canada   9984.67                  AUS   Commonwealth of australia   7692.02                  KAZ  Republic of kazakhstan   2724.90                              

Once again, the DataFrame merely contains the columns with the names of the countries and areas. If columns is None or omitted, and then all of the columns will exist read, as you saw before. The default behavior is columns=None.

Omit Rows

When you test an algorithm for data processing or machine learning, you frequently don't need the unabridged dataset. It's convenient to load only a subset of the information to speed up the procedure. The Pandas read_csv() and read_excel() functions have some optional parameters that allow you to select which rows you want to load:

  • skiprows: either the number of rows to skip at the offset of the file if it's an integer, or the zero-based indices of the rows to skip if information technology'due south a list-like object
  • skipfooter: the number of rows to skip at the end of the file
  • nrows: the number of rows to read

Hither's how you would skip rows with odd zero-based indices, keeping the fifty-fifty ones:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  skiprows                  =                  range                  (                  1                  ,                  xx                  ,                  2                  ))                  >>>                                    df                                      COUNTRY      POP     AREA      GDP       CONT     IND_DAY                  IND       Bharat  1351.sixteen  3287.26  2575.67       Asia  1947-08-15                  IDN   Indonesia   268.07  1910.93  1015.54       Asia  1945-08-17                  PAK    Islamic republic of pakistan   205.71   881.91   302.14       Asia  1947-08-fourteen                  BGD  People's republic of bangladesh   167.09   147.57   245.63       Asia  1971-03-26                  MEX      Mexico   126.58  1964.38  1158.23  N.America  1810-09-16                  DEU     Frg    83.02   357.11  3693.20     Europe         NaN                  GBR          United kingdom of great britain and northern ireland    66.44   242.l  2631.23     Europe         NaN                  ARG   Argentina    44.94  2780.40   637.49  Southward.America  1816-07-09                  Tin can      Canada    37.59  9984.67  1647.12  North.America  1867-07-01                  KAZ  Kazakhstan    18.53  2724.90   159.41       Asia  1991-12-xvi                              

In this example, skiprows is range(ane, 20, 2) and corresponds to the values ane, 3, …, 19. The instances of the Python built-in class range deport like sequences. The get-go row of the file information.csv is the header row. It has the index 0, so Pandas loads information technology in. The second row with index 1 corresponds to the label CHN, and Pandas skips it. The third row with the index two and label IND is loaded, so on.

If y'all desire to choose rows randomly, and so skiprows can exist a listing or NumPy array with pseudo-random numbers, obtained either with pure Python or with NumPy.

Force Less Precise Data Types

If you're okay with less precise data types, then yous tin potentially save a significant corporeality of memory! First, become the data types with .dtypes over again:

>>>

                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  Popular               float64                  Expanse              float64                  GDP               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                              

The columns with the floating-point numbers are 64-flake floats. Each number of this type float64 consumes 64 bits or viii bytes. Each column has 20 numbers and requires 160 bytes. You tin can verify this with .memory_usage():

>>>

                                                  >>>                                    df                  .                  memory_usage                  ()                  Index      160                  COUNTRY    160                  Pop        160                  Area       160                  Gdp        160                  CONT       160                  IND_DAY    160                  dtype: int64                              

.memory_usage() returns an case of Series with the retentivity usage of each column in bytes. You tin can conveniently combine it with .loc[] and .sum() to get the retentivity for a group of columns:

>>>

                                                  >>>                                    df                  .                  loc                  [:,                  [                  'Pop'                  ,                  'Surface area'                  ,                  'GDP'                  ]]                  .                  memory_usage                  (                  index                  =                  Fake                  )                  .                  sum                  ()                  480                              

This case shows how you tin combine the numeric columns 'POP', 'Expanse', and 'Gdp' to get their total memory requirement. The argument index=False excludes data for row labels from the resulting Series object. For these 3 columns, you'll demand 480 bytes.

Y'all can also extract the data values in the form of a NumPy array with .to_numpy() or .values. Then, use the .nbytes attribute to get the full bytes consumed by the items of the array:

>>>

                                                  >>>                                    df                  .                  loc                  [:,                  [                  'Pop'                  ,                  'Surface area'                  ,                  'GDP'                  ]]                  .                  to_numpy                  ()                  .                  nbytes                  480                              

The issue is the same 480 bytes. So, how do you lot salve memory?

In this case, you tin can specify that your numeric columns 'POP', 'Area', and 'Gross domestic product' should have the type float32. Utilise the optional parameter dtype to do this:

>>>

                                                  >>>                                    dtypes                  =                  {                  'Popular'                  :                  'float32'                  ,                  'Expanse'                  :                  'float32'                  ,                  'Gdp'                  :                  'float32'                  }                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  dtype                  =                  dtypes                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                              

The lexicon dtypes specifies the desired data types for each column. It's passed to the Pandas read_csv() function as the argument that corresponds to the parameter dtype.

At present yous can verify that each numeric column needs 80 bytes, or 4 bytes per item:

>>>

                                                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  POP               float32                  AREA              float32                  GDP               float32                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  .                  memory_usage                  ()                  Index      160                  Land    160                  Popular         80                  Surface area        fourscore                  GDP         80                  CONT       160                  IND_DAY    160                  dtype: int64                  >>>                                    df                  .                  loc                  [:,                  [                  'POP'                  ,                  'AREA'                  ,                  'Gross domestic product'                  ]]                  .                  memory_usage                  (                  index                  =                  Fake                  )                  .                  sum                  ()                  240                  >>>                                    df                  .                  loc                  [:,                  [                  'POP'                  ,                  'AREA'                  ,                  'Gross domestic product'                  ]]                  .                  to_numpy                  ()                  .                  nbytes                  240                              

Each value is a floating-signal number of 32 bits or 4 bytes. The three numeric columns contain xx items each. In total, yous'll demand 240 bytes of memory when you work with the blazon float32. This is one-half the size of the 480 bytes you'd need to work with float64.

In addition to saving retentiveness, you can significantly reduce the fourth dimension required to process data by using float32 instead of float64 in some cases.

Utilize Chunks to Iterate Through Files

Another fashion to deal with very large datasets is to split the information into smaller chunks and process one chunk at a time. If yous use read_csv(), read_json() or read_sql(), then you can specify the optional parameter chunksize:

>>>

                                                  >>>                                    data_chunk                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  chunksize                  =                  8                  )                  >>>                                    type                  (                  data_chunk                  )                  <class 'pandas.io.parsers.TextFileReader'>                  >>>                                    hasattr                  (                  data_chunk                  ,                  '__iter__'                  )                  True                  >>>                                    hasattr                  (                  data_chunk                  ,                  '__next__'                  )                  True                              

chunksize defaults to None and can have on an integer value that indicates the number of items in a single clamper. When chunksize is an integer, read_csv() returns an iterable that you can use in a for loop to get and process only a fragment of the dataset in each iteration:

>>>

                                                  >>>                                    for                  df_chunk                  in                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  chunksize                  =                  8                  ):                  ...                                    print                  (                  df_chunk                  ,                  terminate                  =                  '                  \n\n                  '                  )                  ...                                    print                  (                  'retentiveness:'                  ,                  df_chunk                  .                  memory_usage                  ()                  .                  sum                  (),                  'bytes'                  ,                  ...                                    end                  =                  '                  \n\northward\n                  '                  )                  ...                                      COUNTRY      Popular     AREA       GDP       CONT     IND_DAY                  CHN       China  1398.72  9596.96  12234.78       Asia         NaN                  IND       Bharat  1351.16  3287.26   2575.67       Asia  1947-08-15                  U.s.a.          U.s.   329.74  9833.52  19485.39  N.America  1776-07-04                  IDN   Indonesia   268.07  1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32  8515.77   2055.51  S.America  1822-09-07                  PAK    Pakistan   205.71   881.91    302.14       Asia  1947-08-14                  NGA     Nigeria   200.96   923.77    375.77     Africa  1960-10-01                  BGD  Bangladesh   167.09   147.57    245.63       Asia  1971-03-26                  retentivity: 448 bytes                                      State     Pop      Area      Gross domestic product       CONT     IND_DAY                  RUS     Russia  146.79  17098.25  1530.75        NaN  1992-06-12                  MEX     Mexico  126.58   1964.38  1158.23  Due north.America  1810-09-16                  JPN      Japan  126.22    377.97  4872.42       Asia         NaN                  DEU    Germany   83.02    357.xi  3693.xx     Europe         NaN                  FRA     France   67.02    640.68  2582.49     Europe  1789-07-14                  GBR         UK   66.44    242.fifty  2631.23     Europe         NaN                  ITA      Italy   60.36    301.34  1943.84     Europe         NaN                  ARG  Argentina   44.94   2780.40   637.49  S.America  1816-07-09                  memory: 448 bytes                                      COUNTRY    POP     Area      GDP       CONT     IND_DAY                  DZA     People's democratic republic of algeria  43.38  2381.74   167.56     Africa  1962-07-05                  CAN      Canada  37.59  9984.67  1647.12  N.America  1867-07-01                  AUS   Australia  25.47  7692.02  1408.68    Oceania         NaN                  KAZ  Kazakhstan  xviii.53  2724.ninety   159.41       Asia  1991-12-16                  memory: 224 bytes                              

In this example, the chunksize is 8. The starting time iteration of the for loop returns a DataFrame with the showtime eight rows of the dataset just. The 2d iteration returns some other DataFrame with the adjacent eight rows. The third and last iteration returns the remaining four rows.

In each iteration, you get and process the DataFrame with the number of rows equal to chunksize. It'southward possible to accept fewer rows than the value of chunksize in the last iteration. You can employ this functionality to control the amount of retentiveness required to process data and proceed that amount reasonably small.

Conclusion

You now know how to save the data and labels from Pandas DataFrame objects to unlike kinds of files. You too know how to load your information from files and create DataFrame objects.

You've used the Pandas read_csv() and .to_csv() methods to read and write CSV files. You also used similar methods to read and write Excel, JSON, HTML, SQL, and pickle files. These functions are very convenient and widely used. They permit you to save or load your information in a single function or method call.

Y'all've also learned how to relieve fourth dimension, retention, and disk infinite when working with large information files:

  • Compress or decompress files
  • Choose the rows and columns you want to load
  • Use less precise data types
  • Separate data into chunks and procedure them 1 by i

You've mastered a pregnant step in the machine learning and data scientific discipline process! If you have whatsoever questions or comments, then please put them in the comments section beneath.

Lookout man Now This tutorial has a related video form created past the Real Python squad. Watch information technology together with the written tutorial to deepen your agreement: Reading and Writing Files With Pandas