How to Read .data File in Pandas
Sentry Now This tutorial has a related video course created by the Existent Python team. Watch it together with the written tutorial to deepen your understanding: Reading and Writing Files With Pandas
          Pandas          is a powerful and flexible Python parcel that allows you to piece of work with labeled and time series data. It likewise provides statistics methods, enables plotting, and more than. One crucial feature of Pandas is its ability to write and read Excel, CSV, and many other types of files. Functions like the Pandas          read_csv()          method enable y'all to piece of work with files effectively. You can use them to salve the data and labels from Pandas objects to a file and load them later as Pandas          Series          or          DataFrame          instances.
In this tutorial, you'll learn:
- What the Pandas IO tools API is
 - How to read and write data to and from files
 - How to work with various file formats
 - How to work with big data efficiently
 
Let's start reading and writing files!
Installing Pandas
The code in this tutorial is executed with CPython 3.7.iv and Pandas 0.25.i. It would be beneficial to make sure yous accept the latest versions of Python and Pandas on your machine. You might want to create a new virtual surroundings and install the dependencies for this tutorial.
First, you'll need the Pandas library. You may already have it installed. If you don't, so yous can install it with pip:
Once the installation process completes, you should have Pandas installed and set up.
Anaconda is an excellent Python distribution that comes with Python, many useful packages similar Pandas, and a parcel and environment managing director called Conda. To acquire more almost Anaconda, check out Setting Upwards Python for Machine Learning on Windows.
If you don't accept Pandas in your virtual surround, and so you tin can install information technology with Conda:
Conda is powerful as information technology manages the dependencies and their versions. To learn more about working with Conda, you lot tin can check out the official documentation.
Preparing Data
In this tutorial, you'll use the data related to 20 countries. Here'south an overview of the data and sources you'll be working with:
-               
Country is denoted by the country proper noun. Each country is in the superlative x list for either population, area, or gross domestic product (GDP). The row labels for the dataset are the three-letter of the alphabet country codes defined in ISO 3166-1. The column label for the dataset is
Country. -               
Population is expressed in millions. The information comes from a listing of countries and dependencies by population on Wikipedia. The column characterization for the dataset is
POP. -               
Area is expressed in thousands of kilometers squared. The data comes from a list of countries and dependencies by area on Wikipedia. The cavalcade label for the dataset is
AREA. -               
Gross domestic product is expressed in millions of U.S. dollars, according to the United nations data for 2017. Yous can find this data in the list of countries by nominal GDP on Wikipedia. The column characterization for the dataset is
Gross domestic product. -               
Continent is either Africa, Asia, Oceania, Europe, North America, or South America. Y'all can find this information on Wikipedia also. The column label for the dataset is
CONT. -               
Independence twenty-four hours is a appointment that commemorates a nation'south independence. The information comes from the list of national independence days on Wikipedia. The dates are shown in ISO 8601 format. The first four digits stand for the year, the next two numbers are the month, and the concluding two are for the day of the month. The column label for the dataset is
IND_DAY. 
This is how the data looks as a table:
| Country | POP | Surface area | Gdp | CONT | IND_DAY | |
|---|---|---|---|---|---|---|
| CHN | People's republic of china | 1398.72 | 9596.96 | 12234.78 | Asia | |
| IND | India | 1351.16 | 3287.26 | 2575.67 | Asia | 1947-08-15 | 
| USA | US | 329.74 | 9833.52 | 19485.39 | N.America | 1776-07-04 | 
| IDN | Indonesia | 268.07 | 1910.93 | 1015.54 | Asia | 1945-08-17 | 
| BRA | Brazil | 210.32 | 8515.77 | 2055.51 | South.America | 1822-09-07 | 
| PAK | Pakistan | 205.71 | 881.91 | 302.14 | Asia | 1947-08-fourteen | 
| NGA | Nigeria | 200.96 | 923.77 | 375.77 | Africa | 1960-ten-01 | 
| BGD | People's republic of bangladesh | 167.09 | 147.57 | 245.63 | Asia | 1971-03-26 | 
| RUS | Russia | 146.79 | 17098.25 | 1530.75 | 1992-06-12 | |
| MEX | Mexico | 126.58 | 1964.38 | 1158.23 | N.America | 1810-09-16 | 
| JPN | Nippon | 126.22 | 377.97 | 4872.42 | Asia | |
| DEU | Frg | 83.02 | 357.xi | 3693.twenty | Europe | |
| FRA | French republic | 67.02 | 640.68 | 2582.49 | Europe | 1789-07-14 | 
| GBR | UK | 66.44 | 242.l | 2631.23 | Europe | |
| ITA | Italia | 60.36 | 301.34 | 1943.84 | Europe | |
| ARG | Argentina | 44.94 | 2780.40 | 637.49 | South.America | 1816-07-09 | 
| DZA | People's democratic republic of algeria | 43.38 | 2381.74 | 167.56 | Africa | 1962-07-05 | 
| Tin | Canada | 37.59 | 9984.67 | 1647.12 | N.America | 1867-07-01 | 
| AUS | Australia | 25.47 | 7692.02 | 1408.68 | Oceania | |
| KAZ | Republic of kazakhstan | xviii.53 | 2724.90 | 159.41 | Asia | 1991-12-16 | 
You may notice that some of the information is missing. For case, the continent for Russian federation is not specified considering information technology spreads across both Europe and Asia. At that place are also several missing independence days considering the data source omits them.
Y'all tin organize this data in Python using a nested dictionary:
                                            data                =                {                'CHN'                :                {                'COUNTRY'                :                'Communist china'                ,                'Pop'                :                1_398.72                ,                'AREA'                :                9_596.96                ,                'GDP'                :                12_234.78                ,                'CONT'                :                'Asia'                },                'IND'                :                {                'State'                :                'Bharat'                ,                'Pop'                :                1_351.16                ,                'AREA'                :                3_287.26                ,                'Gross domestic product'                :                2_575.67                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1947-08-15'                },                'Us'                :                {                'COUNTRY'                :                'United states'                ,                'Pop'                :                329.74                ,                'Surface area'                :                9_833.52                ,                'Gdp'                :                19_485.39                ,                'CONT'                :                'N.America'                ,                'IND_DAY'                :                '1776-07-04'                },                'IDN'                :                {                'Land'                :                'Indonesia'                ,                'Popular'                :                268.07                ,                'AREA'                :                1_910.93                ,                'GDP'                :                1_015.54                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1945-08-17'                },                'BRA'                :                {                'COUNTRY'                :                'Brazil'                ,                'Pop'                :                210.32                ,                'AREA'                :                8_515.77                ,                'GDP'                :                2_055.51                ,                'CONT'                :                'S.America'                ,                'IND_DAY'                :                '1822-09-07'                },                'PAK'                :                {                'Country'                :                'Pakistan'                ,                'Pop'                :                205.71                ,                'Expanse'                :                881.91                ,                'GDP'                :                302.14                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1947-08-fourteen'                },                'NGA'                :                {                'State'                :                'Nigeria'                ,                'Popular'                :                200.96                ,                'AREA'                :                923.77                ,                'GDP'                :                375.77                ,                'CONT'                :                'Africa'                ,                'IND_DAY'                :                '1960-ten-01'                },                'BGD'                :                {                'Land'                :                'Bangladesh'                ,                'POP'                :                167.09                ,                'AREA'                :                147.57                ,                'GDP'                :                245.63                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1971-03-26'                },                'RUS'                :                {                'State'                :                'Russian federation'                ,                'Popular'                :                146.79                ,                'AREA'                :                17_098.25                ,                'Gdp'                :                1_530.75                ,                'IND_DAY'                :                '1992-06-12'                },                'MEX'                :                {                'State'                :                'Mexico'                ,                'POP'                :                126.58                ,                'Expanse'                :                1_964.38                ,                'GDP'                :                1_158.23                ,                'CONT'                :                'Northward.America'                ,                'IND_DAY'                :                '1810-09-sixteen'                },                'JPN'                :                {                'COUNTRY'                :                'Japan'                ,                'POP'                :                126.22                ,                'AREA'                :                377.97                ,                'GDP'                :                4_872.42                ,                'CONT'                :                'Asia'                },                'DEU'                :                {                'State'                :                'Deutschland'                ,                'Pop'                :                83.02                ,                'AREA'                :                357.11                ,                'Gross domestic product'                :                3_693.20                ,                'CONT'                :                'Europe'                },                'FRA'                :                {                'COUNTRY'                :                'France'                ,                'POP'                :                67.02                ,                'AREA'                :                640.68                ,                'Gross domestic product'                :                2_582.49                ,                'CONT'                :                'Europe'                ,                'IND_DAY'                :                '1789-07-14'                },                'GBR'                :                {                'COUNTRY'                :                'UK'                ,                'POP'                :                66.44                ,                'AREA'                :                242.l                ,                'GDP'                :                2_631.23                ,                'CONT'                :                'Europe'                },                'ITA'                :                {                'Country'                :                'Italy'                ,                'Pop'                :                60.36                ,                'AREA'                :                301.34                ,                'GDP'                :                1_943.84                ,                'CONT'                :                'Europe'                },                'ARG'                :                {                'State'                :                'Argentine republic'                ,                'Popular'                :                44.94                ,                'AREA'                :                2_780.forty                ,                'Gross domestic product'                :                637.49                ,                'CONT'                :                'Southward.America'                ,                'IND_DAY'                :                '1816-07-09'                },                'DZA'                :                {                'COUNTRY'                :                'Algeria'                ,                'Popular'                :                43.38                ,                'AREA'                :                2_381.74                ,                'GDP'                :                167.56                ,                'CONT'                :                'Africa'                ,                'IND_DAY'                :                '1962-07-05'                },                'Tin can'                :                {                'COUNTRY'                :                'Canada'                ,                'Popular'                :                37.59                ,                'Area'                :                9_984.67                ,                'GDP'                :                1_647.12                ,                'CONT'                :                'N.America'                ,                'IND_DAY'                :                '1867-07-01'                },                'AUS'                :                {                'COUNTRY'                :                'Australia'                ,                'Popular'                :                25.47                ,                'Expanse'                :                7_692.02                ,                'Gross domestic product'                :                1_408.68                ,                'CONT'                :                'Oceania'                },                'KAZ'                :                {                'COUNTRY'                :                'Kazakhstan'                ,                'POP'                :                18.53                ,                'AREA'                :                2_724.xc                ,                'GDP'                :                159.41                ,                'CONT'                :                'Asia'                ,                'IND_DAY'                :                '1991-12-16'                }                }                columns                =                (                'Land'                ,                'POP'                ,                'AREA'                ,                'Gross domestic product'                ,                'CONT'                ,                'IND_DAY'                )                                    Each row of the table is written as an inner dictionary whose keys are the column names and values are the corresponding data. These dictionaries are and then nerveless as the values in the outer            information            dictionary. The respective keys for            data            are the three-letter country codes.
You can use this            information            to create an example of a Pandas            DataFrame. Kickoff, y'all need to import Pandas:
>>>
                                            >>>                                import                pandas                as                pd                                    At present that you lot have Pandas imported, you lot tin use the            DataFrame            constructor and            data            to create a            DataFrame            object.
            information            is organized in such a way that the country codes represent to columns. You can reverse the rows and columns of a            DataFrame            with the belongings            .T:
>>>
                                            >>>                                df                =                pd                .                DataFrame                (                data                =                data                )                .                T                >>>                                df                                  Country      POP     Surface area      GDP       CONT     IND_DAY                CHN       Red china  1398.72  9596.96  12234.8       Asia         NaN                IND       Republic of india  1351.16  3287.26  2575.67       Asia  1947-08-15                USA          U.s.a.   329.74  9833.52  19485.4  N.America  1776-07-04                IDN   Republic of indonesia   268.07  1910.93  1015.54       Asia  1945-08-17                BRA      Brazil   210.32  8515.77  2055.51  S.America  1822-09-07                PAK    Pakistan   205.71   881.91   302.14       Asia  1947-08-14                NGA     Nigeria   200.96   923.77   375.77     Africa  1960-10-01                BGD  Bangladesh   167.09   147.57   245.63       Asia  1971-03-26                RUS      Russia   146.79  17098.2  1530.75        NaN  1992-06-12                MEX      Mexico   126.58  1964.38  1158.23  N.America  1810-09-16                JPN       Nihon   126.22   377.97  4872.42       Asia         NaN                DEU     Germany    83.02   357.11   3693.2     Europe         NaN                FRA      France    67.02   640.68  2582.49     Europe  1789-07-14                GBR          UK    66.44    242.5  2631.23     Europe         NaN                ITA       Italian republic    60.36   301.34  1943.84     Europe         NaN                ARG   Argentine republic    44.94   2780.4   637.49  S.America  1816-07-09                DZA     Algeria    43.38  2381.74   167.56     Africa  1962-07-05                CAN      Canada    37.59  9984.67  1647.12  N.America  1867-07-01                AUS   Australia    25.47  7692.02  1408.68    Oceania         NaN                KAZ  Kazakhstan    xviii.53   2724.9   159.41       Asia  1991-12-sixteen                                    Now yous have your            DataFrame            object populated with the data almost each country.
Versions of Python older than 3.six did not guarantee the lodge of keys in dictionaries. To ensure the lodge of columns is maintained for older versions of Python and Pandas, you can specify            alphabetize=columns:
>>>
                                            >>>                                df                =                pd                .                DataFrame                (                data                =                data                ,                index                =                columns                )                .                T                                    At present that you've prepared your information, you're set to start working with files!
Using the Pandas            read_csv()            and            .to_csv()            Functions
          A comma-separated values (CSV) file is a plaintext file with a            .csv            extension that holds tabular information. This is ane of the most popular file formats for storing big amounts of data. Each row of the CSV file represents a unmarried table row. The values in the same row are past default separated with commas, but you could change the separator to a semicolon, tab, space, or another character.
Write a CSV File
You can relieve your Pandas              DataFrame              as a CSV file with              .to_csv():
>>>
                                                  >>>                                    df                  .                  to_csv                  (                  'data.csv'                  )                                          That'southward it! You've created the file              data.csv              in your current working directory. You lot tin expand the code block below to run into how your CSV file should wait:
                                ,COUNTRY,Popular,AREA,GDP,CONT,IND_DAY CHN,Prc,1398.72,9596.96,12234.78,Asia, IND,Bharat,1351.16,3287.26,2575.67,Asia,1947-08-xv The states,US,329.74,9833.52,19485.39,Due north.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.14,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,,1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16 JPN,Nihon,126.22,377.97,4872.42,Asia, DEU,Germany,83.02,357.11,3693.ii,Europe, FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,UK,66.44,242.5,2631.23,Europe, ITA,Italy,60.36,301.34,1943.84,Europe, ARG,Argentine republic,44.94,2780.4,637.49,S.America,1816-07-09 DZA,Algeria,43.38,2381.74,167.56,Africa,1962-07-05 CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,1991-12-16                                          This text file contains the data separated with              commas. The commencement column contains the row labels. In some cases, you'll find them irrelevant. If you don't want to continue them, and so you can pass the argument              index=False              to              .to_csv().
Read a CSV File
Once your information is saved in a CSV file, you'll likely want to load and use it from time to time. You can do that with the Pandas              read_csv()              role:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  )                  >>>                                    df                                      State      POP      AREA       Gdp       CONT     IND_DAY                  CHN       Cathay  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.sixteen   3287.26   2575.67       Asia  1947-08-15                  The states          United states   329.74   9833.52  19485.39  Northward.America  1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Due south.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.xiv       Asia  1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-xvi                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Germany    83.02    357.eleven   3693.twenty     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-14                  GBR          UK    66.44    242.50   2631.23     Europe         NaN                  ITA       Italian republic    lx.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.xl    637.49  S.America  1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa  1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  N.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Republic of kazakhstan    eighteen.53   2724.ninety    159.41       Asia  1991-12-16                                          In this case, the Pandas              read_csv()              function returns a new              DataFrame              with the data and labels from the file              data.csv, which you specified with the offset argument. This cord tin be any valid path, including URLs.
The parameter              index_col              specifies the cavalcade from the CSV file that contains the row labels. You assign a zero-based column index to this parameter. You should determine the value of              index_col              when the CSV file contains the row labels to avert loading them as data.
Yous'll learn more than about using Pandas with CSV files later on in this tutorial. Y'all tin also check out Reading and Writing CSV Files in Python to see how to handle CSV files with the congenital-in Python library csv likewise.
Using Pandas to Write and Read Excel Files
Microsoft Excel is probably the nigh widely-used spreadsheet software. While older versions used binary            .xls            files, Excel 2007 introduced the new XML-based            .xlsx            file. You tin can read and write Excel files in Pandas, similar to CSV files. Nevertheless, yous'll need to install the following Python packages first:
- xlwt to write to              
.xlsfiles - openpyxl or XlsxWriter to write to              
.xlsxfiles - xlrd to read Excel files
 
You can install them using pip with a single command:
                                            $                pip install xlwt openpyxl xlsxwriter xlrd                                    Yous can also use Conda:
                                            $                conda install xlwt openpyxl xlsxwriter xlrd                                    Please annotation that y'all don't have to install            all            these packages. For example, you don't need both openpyxl and XlsxWriter. If you're going to work merely with            .xls            files, so you don't need whatever of them! However, if you intend to piece of work simply with            .xlsx            files, and then you lot're going to need at least one of them, simply non            xlwt. Take some time to decide which packages are right for your project.
Write an Excel File
Once you have those packages installed, you tin save your              DataFrame              in an Excel file with              .to_excel():
>>>
                                                  >>>                                    df                  .                  to_excel                  (                  'information.xlsx'                  )                                          The argument              'data.xlsx'              represents the target file and, optionally, its path. The above statement should create the file              information.xlsx              in your current working directory. That file should look like this:
            The outset column of the file contains the labels of the rows, while the other columns store data.
Read an Excel File
You can load data from Excel files with              read_excel():
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'data.xlsx'                  ,                  index_col                  =                  0                  )                  >>>                                    df                                      COUNTRY      Pop      AREA       GDP       CONT     IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.16   3287.26   2575.67       Asia  1947-08-15                  United states of america          The states   329.74   9833.52  19485.39  N.America  1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Southward.America  1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia  1947-08-xiv                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-ten-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-16                  JPN       Nippon   126.22    377.97   4872.42       Asia         NaN                  DEU     Germany    83.02    357.xi   3693.20     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-fourteen                  GBR          U.k.    66.44    242.50   2631.23     Europe         NaN                  ITA       Italian republic    sixty.36    301.34   1943.84     Europe         NaN                  ARG   Argentina    44.94   2780.40    637.49  S.America  1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa  1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Due north.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia  1991-12-sixteen                                                        read_excel()              returns a new              DataFrame              that contains the values from              data.xlsx. Y'all can also employ              read_excel()              with OpenDocument spreadsheets, or              .ods              files.
You'll learn more about working with Excel files afterward on in this tutorial. Yous tin can as well check out Using Pandas to Read Large Excel Files in Python.
Understanding the Pandas IO API
            Pandas IO Tools            is the API that allows you to save the contents of            Series            and            DataFrame            objects to the clipboard, objects, or files of various types. It also enables loading data from the clipboard, objects, or files.
Write Files
              Serial              and              DataFrame              objects have methods that enable writing data and labels to the clipboard or files. They're named with the pattern                              .to_<file-blazon>()              , where              <file-type>              is the blazon of the target file.
Yous've learned about              .to_csv()              and              .to_excel(), but there are others, including:
-                 
.to_json() -                 
.to_html() -                 
.to_sql() -                 
.to_pickle() 
At that place are nonetheless more file types that you tin write to, so this list is not exhaustive.
These methods have parameters specifying the target file path where you saved the data and labels. This is mandatory in some cases and optional in others. If this option is available and you choose to omit it, then the methods return the objects (like strings or iterables) with the contents of              DataFrame              instances.
The optional parameter              compression              decides how to compress the file with the data and labels. You'll learn more about information technology afterwards on. There are a few other parameters, but they're mostly specific to one or several methods. Y'all won't go into them in detail here.
Read Files
Pandas functions for reading the contents of files are named using the pattern                              .read_<file-blazon>()              , where              <file-type>              indicates the blazon of the file to read. Y'all've already seen the Pandas              read_csv()              and              read_excel()              functions. Here are a few others:
-                 
read_json() -                 
read_html() -                 
read_sql() -                 
read_pickle() 
These functions take a parameter that specifies the target file path. It can be any valid cord that represents the path, either on a local machine or in a URL. Other objects are also acceptable depending on the file blazon.
The optional parameter              compression              determines the type of decompression to use for the compressed files. You'll acquire near it subsequently on in this tutorial. There are other parameters, but they're specific to i or several functions. You won't go into them in detail here.
Working With Different File Types
The Pandas library offers a wide range of possibilities for saving your data to files and loading data from files. In this department, you lot'll learn more than about working with CSV and Excel files. You'll as well see how to use other types of files, like JSON, web pages, databases, and Python pickle files.
CSV Files
You lot've already learned how to read and write CSV files. At present let'southward dig a little deeper into the details. When you use              .to_csv()              to relieve your              DataFrame, you lot can provide an statement for the parameter              path_or_buf              to specify the path, name, and extension of the target file.            
              path_or_buf              is the first argument              .to_csv()              will go. Information technology can exist any string that represents a valid file path that includes the file name and its extension. You lot've seen this in a previous example. However, if yous omit              path_or_buf, then              .to_csv()              won't create whatsoever files. Instead, information technology'll return the respective string:
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    s                  =                  df                  .                  to_csv                  ()                  >>>                                    impress                  (                  south                  )                  ,State,POP,AREA,GDP,CONT,IND_DAY                  CHN,China,1398.72,9596.96,12234.78,Asia,                  IND,India,1351.16,3287.26,2575.67,Asia,1947-08-xv                  USA,US,329.74,9833.52,19485.39,N.America,1776-07-04                  IDN,Republic of indonesia,268.07,1910.93,1015.54,Asia,1945-08-17                  BRA,Brazil,210.32,8515.77,2055.51,Due south.America,1822-09-07                  PAK,Islamic republic of pakistan,205.71,881.91,302.14,Asia,1947-08-14                  NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01                  BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26                  RUS,Russia,146.79,17098.25,1530.75,,1992-06-12                  MEX,Mexico,126.58,1964.38,1158.23,Northward.America,1810-09-xvi                  JPN,Japan,126.22,377.97,4872.42,Asia,                  DEU,Federal republic of germany,83.02,357.11,3693.two,Europe,                  FRA,France,67.02,640.68,2582.49,Europe,1789-07-14                  GBR,Uk,66.44,242.5,2631.23,Europe,                  ITA,Italia,lx.36,301.34,1943.84,Europe,                  ARG,Argentina,44.94,2780.4,637.49,Due south.America,1816-07-09                  DZA,Algeria,43.38,2381.74,167.56,Africa,1962-07-05                  CAN,Canada,37.59,9984.67,1647.12,North.America,1867-07-01                  AUS,Australia,25.47,7692.02,1408.68,Oceania,                  KAZ,Republic of kazakhstan,18.53,2724.9,159.41,Asia,1991-12-16                                          Now you have the string              s              instead of a CSV file. Yous likewise have some              missing values              in your              DataFrame              object. For instance, the continent for Russia and the independence days for several countries (China, Japan, and and then on) are not available. In data science and machine learning, you must handle missing values carefully. Pandas excels here! By default, Pandas uses the NaN value to replace the missing values.
The continent that corresponds to Russia in              df              is              nan:
>>>
                                                  >>>                                    df                  .                  loc                  [                  'RUS'                  ,                  'CONT'                  ]                  nan                                          This example uses              .loc[]              to get data with the specified row and column names.
When yous salvage your              DataFrame              to a CSV file, empty strings ('') will stand for the missing data. You tin run across this both in your file              data.csv              and in the string              s. If you lot want to alter this behavior, then utilize the optional parameter              na_rep:
>>>
                                                  >>>                                    df                  .                  to_csv                  (                  'new-information.csv'                  ,                  na_rep                  =                  '(missing)'                  )                                          This code produces the file              new-data.csv              where the missing values are no longer empty strings. You tin expand the code block below to run across how this file should look:
                                ,Land,Pop,AREA,Gross domestic product,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia,(missing) IND,India,1351.sixteen,3287.26,2575.67,Asia,1947-08-xv Us,The states,329.74,9833.52,19485.39,Northward.America,1776-07-04 IDN,Indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,South.America,1822-09-07 PAK,Pakistan,205.71,881.91,302.fourteen,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-10-01 BGD,People's republic of bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,(missing),1992-06-12 MEX,United mexican states,126.58,1964.38,1158.23,N.America,1810-09-16 JPN,Nihon,126.22,377.97,4872.42,Asia,(missing) DEU,Germany,83.02,357.eleven,3693.ii,Europe,(missing) FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,Britain,66.44,242.5,2631.23,Europe,(missing) ITA,Italy,60.36,301.34,1943.84,Europe,(missing) ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09 DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,1962-07-05 Tin,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania,(missing) KAZ,Kazakhstan,eighteen.53,2724.9,159.41,Asia,1991-12-16                                          Now, the string              '(missing)'              in the file corresponds to the              nan              values from              df.
When Pandas reads files, it considers the empty string ('') and a few others as missing values past default:
-                 
'nan' -                 
'-nan' -                 
'NA' -                 
'N/A' -                 
'NaN' -                 
'null' 
If you don't desire this beliefs, then you can pass              keep_default_na=False              to the Pandas              read_csv()              function. To specify other labels for missing values, apply the parameter              na_values:
>>>
                                                  >>>                                    pd                  .                  read_csv                  (                  'new-information.csv'                  ,                  index_col                  =                  0                  ,                  na_values                  =                  '(missing)'                  )                                      Country      POP      AREA       GDP       CONT     IND_DAY                  CHN       China  1398.72   9596.96  12234.78       Asia         NaN                  IND       India  1351.sixteen   3287.26   2575.67       Asia  1947-08-15                  U.s.          United states   329.74   9833.52  19485.39  N.America  1776-07-04                  IDN   Republic of indonesia   268.07   1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Due south.America  1822-09-07                  PAK    Islamic republic of pakistan   205.71    881.91    302.fourteen       Asia  1947-08-fourteen                  NGA     Nigeria   200.96    923.77    375.77     Africa  1960-ten-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia  1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75        NaN  1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America  1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia         NaN                  DEU     Deutschland    83.02    357.11   3693.20     Europe         NaN                  FRA      France    67.02    640.68   2582.49     Europe  1789-07-14                  GBR          United kingdom of great britain and northern ireland    66.44    242.50   2631.23     Europe         NaN                  ITA       Italy    60.36    301.34   1943.84     Europe         NaN                  ARG   Argentine republic    44.94   2780.xl    637.49  Southward.America  1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa  1962-07-05                  Tin      Canada    37.59   9984.67   1647.12  N.America  1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania         NaN                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia  1991-12-xvi                                          Hither, you've marked the string              '(missing)'              every bit a new missing information label, and Pandas replaced it with              nan              when it read the file.
When you load data from a file, Pandas assigns the data types to the values of each cavalcade past default. You tin bank check these types with              .dtypes:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  )                  >>>                                    df                  .                  dtypes                  COUNTRY     object                  POP        float64                  Expanse       float64                  GDP        float64                  CONT        object                  IND_DAY     object                  dtype: object                                          The columns with strings and dates ('State',              'CONT', and              'IND_DAY') have the data blazon              object. Meanwhile, the numeric columns contain 64-bit floating-point numbers (float64).
You can use the parameter              dtype              to specify the desired data types and              parse_dates              to forcefulness use of datetimes:
>>>
                                                  >>>                                    dtypes                  =                  {                  'Pop'                  :                  'float32'                  ,                  'AREA'                  :                  'float32'                  ,                  'Gdp'                  :                  'float32'                  }                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  dtype                  =                  dtypes                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  Land            object                  POP               float32                  Expanse              float32                  GDP               float32                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  [                  'IND_DAY'                  ]                  CHN          NaT                  IND   1947-08-15                  United states of america   1776-07-04                  IDN   1945-08-17                  BRA   1822-09-07                  PAK   1947-08-14                  NGA   1960-10-01                  BGD   1971-03-26                  RUS   1992-06-12                  MEX   1810-09-16                  JPN          NaT                  DEU          NaT                  FRA   1789-07-14                  GBR          NaT                  ITA          NaT                  ARG   1816-07-09                  DZA   1962-07-05                  CAN   1867-07-01                  AUS          NaT                  KAZ   1991-12-16                  Proper noun: IND_DAY, dtype: datetime64[ns]                                          Now, you have 32-fleck floating-point numbers (float32) as specified with              dtype. These differ slightly from the original 64-chip numbers because of smaller              precision. The values in the last column are considered as dates and accept the data blazon              datetime64. That'due south why the              NaN              values in this column are replaced with              NaT.
Now that you lot have existent dates, yous can save them in the format y'all like:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  to_csv                  (                  'formatted-information.csv'                  ,                  date_format                  =                  '%B                                    %d                  , %Y'                  )                                          Here, yous've specified the parameter              date_format              to exist              '%B %d, %Y'. You tin expand the lawmaking cake below to see the resulting file:
                                ,COUNTRY,Popular,AREA,GDP,CONT,IND_DAY CHN,China,1398.72,9596.96,12234.78,Asia, IND,India,1351.16,3287.26,2575.67,Asia,"August xv, 1947" United states,Us,329.74,9833.52,19485.39,N.America,"July 04, 1776" IDN,Republic of indonesia,268.07,1910.93,1015.54,Asia,"August 17, 1945" BRA,Brazil,210.32,8515.77,2055.51,S.America,"September 07, 1822" PAK,Pakistan,205.71,881.91,302.14,Asia,"Baronial 14, 1947" NGA,Nigeria,200.96,923.77,375.77,Africa,"October 01, 1960" BGD,Bangladesh,167.09,147.57,245.63,Asia,"March 26, 1971" RUS,Russia,146.79,17098.25,1530.75,,"June 12, 1992" MEX,Mexico,126.58,1964.38,1158.23,Due north.America,"September 16, 1810" JPN,Japan,126.22,377.97,4872.42,Asia, DEU,Frg,83.02,357.11,3693.2,Europe, FRA,France,67.02,640.68,2582.49,Europe,"July 14, 1789" GBR,United kingdom,66.44,242.five,2631.23,Europe, ITA,Italian republic,60.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.4,637.49,South.America,"July 09, 1816" DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,"July 05, 1962" CAN,Canada,37.59,9984.67,1647.12,Due north.America,"July 01, 1867" AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,"December 16, 1991"                                          The format of the dates is dissimilar now. The format              '%B %d, %Y'              ways the date volition first display the full name of the month, then the day followed by a comma, and finally the full year.
In that location are several other optional parameters that you can use with              .to_csv():
-                                   
sepdenotes a values separator. -                                   
decimalindicates a decimal separator. -                                   
encodingsets the file encoding. -                                   
headerspecifies whether you want to write column labels in the file. 
Here's how you lot would laissez passer arguments for              sep              and              header:
>>>
                                                  >>>                                    s                  =                  df                  .                  to_csv                  (                  sep                  =                  ';'                  ,                  header                  =                  False                  )                  >>>                                    print                  (                  s                  )                  CHN;China;1398.72;9596.96;12234.78;Asia;                  IND;Bharat;1351.16;3287.26;2575.67;Asia;1947-08-fifteen                  U.s.a.;U.s.a.;329.74;9833.52;19485.39;N.America;1776-07-04                  IDN;Republic of indonesia;268.07;1910.93;1015.54;Asia;1945-08-17                  BRA;Brazil;210.32;8515.77;2055.51;Due south.America;1822-09-07                  PAK;Pakistan;205.71;881.91;302.xiv;Asia;1947-08-14                  NGA;Nigeria;200.96;923.77;375.77;Africa;1960-x-01                  BGD;Bangladesh;167.09;147.57;245.63;Asia;1971-03-26                  RUS;Russian federation;146.79;17098.25;1530.75;;1992-06-12                  MEX;Mexico;126.58;1964.38;1158.23;N.America;1810-09-16                  JPN;Nippon;126.22;377.97;4872.42;Asia;                  DEU;Germany;83.02;357.11;3693.2;Europe;                  FRA;France;67.02;640.68;2582.49;Europe;1789-07-xiv                  GBR;United kingdom;66.44;242.5;2631.23;Europe;                  ITA;Italy;60.36;301.34;1943.84;Europe;                  ARG;Argentine republic;44.94;2780.four;637.49;Due south.America;1816-07-09                  DZA;Algeria;43.38;2381.74;167.56;Africa;1962-07-05                  Can;Canada;37.59;9984.67;1647.12;N.America;1867-07-01                  AUS;Australia;25.47;7692.02;1408.68;Oceania;                  KAZ;Kazakhstan;18.53;2724.9;159.41;Asia;1991-12-16                                          The data is separated with a semicolon (';') considering you've specified              sep=';'. Likewise, since yous passed              header=False, you see your data without the header row of cavalcade names.
The Pandas              read_csv()              role has many additional options for managing missing data, working with dates and times, quoting, encoding, treatment errors, and more than. For case, if you have a file with one data column and want to get a              Serial              object instead of a              DataFrame, then you can pass              squeeze=True              to              read_csv(). Y'all'll learn afterwards on about information compression and decompression, besides as how to skip rows and columns.
JSON Files
JSON stands for JavaScript object notation. JSON files are plaintext files used for data interchange, and humans can read them easily. They follow the ISO/IEC 21778:2017 and ECMA-404 standards and apply the              .json              extension. Python and Pandas work well with JSON files, as Python'southward json library offers congenital-in support for them.
You tin save the data from your              DataFrame              to a JSON file with              .to_json(). Start by creating a              DataFrame              object again. Use the dictionary              data              that holds the data about countries and then apply              .to_json():
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_json                  (                  'data-columns.json'                  )                                          This code produces the file              data-columns.json. You can expand the lawmaking cake below to see how this file should wait:
                                                  {                  "Land"                  :{                  "CHN"                  :                  "People's republic of china"                  ,                  "IND"                  :                  "India"                  ,                  "USA"                  :                  "US"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russian federation"                  ,                  "MEX"                  :                  "United mexican states"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Deutschland"                  ,                  "FRA"                  :                  "French republic"                  ,                  "GBR"                  :                  "Uk"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "People's democratic republic of algeria"                  ,                  "Can"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "Popular"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "Usa"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  60.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "CAN"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  18.53                  },                  "AREA"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "United states of america"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.11                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.five                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.4                  ,                  "DZA"                  :                  2381.74                  ,                  "CAN"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.nine                  },                  "GDP"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "United states of america"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "Tin can"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "South.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  null                  ,                  "MEX"                  :                  "N.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "South.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "CAN"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  nothing                  ,                  "IND"                  :                  "1947-08-15"                  ,                  "USA"                  :                  "1776-07-04"                  ,                  "IDN"                  :                  "1945-08-17"                  ,                  "BRA"                  :                  "1822-09-07"                  ,                  "PAK"                  :                  "1947-08-fourteen"                  ,                  "NGA"                  :                  "1960-x-01"                  ,                  "BGD"                  :                  "1971-03-26"                  ,                  "RUS"                  :                  "1992-06-12"                  ,                  "MEX"                  :                  "1810-09-16"                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  "1789-07-14"                  ,                  "GBR"                  :                  cypher                  ,                  "ITA"                  :                  zippo                  ,                  "ARG"                  :                  "1816-07-09"                  ,                  "DZA"                  :                  "1962-07-05"                  ,                  "CAN"                  :                  "1867-07-01"                  ,                  "AUS"                  :                  null                  ,                  "KAZ"                  :                  "1991-12-xvi"                  }}                                                        information-columns.json              has 1 large dictionary with the column labels every bit keys and the corresponding inner dictionaries equally values.
You lot tin get a different file construction if you laissez passer an argument for the optional parameter              orient:
>>>
                                                  >>>                                    df                  .                  to_json                  (                  'data-index.json'                  ,                  orient                  =                  'index'                  )                                          The              orient              parameter defaults to              'columns'. Here, yous've set it to              index.
You should go a new file              information-index.json. You tin can expand the code block beneath to see the changes:
                                                  {                  "CHN"                  :{                  "State"                  :                  "Red china"                  ,                  "POP"                  :                  1398.72                  ,                  "AREA"                  :                  9596.96                  ,                  "Gross domestic product"                  :                  12234.78                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  goose egg                  },                  "IND"                  :{                  "COUNTRY"                  :                  "India"                  ,                  "POP"                  :                  1351.16                  ,                  "Surface area"                  :                  3287.26                  ,                  "GDP"                  :                  2575.67                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-15"                  },                  "Us"                  :{                  "COUNTRY"                  :                  "US"                  ,                  "Pop"                  :                  329.74                  ,                  "AREA"                  :                  9833.52                  ,                  "GDP"                  :                  19485.39                  ,                  "CONT"                  :                  "North.America"                  ,                  "IND_DAY"                  :                  "1776-07-04"                  },                  "IDN"                  :{                  "Land"                  :                  "Republic of indonesia"                  ,                  "Popular"                  :                  268.07                  ,                  "Expanse"                  :                  1910.93                  ,                  "Gdp"                  :                  1015.54                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1945-08-17"                  },                  "BRA"                  :{                  "COUNTRY"                  :                  "Brazil"                  ,                  "POP"                  :                  210.32                  ,                  "AREA"                  :                  8515.77                  ,                  "GDP"                  :                  2055.51                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1822-09-07"                  },                  "PAK"                  :{                  "COUNTRY"                  :                  "Pakistan"                  ,                  "Pop"                  :                  205.71                  ,                  "Expanse"                  :                  881.91                  ,                  "Gross domestic product"                  :                  302.fourteen                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-14"                  },                  "NGA"                  :{                  "COUNTRY"                  :                  "Nigeria"                  ,                  "Pop"                  :                  200.96                  ,                  "AREA"                  :                  923.77                  ,                  "Gdp"                  :                  375.77                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1960-10-01"                  },                  "BGD"                  :{                  "COUNTRY"                  :                  "People's republic of bangladesh"                  ,                  "Pop"                  :                  167.09                  ,                  "Expanse"                  :                  147.57                  ,                  "Gross domestic product"                  :                  245.63                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1971-03-26"                  },                  "RUS"                  :{                  "Country"                  :                  "Russian federation"                  ,                  "Pop"                  :                  146.79                  ,                  "AREA"                  :                  17098.25                  ,                  "Gdp"                  :                  1530.75                  ,                  "CONT"                  :                  nada                  ,                  "IND_DAY"                  :                  "1992-06-12"                  },                  "MEX"                  :{                  "Country"                  :                  "Mexico"                  ,                  "POP"                  :                  126.58                  ,                  "Expanse"                  :                  1964.38                  ,                  "GDP"                  :                  1158.23                  ,                  "CONT"                  :                  "North.America"                  ,                  "IND_DAY"                  :                  "1810-09-16"                  },                  "JPN"                  :{                  "COUNTRY"                  :                  "Japan"                  ,                  "POP"                  :                  126.22                  ,                  "Surface area"                  :                  377.97                  ,                  "Gross domestic product"                  :                  4872.42                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  nothing                  },                  "DEU"                  :{                  "COUNTRY"                  :                  "Frg"                  ,                  "Popular"                  :                  83.02                  ,                  "AREA"                  :                  357.11                  ,                  "GDP"                  :                  3693.2                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "FRA"                  :{                  "COUNTRY"                  :                  "France"                  ,                  "POP"                  :                  67.02                  ,                  "Expanse"                  :                  640.68                  ,                  "Gdp"                  :                  2582.49                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  "1789-07-fourteen"                  },                  "GBR"                  :{                  "Land"                  :                  "Britain"                  ,                  "Pop"                  :                  66.44                  ,                  "AREA"                  :                  242.5                  ,                  "GDP"                  :                  2631.23                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "ITA"                  :{                  "COUNTRY"                  :                  "Italia"                  ,                  "POP"                  :                  60.36                  ,                  "Area"                  :                  301.34                  ,                  "Gdp"                  :                  1943.84                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },                  "ARG"                  :{                  "Country"                  :                  "Argentine republic"                  ,                  "POP"                  :                  44.94                  ,                  "AREA"                  :                  2780.4                  ,                  "GDP"                  :                  637.49                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1816-07-09"                  },                  "DZA"                  :{                  "COUNTRY"                  :                  "Algeria"                  ,                  "POP"                  :                  43.38                  ,                  "AREA"                  :                  2381.74                  ,                  "Gdp"                  :                  167.56                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1962-07-05"                  },                  "CAN"                  :{                  "Country"                  :                  "Canada"                  ,                  "POP"                  :                  37.59                  ,                  "AREA"                  :                  9984.67                  ,                  "GDP"                  :                  1647.12                  ,                  "CONT"                  :                  "Northward.America"                  ,                  "IND_DAY"                  :                  "1867-07-01"                  },                  "AUS"                  :{                  "Country"                  :                  "Australia"                  ,                  "Popular"                  :                  25.47                  ,                  "Expanse"                  :                  7692.02                  ,                  "Gdp"                  :                  1408.68                  ,                  "CONT"                  :                  "Oceania"                  ,                  "IND_DAY"                  :                  nada                  },                  "KAZ"                  :{                  "COUNTRY"                  :                  "Kazakhstan"                  ,                  "POP"                  :                  18.53                  ,                  "AREA"                  :                  2724.nine                  ,                  "Gross domestic product"                  :                  159.41                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1991-12-16"                  }}                                                        data-alphabetize.json              besides has ane large dictionary, only this time the row labels are the keys, and the inner dictionaries are the values.
At that place are few more options for              orient. Ane of them is              'records':
>>>
                                                  >>>                                    df                  .                  to_json                  (                  'data-records.json'                  ,                  orient                  =                  'records'                  )                                          This code should yield the file              data-records.json. You tin expand the code block below to see the content:
                                                  [{                  "Country"                  :                  "Prc"                  ,                  "Pop"                  :                  1398.72                  ,                  "AREA"                  :                  9596.96                  ,                  "GDP"                  :                  12234.78                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  cypher                  },{                  "State"                  :                  "India"                  ,                  "POP"                  :                  1351.16                  ,                  "AREA"                  :                  3287.26                  ,                  "Gross domestic product"                  :                  2575.67                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-15"                  },{                  "Land"                  :                  "Us"                  ,                  "POP"                  :                  329.74                  ,                  "Expanse"                  :                  9833.52                  ,                  "Gross domestic product"                  :                  19485.39                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1776-07-04"                  },{                  "State"                  :                  "Indonesia"                  ,                  "Pop"                  :                  268.07                  ,                  "Expanse"                  :                  1910.93                  ,                  "Gross domestic product"                  :                  1015.54                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1945-08-17"                  },{                  "State"                  :                  "Brazil"                  ,                  "Pop"                  :                  210.32                  ,                  "AREA"                  :                  8515.77                  ,                  "GDP"                  :                  2055.51                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1822-09-07"                  },{                  "COUNTRY"                  :                  "Pakistan"                  ,                  "POP"                  :                  205.71                  ,                  "Area"                  :                  881.91                  ,                  "Gdp"                  :                  302.14                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1947-08-fourteen"                  },{                  "State"                  :                  "Nigeria"                  ,                  "Popular"                  :                  200.96                  ,                  "Area"                  :                  923.77                  ,                  "Gdp"                  :                  375.77                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1960-10-01"                  },{                  "Country"                  :                  "Bangladesh"                  ,                  "POP"                  :                  167.09                  ,                  "AREA"                  :                  147.57                  ,                  "Gross domestic product"                  :                  245.63                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1971-03-26"                  },{                  "COUNTRY"                  :                  "Russian federation"                  ,                  "Pop"                  :                  146.79                  ,                  "Expanse"                  :                  17098.25                  ,                  "GDP"                  :                  1530.75                  ,                  "CONT"                  :                  zippo                  ,                  "IND_DAY"                  :                  "1992-06-12"                  },{                  "State"                  :                  "Mexico"                  ,                  "Popular"                  :                  126.58                  ,                  "Surface area"                  :                  1964.38                  ,                  "GDP"                  :                  1158.23                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1810-09-xvi"                  },{                  "COUNTRY"                  :                  "Japan"                  ,                  "Pop"                  :                  126.22                  ,                  "Expanse"                  :                  377.97                  ,                  "GDP"                  :                  4872.42                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  null                  },{                  "State"                  :                  "Federal republic of germany"                  ,                  "Popular"                  :                  83.02                  ,                  "Expanse"                  :                  357.11                  ,                  "GDP"                  :                  3693.2                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  cypher                  },{                  "Country"                  :                  "France"                  ,                  "POP"                  :                  67.02                  ,                  "AREA"                  :                  640.68                  ,                  "Gdp"                  :                  2582.49                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  "1789-07-14"                  },{                  "COUNTRY"                  :                  "UK"                  ,                  "Popular"                  :                  66.44                  ,                  "AREA"                  :                  242.5                  ,                  "GDP"                  :                  2631.23                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  null                  },{                  "Country"                  :                  "Italy"                  ,                  "Pop"                  :                  60.36                  ,                  "Expanse"                  :                  301.34                  ,                  "GDP"                  :                  1943.84                  ,                  "CONT"                  :                  "Europe"                  ,                  "IND_DAY"                  :                  zip                  },{                  "Land"                  :                  "Argentina"                  ,                  "POP"                  :                  44.94                  ,                  "Area"                  :                  2780.four                  ,                  "Gross domestic product"                  :                  637.49                  ,                  "CONT"                  :                  "S.America"                  ,                  "IND_DAY"                  :                  "1816-07-09"                  },{                  "COUNTRY"                  :                  "Algeria"                  ,                  "POP"                  :                  43.38                  ,                  "AREA"                  :                  2381.74                  ,                  "Gross domestic product"                  :                  167.56                  ,                  "CONT"                  :                  "Africa"                  ,                  "IND_DAY"                  :                  "1962-07-05"                  },{                  "COUNTRY"                  :                  "Canada"                  ,                  "Popular"                  :                  37.59                  ,                  "Surface area"                  :                  9984.67                  ,                  "Gross domestic product"                  :                  1647.12                  ,                  "CONT"                  :                  "N.America"                  ,                  "IND_DAY"                  :                  "1867-07-01"                  },{                  "State"                  :                  "Commonwealth of australia"                  ,                  "Popular"                  :                  25.47                  ,                  "AREA"                  :                  7692.02                  ,                  "GDP"                  :                  1408.68                  ,                  "CONT"                  :                  "Oceania"                  ,                  "IND_DAY"                  :                  cypher                  },{                  "COUNTRY"                  :                  "Kazakhstan"                  ,                  "POP"                  :                  xviii.53                  ,                  "Area"                  :                  2724.9                  ,                  "GDP"                  :                  159.41                  ,                  "CONT"                  :                  "Asia"                  ,                  "IND_DAY"                  :                  "1991-12-16"                  }]                                                        data-records.json              holds a list with i dictionary for each row. The row labels              are non              written.
Y'all can get some other interesting file structure with              orient='split up':
>>>
                                                  >>>                                    df                  .                  to_json                  (                  'data-split.json'                  ,                  orient                  =                  'carve up'                  )                                          The resulting file is              data-split.json. You can expand the code block beneath to come across how this file should look:
                                                  {                  "columns"                  :[                  "Land"                  ,                  "Pop"                  ,                  "AREA"                  ,                  "Gdp"                  ,                  "CONT"                  ,                  "IND_DAY"                  ],                  "index"                  :[                  "CHN"                  ,                  "IND"                  ,                  "USA"                  ,                  "IDN"                  ,                  "BRA"                  ,                  "PAK"                  ,                  "NGA"                  ,                  "BGD"                  ,                  "RUS"                  ,                  "MEX"                  ,                  "JPN"                  ,                  "DEU"                  ,                  "FRA"                  ,                  "GBR"                  ,                  "ITA"                  ,                  "ARG"                  ,                  "DZA"                  ,                  "Tin can"                  ,                  "AUS"                  ,                  "KAZ"                  ],                  "data"                  :[[                  "Prc"                  ,                  1398.72                  ,                  9596.96                  ,                  12234.78                  ,                  "Asia"                  ,                  null                  ],[                  "India"                  ,                  1351.16                  ,                  3287.26                  ,                  2575.67                  ,                  "Asia"                  ,                  "1947-08-15"                  ],[                  "US"                  ,                  329.74                  ,                  9833.52                  ,                  19485.39                  ,                  "N.America"                  ,                  "1776-07-04"                  ],[                  "Indonesia"                  ,                  268.07                  ,                  1910.93                  ,                  1015.54                  ,                  "Asia"                  ,                  "1945-08-17"                  ],[                  "Brazil"                  ,                  210.32                  ,                  8515.77                  ,                  2055.51                  ,                  "S.America"                  ,                  "1822-09-07"                  ],[                  "Islamic republic of pakistan"                  ,                  205.71                  ,                  881.91                  ,                  302.xiv                  ,                  "Asia"                  ,                  "1947-08-14"                  ],[                  "Nigeria"                  ,                  200.96                  ,                  923.77                  ,                  375.77                  ,                  "Africa"                  ,                  "1960-x-01"                  ],[                  "Bangladesh"                  ,                  167.09                  ,                  147.57                  ,                  245.63                  ,                  "Asia"                  ,                  "1971-03-26"                  ],[                  "Russia"                  ,                  146.79                  ,                  17098.25                  ,                  1530.75                  ,                  null                  ,                  "1992-06-12"                  ],[                  "Mexico"                  ,                  126.58                  ,                  1964.38                  ,                  1158.23                  ,                  "Due north.America"                  ,                  "1810-09-xvi"                  ],[                  "Nihon"                  ,                  126.22                  ,                  377.97                  ,                  4872.42                  ,                  "Asia"                  ,                  zero                  ],[                  "Germany"                  ,                  83.02                  ,                  357.11                  ,                  3693.2                  ,                  "Europe"                  ,                  aught                  ],[                  "French republic"                  ,                  67.02                  ,                  640.68                  ,                  2582.49                  ,                  "Europe"                  ,                  "1789-07-14"                  ],[                  "UK"                  ,                  66.44                  ,                  242.5                  ,                  2631.23                  ,                  "Europe"                  ,                  null                  ],[                  "Italian republic"                  ,                  60.36                  ,                  301.34                  ,                  1943.84                  ,                  "Europe"                  ,                  null                  ],[                  "Argentina"                  ,                  44.94                  ,                  2780.4                  ,                  637.49                  ,                  "S.America"                  ,                  "1816-07-09"                  ],[                  "People's democratic republic of algeria"                  ,                  43.38                  ,                  2381.74                  ,                  167.56                  ,                  "Africa"                  ,                  "1962-07-05"                  ],[                  "Canada"                  ,                  37.59                  ,                  9984.67                  ,                  1647.12                  ,                  "N.America"                  ,                  "1867-07-01"                  ],[                  "Australia"                  ,                  25.47                  ,                  7692.02                  ,                  1408.68                  ,                  "Oceania"                  ,                  null                  ],[                  "Kazakhstan"                  ,                  eighteen.53                  ,                  2724.9                  ,                  159.41                  ,                  "Asia"                  ,                  "1991-12-16"                  ]]}                                                        data-carve up.json              contains one dictionary that holds the following lists:
- The names of the columns
 - The labels of the rows
 - The inner lists (ii-dimensional sequence) that hold information values
 
If you don't provide the value for the optional parameter              path_or_buf              that defines the file path, and then              .to_json()              will return a JSON cord instead of writing the results to a file. This beliefs is consequent with              .to_csv().
At that place are other optional parameters you tin can use. For instance, yous tin set              index=False              to forgo saving row labels. You can manipulate precision with              double_precision, and dates with              date_format              and              date_unit. These concluding two parameters are particularly important when you have time series among your data:
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  information                  =                  data                  )                  .                  T                  >>>                                    df                  [                  'IND_DAY'                  ]                  =                  pd                  .                  to_datetime                  (                  df                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  State            object                  Popular                object                  AREA               object                  Gdp                object                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  .                  to_json                  (                  'data-time.json'                  )                                          In this instance, you lot've created the              DataFrame              from the dictionary              data              and used              to_datetime()              to convert the values in the last column to              datetime64. You tin expand the code block below to run across the resulting file:
                                                  {                  "COUNTRY"                  :{                  "CHN"                  :                  "Red china"                  ,                  "IND"                  :                  "Republic of india"                  ,                  "USA"                  :                  "US"                  ,                  "IDN"                  :                  "Republic of indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Islamic republic of pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russia"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Japan"                  ,                  "DEU"                  :                  "Germany"                  ,                  "FRA"                  :                  "France"                  ,                  "GBR"                  :                  "Uk"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "Algeria"                  ,                  "CAN"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "POP"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.16                  ,                  "USA"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  threescore.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "Tin"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  18.53                  },                  "AREA"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "United states of america"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.xi                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.5                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.four                  ,                  "DZA"                  :                  2381.74                  ,                  "Tin can"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.9                  },                  "Gdp"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "Usa"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.14                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "Can"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "South.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  naught                  ,                  "MEX"                  :                  "N.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "CAN"                  :                  "N.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  null                  ,                  "IND"                  :                  -706320000000                  ,                  "United states of america"                  :                  -6106060800000                  ,                  "IDN"                  :                  -769219200000                  ,                  "BRA"                  :                  -4648924800000                  ,                  "PAK"                  :                  -706406400000                  ,                  "NGA"                  :                  -291945600000                  ,                  "BGD"                  :                  38793600000                  ,                  "RUS"                  :                  708307200000                  ,                  "MEX"                  :                  -5026838400000                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  null                  ,                  "FRA"                  :                  -5694969600000                  ,                  "GBR"                  :                  null                  ,                  "ITA"                  :                  aught                  ,                  "ARG"                  :                  -4843411200000                  ,                  "DZA"                  :                  -236476800000                  ,                  "Tin can"                  :                  -3234729600000                  ,                  "AUS"                  :                  goose egg                  ,                  "KAZ"                  :                  692841600000                  }}                                          In this file, you have large integers instead of dates for the independence days. That's because the default value of the optional parameter              date_format              is              'epoch'              whenever              orient              isn't              'tabular array'. This default behavior expresses dates every bit an epoch in milliseconds relative to midnight on January ane, 1970.
However, if you pass              date_format='iso', and so you'll get the dates in the ISO 8601 format. In addition,              date_unit              decides the units of fourth dimension:
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  >>>                                    df                  [                  'IND_DAY'                  ]                  =                  pd                  .                  to_datetime                  (                  df                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  to_json                  (                  'new-data-time.json'                  ,                  date_format                  =                  'iso'                  ,                  date_unit                  =                  'due south'                  )                                          This code produces the following JSON file:
                                                  {                  "COUNTRY"                  :{                  "CHN"                  :                  "China"                  ,                  "IND"                  :                  "India"                  ,                  "USA"                  :                  "Us"                  ,                  "IDN"                  :                  "Indonesia"                  ,                  "BRA"                  :                  "Brazil"                  ,                  "PAK"                  :                  "Islamic republic of pakistan"                  ,                  "NGA"                  :                  "Nigeria"                  ,                  "BGD"                  :                  "Bangladesh"                  ,                  "RUS"                  :                  "Russia"                  ,                  "MEX"                  :                  "Mexico"                  ,                  "JPN"                  :                  "Nihon"                  ,                  "DEU"                  :                  "Germany"                  ,                  "FRA"                  :                  "France"                  ,                  "GBR"                  :                  "UK"                  ,                  "ITA"                  :                  "Italy"                  ,                  "ARG"                  :                  "Argentina"                  ,                  "DZA"                  :                  "Algeria"                  ,                  "CAN"                  :                  "Canada"                  ,                  "AUS"                  :                  "Australia"                  ,                  "KAZ"                  :                  "Kazakhstan"                  },                  "POP"                  :{                  "CHN"                  :                  1398.72                  ,                  "IND"                  :                  1351.xvi                  ,                  "USA"                  :                  329.74                  ,                  "IDN"                  :                  268.07                  ,                  "BRA"                  :                  210.32                  ,                  "PAK"                  :                  205.71                  ,                  "NGA"                  :                  200.96                  ,                  "BGD"                  :                  167.09                  ,                  "RUS"                  :                  146.79                  ,                  "MEX"                  :                  126.58                  ,                  "JPN"                  :                  126.22                  ,                  "DEU"                  :                  83.02                  ,                  "FRA"                  :                  67.02                  ,                  "GBR"                  :                  66.44                  ,                  "ITA"                  :                  threescore.36                  ,                  "ARG"                  :                  44.94                  ,                  "DZA"                  :                  43.38                  ,                  "Can"                  :                  37.59                  ,                  "AUS"                  :                  25.47                  ,                  "KAZ"                  :                  xviii.53                  },                  "Expanse"                  :{                  "CHN"                  :                  9596.96                  ,                  "IND"                  :                  3287.26                  ,                  "Us"                  :                  9833.52                  ,                  "IDN"                  :                  1910.93                  ,                  "BRA"                  :                  8515.77                  ,                  "PAK"                  :                  881.91                  ,                  "NGA"                  :                  923.77                  ,                  "BGD"                  :                  147.57                  ,                  "RUS"                  :                  17098.25                  ,                  "MEX"                  :                  1964.38                  ,                  "JPN"                  :                  377.97                  ,                  "DEU"                  :                  357.11                  ,                  "FRA"                  :                  640.68                  ,                  "GBR"                  :                  242.five                  ,                  "ITA"                  :                  301.34                  ,                  "ARG"                  :                  2780.four                  ,                  "DZA"                  :                  2381.74                  ,                  "CAN"                  :                  9984.67                  ,                  "AUS"                  :                  7692.02                  ,                  "KAZ"                  :                  2724.9                  },                  "Gdp"                  :{                  "CHN"                  :                  12234.78                  ,                  "IND"                  :                  2575.67                  ,                  "United states of america"                  :                  19485.39                  ,                  "IDN"                  :                  1015.54                  ,                  "BRA"                  :                  2055.51                  ,                  "PAK"                  :                  302.fourteen                  ,                  "NGA"                  :                  375.77                  ,                  "BGD"                  :                  245.63                  ,                  "RUS"                  :                  1530.75                  ,                  "MEX"                  :                  1158.23                  ,                  "JPN"                  :                  4872.42                  ,                  "DEU"                  :                  3693.2                  ,                  "FRA"                  :                  2582.49                  ,                  "GBR"                  :                  2631.23                  ,                  "ITA"                  :                  1943.84                  ,                  "ARG"                  :                  637.49                  ,                  "DZA"                  :                  167.56                  ,                  "CAN"                  :                  1647.12                  ,                  "AUS"                  :                  1408.68                  ,                  "KAZ"                  :                  159.41                  },                  "CONT"                  :{                  "CHN"                  :                  "Asia"                  ,                  "IND"                  :                  "Asia"                  ,                  "USA"                  :                  "N.America"                  ,                  "IDN"                  :                  "Asia"                  ,                  "BRA"                  :                  "S.America"                  ,                  "PAK"                  :                  "Asia"                  ,                  "NGA"                  :                  "Africa"                  ,                  "BGD"                  :                  "Asia"                  ,                  "RUS"                  :                  zilch                  ,                  "MEX"                  :                  "Due north.America"                  ,                  "JPN"                  :                  "Asia"                  ,                  "DEU"                  :                  "Europe"                  ,                  "FRA"                  :                  "Europe"                  ,                  "GBR"                  :                  "Europe"                  ,                  "ITA"                  :                  "Europe"                  ,                  "ARG"                  :                  "S.America"                  ,                  "DZA"                  :                  "Africa"                  ,                  "Tin can"                  :                  "North.America"                  ,                  "AUS"                  :                  "Oceania"                  ,                  "KAZ"                  :                  "Asia"                  },                  "IND_DAY"                  :{                  "CHN"                  :                  null                  ,                  "IND"                  :                  "1947-08-15T00:00:00Z"                  ,                  "USA"                  :                  "1776-07-04T00:00:00Z"                  ,                  "IDN"                  :                  "1945-08-17T00:00:00Z"                  ,                  "BRA"                  :                  "1822-09-07T00:00:00Z"                  ,                  "PAK"                  :                  "1947-08-14T00:00:00Z"                  ,                  "NGA"                  :                  "1960-x-01T00:00:00Z"                  ,                  "BGD"                  :                  "1971-03-26T00:00:00Z"                  ,                  "RUS"                  :                  "1992-06-12T00:00:00Z"                  ,                  "MEX"                  :                  "1810-09-16T00:00:00Z"                  ,                  "JPN"                  :                  null                  ,                  "DEU"                  :                  nothing                  ,                  "FRA"                  :                  "1789-07-14T00:00:00Z"                  ,                  "GBR"                  :                  null                  ,                  "ITA"                  :                  null                  ,                  "ARG"                  :                  "1816-07-09T00:00:00Z"                  ,                  "DZA"                  :                  "1962-07-05T00:00:00Z"                  ,                  "Tin can"                  :                  "1867-07-01T00:00:00Z"                  ,                  "AUS"                  :                  nil                  ,                  "KAZ"                  :                  "1991-12-16T00:00:00Z"                  }}                                          The dates in the resulting file are in the ISO 8601 format.
Y'all can load the data from a JSON file with              read_json():
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_json                  (                  'data-alphabetize.json'                  ,                  orient                  =                  'index'                  ,                  ...                                    convert_dates                  =                  [                  'IND_DAY'                  ])                                          The parameter              convert_dates              has a similar purpose as              parse_dates              when y'all utilize information technology to read CSV files. The optional parameter              orient              is very of import considering it specifies how Pandas understands the construction of the file.
There are other optional parameters you tin can use also:
-                 Set the encoding                with                
encoding. -                 Manipulate dates                with                
convert_datesandkeep_default_dates. -                 Bear upon precision                with                
dtypeandprecise_float. -                 Decode numeric information                directly to NumPy arrays with                
numpy=True. 
Note that y'all might lose the club of rows and columns when using the JSON format to store your data.
HTML Files
An HTML is a plaintext file that uses hypertext markup linguistic communication to help browsers render web pages. The extensions for HTML files are              .html              and              .htm. You'll need to install an HTML parser library like lxml or html5lib to be able to piece of work with HTML files:
                                                  $pip install lxml html5lib                                          You can also utilize Conda to install the same packages:
                                                  $                  conda install lxml html5lib                                          Once you have these libraries, you can salvage the contents of your              DataFrame              as an HTML file with              .to_html():
>>>
                                                  df = pd.DataFrame(data=data).T                  df.to_html('information.html')                                          This code generates a file              data.html. You lot tin can expand the code block below to see how this file should look:
                                                  <                  table                  border                  =                  "1"                  class                  =                  "dataframe"                  >                  <                  thead                  >                  <                  tr                  style                  =                  "text-align: correct;"                  >                  <                  th                  ></                  th                  >                  <                  th                  >COUNTRY</                  th                  >                  <                  thursday                  >POP</                  th                  >                  <                  th                  >AREA</                  th                  >                  <                  thursday                  >GDP</                  th                  >                  <                  th                  >CONT</                  th                  >                  <                  thursday                  >IND_DAY</                  th                  >                  </                  tr                  >                  </                  thead                  >                  <                  tbody                  >                  <                  tr                  >                  <                  th                  >CHN</                  th                  >                  <                  td                  >Red china</                  td                  >                  <                  td                  >1398.72</                  td                  >                  <                  td                  >9596.96</                  td                  >                  <                  td                  >12234.eight</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >IND</                  thursday                  >                  <                  td                  >Bharat</                  td                  >                  <                  td                  >1351.sixteen</                  td                  >                  <                  td                  >3287.26</                  td                  >                  <                  td                  >2575.67</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1947-08-15</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >USA</                  thursday                  >                  <                  td                  >United states of america</                  td                  >                  <                  td                  >329.74</                  td                  >                  <                  td                  >9833.52</                  td                  >                  <                  td                  >19485.4</                  td                  >                  <                  td                  >N.America</                  td                  >                  <                  td                  >1776-07-04</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >IDN</                  th                  >                  <                  td                  >Indonesia</                  td                  >                  <                  td                  >268.07</                  td                  >                  <                  td                  >1910.93</                  td                  >                  <                  td                  >1015.54</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1945-08-17</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >BRA</                  thursday                  >                  <                  td                  >Brazil</                  td                  >                  <                  td                  >210.32</                  td                  >                  <                  td                  >8515.77</                  td                  >                  <                  td                  >2055.51</                  td                  >                  <                  td                  >S.America</                  td                  >                  <                  td                  >1822-09-07</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >PAK</                  th                  >                  <                  td                  >Pakistan</                  td                  >                  <                  td                  >205.71</                  td                  >                  <                  td                  >881.91</                  td                  >                  <                  td                  >302.xiv</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1947-08-14</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >NGA</                  th                  >                  <                  td                  >Nigeria</                  td                  >                  <                  td                  >200.96</                  td                  >                  <                  td                  >923.77</                  td                  >                  <                  td                  >375.77</                  td                  >                  <                  td                  >Africa</                  td                  >                  <                  td                  >1960-x-01</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >BGD</                  th                  >                  <                  td                  >Bangladesh</                  td                  >                  <                  td                  >167.09</                  td                  >                  <                  td                  >147.57</                  td                  >                  <                  td                  >245.63</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1971-03-26</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >RUS</                  th                  >                  <                  td                  >Russia</                  td                  >                  <                  td                  >146.79</                  td                  >                  <                  td                  >17098.2</                  td                  >                  <                  td                  >1530.75</                  td                  >                  <                  td                  >NaN</                  td                  >                  <                  td                  >1992-06-12</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >MEX</                  th                  >                  <                  td                  >Mexico</                  td                  >                  <                  td                  >126.58</                  td                  >                  <                  td                  >1964.38</                  td                  >                  <                  td                  >1158.23</                  td                  >                  <                  td                  >Northward.America</                  td                  >                  <                  td                  >1810-09-xvi</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >JPN</                  thursday                  >                  <                  td                  >Japan</                  td                  >                  <                  td                  >126.22</                  td                  >                  <                  td                  >377.97</                  td                  >                  <                  td                  >4872.42</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >DEU</                  th                  >                  <                  td                  >Germany</                  td                  >                  <                  td                  >83.02</                  td                  >                  <                  td                  >357.11</                  td                  >                  <                  td                  >3693.ii</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >FRA</                  th                  >                  <                  td                  >France</                  td                  >                  <                  td                  >67.02</                  td                  >                  <                  td                  >640.68</                  td                  >                  <                  td                  >2582.49</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >1789-07-fourteen</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >GBR</                  th                  >                  <                  td                  >UK</                  td                  >                  <                  td                  >66.44</                  td                  >                  <                  td                  >242.v</                  td                  >                  <                  td                  >2631.23</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >ITA</                  th                  >                  <                  td                  >Italy</                  td                  >                  <                  td                  >60.36</                  td                  >                  <                  td                  >301.34</                  td                  >                  <                  td                  >1943.84</                  td                  >                  <                  td                  >Europe</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  thursday                  >ARG</                  th                  >                  <                  td                  >Argentina</                  td                  >                  <                  td                  >44.94</                  td                  >                  <                  td                  >2780.iv</                  td                  >                  <                  td                  >637.49</                  td                  >                  <                  td                  >S.America</                  td                  >                  <                  td                  >1816-07-09</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >DZA</                  th                  >                  <                  td                  >People's democratic republic of algeria</                  td                  >                  <                  td                  >43.38</                  td                  >                  <                  td                  >2381.74</                  td                  >                  <                  td                  >167.56</                  td                  >                  <                  td                  >Africa</                  td                  >                  <                  td                  >1962-07-05</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >Tin</                  thursday                  >                  <                  td                  >Canada</                  td                  >                  <                  td                  >37.59</                  td                  >                  <                  td                  >9984.67</                  td                  >                  <                  td                  >1647.12</                  td                  >                  <                  td                  >N.America</                  td                  >                  <                  td                  >1867-07-01</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >AUS</                  th                  >                  <                  td                  >Commonwealth of australia</                  td                  >                  <                  td                  >25.47</                  td                  >                  <                  td                  >7692.02</                  td                  >                  <                  td                  >1408.68</                  td                  >                  <                  td                  >Oceania</                  td                  >                  <                  td                  >NaN</                  td                  >                  </                  tr                  >                  <                  tr                  >                  <                  th                  >KAZ</                  th                  >                  <                  td                  >Kazakhstan</                  td                  >                  <                  td                  >xviii.53</                  td                  >                  <                  td                  >2724.ix</                  td                  >                  <                  td                  >159.41</                  td                  >                  <                  td                  >Asia</                  td                  >                  <                  td                  >1991-12-xvi</                  td                  >                  </                  tr                  >                  </                  tbody                  >                  </                  tabular array                  >                                          This file shows the              DataFrame              contents nicely. Even so, notice that yous haven't obtained an entire spider web page. You've but output the information that corresponds to              df              in the HTML format.
              .to_html()              won't create a file if y'all don't provide the optional parameter              buf, which denotes the buffer to write to. If you go out this parameter out, and so your code will render a string as information technology did with              .to_csv()              and              .to_json().
Here are some other optional parameters:
-                                   
headerdetermines whether to save the column names. -                                   
indexdetermines whether to relieve the row labels. -                                   
classesassigns cascading style sheet (CSS) classes. -                                   
render_linksspecifies whether to convert URLs to HTML links. -                                   
table_idassigns the CSSidto thetabletag. -                                   
escapedecides whether to catechumen the characters<,>, and&to HTML-safe strings. 
You employ parameters like these to specify different aspects of the resulting files or strings.
You can create a              DataFrame              object from a suitable HTML file using              read_html(), which will render a              DataFrame              example or a list of them:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_html                  (                  'data.html'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                                          This is very similar to what yous did when reading CSV files. You also have parameters that help you work with dates, missing values, precision, encoding, HTML parsers, and more.
Excel Files
You've already learned how to read and write Excel files with Pandas. However, there are a few more options worth considering. For 1, when you use              .to_excel(), y'all tin specify the proper noun of the target worksheet with the optional parameter              sheet_name:
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  information                  )                  .                  T                  >>>                                    df                  .                  to_excel                  (                  'data.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  )                                          Here, you create a file              information.xlsx              with a worksheet chosen              COUNTRIES              that stores the data. The string              'information.xlsx'              is the argument for the parameter              excel_writer              that defines the proper noun of the Excel file or its path.
The optional parameters              startrow              and              startcol              both default to              0              and betoken the upper left-most prison cell where the data should start being written:
>>>
                                                  >>>                                    df                  .                  to_excel                  (                  'data-shifted.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  ,                  ...                                    startrow                  =                  2                  ,                  startcol                  =                  4                  )                                          Here, you specify that the tabular array should showtime in the third row and the fifth column. Yous also used nothing-based indexing, so the third row is denoted past              2              and the fifth cavalcade by              4.
At present the resulting worksheet looks similar this:
            Equally y'all tin can see, the table starts in the tertiary row              two              and the fifth column              E.
              .read_excel()              also has the optional parameter              sheet_name              that specifies which worksheets to read when loading information. It can take on i of the post-obit values:
- The zero-based index of the worksheet
 - The name of the worksheet
 - The list of indices or names to read multiple sheets
 -                 The value                  
Noneto read all sheets 
Hither'southward how you lot would use this parameter in your code:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'information.xlsx'                  ,                  sheet_name                  =                  0                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  =                  pd                  .                  read_excel                  (                  'information.xlsx'                  ,                  sheet_name                  =                  'COUNTRIES'                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                                          Both statements above create the aforementioned              DataFrame              considering the              sheet_name              parameters have the aforementioned values. In both cases,              sheet_name=0              and              sheet_name='COUNTRIES'              refer to the same worksheet. The argument              parse_dates=['IND_DAY']              tells Pandas to endeavor to consider the values in this column equally dates or times.
There are other optional parameters you can employ with              .read_excel()              and              .to_excel()              to determine the Excel engine, the encoding, the style to handle missing values and infinities, the method for writing cavalcade names and row labels, and so on.
SQL Files
Pandas IO tools can also read and write databases. In this side by side example, you'll write your information to a database called              information.db. To get started, you'll need the SQLAlchemy package. To larn more most information technology, you lot can read the official ORM tutorial. Y'all'll also need the database driver. Python has a built-in commuter for SQLite.            
Y'all can install SQLAlchemy with pip:
You tin also install it with Conda:
                                                  $                  conda install sqlalchemy                                          Once y'all have SQLAlchemy installed, import              create_engine()              and create a database engine:
>>>
                                                  >>>                                    from                  sqlalchemy                  import                  create_engine                  >>>                                    engine                  =                  create_engine                  (                  'sqlite:///data.db'                  ,                  echo                  =                  Imitation                  )                                          Now that you have everything gear up upwardly, the next pace is to create a              DataFrame              object. It's convenient to specify the information types and apply              .to_sql().
>>>
                                                  >>>                                    dtypes                  =                  {                  'Pop'                  :                  'float64'                  ,                  'AREA'                  :                  'float64'                  ,                  'GDP'                  :                  'float64'                  ,                  ...                                    'IND_DAY'                  :                  'datetime64'                  }                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  .                  astype                  (                  dtype                  =                  dtypes                  )                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  Popular               float64                  AREA              float64                  Gdp               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                                                        .astype()              is a very convenient method y'all can apply to prepare multiple data types at once.
One time you've created your              DataFrame, you can save it to the database with              .to_sql():
>>>
                                                  >>>                                    df                  .                  to_sql                  (                  'information.db'                  ,                  con                  =                  engine                  ,                  index_label                  =                  'ID'                  )                                          The parameter              con              is used to specify the database connection or engine that you want to employ. The optional parameter              index_label              specifies how to call the database column with the row labels. You lot'll frequently encounter it take on the value              ID,              Id, or              id.
You should go the database              data.db              with a single table that looks like this:
            The get-go column contains the row labels. To omit writing them into the database, pass              alphabetize=False              to              .to_sql(). The other columns represent to the columns of the              DataFrame.
In that location are a few more optional parameters. For example, you can use              schema              to specify the database schema and              dtype              to make up one's mind the types of the database columns. You can also utilise              if_exists, which says what to do if a database with the same name and path already exists:
-                                   
if_exists='fail'raises a ValueError and is the default. -                                   
if_exists='replace'drops the tabular array and inserts new values. -                                   
if_exists='append'inserts new values into the table. 
You lot tin can load the data from the database with              read_sql():
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_sql                  (                  'information.db'                  ,                  con                  =                  engine                  ,                  index_col                  =                  'ID'                  )                  >>>                                    df                                      COUNTRY      Pop      AREA       Gdp       CONT    IND_DAY                  ID                  CHN       China  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.sixteen   3287.26   2575.67       Asia 1947-08-15                  United states          US   329.74   9833.52  19485.39  North.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  Southward.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-fourteen                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75       None 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.eleven   3693.twenty     Europe        NaT                  FRA      French republic    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          United kingdom    66.44    242.50   2631.23     Europe        NaT                  ITA       Italia    sixty.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  N.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia 1991-12-16                                          The parameter              index_col              specifies the proper noun of the column with the row labels. Notation that this inserts an extra row after the header that starts with              ID. You can fix this behavior with the following line of code:
>>>
                                                  >>>                                    df                  .                  alphabetize                  .                  name                  =                  None                  >>>                                    df                                      State      POP      AREA       GDP       CONT    IND_DAY                  CHN       Red china  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.sixteen   3287.26   2575.67       Asia 1947-08-xv                  United states of america          US   329.74   9833.52  19485.39  N.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-xiv                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russian federation   146.79  17098.25   1530.75       None 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.xi   3693.20     Europe        NaT                  FRA      French republic    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          UK    66.44    242.50   2631.23     Europe        NaT                  ITA       Italian republic    60.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  Due south.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Northward.America 1867-07-01                  AUS   Australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    eighteen.53   2724.xc    159.41       Asia 1991-12-16                                          Now you accept the same              DataFrame              object every bit before.
Note that the continent for Russian federation is now              None              instead of              nan. If you want to fill the missing values with              nan, and so you can use              .fillna():
>>>
                                                  >>>                                    df                  .                  fillna                  (                  value                  =                  float                  (                  'nan'                  ),                  inplace                  =                  True                  )                                                        .fillna()              replaces all missing values with any you pass to              value. Hither, you passed              float('nan'), which says to fill all missing values with              nan.
Likewise annotation that you didn't have to pass              parse_dates=['IND_DAY']              to              read_sql(). That's because your database was able to observe that the last column contains dates. However, you tin pass              parse_dates              if you'd similar. Yous'll get the same results.
In that location are other functions that you can utilize to read databases, like              read_sql_table()              and              read_sql_query(). Feel costless to try them out!
Pickle Files
Pickling is the act of converting Python objects into byte streams. Unpickling is the changed process. Python pickle files are the binary files that keep the information and hierarchy of Python objects. They usually have the extension              .pickle              or              .pkl.
You can save your              DataFrame              in a pickle file with              .to_pickle():
>>>
                                                  >>>                                    dtypes                  =                  {                  'POP'                  :                  'float64'                  ,                  'Expanse'                  :                  'float64'                  ,                  'GDP'                  :                  'float64'                  ,                  ...                                    'IND_DAY'                  :                  'datetime64'                  }                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  data                  )                  .                  T                  .                  astype                  (                  dtype                  =                  dtypes                  )                  >>>                                    df                  .                  to_pickle                  (                  'data.pickle'                  )                                          Like you did with databases, it can be user-friendly first to specify the data types. Then, you create a file              data.pickle              to contain your data. Y'all could also laissez passer an integer value to the optional parameter              protocol, which specifies the protocol of the pickler.
Y'all can get the data from a pickle file with              read_pickle():
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_pickle                  (                  'information.pickle'                  )                  >>>                                    df                                      Country      POP      AREA       GDP       CONT    IND_DAY                  CHN       Prc  1398.72   9596.96  12234.78       Asia        NaT                  IND       Bharat  1351.16   3287.26   2575.67       Asia 1947-08-15                  USA          US   329.74   9833.52  19485.39  N.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  S.America 1822-09-07                  PAK    Pakistan   205.71    881.91    302.14       Asia 1947-08-fourteen                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  People's republic of bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-sixteen                  JPN       Japan   126.22    377.97   4872.42       Asia        NaT                  DEU     Germany    83.02    357.xi   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          UK    66.44    242.fifty   2631.23     Europe        NaT                  ITA       Italy    threescore.36    301.34   1943.84     Europe        NaT                  ARG   Argentine republic    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     Algeria    43.38   2381.74    167.56     Africa 1962-07-05                  CAN      Canada    37.59   9984.67   1647.12  Northward.America 1867-07-01                  AUS   Commonwealth of australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    18.53   2724.90    159.41       Asia 1991-12-16                                                        read_pickle()              returns the              DataFrame              with the stored data. You can too check the data types:
>>>
                                                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  POP               float64                  AREA              float64                  GDP               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                                          These are the aforementioned ones that you specified before using              .to_pickle().
As a word of caution, you should always beware of loading pickles from untrusted sources. This can be dangerous! When y'all unpickle an untrustworthy file, it could execute arbitrary lawmaking on your machine, gain remote access to your computer, or otherwise exploit your device in other ways.
Working With Big Data
If your files are also large for saving or processing, then there are several approaches you tin take to reduce the required disk space:
- Compress your files
 - Choose merely the columns you lot want
 - Omit the rows you don't demand
 - Force the use of less precise data types
 - Split the information into chunks
 
You'll take a look at each of these techniques in plough.
Compress and Decompress Files
You can create an archive file similar you would a regular one, with the addition of a suffix that corresponds to the desired compression type:
-                 
'.gz' -                 
'.bz2' -                 
'.zip' -                 
'.xz' 
Pandas can deduce the compression blazon by itself:
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  information                  =                  data                  )                  .                  T                  >>>                                    df                  .                  to_csv                  (                  'data.csv.zip'                  )                                          Hither, you create a compressed              .csv              file as an archive. The size of the regular              .csv              file is 1048 bytes, while the compressed file merely has 766 bytes.
You can open up this compressed file as usual with the Pandas              read_csv()              office:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'information.csv.zip'                  ,                  index_col                  =                  0                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                                      COUNTRY      Popular      AREA       Gdp       CONT    IND_DAY                  CHN       People's republic of china  1398.72   9596.96  12234.78       Asia        NaT                  IND       India  1351.16   3287.26   2575.67       Asia 1947-08-15                  USA          US   329.74   9833.52  19485.39  North.America 1776-07-04                  IDN   Indonesia   268.07   1910.93   1015.54       Asia 1945-08-17                  BRA      Brazil   210.32   8515.77   2055.51  South.America 1822-09-07                  PAK    Islamic republic of pakistan   205.71    881.91    302.14       Asia 1947-08-14                  NGA     Nigeria   200.96    923.77    375.77     Africa 1960-10-01                  BGD  Bangladesh   167.09    147.57    245.63       Asia 1971-03-26                  RUS      Russia   146.79  17098.25   1530.75        NaN 1992-06-12                  MEX      Mexico   126.58   1964.38   1158.23  N.America 1810-09-16                  JPN       Nihon   126.22    377.97   4872.42       Asia        NaT                  DEU     Deutschland    83.02    357.11   3693.20     Europe        NaT                  FRA      France    67.02    640.68   2582.49     Europe 1789-07-14                  GBR          U.k.    66.44    242.50   2631.23     Europe        NaT                  ITA       Italian republic    60.36    301.34   1943.84     Europe        NaT                  ARG   Argentina    44.94   2780.40    637.49  S.America 1816-07-09                  DZA     People's democratic republic of algeria    43.38   2381.74    167.56     Africa 1962-07-05                  Tin      Canada    37.59   9984.67   1647.12  North.America 1867-07-01                  AUS   Commonwealth of australia    25.47   7692.02   1408.68    Oceania        NaT                  KAZ  Kazakhstan    eighteen.53   2724.90    159.41       Asia 1991-12-16                                                        read_csv()              decompresses the file earlier reading it into a              DataFrame.
You lot tin can specify the type of compression with the optional parameter              compression, which can take on whatever of the following values:
-                 
'infer' -                 
'gzip' -                 
'bz2' -                 
'null' -                 
'xz' -                 
None 
The default value              compression='infer'              indicates that Pandas should deduce the pinch type from the file extension.
Hither'south how you would shrink a pickle file:
>>>
                                                  >>>                                    df                  =                  pd                  .                  DataFrame                  (                  data                  =                  information                  )                  .                  T                  >>>                                    df                  .                  to_pickle                  (                  'data.pickle.shrink'                  ,                  compression                  =                  'gzip'                  )                                          Y'all should get the file              data.pickle.shrink              that you tin can later decompress and read:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_pickle                  (                  'data.pickle.compress'                  ,                  compression                  =                  'gzip'                  )                                                        df              again corresponds to the              DataFrame              with the same data as before.
You can give the other compression methods a try, as well. If you're using pickle files, and then keep in mind that the              .zip              format supports reading only.
Choose Columns
The Pandas              read_csv()              and              read_excel()              functions accept the optional parameter              usecols              that y'all can apply to specify the columns you want to load from the file. You can laissez passer the list of cavalcade names as the corresponding argument:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  usecols                  =                  [                  'COUNTRY'                  ,                  'AREA'                  ])                  >>>                                    df                                      Country      AREA                  0        Red china   9596.96                  1        India   3287.26                  2           Us   9833.52                  3    Indonesia   1910.93                  4       Brazil   8515.77                  5     Pakistan    881.91                  6      Nigeria    923.77                  7   Bangladesh    147.57                  8       Russia  17098.25                  9       Mexico   1964.38                  10       Nippon    377.97                  xi     Germany    357.11                  12      French republic    640.68                  13          United kingdom of great britain and northern ireland    242.l                  14       Italy    301.34                  15   Argentina   2780.xl                  16     Algeria   2381.74                  17      Canada   9984.67                  xviii   Australia   7692.02                  19  Kazakhstan   2724.90                                          Now you accept a              DataFrame              that contains less data than earlier. Hither, in that location are but the names of the countries and their areas.
Instead of the column names, you lot can also pass their indices:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  usecols                  =                  [                  0                  ,                  1                  ,                  three                  ])                  >>>                                    df                                      COUNTRY      Surface area                  CHN       China   9596.96                  IND       India   3287.26                  USA          US   9833.52                  IDN   Indonesia   1910.93                  BRA      Brazil   8515.77                  PAK    Pakistan    881.91                  NGA     Nigeria    923.77                  BGD  Bangladesh    147.57                  RUS      Russian federation  17098.25                  MEX      Mexico   1964.38                  JPN       Japan    377.97                  DEU     Germany    357.11                  FRA      France    640.68                  GBR          UK    242.50                  ITA       Italia    301.34                  ARG   Argentina   2780.twoscore                  DZA     Algeria   2381.74                  Tin can      Canada   9984.67                  AUS   Australia   7692.02                  KAZ  Republic of kazakhstan   2724.ninety                                          Expand the code block below to compare these results with the file              'data.csv':
                                ,State,POP,Expanse,Gross domestic product,CONT,IND_DAY CHN,Communist china,1398.72,9596.96,12234.78,Asia, IND,Bharat,1351.sixteen,3287.26,2575.67,Asia,1947-08-15 The states,US,329.74,9833.52,19485.39,N.America,1776-07-04 IDN,Republic of indonesia,268.07,1910.93,1015.54,Asia,1945-08-17 BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07 PAK,Islamic republic of pakistan,205.71,881.91,302.14,Asia,1947-08-14 NGA,Nigeria,200.96,923.77,375.77,Africa,1960-ten-01 BGD,Bangladesh,167.09,147.57,245.63,Asia,1971-03-26 RUS,Russia,146.79,17098.25,1530.75,,1992-06-12 MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-sixteen JPN,Japan,126.22,377.97,4872.42,Asia, DEU,Germany,83.02,357.xi,3693.2,Europe, FRA,France,67.02,640.68,2582.49,Europe,1789-07-14 GBR,UK,66.44,242.v,2631.23,Europe, ITA,Italy,60.36,301.34,1943.84,Europe, ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09 DZA,People's democratic republic of algeria,43.38,2381.74,167.56,Africa,1962-07-05 CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01 AUS,Australia,25.47,7692.02,1408.68,Oceania, KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,1991-12-sixteen                                          You can see the following columns:
- The column at                index                  
0contains the row labels. - The column at                alphabetize                  
icontains the country names. - The column at                index                  
3contains the areas. 
Simlarly,              read_sql()              has the optional parameter              columns              that takes a list of column names to read:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_sql                  (                  'data.db'                  ,                  con                  =                  engine                  ,                  index_col                  =                  'ID'                  ,                  ...                                    columns                  =                  [                  'Country'                  ,                  'AREA'                  ])                  >>>                                    df                  .                  alphabetize                  .                  name                  =                  None                  >>>                                    df                                      Land      AREA                  CHN       China   9596.96                  IND       Bharat   3287.26                  United states          United states   9833.52                  IDN   Indonesia   1910.93                  BRA      Brazil   8515.77                  PAK    Pakistan    881.91                  NGA     Nigeria    923.77                  BGD  People's republic of bangladesh    147.57                  RUS      Russian federation  17098.25                  MEX      United mexican states   1964.38                  JPN       Japan    377.97                  DEU     Germany    357.11                  FRA      France    640.68                  GBR          UK    242.l                  ITA       Italy    301.34                  ARG   Argentine republic   2780.40                  DZA     People's democratic republic of algeria   2381.74                  CAN      Canada   9984.67                  AUS   Commonwealth of australia   7692.02                  KAZ  Republic of kazakhstan   2724.90                                          Once again, the              DataFrame              merely contains the columns with the names of the countries and areas. If              columns              is              None              or omitted, and then all of the columns will exist read, as you saw before. The default behavior is              columns=None.
Omit Rows
When you test an algorithm for data processing or machine learning, you frequently don't need the unabridged dataset. It's convenient to load only a subset of the information to speed up the procedure. The Pandas              read_csv()              and              read_excel()              functions have some optional parameters that allow you to select which rows you want to load:
-                                   
skiprows: either the number of rows to skip at the offset of the file if it's an integer, or the zero-based indices of the rows to skip if information technology'due south a list-like object -                                   
skipfooter: the number of rows to skip at the end of the file -                                   
nrows: the number of rows to read 
Hither's how you would skip rows with odd zero-based indices, keeping the fifty-fifty ones:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  skiprows                  =                  range                  (                  1                  ,                  xx                  ,                  2                  ))                  >>>                                    df                                      COUNTRY      POP     AREA      GDP       CONT     IND_DAY                  IND       Bharat  1351.sixteen  3287.26  2575.67       Asia  1947-08-15                  IDN   Indonesia   268.07  1910.93  1015.54       Asia  1945-08-17                  PAK    Islamic republic of pakistan   205.71   881.91   302.14       Asia  1947-08-fourteen                  BGD  People's republic of bangladesh   167.09   147.57   245.63       Asia  1971-03-26                  MEX      Mexico   126.58  1964.38  1158.23  N.America  1810-09-16                  DEU     Frg    83.02   357.11  3693.20     Europe         NaN                  GBR          United kingdom of great britain and northern ireland    66.44   242.l  2631.23     Europe         NaN                  ARG   Argentina    44.94  2780.40   637.49  Southward.America  1816-07-09                  Tin can      Canada    37.59  9984.67  1647.12  North.America  1867-07-01                  KAZ  Kazakhstan    18.53  2724.90   159.41       Asia  1991-12-xvi                                          In this example,              skiprows              is              range(ane, 20, 2)              and corresponds to the values              ane,              3, …,              19. The instances of the Python built-in class              range              deport like sequences. The get-go row of the file              information.csv              is the header row. It has the index              0, so Pandas loads information technology in. The second row with index              1              corresponds to the label              CHN, and Pandas skips it. The third row with the index              two              and label              IND              is loaded, so on.
If y'all desire to choose rows randomly, and so              skiprows              can exist a listing or NumPy array with pseudo-random numbers, obtained either with pure Python or with NumPy.
Force Less Precise Data Types
If you're okay with less precise data types, then yous tin potentially save a significant corporeality of memory! First, become the data types with              .dtypes              over again:
>>>
                                                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  parse_dates                  =                  [                  'IND_DAY'                  ])                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  Popular               float64                  Expanse              float64                  GDP               float64                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                                          The columns with the floating-point numbers are 64-flake floats. Each number of this type              float64              consumes 64 bits or viii bytes. Each column has 20 numbers and requires 160 bytes. You tin can verify this with              .memory_usage():
>>>
                                                  >>>                                    df                  .                  memory_usage                  ()                  Index      160                  COUNTRY    160                  Pop        160                  Area       160                  Gdp        160                  CONT       160                  IND_DAY    160                  dtype: int64                                                        .memory_usage()              returns an case of              Series              with the retentivity usage of each column in bytes. You tin can conveniently combine it with              .loc[]              and              .sum()              to get the retentivity for a group of columns:
>>>
                                                  >>>                                    df                  .                  loc                  [:,                  [                  'Pop'                  ,                  'Surface area'                  ,                  'GDP'                  ]]                  .                  memory_usage                  (                  index                  =                  Fake                  )                  .                  sum                  ()                  480                                          This case shows how you tin combine the numeric columns              'POP',              'Expanse', and              'Gdp'              to get their total memory requirement. The argument              index=False              excludes data for row labels from the resulting              Series              object. For these 3 columns, you'll demand 480 bytes.
Y'all can also extract the data values in the form of a NumPy array with              .to_numpy()              or              .values. Then, use the              .nbytes              attribute to get the full bytes consumed by the items of the array:
>>>
                                                  >>>                                    df                  .                  loc                  [:,                  [                  'Pop'                  ,                  'Surface area'                  ,                  'GDP'                  ]]                  .                  to_numpy                  ()                  .                  nbytes                  480                                          The issue is the same 480 bytes. So, how do you lot salve memory?
In this case, you tin can specify that your numeric columns              'POP',              'Area', and              'Gross domestic product'              should have the type              float32. Utilise the optional parameter              dtype              to do this:
>>>
                                                  >>>                                    dtypes                  =                  {                  'Popular'                  :                  'float32'                  ,                  'Expanse'                  :                  'float32'                  ,                  'Gdp'                  :                  'float32'                  }                  >>>                                    df                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  dtype                  =                  dtypes                  ,                  ...                                    parse_dates                  =                  [                  'IND_DAY'                  ])                                          The lexicon              dtypes              specifies the desired data types for each column. It's passed to the Pandas              read_csv()              function as the argument that corresponds to the parameter              dtype.
At present yous can verify that each numeric column needs 80 bytes, or 4 bytes per item:
>>>
                                                  >>>                                    df                  .                  dtypes                  COUNTRY            object                  POP               float32                  AREA              float32                  GDP               float32                  CONT               object                  IND_DAY    datetime64[ns]                  dtype: object                  >>>                                    df                  .                  memory_usage                  ()                  Index      160                  Land    160                  Popular         80                  Surface area        fourscore                  GDP         80                  CONT       160                  IND_DAY    160                  dtype: int64                  >>>                                    df                  .                  loc                  [:,                  [                  'POP'                  ,                  'AREA'                  ,                  'Gross domestic product'                  ]]                  .                  memory_usage                  (                  index                  =                  Fake                  )                  .                  sum                  ()                  240                  >>>                                    df                  .                  loc                  [:,                  [                  'POP'                  ,                  'AREA'                  ,                  'Gross domestic product'                  ]]                  .                  to_numpy                  ()                  .                  nbytes                  240                                          Each value is a floating-signal number of 32 bits or 4 bytes. The three numeric columns contain xx items each. In total, yous'll demand 240 bytes of memory when you work with the blazon              float32. This is one-half the size of the 480 bytes you'd need to work with              float64.
In addition to saving retentiveness, you can significantly reduce the fourth dimension required to process data by using              float32              instead of              float64              in some cases.
Utilize Chunks to Iterate Through Files
Another fashion to deal with very large datasets is to split the information into smaller              chunks              and process one chunk at a time. If yous use              read_csv(),              read_json()              or              read_sql(), then you can specify the optional parameter              chunksize:
>>>
                                                  >>>                                    data_chunk                  =                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  chunksize                  =                  8                  )                  >>>                                    type                  (                  data_chunk                  )                  <class 'pandas.io.parsers.TextFileReader'>                  >>>                                    hasattr                  (                  data_chunk                  ,                  '__iter__'                  )                  True                  >>>                                    hasattr                  (                  data_chunk                  ,                  '__next__'                  )                  True                                                        chunksize              defaults to              None              and can have on an integer value that indicates the number of items in a single clamper. When              chunksize              is an integer,              read_csv()              returns an iterable that you can use in a              for              loop to get and process only a fragment of the dataset in each iteration:
>>>
                                                  >>>                                    for                  df_chunk                  in                  pd                  .                  read_csv                  (                  'data.csv'                  ,                  index_col                  =                  0                  ,                  chunksize                  =                  8                  ):                  ...                                    print                  (                  df_chunk                  ,                  terminate                  =                  '                  \n\n                  '                  )                  ...                                    print                  (                  'retentiveness:'                  ,                  df_chunk                  .                  memory_usage                  ()                  .                  sum                  (),                  'bytes'                  ,                  ...                                    end                  =                  '                  \n\northward\n                  '                  )                  ...                                      COUNTRY      Popular     AREA       GDP       CONT     IND_DAY                  CHN       China  1398.72  9596.96  12234.78       Asia         NaN                  IND       Bharat  1351.16  3287.26   2575.67       Asia  1947-08-15                  U.s.a.          U.s.   329.74  9833.52  19485.39  N.America  1776-07-04                  IDN   Indonesia   268.07  1910.93   1015.54       Asia  1945-08-17                  BRA      Brazil   210.32  8515.77   2055.51  S.America  1822-09-07                  PAK    Pakistan   205.71   881.91    302.14       Asia  1947-08-14                  NGA     Nigeria   200.96   923.77    375.77     Africa  1960-10-01                  BGD  Bangladesh   167.09   147.57    245.63       Asia  1971-03-26                  retentivity: 448 bytes                                      State     Pop      Area      Gross domestic product       CONT     IND_DAY                  RUS     Russia  146.79  17098.25  1530.75        NaN  1992-06-12                  MEX     Mexico  126.58   1964.38  1158.23  Due north.America  1810-09-16                  JPN      Japan  126.22    377.97  4872.42       Asia         NaN                  DEU    Germany   83.02    357.xi  3693.xx     Europe         NaN                  FRA     France   67.02    640.68  2582.49     Europe  1789-07-14                  GBR         UK   66.44    242.fifty  2631.23     Europe         NaN                  ITA      Italy   60.36    301.34  1943.84     Europe         NaN                  ARG  Argentina   44.94   2780.40   637.49  S.America  1816-07-09                  memory: 448 bytes                                      COUNTRY    POP     Area      GDP       CONT     IND_DAY                  DZA     People's democratic republic of algeria  43.38  2381.74   167.56     Africa  1962-07-05                  CAN      Canada  37.59  9984.67  1647.12  N.America  1867-07-01                  AUS   Australia  25.47  7692.02  1408.68    Oceania         NaN                  KAZ  Kazakhstan  xviii.53  2724.ninety   159.41       Asia  1991-12-16                  memory: 224 bytes                                          In this example, the              chunksize              is              8. The starting time iteration of the              for              loop returns a              DataFrame              with the showtime eight rows of the dataset just. The 2d iteration returns some other              DataFrame              with the adjacent eight rows. The third and last iteration returns the remaining four rows.
In each iteration, you get and process the              DataFrame              with the number of rows equal to              chunksize. It'southward possible to accept fewer rows than the value of              chunksize              in the last iteration. You can employ this functionality to control the amount of retentiveness required to process data and proceed that amount reasonably small.
Conclusion
You now know how to            save            the data and labels from Pandas            DataFrame            objects to unlike kinds of files. You too know how to            load            your information from files and create            DataFrame            objects.
You've used the Pandas            read_csv()            and            .to_csv()            methods to read and write CSV files. You also used similar methods to read and write Excel, JSON, HTML, SQL, and pickle files. These functions are very convenient and widely used. They permit you to save or load your information in a single function or method call.
Y'all've also learned how to relieve fourth dimension, retention, and disk infinite when working with large information files:
- Compress or decompress files
 - Choose the rows and columns you want to load
 - Use less precise data types
 - Separate data into chunks and procedure them 1 by i
 
You've mastered a pregnant step in the machine learning and data scientific discipline process! If you have whatsoever questions or comments, then please put them in the comments section beneath.
Lookout man Now This tutorial has a related video form created past the Real Python squad. Watch information technology together with the written tutorial to deepen your agreement: Reading and Writing Files With Pandas
johnsoncoughts1965.blogspot.com
Source: https://realpython.com/pandas-read-write-files/
0 Response to "How to Read .data File in Pandas"
Post a Comment