Loading data into a Table

A Table can be created from a dataset or a schema, the specifics of which are discussed in the JavaScript section of the user's guide. In Python, however, Perspective supports additional data types that are commonly used when processing data:

  • pandas.DataFrame
  • polars.DataFrame
  • bytes (encoding an Apache Arrow)
  • objects (either extracting a repr or via reference)
  • str (encoding as a CSV)

A Table is created in a similar fashion to its JavaScript equivalent:

from datetime import date, datetime
import numpy as np
import pandas as pd
import perspective

data = pd.DataFrame({
    "int": np.arange(100),
    "float": [i * 1.5 for i in range(100)],
    "bool": [True for i in range(100)],
    "date": [date.today() for i in range(100)],
    "datetime": [datetime.now() for i in range(100)],
    "string": [str(i) for i in range(100)]
})

table = perspective.table(data, index="float")

Likewise, a View can be created via the view() method:

view = table.view(group_by=["float"], filter=[["bool", "==", True]])
column_data = view.to_columns()
row_data = view.to_json()

Polars Support

Polars DataFrame types work similarly to Apache Arrow input, which Perspective uses to interface with Polars.

df = polars.DataFrame({"a": [1,2,3,4,5]})
table = perspective.table(df)

Pandas Support

Perspective's Table can be constructed from pandas.DataFrame objects. Internally, this just uses pyarrow::from_pandas, which dictates behavior of this feature including type support.

If the dataframe does not have an index set, an integer-typed column named "index" is created. If you want to preserve the indexing behavior of the dataframe passed into Perspective, simply create the Table with index="index" as a keyword argument. This tells Perspective to once again treat the index as a primary key:

data.set_index("datetime")
table = perspective.table(data, index="index")

Time Zone Handling

When parsing "datetime" strings, times are assumed local time unless an explicit timezone offset is parsed. All "datetime" columns (regardless of input time zone) are output to the user as datetime.datetime objects in local time according to the Python runtime.

This behavior is consistent with Perspective's behavior in JavaScript. For more details, see this in-depth explanation of perspective-python semantics around time zone handling.