Skip to main content

Data Binding

Gcluster_11Python Servercluster_thread_2PerspectiveManager - Thread 2cluster_thread_1PerspectiveManager - Thread 1cluster_browserBrowsercluster_2WebWorker 1cluster_webworker2WebWorker 2cluster_41<html>table_thread_2table(df)view_thread_2view({    group_by: ["State"]]})table_thread_2->view_thread_2view_thread_2_2view({    group_by: ["City"]]})table_thread_2->view_thread_2_2table12table(...)view_thread_2->table12table_thread_1table(arrow)view_thread_1view({    group_by:'["State"]    split_by:'["Segment"]'})table_thread_1->view_thread_1viewer4<perspective-viewer    view="heatmap"    row-pivots='["State"]    column-pivots='["Segment"]'>view_thread_1->viewer4table1table(csv)view1view({    group_by: ["Category"]    filter: [["State","==","Texas"]]})table1->view1view2view({    group_by: ["Sub-Category"]})table1->view2table_remote_viewtable(json)view3view()table_remote_view->view3viewer1<perspective-viewer    view="Y Bar"    row-pivots='["Category"]'    filters='[["State","==","Texas"]]'>view1->viewer1viewer2<perspective-viewer    view="xy_scatter"    row-pivots='["Sub-Category"]'>view2->viewer2viewer3<perspective-viewer    view="grid">view3->viewer3view12view({    group_by: ["Category"]    filter: [["State","==","Texas"]]})table12->view12viewer5<perspective-viewer    view="heatmap"    row-pivots='["State"]    column-pivots='["Segment"]'>view12->viewer5

Application developers can choose from Client (WebAssembly), Server (Python/Node) or Client/Server Replicated designs to bind data, and a web application can use one or a mix of these designs as needed. By serializing to Apache Arrow, tables are duplicated and synchronized across runtimes efficiently.

Client-only

Gcluster_browserBrowsercluster_2WebWorker 1cluster_41<html>table1table(csv)view2view({    group_by: ["Sub-Category"]})table1->view2viewer2<perspective-viewer    view="xy_scatter"    row-pivots='["Sub-Category"]'>view2->viewer2

For static datasets, datasets provided by the user, and simple server-less and read-only web applications.

In this design, Perspective is run as a client Browser WebAssembly library, the dataset is downloaded entirely to the client and all calculations and UI interactions are performed locally. Interactive performance is very good, using WebAssembly engine for near-native runtime plus WebWorker isolation for parallel rendering within the browser. Operations like scrolling and creating new views are responsive. However, the entire dataset must be downloaded to the client. Perspective is not a typical browser component, and datset sizes of 1gb+ in Apache Arrow format will load fine with good interactive performance!

Horizontal scaling is a non-issue, since here is no concurrent state to scale, and only uses client-side computation via WebAssembly client. Client-only perspective can support as many concurrent users as can download the web application itself. Once the data is loaded, no server connection is needed and all operations occur in the client browser, imparting no additional runtime cost on the server beyond initial load. This also means updates and edits are local to the browser client and will be lost when the page is refreshed, unless otherwise persisted by your application.

As the client-only design starts with creating a client-side Perspective Table, data can be provided by any standard web service in any Perspective compatible format (JSON, CSV or Apache Arrow).

Javascript client

const worker = perspective.worker();
const table = await worker.table(csv);

const viewer = document.createElement("perspective-viewer");
document.body.appendChild(viewer);
viewer.load(table);

Client/Server Replicated

Gcluster_11Python Servercluster_thread_2PerspectiveManager - Thread 2cluster_browserBrowsercluster_webworker2WebWorker 2cluster_41<html>table_thread_2table(df)view_thread_2view({    group_by: ["State"]]})table_thread_2->view_thread_2table12table(...)view_thread_2->table12view12view({    group_by: ["Category"]    filter: [["State","==","Texas"]]})table12->view12viewer5<perspective-viewer    view="heatmap"    row-pivots='["State"]    column-pivots='["Segment"]'>view12->viewer5

For medium-sized, real-time, synchronized and/or editable data sets with many concurrent users.

The dataset is instantiated in-memory with a Python or Node.js Perspective server, and web applications create duplicates of these tables in a local WebAssembly client in the browser, synchonized efficiently to the server via Apache Arrow. This design scales well with additional concurrent users, as browsers only need to download the initial data set and subsequent update deltas, while operations like scrolling, pivots, sorting, etc. are performed on the client.

Python servers can make especially good use of additional threads, as Perspective will release the GIL for almost all operations. Interactive performance on the client is very good and identical to client-only architecture. Updates and edits are seamlessly synchonized across clients via their virtual server counterparts using websockets and Apache Arrow.

Python and Tornado server

from perspective import Table, PerspectiveManager, PerspectiveTornadoHandler

table = Table(csv)
manager = PerspectiveManager()
manager.host("my_table", table)
routes = [(
r"/websocket",
PerspectiveTornadoHandler,
{"manager": manager, "check_origin": True},
)]

app = tornado.web.Application(routes)
app.listen(8080)
loop = tornado.ioloop.IOLoop.current()
loop.start()

Javascript client

Perspective's websocket client interfaces with the Python server. then replicates the server-side Table.

const websocket = perspective.websocket("ws://localhost:8080");
const server_table = websocket.open("my_table");
const server_view = await server_table.view();

const worker = perspective.worker();
const client_table = await worker.table(server_view);

const viewer = document.createElement("perspective-viewer");
document.body.appendChild(viewer);
viewer.load(client_table);

Server-only

Gcluster_11Python Servercluster_thread_1PerspectiveManager - Thread 1cluster_browserBrowsercluster_41<html>table_thread_1table(arrow)view_thread_1view({    group_by:'["State"]    split_by:'["Segment"]'})table_thread_1->view_thread_1viewer4<perspective-viewer    view="heatmap"    row-pivots='["State"]    column-pivots='["Segment"]'>view_thread_1->viewer4

For extremely large datasets with a small number of concurrent users.

The dataset is instantiated in-memory with a Python or Node.js server, and web applications connect virtually. Has very good initial load performance, since no data is downloaded. Group-by and other operations will run column-parallel if configured.

But interactive performance is poor, as every user interaction must page the server to render. Operations like scrolling are not as responsive and can be impacted by network latency. Web applications must be "always connected" to the server via WebSocket. Disconnecting will prevent any interaction, scrolling, etc. of the UI. Does not use WebAssembly.

Each connected browser will impact server performance as long as the connection is open, which in turn impacts interactive performance of every client. This ultimately limits the horizontal scalabity of this architecture. Since each client reads the perspective Table virtually, changes like edits and updates are automatically reflected to all clients and persist across browser refresh. Using the same Python server as the previous design, we can simply skip the intermediate WebAssembly Table and pass the virtual table directly to load()

const websocket = perspective.websocket("ws://localhost:8080");
const server_table = websocket.open("my_table");

const viewer = document.createElement("perspective-viewer");
document.body.appendChild(viewer);
viewer.load(server_table);