SVGPlot
Introduction
SVGPlot will be the combination of several things:
- JavaScript API for creating plots/graphs/charts from data (JSON, XML, CSV, SQL query, HTML
<table>
element) - XML file format (PlotXML) that is built up by the JavaScript, but can be modified by hand (to edit plot parameters) and re-opened.
- Online web service to graphically create and manipulate plots and save your work as a JavaScript, PlotXML, pure SVG, PDF, PNG, JPEG, etc.
- Plot interactivity so your plots can be manipulated by end users -- zoom in, drill down, get values of points.
The goal is to make these three as clean as possible
- Simple things are easy -- good defaults and intelligent assumptions
- It's general enough to allow any complicated plot
- Advanced things are are only as complex as they need to be -- no need to always drop back to low level graphics commands.
I set out to create a plotting program that:
- Allows server-side command-line rendering through Batik, a Java program with an embedded JavaScript interpreter that handles SVG.
- Has the good features of Matlab, Mathematica, GNUPlot, Ploticus, Super Mongo, Asymptote. Eventually I'd like to write bridges so people's existing scripts and familiarity can be used to generate interactive web content.
- Can reproduce any plot in journals like Science and Nature as straightforwardly as possible. These tend to convey large amounts of data in a small space.
- Can reproduce any plot in Physics and Math textbooks as straightforwardly as possible. These tend to be more instructive with thick lines, simple axes, and lots of explanation arrows.
- Can reproduce all of the eye catching business graphics of other online charting tools, but this is not the initial focus.
- Client-side features of panning, zooming, and exploring the data.
- Allows raw data to optionally be published inside of PlotXML file along with the "suggested way" of viewing it, but the GUI allows viewer to painlessly explore data (e.g. change scale to log)
- Can produce maps, coordinate projections, and GIS data (TODO)
Live Demos
- live demo - Table of tests that you can modify and re-run client side.
- csv demo - Type or paste in CSV from a spread sheet and plot it.
Browse Code in SVN Repository
Design/Philosophy
Looking around, I found most of the existing plotting APIs lacking and decided to start from scratch.
The idea is to plot data sets. You can think of data sets as an area in a spreadsheet or a table in a database. Each piece of data (row) has values for attributes (columns). Plotting is the act of mapping one or more of these attributes to one or more plot characteristics: x-position, y-position, color, shape, size, orientation, panel-number.
|
<plot where the phase of the moon is encoded in little pictures of what the moon would look like in that phase> |
The simplest and most common case is plotting one variable against another.
|
There are two main categories of data, which affect how they are plotted and how ticks and labels are done:
- Continuous -- numbers (real and integers, dates, times, money
- Discrete -- categories, strings, names, days of the week, some dates and numbers which you don't want relatively positioned. There may or may not be a natural ordering.
A histogram is a higher-level construct. Here the data isn't directly mapped to the plot, but something is first calculated (how many data points fit into each category) and then the category versus counts are plotted.
Plot Features to Support
- Auto-scaling axes based on data range. Option to definitely include zero or a list of important "must haves." Option to include a little extra padding (go to the nearest integer)
- Tick marks on the axes, explicitly given or auto generated to be reasonably spaced.
- Grid - lines, stripes, checkerboard
- Labels along the axes, explicit or auto
- Multiple axes referring to the same plot area -- either dependent axes (like Celsius and Fahrenheit) or independent axes for plotting two different types of variables in the same area (usually with a common horizontal axis like population and GDP over time)
- Axis titles, plot titles
- Legends
- Multiple plot panels either independent or dependent (sharing axes which must auto-scale together despite different data.)
- Annotations
- labels on each dataset, which is often much easier to read than a legend.
- arrows with or without labels
- span indicators of various kinds:
|---- x ----|
or---> <--- x
- SVG graphics and effects on top of plot, but whose position can be given in terms of plot coordinates.
- Polar Coordinates (and other coordinate systems like hyperbolic and triangular) This is tricky because if you want to plot a ploar graph and a cartesian graph on top of each other, you'd like them to share the same cartesian coordinate system and possible auto-scaling based on both the cartesian and polar data.
- Polar Coordinates - adjust scale to fit data (possibly moving 0,0 off center) or make sure 0,0 is in the center and things are auto-ranged only to the biggest radius?
- Logarithmic, arctan, and other kinds of scales. (Should this be a property of the plot canvas or a property of the mapping of the dataset onto the plot canvas?)
- Pie Charts
- Area Charts
- Bar Charts, both horizontal and vertical of many styles and information (absolute, percent of whole, etc)
- Double bar charts where one category is horizontal and within that there are several bars for a second category.
- Gaps in data which break up lines, areas, stair steps etc. These are either unplottable y-values (like infinity or NaN floating point), or a break in the regular progression of x-data. Option to auto-detect either kind.
- Histogram (for ranges) and Count (for repeated x-values.)
- Accumulation: accumulate(data) returns for each position the sum of that data point and the previous data points. accumulate([1,2,1]) -> [1,3,4]
- Dates, times, and Date-time stamps. Relative times (durations expressed in seconds, minutes, days, etc.)
- European decimal notation.
- Financial notation. ($2,000), ($1,000), $0, $1000, $2000 and with other currencies.
- Sort data on row: e.g.last name, then first name.
- Filter data on a particular set of attributes so, for example, only a window is plotted (otherwise all points will be plotted and go beyond the range. (Iterators/generators would be great for this to save time and memory.)
- Clipping applied to the data area.
- line plot has a "firstpoint" option to add an additional point, usually at (0,0). Alternatively plot( prepend(data, data_point) )
- Legend auto (how to get the names), add manually, specify symbol and/or line to be shown next to name
- values: write the value for each point. How to specify style & relative position? How to add extra space or avoid curve?
Hand Crafted SVGs to guide Development
Inspiration/Other Programs
Also see jsxgraph SVG Wikipedia's List_of_information_graphics_software
Data Plotting Ploticus / GNUPlot / Matplotlib / PyX / Chaco / GNU PlotUtils / Super Mongo / Igor Pro Galary / Origin (Comm) / Matlab Tutorial / Dplot Galery (Win Trial) / Graphis / LabPlot KDE / JFreeChart / Grace for X / Create A Graph Website / Octave (OSS Matlab) / PlotKit / .netCharting / pgpolot / swivel.com / ChartFX / JpGraph / SciDavis / QtiPlot / Stolte & Hanrahan's Polaris /
Function Plotting Asymptote / NuCalc (old Mac) / Grapher (replacement) / GrafEQ Interval / DLSin / MTAC Java / Java Online / GraphCalc OSS Win / GraphPlotter Win / Fornux Win / Math 3D Explorer / Virtual Labs / PyX /
Complex and Vector Plotting Visual Complex Analysis Websites / Draw Rect Vector Flow / Vector Field Analizer (Applet) / Complex Mapping Viewer / Complex Functions /
Web Charting Flex Charting & Dashboard Example / PlotKit / Dojo / GlobFX /
Visualization VTK / TVTK / MayaVi & MayaVi2 / ParaView / VisIT / OpenDX (IBM's Viz Data Explorer) / SCIRun / NAG's Iris Explorer / OSS Package / Gallary of Viz / vpython / Gapminder / JunkCharts /
Data Analysis Tableau / Root / Java Analysis Studio (JAS3) / HippoDraw / AIDA: Abstract Interfaces for Data Analysis /
Computer Algebra Systems Maxima / Yacas / Euler / Eclipse / Sage /
Design Issues/Tradeoffs
- Densely pack information in the available space minimizing margins and extraneous "ink?" This is good for web viewing. Or should it more liesurely arrange things for optimal print viewing?
- Pixel align vertical and horizontal lines to avoid annoying anti-aliasing effects on screen or position plotted object "properly" for accurate printing? (see equally_spaced_lines vs rounded_lines)
- When the SVG DOM is modified, those changes are reflected in the resulting image. I don't think there's a way to "capture" possible modifications on the PlotXML DOM tree and to change the plot (or other calculated information) accordingly in real time. This means there really needs to be a duplication between the API function calls and the XML representation. Put another way, changing the DOM can't have side effects like calling a function, so to actually re-draw stuff, we need getter and setter functions.
- Procedural model (plot function) versus data model (XML/JSON rep).
- This issue is like Canvas vs SVG.
- The procedural model makes it easier to crate variables based on the data and use them in other parts without a lot of xrefs. (How to even do in JSON?)
- The data model makes it easier to change parameters with a UI, add data, and re-render.
- Ways to pass graph properties:
- Stack-based state method like Canvas
strokeStyle = 'red'; plot(func, {'x', 0, 10})
- Explicitly with each function like Mathematica:
plot(func, {'x', 0, 10}, {'strokeStyle':'red'} )
- A Combo: passing in a dictionary as the last parameter which overrides current state
- Create objects and set properties with setter functions explicitly like vtk
- Stack-based state method like Canvas
- Properties are nice for stack-based, but bad for object-based unless you're in Python where you can capture the setting or you're willing to register callbacks that check if the state is different than it was when it was drawn Periodic updates aren't so bad. Mozilla's native SVG element['width']=10 does it this way.
- Defaults are hard to deal with. Should axes, ticks, and labels start up on automatically? Sure. Then there's a difference between setting the ones and adding new ones. When you change a parameter, does it affect the axes or just the drawing of the new axes? What if you don't want all of the default ticks and stuff - do you have to delete them all explicitly?
- Input data:
- Table (2D array) with column headings (most efficient). First row may be heading. Spreadsheet-like.
[ ['x', 'y', 'a'], [7, 3, 6], [4, 2, 9], ...]
- Dictionary with variable names as keys and data as arrays.
{ x:[7, 3, 6], y:[4, 2, 9], a:[...], ... }
- List of objects with uniform attributes. Uniform, and what SQLObject returns. Inefficient. Easily passed to Drawing Function
[ {x:7, y:3, a:6}, {x:4, y:2, a:9}, ... ]
- Table (2D array) with column headings (most efficient). First row may be heading. Spreadsheet-like.
- What about truly 2D or 3D data like an image or height map? Would like to read in PNG/JPEG, etc. Would like to read astronomy's FITS files.
- What about topological information like regular grids, irregular grids, mesh of cells? These can all be described by tables where each datum includes all of this information, but that's not a good way to format or think about the data.
- Should 3D data even be supported by generating one-view SVG, or should this wait until a true web-friendly XML-based 3D data standard like X3D becomes universally adopted?
- Should the PlotXML format and JavaScript API be written extensiblly to support these kinds of plots once the underlying 3D technology is feasible, or would that make simple things too complicated?
- I'd love to have written this whole thing in Python or a combination of Python and C++ for the low-level stuff, but alas JavaScript is the only supported scripting language on the web and in SVG. Opening and saving files is impossible without going through a server, as is connecting to real-time data acquisition hardware.
Future Plans or Additional Packages
- Curve Fitting
- Histograms
- Statistical Analysis
- Computer Algebra system for function plotting
Keywords: plot, plotting, graph, graphing, chart, charting