Skip to content

Project Functionality

Overview

Functionality for managing a DataDiVR project, including reading and writing project files in both JSON and binary formats.

Data Structure

Visual representation of Project data structure

Core Components

Project

The root model representing a DataDiVR project. Contains all data necessary to represent and visualize a network of nodes, their connections, and various layouts.

Key Attributes:

  • name: Project display name
  • attributes: Optional key-value pairs for project metadata
  • nodes_data: Efficient storage for node information (IDs, names, attributes)
  • links_data: Storage for node connections
  • layouts_data: Dictionary of layout configurations
  • selections: Optional list of node selection groups

Data Storage Classes

NodeData

Efficient storage for large node datasets:

  • ids: Array of node IDs (numpy int32)
  • names: Parallel array of names
  • attributes: Sparse dictionary of attributes keyed by node ID

LayoutData

Storage for layout positions:

  • node_ids: Array of node IDs (numpy int32)
  • positions: Array of 3D positions (numpy float32)
  • colors: Array of RGBA colors (numpy uint8)

LinkData

Storage for node connections:

  • start_ids: Array of source IDs (numpy int32)
  • end_ids: Array of target IDs (numpy int32)
  • colors: Array of RGBA colors (numpy uint8)

File Formats

The project supports two file formats:

JSON Format

  • Human-readable format
  • Uses optimized orjson serialization
  • Suitable for very small projects

Binary Format

  • Compressed zip file containing:
  • metadata.json: Project metadata and non-array data
  • arrays/*.npy: Binary numpy arrays for large datasets
  • Optimized for large projects with significant numerical data
  • Uses numpy's native format for efficient storage of arrays
  • much smaller file size than JSON
  • faster to load (10x+)

Color Representation

Colors are represented using RGBA format:

  • Type: RGBAColor = tuple[int, int, int, int]
  • Values: Four integers representing Red, Green, Blue, and Alpha channels

datadivr.project.model

Attributes

RGBAColor = tuple[int, int, int, int] module-attribute

Type alias for RGBA colors represented as a tuple of 4 integers (r,g,b,a).

Classes

LayoutData dataclass

Efficient storage for layout positions

Source code in datadivr/project/model.py
@dataclass
class LayoutData:
    """Efficient storage for layout positions"""

    node_ids: npt.NDArray[np.int32]  # Array of node IDs (N,)
    positions: npt.NDArray[np.float32]  # Array of positions (N, 3)
    colors: npt.NDArray[np.uint8]  # Array of RGBA colors (N, 4)

LayoutNotFoundError

Bases: ValueError

Raised when a requested layout is not found in the project.

Source code in datadivr/project/model.py
class LayoutNotFoundError(ValueError):
    """Raised when a requested layout is not found in the project."""

    pass

LinkData dataclass

Efficient storage for links

Source code in datadivr/project/model.py
@dataclass
class LinkData:
    """Efficient storage for links"""

    start_ids: npt.NDArray[np.int32]  # Array of source IDs (M,)
    end_ids: npt.NDArray[np.int32]  # Array of target IDs (M,)
    colors: npt.NDArray[np.uint8]  # Array of RGBA colors (M, 4)

NodeData dataclass

Efficient storage for large node datasets using parallel numpy arrays

Source code in datadivr/project/model.py
@dataclass
class NodeData:
    """Efficient storage for large node datasets using parallel numpy arrays"""

    ids: npt.NDArray[np.int32]  # Array of node IDs (N,)

    # Store attributes in separate arrays by data type for efficiency
    str_attributes: dict[str, npt.NDArray[np.dtype("O")]]  # type: ignore[valid-type]  # String attributes (N,)
    float_attributes: dict[str, npt.NDArray[np.float32]]  # Float attributes (N,)
    int_attributes: dict[str, npt.NDArray[np.int32]]  # Integer attributes (N,)
    bool_attributes: dict[str, npt.NDArray[np.bool_]]  # Boolean attributes (N,)

    def __init__(self, ids: npt.NDArray[np.int32]):
        self.ids = ids
        self.str_attributes = {}
        self.float_attributes = {}
        self.int_attributes = {}
        self.bool_attributes = {}

    def add_attribute(self, name: str, values: npt.NDArray, dtype: Any) -> None:
        """Add a new attribute array of specified type"""
        if np.issubdtype(dtype, np.floating):
            self.float_attributes[name] = values.astype(np.float32)
        elif np.issubdtype(dtype, np.integer):
            self.int_attributes[name] = values.astype(np.int32)
        elif np.issubdtype(dtype, np.bool_):
            self.bool_attributes[name] = values.astype(np.bool_)
        else:
            self.str_attributes[name] = values.astype("O")

    def get_attribute(self, name: str) -> npt.NDArray:
        """Get attribute array by name"""
        if name not in self.attribute_names:
            logger.error(
                "Requested attribute not found", requested_attribute=name, available_attributes=self.attribute_names
            )
            raise AttributeNotFoundError(name)

        for attr_dict in [self.str_attributes, self.float_attributes, self.int_attributes, self.bool_attributes]:
            if name in attr_dict:
                return attr_dict[name]
        # This point should not be reached due to the check above
        raise AttributeNotFoundError(name)

    @property
    def attribute_names(self) -> set[str]:
        """Get all available attribute names"""
        names: set[str] = set()
        for attr_dict in [self.str_attributes, self.float_attributes, self.int_attributes, self.bool_attributes]:
            names.update(attr_dict.keys())
        return names

    def get_attributes_by_index(self, index: int) -> dict[str, Any]:
        """Get all attributes for a node by its index."""
        if index < 0 or index >= len(self.ids):
            raise NodeIndexOutOfBoundsError(index, len(self.ids))

        attributes = {}
        for attr_dict in [self.str_attributes, self.float_attributes, self.int_attributes, self.bool_attributes]:
            for name, values in attr_dict.items():
                attributes[name] = values[index]
        return attributes
Attributes
attribute_names: set[str] property

Get all available attribute names

Functions
add_attribute(name, values, dtype)

Add a new attribute array of specified type

Source code in datadivr/project/model.py
def add_attribute(self, name: str, values: npt.NDArray, dtype: Any) -> None:
    """Add a new attribute array of specified type"""
    if np.issubdtype(dtype, np.floating):
        self.float_attributes[name] = values.astype(np.float32)
    elif np.issubdtype(dtype, np.integer):
        self.int_attributes[name] = values.astype(np.int32)
    elif np.issubdtype(dtype, np.bool_):
        self.bool_attributes[name] = values.astype(np.bool_)
    else:
        self.str_attributes[name] = values.astype("O")
get_attribute(name)

Get attribute array by name

Source code in datadivr/project/model.py
def get_attribute(self, name: str) -> npt.NDArray:
    """Get attribute array by name"""
    if name not in self.attribute_names:
        logger.error(
            "Requested attribute not found", requested_attribute=name, available_attributes=self.attribute_names
        )
        raise AttributeNotFoundError(name)

    for attr_dict in [self.str_attributes, self.float_attributes, self.int_attributes, self.bool_attributes]:
        if name in attr_dict:
            return attr_dict[name]
    # This point should not be reached due to the check above
    raise AttributeNotFoundError(name)
get_attributes_by_index(index)

Get all attributes for a node by its index.

Source code in datadivr/project/model.py
def get_attributes_by_index(self, index: int) -> dict[str, Any]:
    """Get all attributes for a node by its index."""
    if index < 0 or index >= len(self.ids):
        raise NodeIndexOutOfBoundsError(index, len(self.ids))

    attributes = {}
    for attr_dict in [self.str_attributes, self.float_attributes, self.int_attributes, self.bool_attributes]:
        for name, values in attr_dict.items():
            attributes[name] = values[index]
    return attributes

Project

Bases: BaseModel

Root model representing a DataDiVR project.

This model contains all data necessary to represent and visualize a network of nodes, their connections, and various layouts using efficient data structures for large datasets.

Attributes:

Name Type Description
name str

Project display name

attributes dict[str, str]

Optional key-value pairs for project metadata

nodes_data NodeData | None

Efficient storage for node data (ids, names, and attributes)

links_data LinkData | None

Efficient storage for link data (start_ids, end_ids, and colors)

layouts_data dict[str, LayoutData]

Dictionary of layout configurations with efficient array storage

selections list[Selection] | None

Optional list of node Selection groups

Example
project = Project(
    name="My Project",
    attributes={},
    nodes_data=NodeData(ids=np.array([1]), names=["Node 1"], attributes={}),
    links_data=None,
    layouts_data={},
    selections=[]
)
Source code in datadivr/project/model.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
class Project(BaseModel):
    """Root model representing a DataDiVR project.

    This model contains all data necessary to represent and visualize
    a network of nodes, their connections, and various layouts using
    efficient data structures for large datasets.

    Attributes:
        name: Project display name
        attributes: Optional key-value pairs for project metadata
        nodes_data: Efficient storage for node data (ids, names, and attributes)
        links_data: Efficient storage for link data (start_ids, end_ids, and colors)
        layouts_data: Dictionary of layout configurations with efficient array storage
        selections: Optional list of node Selection groups

    Example:
        ```python
        project = Project(
            name="My Project",
            attributes={},
            nodes_data=NodeData(ids=np.array([1]), names=["Node 1"], attributes={}),
            links_data=None,
            layouts_data={},
            selections=[]
        )
        ```
    """

    model_config = {
        "arbitrary_types_allowed": True,
        "json_schema_extra": {
            "examples": [
                {"name": "Example Project", "nodes": [{"id": 1, "name": "First Node"}], "links": [], "layouts": []}
            ]
        },
    }

    name: str
    attributes: dict[str, str] = Field(default_factory=dict, description="Custom metadata key-value pairs")

    # Change to public names
    nodes_data: NodeData | None = None
    links_data: LinkData | None = None
    layouts_data: dict[str, LayoutData] = Field(default_factory=dict)
    selections: list[Selection] | None = []

    def add_nodes_bulk(self, ids: npt.NDArray[np.int32], attributes: dict[str, npt.NDArray]) -> None:
        """Efficiently add multiple nodes at once with attribute arrays

        Args:
            ids: Array of node IDs
            attributes: Dictionary mapping attribute names to numpy arrays of values
        """
        self.nodes_data = NodeData(ids=ids)
        for name, values in attributes.items():
            self.nodes_data.add_attribute(name, values, values.dtype)

    def add_layout_bulk(
        self,
        name: str,
        node_ids: npt.NDArray[np.int32],
        positions: npt.NDArray[np.float32],
        colors: npt.NDArray[np.uint8],
    ) -> None:
        """Efficiently add layout data"""
        self.layouts_data[name] = LayoutData(node_ids=node_ids, positions=positions, colors=colors)

    def add_links_bulk(
        self, start_ids: npt.NDArray[np.int32], end_ids: npt.NDArray[np.int32], colors: npt.NDArray[np.uint8]
    ) -> None:
        """Efficiently add multiple links at once"""
        self.links_data = LinkData(start_ids=start_ids, end_ids=end_ids, colors=colors)

    def model_dump(
        self,
        *,
        mode: str = "python",
        include: Any = None,
        exclude: Any = None,
        by_alias: bool = False,
        exclude_unset: bool = False,
        exclude_defaults: bool = False,
        exclude_none: bool = False,
        round_trip: bool = False,
        warnings: bool | str = True,
        serialize_as_any: bool = False,
        context: Any = None,
    ) -> dict[str, Any]:
        """Custom serialization optimized for speed and memory efficiency"""
        # Combine all attribute dictionaries
        attributes = {}
        if self.nodes_data:
            attributes.update({str(k): v.tolist() for k, v in self.nodes_data.str_attributes.items()})
            attributes.update({str(k): v.tolist() for k, v in self.nodes_data.float_attributes.items()})
            attributes.update({str(k): v.tolist() for k, v in self.nodes_data.int_attributes.items()})
            attributes.update({str(k): v.tolist() for k, v in self.nodes_data.bool_attributes.items()})

        # Ensure all keys in attributes are strings
        attributes = {str(k): v for k, v in attributes.items()}

        return {
            "name": self.name,
            "attributes": self.attributes,
            "nodes": {
                "ids": self.nodes_data.ids.astype(int).tolist() if self.nodes_data else [],
                "attributes": attributes,  # Use the updated attributes dictionary
            },
            "links": {
                "start_ids": self.links_data.start_ids.astype(int).tolist() if self.links_data else [],
                "end_ids": self.links_data.end_ids.astype(int).tolist() if self.links_data else [],
                "colors": self.links_data.colors.tolist() if self.links_data else [],
            },
            "layouts": {
                str(name): {  # Ensure layout names are strings
                    "node_ids": layout.node_ids.astype(int).tolist(),
                    "positions": layout.positions.tolist(),
                    "colors": layout.colors.tolist(),
                }
                for name, layout in self.layouts_data.items()
            },
            "selections": [s.model_dump() for s in self.selections] if self.selections else [],
        }

    @classmethod
    def model_validate(
        cls,
        obj: Any,
        *,
        strict: bool | None = None,
        from_attributes: bool | None = None,
        context: Any | None = None,
    ) -> "Project":
        """Custom deserialization from efficient storage"""
        data = obj  # obj will contain our dictionary data
        project = cls(name=data["name"], attributes=data.get("attributes", {}))

        # Load nodes
        if "nodes" in data:
            project.nodes_data = NodeData(ids=np.array(data["nodes"]["ids"], dtype=np.int32))

            # Load attributes into appropriate typed dictionaries
            if "attributes" in data["nodes"]:
                for name, values in data["nodes"]["attributes"].items():
                    # Convert to numpy array and infer type
                    arr = np.array(values)
                    project.nodes_data.add_attribute(name, arr, arr.dtype)

        # Load links
        if "links" in data:
            project.links_data = LinkData(
                start_ids=np.array(data["links"]["start_ids"], dtype=np.int32),
                end_ids=np.array(data["links"]["end_ids"], dtype=np.int32),
                colors=np.array(data["links"]["colors"], dtype=np.uint8),
            )

        # Load layouts
        if "layouts" in data:
            for name, layout_data in data["layouts"].items():
                project.layouts_data[name] = LayoutData(
                    node_ids=np.array(layout_data["node_ids"], dtype=np.int32),
                    positions=np.array(layout_data["positions"], dtype=np.float32),
                    colors=np.array(layout_data["colors"], dtype=np.uint8),
                )

        # Load selections
        if "selections" in data:
            project.selections = [Selection.model_validate(s) for s in data["selections"]]

        return project

    @classmethod
    def load_from_json_file(cls, file_path: Path | str) -> "Project":
        """Load a project from a JSON file.

        Args:
            file_path: Path to the JSON file

        Returns:
            Project: Loaded and validated Project instance

        Raises:
            ValidationError: If the JSON data doesn't match the expected schema
            OSError: If there are file access issues
        """
        file_path = Path(file_path)
        logger.debug("Loading project", file_path=str(file_path))

        try:
            with file_path.open("r", encoding="utf-8") as f:
                data = json.load(f)
                project = cls.model_validate(data)
                logger.info("Project loaded successfully", project_name=project.name)
                return project
        except Exception as e:
            logger.exception("Failed to load project", error=str(e))
            raise

    def save_to_json_file(self, file_path: Path | str) -> None:
        """Save the project to a JSON file with optimized performance."""
        file_path = Path(file_path)
        logger.debug("Saving project", file_path=str(file_path))

        try:
            # Convert to JSON-compatible dict first
            data = self.model_dump()

            # Use a faster JSON encoder
            import orjson  # Much faster than standard json

            # Write in binary mode with orjson
            with file_path.open("wb") as f:
                f.write(orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SERIALIZE_NUMPY))

            logger.info("Project saved successfully", project_name=self.name)
        except Exception as e:
            logger.exception("Failed to save project", error=str(e))
            raise

    def save_to_binary_file(self, file_path: Path | str) -> None:
        """Save the project using numpy binary format for large arrays."""
        file_path = Path(file_path)
        logger.debug("Saving project in binary format", file_path=str(file_path))

        try:
            with tempfile.TemporaryDirectory() as temp_dir:
                temp_path = Path(temp_dir)
                self._save_arrays_to_temp(temp_path)
                self._save_metadata_to_temp(temp_path)
                self._create_zip_archive(file_path, temp_path)
                logger.info("Project saved successfully in binary format", project_name=self.name)
        except Exception as e:
            logger.exception("Failed to save project in binary format", error=str(e))
            raise

    def _save_arrays_to_temp(self, temp_path: Path) -> None:
        """Save numpy arrays to temporary directory."""
        arrays_dir = temp_path / "arrays"
        arrays_dir.mkdir()

        # Save node data
        if self.nodes_data:
            np.save(arrays_dir / "node_ids.npy", self.nodes_data.ids)
            self._save_node_attributes(arrays_dir)

        # Save links
        if self.links_data:
            np.save(arrays_dir / "link_start_ids.npy", self.links_data.start_ids)
            np.save(arrays_dir / "link_end_ids.npy", self.links_data.end_ids)
            np.save(arrays_dir / "link_colors.npy", self.links_data.colors)

        # Save layouts
        for name, layout in self.layouts_data.items():
            layout_dir = arrays_dir / f"layout_{name}"
            layout_dir.mkdir()
            np.save(layout_dir / "node_ids.npy", layout.node_ids)
            np.save(layout_dir / "positions.npy", layout.positions)
            np.save(layout_dir / "colors.npy", layout.colors)

    def _save_node_attributes(self, arrays_dir: Path) -> None:
        """Save node attributes to arrays directory."""
        if self.nodes_data is None:
            return  # Exit early if nodes_data is None

        for attr_dict in [
            self.nodes_data.str_attributes,
            self.nodes_data.float_attributes,
            self.nodes_data.int_attributes,
            self.nodes_data.bool_attributes,
        ]:
            for name, values in attr_dict.items():
                np.save(arrays_dir / f"node_attr_{name}.npy", values)

    def _save_metadata_to_temp(self, temp_path: Path) -> None:
        """Save metadata JSON to temporary directory."""
        metadata = {
            "name": self.name,
            "attributes": self.attributes,
            "nodes": {
                "attributes": {
                    name: str(values.dtype)
                    for attr_dict in [
                        self.nodes_data.str_attributes,
                        self.nodes_data.float_attributes,
                        self.nodes_data.int_attributes,
                        self.nodes_data.bool_attributes,
                    ]
                    for name, values in attr_dict.items()
                }
                if self.nodes_data
                else {}
            },
            "layouts": list(self.layouts_data.keys()),
            "selections": [s.model_dump() for s in self.selections] if self.selections else [],
        }

        with open(temp_path / "metadata.json", "wb") as f:
            f.write(orjson.dumps(metadata, option=orjson.OPT_INDENT_2))

    def _create_zip_archive(self, file_path: Path, temp_path: Path) -> None:
        """Create zip archive from temporary directory."""
        with ZipFile(file_path, "w", compression=ZIP_DEFLATED) as zf:
            for file_path in temp_path.rglob("*"):
                if file_path.is_file():
                    zf.write(file_path, file_path.relative_to(temp_path))

    @classmethod
    def load_from_binary_file(cls, file_path: Path | str) -> "Project":
        """Load a project from a binary format file."""
        file_path = Path(file_path)
        logger.debug("Loading project from binary format", file_path=str(file_path))

        try:
            with tempfile.TemporaryDirectory() as temp_dir:
                temp_path = Path(temp_dir)

                # Extract zip archive
                with ZipFile(file_path, "r") as zf:
                    zf.extractall(temp_path)

                # Load metadata
                with open(temp_path / "metadata.json", "rb") as f:
                    metadata = orjson.loads(f.read())

                # Create project instance
                project = cls(name=metadata["name"], attributes=metadata.get("attributes", {}))

                # Load nodes
                if (temp_path / "arrays/node_ids.npy").exists():
                    node_ids = np.load(temp_path / "arrays/node_ids.npy")
                    project.nodes_data = NodeData(ids=node_ids)

                    # Load attributes from metadata
                    if "nodes" in metadata and "attributes" in metadata["nodes"]:
                        for name, _dtype_info in metadata["nodes"]["attributes"].items():
                            attr_path = temp_path / f"arrays/node_attr_{name}.npy"
                            if attr_path.exists():
                                values = np.load(attr_path, allow_pickle=True)
                                project.nodes_data.add_attribute(name, values, values.dtype)

                # Load links if present
                if (temp_path / "arrays/link_start_ids.npy").exists():
                    project.links_data = LinkData(
                        start_ids=np.load(temp_path / "arrays/link_start_ids.npy"),
                        end_ids=np.load(temp_path / "arrays/link_end_ids.npy"),
                        colors=np.load(temp_path / "arrays/link_colors.npy"),
                    )

                # Load layouts
                for layout_name in metadata["layouts"]:
                    layout_dir = temp_path / f"arrays/layout_{layout_name}"
                    project.layouts_data[layout_name] = LayoutData(
                        node_ids=np.load(layout_dir / "node_ids.npy"),
                        positions=np.load(layout_dir / "positions.npy"),
                        colors=np.load(layout_dir / "colors.npy"),
                    )

                # Load selections
                if "selections" in metadata:
                    project.selections = [Selection.model_validate(s) for s in metadata["selections"]]

                return project

        except Exception as e:
            logger.exception("Failed to load project from binary format", error=str(e))
            raise

    def get_layout_positions(self, layout_name: str = "default") -> npt.NDArray[np.float32]:
        """Get node positions for a specific layout"""
        if layout_name not in self.layouts_data:
            raise LayoutNotFoundError(layout_name)
        return self.layouts_data[layout_name].positions

    def get_layout_colors(self, layout_name: str = "default") -> npt.NDArray[np.uint8]:
        """Get node colors for a specific layout"""
        if layout_name not in self.layouts_data:
            raise LayoutNotFoundError(layout_name)
        return self.layouts_data[layout_name].colors

    def create_textures(self, output_dir: str = "static/projects/") -> None:
        """Create textures for the project."""
        create_textures_from_project(
            self.name,
            self.layouts_data,
            {
                "start_ids": self.links_data.start_ids,
                "end_ids": self.links_data.end_ids,
                "colors": self.links_data.colors,
            }
            if self.links_data
            else None,
            output_dir,
        )

    def create_json_files(self, output_dir: str = "static/projects/") -> None:
        """Create JSON files for nodes and links."""
        if self.nodes_data:
            create_nodes_json(
                self.nodes_data.ids.tolist(),  # Convert ndarray to list
                [str(i) for i in self.nodes_data.ids],  # Assuming node names are string representations of IDs
                self.name,
                output_dir,
            )

        if self.links_data:
            create_links_json(
                list(zip(self.links_data.start_ids, self.links_data.end_ids, strict=False)), self.name, output_dir
            )

    def create_project_summary(self, output_dir: str = "static/projects/") -> None:
        """Create a project summary JSON file."""
        project_summary = {
            "name": self.name,
            "layouts": list(self.layouts_data.keys()),
            "layoutsRGB": [f"{layout}RGB" for layout in self.layouts_data],
            "links": ["links_XYZ"],  # Placeholder, adjust as needed
            "linksRGB": ["links_RGB"],  # Placeholder, adjust as needed
            "nodecount": len(self.nodes_data.ids) if self.nodes_data else 0,
            "linkcount": len(self.links_data.start_ids) if self.links_data else 0,
            "labelcount": 0,  # Placeholder, adjust as needed
            "annotationTypes": False,  # Placeholder, adjust as needed
        }

        file_path = Path(output_dir) / self.name / "project.json"
        file_path.parent.mkdir(parents=True, exist_ok=True)

        with file_path.open("w", encoding="utf-8") as f:
            json.dump(project_summary, f, indent=4)

        logger.info(f"Project summary saved to {file_path}")

    def create_all_assets(self, output_dir: str = "static/projects/") -> None:
        """Create all project assets including textures, JSON files, and project summary."""
        self.create_textures(output_dir)
        self.create_json_files(output_dir)
        self.create_project_summary(output_dir)
        logger.info("All project assets created successfully", project_name=self.name)
Functions
add_layout_bulk(name, node_ids, positions, colors)

Efficiently add layout data

Source code in datadivr/project/model.py
def add_layout_bulk(
    self,
    name: str,
    node_ids: npt.NDArray[np.int32],
    positions: npt.NDArray[np.float32],
    colors: npt.NDArray[np.uint8],
) -> None:
    """Efficiently add layout data"""
    self.layouts_data[name] = LayoutData(node_ids=node_ids, positions=positions, colors=colors)

Efficiently add multiple links at once

Source code in datadivr/project/model.py
def add_links_bulk(
    self, start_ids: npt.NDArray[np.int32], end_ids: npt.NDArray[np.int32], colors: npt.NDArray[np.uint8]
) -> None:
    """Efficiently add multiple links at once"""
    self.links_data = LinkData(start_ids=start_ids, end_ids=end_ids, colors=colors)
add_nodes_bulk(ids, attributes)

Efficiently add multiple nodes at once with attribute arrays

Parameters:

Name Type Description Default
ids NDArray[int32]

Array of node IDs

required
attributes dict[str, NDArray]

Dictionary mapping attribute names to numpy arrays of values

required
Source code in datadivr/project/model.py
def add_nodes_bulk(self, ids: npt.NDArray[np.int32], attributes: dict[str, npt.NDArray]) -> None:
    """Efficiently add multiple nodes at once with attribute arrays

    Args:
        ids: Array of node IDs
        attributes: Dictionary mapping attribute names to numpy arrays of values
    """
    self.nodes_data = NodeData(ids=ids)
    for name, values in attributes.items():
        self.nodes_data.add_attribute(name, values, values.dtype)
create_all_assets(output_dir='static/projects/')

Create all project assets including textures, JSON files, and project summary.

Source code in datadivr/project/model.py
def create_all_assets(self, output_dir: str = "static/projects/") -> None:
    """Create all project assets including textures, JSON files, and project summary."""
    self.create_textures(output_dir)
    self.create_json_files(output_dir)
    self.create_project_summary(output_dir)
    logger.info("All project assets created successfully", project_name=self.name)
create_json_files(output_dir='static/projects/')

Create JSON files for nodes and links.

Source code in datadivr/project/model.py
def create_json_files(self, output_dir: str = "static/projects/") -> None:
    """Create JSON files for nodes and links."""
    if self.nodes_data:
        create_nodes_json(
            self.nodes_data.ids.tolist(),  # Convert ndarray to list
            [str(i) for i in self.nodes_data.ids],  # Assuming node names are string representations of IDs
            self.name,
            output_dir,
        )

    if self.links_data:
        create_links_json(
            list(zip(self.links_data.start_ids, self.links_data.end_ids, strict=False)), self.name, output_dir
        )
create_project_summary(output_dir='static/projects/')

Create a project summary JSON file.

Source code in datadivr/project/model.py
def create_project_summary(self, output_dir: str = "static/projects/") -> None:
    """Create a project summary JSON file."""
    project_summary = {
        "name": self.name,
        "layouts": list(self.layouts_data.keys()),
        "layoutsRGB": [f"{layout}RGB" for layout in self.layouts_data],
        "links": ["links_XYZ"],  # Placeholder, adjust as needed
        "linksRGB": ["links_RGB"],  # Placeholder, adjust as needed
        "nodecount": len(self.nodes_data.ids) if self.nodes_data else 0,
        "linkcount": len(self.links_data.start_ids) if self.links_data else 0,
        "labelcount": 0,  # Placeholder, adjust as needed
        "annotationTypes": False,  # Placeholder, adjust as needed
    }

    file_path = Path(output_dir) / self.name / "project.json"
    file_path.parent.mkdir(parents=True, exist_ok=True)

    with file_path.open("w", encoding="utf-8") as f:
        json.dump(project_summary, f, indent=4)

    logger.info(f"Project summary saved to {file_path}")
create_textures(output_dir='static/projects/')

Create textures for the project.

Source code in datadivr/project/model.py
def create_textures(self, output_dir: str = "static/projects/") -> None:
    """Create textures for the project."""
    create_textures_from_project(
        self.name,
        self.layouts_data,
        {
            "start_ids": self.links_data.start_ids,
            "end_ids": self.links_data.end_ids,
            "colors": self.links_data.colors,
        }
        if self.links_data
        else None,
        output_dir,
    )
get_layout_colors(layout_name='default')

Get node colors for a specific layout

Source code in datadivr/project/model.py
def get_layout_colors(self, layout_name: str = "default") -> npt.NDArray[np.uint8]:
    """Get node colors for a specific layout"""
    if layout_name not in self.layouts_data:
        raise LayoutNotFoundError(layout_name)
    return self.layouts_data[layout_name].colors
get_layout_positions(layout_name='default')

Get node positions for a specific layout

Source code in datadivr/project/model.py
def get_layout_positions(self, layout_name: str = "default") -> npt.NDArray[np.float32]:
    """Get node positions for a specific layout"""
    if layout_name not in self.layouts_data:
        raise LayoutNotFoundError(layout_name)
    return self.layouts_data[layout_name].positions
load_from_binary_file(file_path) classmethod

Load a project from a binary format file.

Source code in datadivr/project/model.py
@classmethod
def load_from_binary_file(cls, file_path: Path | str) -> "Project":
    """Load a project from a binary format file."""
    file_path = Path(file_path)
    logger.debug("Loading project from binary format", file_path=str(file_path))

    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            temp_path = Path(temp_dir)

            # Extract zip archive
            with ZipFile(file_path, "r") as zf:
                zf.extractall(temp_path)

            # Load metadata
            with open(temp_path / "metadata.json", "rb") as f:
                metadata = orjson.loads(f.read())

            # Create project instance
            project = cls(name=metadata["name"], attributes=metadata.get("attributes", {}))

            # Load nodes
            if (temp_path / "arrays/node_ids.npy").exists():
                node_ids = np.load(temp_path / "arrays/node_ids.npy")
                project.nodes_data = NodeData(ids=node_ids)

                # Load attributes from metadata
                if "nodes" in metadata and "attributes" in metadata["nodes"]:
                    for name, _dtype_info in metadata["nodes"]["attributes"].items():
                        attr_path = temp_path / f"arrays/node_attr_{name}.npy"
                        if attr_path.exists():
                            values = np.load(attr_path, allow_pickle=True)
                            project.nodes_data.add_attribute(name, values, values.dtype)

            # Load links if present
            if (temp_path / "arrays/link_start_ids.npy").exists():
                project.links_data = LinkData(
                    start_ids=np.load(temp_path / "arrays/link_start_ids.npy"),
                    end_ids=np.load(temp_path / "arrays/link_end_ids.npy"),
                    colors=np.load(temp_path / "arrays/link_colors.npy"),
                )

            # Load layouts
            for layout_name in metadata["layouts"]:
                layout_dir = temp_path / f"arrays/layout_{layout_name}"
                project.layouts_data[layout_name] = LayoutData(
                    node_ids=np.load(layout_dir / "node_ids.npy"),
                    positions=np.load(layout_dir / "positions.npy"),
                    colors=np.load(layout_dir / "colors.npy"),
                )

            # Load selections
            if "selections" in metadata:
                project.selections = [Selection.model_validate(s) for s in metadata["selections"]]

            return project

    except Exception as e:
        logger.exception("Failed to load project from binary format", error=str(e))
        raise
load_from_json_file(file_path) classmethod

Load a project from a JSON file.

Parameters:

Name Type Description Default
file_path Path | str

Path to the JSON file

required

Returns:

Name Type Description
Project Project

Loaded and validated Project instance

Raises:

Type Description
ValidationError

If the JSON data doesn't match the expected schema

OSError

If there are file access issues

Source code in datadivr/project/model.py
@classmethod
def load_from_json_file(cls, file_path: Path | str) -> "Project":
    """Load a project from a JSON file.

    Args:
        file_path: Path to the JSON file

    Returns:
        Project: Loaded and validated Project instance

    Raises:
        ValidationError: If the JSON data doesn't match the expected schema
        OSError: If there are file access issues
    """
    file_path = Path(file_path)
    logger.debug("Loading project", file_path=str(file_path))

    try:
        with file_path.open("r", encoding="utf-8") as f:
            data = json.load(f)
            project = cls.model_validate(data)
            logger.info("Project loaded successfully", project_name=project.name)
            return project
    except Exception as e:
        logger.exception("Failed to load project", error=str(e))
        raise
model_dump(*, mode='python', include=None, exclude=None, by_alias=False, exclude_unset=False, exclude_defaults=False, exclude_none=False, round_trip=False, warnings=True, serialize_as_any=False, context=None)

Custom serialization optimized for speed and memory efficiency

Source code in datadivr/project/model.py
def model_dump(
    self,
    *,
    mode: str = "python",
    include: Any = None,
    exclude: Any = None,
    by_alias: bool = False,
    exclude_unset: bool = False,
    exclude_defaults: bool = False,
    exclude_none: bool = False,
    round_trip: bool = False,
    warnings: bool | str = True,
    serialize_as_any: bool = False,
    context: Any = None,
) -> dict[str, Any]:
    """Custom serialization optimized for speed and memory efficiency"""
    # Combine all attribute dictionaries
    attributes = {}
    if self.nodes_data:
        attributes.update({str(k): v.tolist() for k, v in self.nodes_data.str_attributes.items()})
        attributes.update({str(k): v.tolist() for k, v in self.nodes_data.float_attributes.items()})
        attributes.update({str(k): v.tolist() for k, v in self.nodes_data.int_attributes.items()})
        attributes.update({str(k): v.tolist() for k, v in self.nodes_data.bool_attributes.items()})

    # Ensure all keys in attributes are strings
    attributes = {str(k): v for k, v in attributes.items()}

    return {
        "name": self.name,
        "attributes": self.attributes,
        "nodes": {
            "ids": self.nodes_data.ids.astype(int).tolist() if self.nodes_data else [],
            "attributes": attributes,  # Use the updated attributes dictionary
        },
        "links": {
            "start_ids": self.links_data.start_ids.astype(int).tolist() if self.links_data else [],
            "end_ids": self.links_data.end_ids.astype(int).tolist() if self.links_data else [],
            "colors": self.links_data.colors.tolist() if self.links_data else [],
        },
        "layouts": {
            str(name): {  # Ensure layout names are strings
                "node_ids": layout.node_ids.astype(int).tolist(),
                "positions": layout.positions.tolist(),
                "colors": layout.colors.tolist(),
            }
            for name, layout in self.layouts_data.items()
        },
        "selections": [s.model_dump() for s in self.selections] if self.selections else [],
    }
model_validate(obj, *, strict=None, from_attributes=None, context=None) classmethod

Custom deserialization from efficient storage

Source code in datadivr/project/model.py
@classmethod
def model_validate(
    cls,
    obj: Any,
    *,
    strict: bool | None = None,
    from_attributes: bool | None = None,
    context: Any | None = None,
) -> "Project":
    """Custom deserialization from efficient storage"""
    data = obj  # obj will contain our dictionary data
    project = cls(name=data["name"], attributes=data.get("attributes", {}))

    # Load nodes
    if "nodes" in data:
        project.nodes_data = NodeData(ids=np.array(data["nodes"]["ids"], dtype=np.int32))

        # Load attributes into appropriate typed dictionaries
        if "attributes" in data["nodes"]:
            for name, values in data["nodes"]["attributes"].items():
                # Convert to numpy array and infer type
                arr = np.array(values)
                project.nodes_data.add_attribute(name, arr, arr.dtype)

    # Load links
    if "links" in data:
        project.links_data = LinkData(
            start_ids=np.array(data["links"]["start_ids"], dtype=np.int32),
            end_ids=np.array(data["links"]["end_ids"], dtype=np.int32),
            colors=np.array(data["links"]["colors"], dtype=np.uint8),
        )

    # Load layouts
    if "layouts" in data:
        for name, layout_data in data["layouts"].items():
            project.layouts_data[name] = LayoutData(
                node_ids=np.array(layout_data["node_ids"], dtype=np.int32),
                positions=np.array(layout_data["positions"], dtype=np.float32),
                colors=np.array(layout_data["colors"], dtype=np.uint8),
            )

    # Load selections
    if "selections" in data:
        project.selections = [Selection.model_validate(s) for s in data["selections"]]

    return project
save_to_binary_file(file_path)

Save the project using numpy binary format for large arrays.

Source code in datadivr/project/model.py
def save_to_binary_file(self, file_path: Path | str) -> None:
    """Save the project using numpy binary format for large arrays."""
    file_path = Path(file_path)
    logger.debug("Saving project in binary format", file_path=str(file_path))

    try:
        with tempfile.TemporaryDirectory() as temp_dir:
            temp_path = Path(temp_dir)
            self._save_arrays_to_temp(temp_path)
            self._save_metadata_to_temp(temp_path)
            self._create_zip_archive(file_path, temp_path)
            logger.info("Project saved successfully in binary format", project_name=self.name)
    except Exception as e:
        logger.exception("Failed to save project in binary format", error=str(e))
        raise
save_to_json_file(file_path)

Save the project to a JSON file with optimized performance.

Source code in datadivr/project/model.py
def save_to_json_file(self, file_path: Path | str) -> None:
    """Save the project to a JSON file with optimized performance."""
    file_path = Path(file_path)
    logger.debug("Saving project", file_path=str(file_path))

    try:
        # Convert to JSON-compatible dict first
        data = self.model_dump()

        # Use a faster JSON encoder
        import orjson  # Much faster than standard json

        # Write in binary mode with orjson
        with file_path.open("wb") as f:
            f.write(orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SERIALIZE_NUMPY))

        logger.info("Project saved successfully", project_name=self.name)
    except Exception as e:
        logger.exception("Failed to save project", error=str(e))
        raise

Functions

options: show_root_heading: true heading_level: 2 members: [Project, Node, Link, Layout, LayoutNodePosition, Selection] show_source: false