Index and MultiIndex Support
By default, the Index of a ObjectsBackingDataframe
is used to uniquely identify rows and associate them with dataframe-backed objects.
However, it is often more convenient to work with dataframes when they have a meaningful Index or MultiIndex. For example, objects often have ID attributes anyway.
Example: Using a dataclass field in an Index
import dataclasses
import papaya as pya
@pya.dataframe_backed_object
@dataclasses.dataclass
class User:
user_id: Annotated[int, pya.DataframeIndex] # (1)!
name: str
UserDataframe = ObjectsBackingDataframe[User]
user_df = UserDataframe(
pd.DataFrame(
[User(user_id=0, name="Wall-E")]
).set_index("user_id") # (2)!
)
- This specifies that
User.user_id
should be used as the Index. - By default, the Index must be set before instantiating the
ObjectsBackingDataframe
. This can be overridden by settingPapayaConfig.set_index
toTrue
or"auto"
.
user_df
can be used as if it were a normal pandas.DataFrame
with an Index called "user_id"
. Interoperability between user_df
and User
objects is maintained, except that trying to set the index field will raise an SettingOnIndexLevelError
(as it would change or break the mapping between user_df
rows and User
objects).
(user_0,) = list(user_df)
user_0.user_id # `1`
user_0.name # `'Wall-E'`
user_0.user_id = 2 # Trying to set on an index field raises a `SettingOnIndexLevelError`.
Example: Using multiple dataclass fields in a MultiIndex
import dataclasses
from uuid import UUID
import papaya as pya
@pya.dataframe_backed_object
@dataclasses.dataclass
class UserAddress:
user_id: Annotated[int, pya.DataframeIndex]
address_id: Annotated[UUID, pya.DataframeIndex]
name: str
UserAddressDataframe = ObjectsBackingDataframe[UserAddress]
user_address_df = UserAddressDataframe(
pd.DataFrame(
[UserAddress(user_id=0, address_id=UUID(int=42, version=4), name="The Axiom")]
).set_index(["user_id", "address_id"])
)
The functionality is essentially identical for a MultiIndex as it is for an Index:
(address_0,) = list(user_df)
address_0.user_id # `1`
address_0.address_id # `UUID('00000000-0000-4000-8000-00000000002a')`
address_0.name # `'The Axiom'`
address_0.user_id = 2 # Trying to set on an index field raises a `SettingOnIndexLevelError`.
Configuration
The behavior when instantiating an ObjectsBackingDataframe
can be controlled using PapayaConfig.set_index
:
-
set_index=False
is the default, used in the above examples. -
set_index=True
requires that index field(s) are in the columns. They will be moved to the [Multi]Index automatically:@pya.dataframe_backed_object @dataclasses.dataclass class User: user_id: Annotated[int, DataframeIndex] name: str papaya_config = pya.PapayaConfig(set_index=True) # (1)! UserDataframe = ObjectsBackingDataframe[User] user_df = UserDataframe([User(user_id=0, name="Wall-E")]) # (2)!
- As a result of
set_index=True
,user_id
will automatically be set as the index ofuser_df
. - Manually calling
.set_index("user_id")
is not necessary.
- As a result of
-
set_index="auto"
moves index field(s) to the [Multi]Index if they are all in the columns.