What does the term canonical form or canonical representation in Java mean

Question

I have often heard this term being used  but I have never really understood it   What does it mean  and can anyone give some examples point me to some links   EDIT  Thanks to everyone for the replies  Can you also tell me how the canonical representation is useful in equals   performance  as stated in Effective Java

User · Answer

Another good example might be  you have a class that supports the use of cartesian  x  y  z   spherical  r  theta  phi  and cylindrical coordinates  r  phi  z   For purposes of establishing equality  equals method   you would probably want to convert all representations to one  canonical  representation of your choosing  e g  spherical coordinates   Or maybe you would want to do this in general - i e  use one internal representation   I am not an expert  but this did occur to me as maybe a good concrete example

User · Answer

I believe there are two related uses of canonical  forms and instances   A canonical form means that values of a particular type of resource can be described or represented in multiple ways  and one of those ways is chosen as the favored canonical form   That form is canonized  like books that made it into the bible  and the other forms are not   A classic example of a canonical form is paths in a hierarchical file system  where a single file can be referenced in a number of ways   myFile txt                                     in current working dir    conf myFile txt                             relative to the CWD  apps tomcat conf myFile txt                   absolute path using symbolic links  u1 local apps tomcat-5 5 1 conf myFile txt    absolute path with no symlinks   The classic definition of the canonical representation of that file would be the last path  With local or relative paths you cannot globally identify the resource without contextual information  With absolute paths you can identify the resource  but cannot tell if two paths refer to the same entity  With two or more paths converted to their canonical forms  you can do all the above  plus determine if two resources are the same or not  if that is important to your application  solve the aliasing problem    Note that the canonical form of a resource is not a quality of that particular form itself  there can be multiple possible canonical forms for a given type like file paths  say  lexicographically first of all possible absolute paths   One form is just selected as the canonical form for a particular application reason  or maybe arbitrarily so that everyone speaks the same language   Forcing objects into their canonical instances is the same basic idea  but instead of determining one  best  representation of a resource  it arbitrarily chooses one instance of a class of instances with the same  content  as the canonical reference  then converts all references to equivalent objects to use the one canonical instance   This can be used as a technique for optimizing both time and space  If there are multiple instances of equivalent objects in an application  then by forcing them all to be resolved as the single canonical instance of a particular value  you can eliminate all but one of each value  saving space and possibly time since you can now compare those values with reference identity      as opposed to object equivalence  equals   method    A classic example of optimizing performance with canonical instances is collapsing strings with the same content  Calling String intern   on two strings with the same character sequence is guaranteed to return the same canonical String object for that text  If you pass all your strings through that canonicalizer  you know equivalent strings are actually identical object references  i e   aliases  The enum types in Java 5 0  force all instances of a particular enum value to use the same canonical instance within a VM  even if the value is serialized and deserialized  That is why you can use if  day    Days SUNDAY  with impunity in java if Days is an enum type  Doing this for your own classes is certainly possible  but takes care  Read Effective Java by Josh Bloch for details and advice

User · Answer

I believe there are two related uses of canonical  forms and instances   A canonical form means that values of a particular type of resource can be described or represented in multiple ways  and one of those ways is chosen as the favored canonical form   That form is canonized  like books that made it into the bible  and the other forms are not   A classic example of a canonical form is paths in a hierarchical file system  where a single file can be referenced in a number of ways   myFile txt                                     in current working dir    conf myFile txt                             relative to the CWD  apps tomcat conf myFile txt                   absolute path using symbolic links  u1 local apps tomcat-5 5 1 conf myFile txt    absolute path with no symlinks   The classic definition of the canonical representation of that file would be the last path  With local or relative paths you cannot globally identify the resource without contextual information  With absolute paths you can identify the resource  but cannot tell if two paths refer to the same entity  With two or more paths converted to their canonical forms  you can do all the above  plus determine if two resources are the same or not  if that is important to your application  solve the aliasing problem    Note that the canonical form of a resource is not a quality of that particular form itself  there can be multiple possible canonical forms for a given type like file paths  say  lexicographically first of all possible absolute paths   One form is just selected as the canonical form for a particular application reason  or maybe arbitrarily so that everyone speaks the same language   Forcing objects into their canonical instances is the same basic idea  but instead of determining one  best  representation of a resource  it arbitrarily chooses one instance of a class of instances with the same  content  as the canonical reference  then converts all references to equivalent objects to use the one canonical instance   This can be used as a technique for optimizing both time and space  If there are multiple instances of equivalent objects in an application  then by forcing them all to be resolved as the single canonical instance of a particular value  you can eliminate all but one of each value  saving space and possibly time since you can now compare those values with reference identity      as opposed to object equivalence  equals   method    A classic example of optimizing performance with canonical instances is collapsing strings with the same content  Calling String intern   on two strings with the same character sequence is guaranteed to return the same canonical String object for that text  If you pass all your strings through that canonicalizer  you know equivalent strings are actually identical object references  i e   aliases  The enum types in Java 5 0  force all instances of a particular enum value to use the same canonical instance within a VM  even if the value is serialized and deserialized  That is why you can use if  day    Days SUNDAY  with impunity in java if Days is an enum type  Doing this for your own classes is certainly possible  but takes care  Read Effective Java by Josh Bloch for details and advice

User · Answer

Wikipedia points to the term Canonicalization      A process for converting data that has more than one possible representation into a  standard  canonical representation  This can be done to compare different representations for equivalence  to count the number of distinct data structures  to improve the efficiency of various algorithms by eliminating repeated calculations  or to make it possible to impose a meaningful sorting order    The Unicode example made the most sense to me      Variable-length encodings in the Unicode standard  in particular UTF-8  have more than one possible encoding for most common characters  This makes string validation more complicated  since every possible encoding of each string character must be considered  A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design  which could cause bugs or allow attacks  The solution is to allow a single encoding for each character  Canonicalization is then the process of translating every string character to its single allowed encoding  An alternative is for software to determine whether a string is canonicalized  and then reject it if it is not  In this case  in a client server context  the canonicalization would be the responsibility of the client    In summary  a standard form of representation for data  From this form you can then convert to any representation you may need

User · Answer

The word  canonical  is just a synonym for  standard  or  usual   It doesn t have any Java-specific meaning

User · Answer

Another good example might be  you have a class that supports the use of cartesian  x  y  z   spherical  r  theta  phi  and cylindrical coordinates  r  phi  z   For purposes of establishing equality  equals method   you would probably want to convert all representations to one  canonical  representation of your choosing  e g  spherical coordinates   Or maybe you would want to do this in general - i e  use one internal representation   I am not an expert  but this did occur to me as maybe a good concrete example

User · Answer

The OP s questions about canonical form and how it can improve performance of the equals method can both be answered by extending the example provided in Effective Java     Consider the following class   public final class CaseInsensitiveString      private final String s     public CaseInsensitiveString String s        this s   Objects requireNonNull s           Override    public boolean equals Object o        return o instanceof CaseInsensitiveString  amp  amp    CaseInsensitiveString  o  s equalsIgnoreCase s           The equals method in this example has added cost by using String s equalsIgnoreCase method   As mentioned in the text     you may want to store a canonical form of the field so the equals   method can do a cheap exact comparison on canonical forms rather than   a more costly nonstandard comparison    What does Joshua Bloch mean when he says canonical form   Well  I think D  nal s concise answer is very appropriate  We can store the underlying String field in the CaseInsensitiveString example in a standard way  perhaps the uppercase form of the String  Now  you can reference this canonical form of the CaseInsensitiveString  its uppercase variant  and perform cheap evaluations in your equals and hashcode methods

User · Answer

Canonical Data in RDBMS  Graph Data  Think as  Normalization  or  Normal form  of a data in a RDBMS  Same data exists in different tables  represented with a unique identifier and mapped it in different tables  or Think a single form of a data in Graph Database that represented in many triples   Major benefit of it is to make Dml  Data manipulation  more efficient since you can upsert  insert update  only one value instead of many

User · Answer

The OP s questions about canonical form and how it can improve performance of the equals method can both be answered by extending the example provided in Effective Java     Consider the following class   public final class CaseInsensitiveString      private final String s     public CaseInsensitiveString String s        this s   Objects requireNonNull s           Override    public boolean equals Object o        return o instanceof CaseInsensitiveString  amp  amp    CaseInsensitiveString  o  s equalsIgnoreCase s           The equals method in this example has added cost by using String s equalsIgnoreCase method   As mentioned in the text     you may want to store a canonical form of the field so the equals   method can do a cheap exact comparison on canonical forms rather than   a more costly nonstandard comparison    What does Joshua Bloch mean when he says canonical form   Well  I think D  nal s concise answer is very appropriate  We can store the underlying String field in the CaseInsensitiveString example in a standard way  perhaps the uppercase form of the String  Now  you can reference this canonical form of the CaseInsensitiveString  its uppercase variant  and perform cheap evaluations in your equals and hashcode methods

User · Answer

The word  canonical  is just a synonym for  standard  or  usual   It doesn t have any Java-specific meaning

User · Answer

Canonical Data in RDBMS  Graph Data  Think as  Normalization  or  Normal form  of a data in a RDBMS  Same data exists in different tables  represented with a unique identifier and mapped it in different tables  or Think a single form of a data in Graph Database that represented in many triples   Major benefit of it is to make Dml  Data manipulation  more efficient since you can upsert  insert update  only one value instead of many

User · Answer

reduced to the simplest and most significant form without losing generality

User · Answer

An easy way to remember it is the way  canonical  is used in theological circles  canonical truth is the real truth so if two people find it they have found the same truth  Same with canonical instance  If you think you have found two of them  i e  a equals b   you really only have one  i e  a    b   So equality implies identity in the case of canonical object   Now for the comparison  You now have the choice of using a  b or a equals b   since they will produce the same answer in the case of canonical instance but a  b is comparison of the reference  the JVM can compare two numbers extremely rapidly as they are just two 32 bit patterns compared to a equals b  which is a method call and involves more overhead

User · Answer

I believe there are two related uses of canonical  forms and instances   A canonical form means that values of a particular type of resource can be described or represented in multiple ways  and one of those ways is chosen as the favored canonical form   That form is canonized  like books that made it into the bible  and the other forms are not   A classic example of a canonical form is paths in a hierarchical file system  where a single file can be referenced in a number of ways   myFile txt                                     in current working dir    conf myFile txt                             relative to the CWD  apps tomcat conf myFile txt                   absolute path using symbolic links  u1 local apps tomcat-5 5 1 conf myFile txt    absolute path with no symlinks   The classic definition of the canonical representation of that file would be the last path  With local or relative paths you cannot globally identify the resource without contextual information  With absolute paths you can identify the resource  but cannot tell if two paths refer to the same entity  With two or more paths converted to their canonical forms  you can do all the above  plus determine if two resources are the same or not  if that is important to your application  solve the aliasing problem    Note that the canonical form of a resource is not a quality of that particular form itself  there can be multiple possible canonical forms for a given type like file paths  say  lexicographically first of all possible absolute paths   One form is just selected as the canonical form for a particular application reason  or maybe arbitrarily so that everyone speaks the same language   Forcing objects into their canonical instances is the same basic idea  but instead of determining one  best  representation of a resource  it arbitrarily chooses one instance of a class of instances with the same  content  as the canonical reference  then converts all references to equivalent objects to use the one canonical instance   This can be used as a technique for optimizing both time and space  If there are multiple instances of equivalent objects in an application  then by forcing them all to be resolved as the single canonical instance of a particular value  you can eliminate all but one of each value  saving space and possibly time since you can now compare those values with reference identity      as opposed to object equivalence  equals   method    A classic example of optimizing performance with canonical instances is collapsing strings with the same content  Calling String intern   on two strings with the same character sequence is guaranteed to return the same canonical String object for that text  If you pass all your strings through that canonicalizer  you know equivalent strings are actually identical object references  i e   aliases  The enum types in Java 5 0  force all instances of a particular enum value to use the same canonical instance within a VM  even if the value is serialized and deserialized  That is why you can use if  day    Days SUNDAY  with impunity in java if Days is an enum type  Doing this for your own classes is certainly possible  but takes care  Read Effective Java by Josh Bloch for details and advice

User · Answer

A canonical form means a naturally unique representation of the element

User · Answer

canonical representation means view the character in different style for example if I write a letter A means another person may write the letter A in different style    This is according to OPTICAL CHARACTER RECOGNITION FIELD

User · Answer

An easy way to remember it is the way  canonical  is used in theological circles  canonical truth is the real truth so if two people find it they have found the same truth  Same with canonical instance  If you think you have found two of them  i e  a equals b   you really only have one  i e  a    b   So equality implies identity in the case of canonical object   Now for the comparison  You now have the choice of using a  b or a equals b   since they will produce the same answer in the case of canonical instance but a  b is comparison of the reference  the JVM can compare two numbers extremely rapidly as they are just two 32 bit patterns compared to a equals b  which is a method call and involves more overhead

User · Answer

A canonical form means a naturally unique representation of the element

User · Answer

Wikipedia points to the term Canonicalization      A process for converting data that has more than one possible representation into a  standard  canonical representation  This can be done to compare different representations for equivalence  to count the number of distinct data structures  to improve the efficiency of various algorithms by eliminating repeated calculations  or to make it possible to impose a meaningful sorting order    The Unicode example made the most sense to me      Variable-length encodings in the Unicode standard  in particular UTF-8  have more than one possible encoding for most common characters  This makes string validation more complicated  since every possible encoding of each string character must be considered  A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design  which could cause bugs or allow attacks  The solution is to allow a single encoding for each character  Canonicalization is then the process of translating every string character to its single allowed encoding  An alternative is for software to determine whether a string is canonicalized  and then reject it if it is not  In this case  in a client server context  the canonicalization would be the responsibility of the client    In summary  a standard form of representation for data  From this form you can then convert to any representation you may need

User · Answer

reduced to the simplest and most significant form without losing generality

User · Answer

Wikipedia points to the term Canonicalization      A process for converting data that has more than one possible representation into a  standard  canonical representation  This can be done to compare different representations for equivalence  to count the number of distinct data structures  to improve the efficiency of various algorithms by eliminating repeated calculations  or to make it possible to impose a meaningful sorting order    The Unicode example made the most sense to me      Variable-length encodings in the Unicode standard  in particular UTF-8  have more than one possible encoding for most common characters  This makes string validation more complicated  since every possible encoding of each string character must be considered  A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design  which could cause bugs or allow attacks  The solution is to allow a single encoding for each character  Canonicalization is then the process of translating every string character to its single allowed encoding  An alternative is for software to determine whether a string is canonicalized  and then reject it if it is not  In this case  in a client server context  the canonicalization would be the responsibility of the client    In summary  a standard form of representation for data  From this form you can then convert to any representation you may need

User · Answer

The word  canonical  is just a synonym for  standard  or  usual   It doesn t have any Java-specific meaning

User · Answer

Wikipedia points to the term Canonicalization      A process for converting data that has more than one possible representation into a  standard  canonical representation  This can be done to compare different representations for equivalence  to count the number of distinct data structures  to improve the efficiency of various algorithms by eliminating repeated calculations  or to make it possible to impose a meaningful sorting order    The Unicode example made the most sense to me      Variable-length encodings in the Unicode standard  in particular UTF-8  have more than one possible encoding for most common characters  This makes string validation more complicated  since every possible encoding of each string character must be considered  A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design  which could cause bugs or allow attacks  The solution is to allow a single encoding for each character  Canonicalization is then the process of translating every string character to its single allowed encoding  An alternative is for software to determine whether a string is canonicalized  and then reject it if it is not  In this case  in a client server context  the canonicalization would be the responsibility of the client    In summary  a standard form of representation for data  From this form you can then convert to any representation you may need

User · Answer

A good example for understanding  canonical form representation  is to look at the XML schema datatype definition of  boolean      the  lexical representation  of boolean can be one of   true  false  1  0  whereas the  canonical representation  can only be one of  true  false    This  in essence  means that     true  and  1  get mapped to the canonical repr   true  and  false  and  0  get mapped to the canoncial repr   false    see the w3 XML schema datatype definition for boolean

User · Answer

A good example for understanding  canonical form representation  is to look at the XML schema datatype definition of  boolean      the  lexical representation  of boolean can be one of   true  false  1  0  whereas the  canonical representation  can only be one of  true  false    This  in essence  means that     true  and  1  get mapped to the canonical repr   true  and  false  and  0  get mapped to the canoncial repr   false    see the w3 XML schema datatype definition for boolean

User · Answer

The word  canonical  is just a synonym for  standard  or  usual   It doesn t have any Java-specific meaning

User · Answer

I believe there are two related uses of canonical  forms and instances   A canonical form means that values of a particular type of resource can be described or represented in multiple ways  and one of those ways is chosen as the favored canonical form   That form is canonized  like books that made it into the bible  and the other forms are not   A classic example of a canonical form is paths in a hierarchical file system  where a single file can be referenced in a number of ways   myFile txt                                     in current working dir    conf myFile txt                             relative to the CWD  apps tomcat conf myFile txt                   absolute path using symbolic links  u1 local apps tomcat-5 5 1 conf myFile txt    absolute path with no symlinks   The classic definition of the canonical representation of that file would be the last path  With local or relative paths you cannot globally identify the resource without contextual information  With absolute paths you can identify the resource  but cannot tell if two paths refer to the same entity  With two or more paths converted to their canonical forms  you can do all the above  plus determine if two resources are the same or not  if that is important to your application  solve the aliasing problem    Note that the canonical form of a resource is not a quality of that particular form itself  there can be multiple possible canonical forms for a given type like file paths  say  lexicographically first of all possible absolute paths   One form is just selected as the canonical form for a particular application reason  or maybe arbitrarily so that everyone speaks the same language   Forcing objects into their canonical instances is the same basic idea  but instead of determining one  best  representation of a resource  it arbitrarily chooses one instance of a class of instances with the same  content  as the canonical reference  then converts all references to equivalent objects to use the one canonical instance   This can be used as a technique for optimizing both time and space  If there are multiple instances of equivalent objects in an application  then by forcing them all to be resolved as the single canonical instance of a particular value  you can eliminate all but one of each value  saving space and possibly time since you can now compare those values with reference identity      as opposed to object equivalence  equals   method    A classic example of optimizing performance with canonical instances is collapsing strings with the same content  Calling String intern   on two strings with the same character sequence is guaranteed to return the same canonical String object for that text  If you pass all your strings through that canonicalizer  you know equivalent strings are actually identical object references  i e   aliases  The enum types in Java 5 0  force all instances of a particular enum value to use the same canonical instance within a VM  even if the value is serialized and deserialized  That is why you can use if  day    Days SUNDAY  with impunity in java if Days is an enum type  Doing this for your own classes is certainly possible  but takes care  Read Effective Java by Josh Bloch for details and advice

User · Answer

canonical representation means view the character in different style for example if I write a letter A means another person may write the letter A in different style    This is according to OPTICAL CHARACTER RECOGNITION FIELD

[java] What does the term "canonical form" or "canonical representation" in Java mean?

Examples related to java