I am a student at university and our task is to create a search engine. I am having difficulty generating a unique id to assign to each url when added into the frontier. I have attempted using the SHA-256 hashing algorithm as well as Guid. Here is the code that i used to implement the guid:
public string generateID(string url_add)
{
long i = 1;
foreach (byte b in Guid.NewGuid().ToByteArray())
{
i *= ((int)b + 1);
}
string number = String.Format("{0:d9}", (DateTime.Now.Ticks / 10) % 1000000000);
return number;
}
This question is related to
c#
asp.net
.net
uniqueidentifier
Here is a 'YouTube-video-id' like id generator e.g. "UcBKmq2XE5a"
StringBuilder builder = new StringBuilder();
Enumerable
.Range(65, 26)
.Select(e => ((char)e).ToString())
.Concat(Enumerable.Range(97, 26).Select(e => ((char)e).ToString()))
.Concat(Enumerable.Range(0, 10).Select(e => e.ToString()))
.OrderBy(e => Guid.NewGuid())
.Take(11)
.ToList().ForEach(e => builder.Append(e));
string id = builder.ToString();
It creates random ids of size 11 characters. You can increase/decrease that as well, just change the parameter of Take method.
0.001% duplicates in 100 million.
Why don't use GUID?
Guid guid = Guid.NewGuid();
string str = guid.ToString();
Why can't we make a unique id as below.
We can use DateTime.Now.Ticks and Guid.NewGuid().ToString() to combine together and make a unique id.
As the DateTime.Now.Ticks is added, we can find out the Date and Time in seconds at which the unique id is created.
Please see the code.
var ticks = DateTime.Now.Ticks;
var guid = Guid.NewGuid().ToString();
var uniqueSessionId = ticks.ToString() +'-'+ guid; //guid created by combining ticks and guid
var datetime = new DateTime(ticks);//for checking purpose
var datetimenow = DateTime.Now; //both these date times are different.
We can even take the part of ticks in unique id and check for the date and time later for future reference.
If you want to use sha-256 (guid would be faster) then you would need to do something like
SHA256 shaAlgorithm = new SHA256Managed();
byte[] shaDigest = shaAlgorithm.ComputeHash(ASCIIEncoding.ASCII.GetBytes(url));
return BitConverter.ToString(shaDigest);
Of course, it doesn't have to ascii and it can be any other kind of hashing algorithm as well
This question seems to be answered, however for completeness, I would add another approach.
You can use a unique ID number generator which is based on Twitter's Snowflake id generator. C# implementation can be found here.
var id64Generator = new Id64Generator();
// ...
public string generateID(string sourceUrl)
{
return string.Format("{0}_{1}", sourceUrl, id64Generator.GenerateId());
}
Note that one of very nice features of that approach is possibility to have multiple generators on independent nodes (probably something useful for a search engine) generating real time, globally unique identifiers.
// node 0
var id64Generator = new Id64Generator(0);
// node 1
var id64Generator = new Id64Generator(1);
// ... node 10
var id64Generator = new Id64Generator(10);
We can do something like this
string TransactionID = "BTRF"+DateTime.Now.Ticks.ToString().Substring(0, 10);
Source: Stackoverflow.com