There is a small problem with Firebird 2.5 SuperServer and MacOSX 10.7 (Lion)...
SuperServer will not start, and produces a crash report header similar to this
Update 26th August 2011
Its not just 2.5 SuperServer, reports have come in that SuperServer for Firebird 2.1.4 will not start, also the same problem exists for Firebird 2.5 SuperClassic.
Process: fbserver [706]
Path: /Library/Frameworks/firebird.framework/Versions/A/Resources/bin/fbserver
Identifier: fbserver
Version: ??? (???)
Code Type: X86 (Native)
Parent Process: fbguard [166]
Date/Time: 2011-06-10 07:59:42.374 -0400
OS Version: Mac OS X 10.7 (11A480b)
Report Version: 9
Anonymous UUID: B7EDF790-CE72-4B21-A982-B9EA4F4E2088
Crashed Thread: 1 Dispatch queue: com.apple.libdispatch-manager
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Application Specific Information:
BUG IN CLIENT OF LIBDISPATCH: Do not close random Unix descriptors
The bug is fixed in Firebird 2.5.1 but Firebird 2.5.1 is not available yet. So I have fixed the problem also in the Firebird 2.5 code and produced 32bit and 64bit builds that work properly.
I have had new builds uploaded to Sourceforge to fix the problem with Firebird
2.1.4 SuperServer, you can distinguish the new builds from the old by the build no.
Old = 18393, New = 18393. Because we have some issues with 2.5.1 that are currently being investigated, I plan to update the 2.5 builds on Sourceforge soon.
Update 28th Sep 2011
The Firebird 2.5 builds have also been replaced on Sourceforge, so you just need to download the latest build. However Firebird 2.5 will be replaced by Firebird 2.5.1 very shortly, builds are now taking place and will be QA'd and released shortly.
Any problems - please let me know.
Tuesday, July 26, 2011
Monday, July 4, 2011
Firebird V2.1 Error: value exceeds the range for valid dates
You are restoring a backup of a Firebird 1.5 or 2.0 database to Firebird 2.1 and you see an error similar to this:
gbak: writing data for table xyz
gbak:20000 records written
gbak: Error: value exceeds the range for valid dates
gbak: Error: gds_$receive failed
gbak:Exiting before completion due to errors
How do you go about solving the problem?
Well first of all a quick visit to the Firebird bug tracker reveals
http://tracker.firebirdsql.org/browse/CORE-1714 and a couple of comments from Dimitry Yemanov.
“The error means to say that some column has an invalid date value (outside the supported range). Prior to V2.1, it was possible to store such invalid values in the database, but now it's prohibited. A verbose output should point you to a problematic table.” Also “The current behavior is intended and is unlikely to be changed.”
1. Firstly use the –v (verbose option of gbak) to find out the table that is causing the problem.
2. Check the table xyz for date columns
3. Perform the following SQL operation on all the date columns you found in 2.
isql> update XYZ set MY_DATE_COLUMN = MY_DATE_COLUMN;
At some point during the process you will see the same error that gbak produced, but now you will know which column is causing the problem.
4. Install Firebird 1.5 or 2.0, by going back to the version of Firebird that allows for invalid dates, you will be at least able to correct the invalid date to something more appropriate.
5. Now lets check for dates that are outside of their proper range (01 Jan 0001 - 31 Dec 9999)
Below date zero:
isql> select PRIMARY_KEY from XYZ where MY_DATE_COLUMN < '0001-01-01'
if an error occurs correct the date to something more appropriate and meaningful
Maximum date
isql> select PRIMARY_KEY from XYZ where MY_DATE_COLUMN > '9999-12-31'
if an error occurs correct the date to something more appropriate and meaningful
6. You can now backup the database using Firebird 1.5 or 2.1 and successfully restore under 2.1
gbak: writing data for table xyz
gbak:20000 records written
gbak: Error: value exceeds the range for valid dates
gbak: Error: gds_$receive failed
gbak:Exiting before completion due to errors
How do you go about solving the problem?
Well first of all a quick visit to the Firebird bug tracker reveals
http://tracker.firebirdsql.org/browse/CORE-1714 and a couple of comments from Dimitry Yemanov.
“The error means to say that some column has an invalid date value (outside the supported range). Prior to V2.1, it was possible to store such invalid values in the database, but now it's prohibited. A verbose output should point you to a problematic table.” Also “The current behavior is intended and is unlikely to be changed.”
1. Firstly use the –v (verbose option of gbak) to find out the table that is causing the problem.
2. Check the table xyz for date columns
3. Perform the following SQL operation on all the date columns you found in 2.
isql> update XYZ set MY_DATE_COLUMN = MY_DATE_COLUMN;
At some point during the process you will see the same error that gbak produced, but now you will know which column is causing the problem.
4. Install Firebird 1.5 or 2.0, by going back to the version of Firebird that allows for invalid dates, you will be at least able to correct the invalid date to something more appropriate.
5. Now lets check for dates that are outside of their proper range (01 Jan 0001 - 31 Dec 9999)
Below date zero:
isql> select PRIMARY_KEY from XYZ where MY_DATE_COLUMN < '0001-01-01'
if an error occurs correct the date to something more appropriate and meaningful
Maximum date
isql> select PRIMARY_KEY from XYZ where MY_DATE_COLUMN > '9999-12-31'
if an error occurs correct the date to something more appropriate and meaningful
6. You can now backup the database using Firebird 1.5 or 2.1 and successfully restore under 2.1
Tuesday, May 10, 2011
Firebird 3.0 - Shared Page Cache
Posted by Vlad on the Firebird Development List.
After more than a year of development, testing, switching on another tasks and returning back I'm ready to commit shared page cache implementation into trunk.
I consider it stable enough to be committed into SVN. It runs some stability tests more than 10 hours for longest run and many shorter runs at different configurations.
Here I want to do overview of changes, what was tested, what was not, and what is work in progress.
Page cache was re-implemented looking at both Vulcan code (thanks to Jim) and our code base. It is not a blind copy of Vulcan code, every decision was thinking on and adjusted when needed with my understanding and expirience with existing code.
The cache is syncronized at different levels. There is a few BufferControl (BCB) level sync's used to syncronize access to the common structures such as LRU queue, Precedence graph, Dirty page list and so one. These common sync's should be locked at as short as possible period of time as this is kind of global locks. Also there is a two syncs for every BufferDesc (BDB) - one for guard contents of page buffer tself and second one is to prevent partial writes of page wich edition is not completed and to prevent simultaneous write of the same page by different threads.
I don't explain cache algoritms here, if someone have a question I'll be happy to answer it. And I not going to said that page cache implementation is frozen - if necessary we will change it of course (there is still space for impovements and bugs fixing).
You know I already committed SyncObject class (and related classes) ported from Vulcan. I removed some unused code and restored fairness of locking : initially it was fair in Vulcan too (i.e. all lock requests was put in waiting queue and granted in first-in-first-out, or fair, order), but later Jim found some (unknown to me) ssue with this and made preference for SHARED locks. I found no performance issues with fair locking so I decided to revert that and restore original behavior. Of course it could be changed when necessary.
Shared page cache is not an isolated change. It affects many parts of engine and our syncronization model changed significantly again. There was agreement that we will not implement shared metadata cache in v3 as this is risky and we could not deliver V3 in reasonable time frame.
In shared cache mode we have single Database instance and many attachment instances linked to it (this is not new).
All metadata objects moved into Attachment. Metadata syncronization is guarded by attachment's mutex now. Database::SyncGuard and company are replaced by corresponding Attachment::XXX classes.
To make AST's work we need to release attachment mutex sometimes. This is very important change after v2.5 : in v2.5 attachment mutex is locked during whole duration of API call and no other API call (except of asyncronous fb_cancel_operation) could work with "busy" attacment. In V3 this rule is not worked anymore. So, now we could run more that one API call on the same attachment (of course not really simultaneously). I'm not sure it is safe but not disabled it so far.
To make asyncronous detach safe I introduced att_use_count counter which is incremented each time when API call is entered and decremented on exit. detach now marks attachment as shutdown and waits for att_use_count == 0 before processing.
Parallel access to the attachment could be easy disabled making every API call wait for att_use_count == 0 on enter or even introducing one more mutex to avoid spin wait.
Also it seems this counter make obsolete att_in_use member as detach call should wait for att_use_count == 0 and drop call should return "object is in use" if att_use_count != 0.
All AST's related to attachment-level objects should take attachment mutex before access attachment internals. This is implemented but not tested!
Transaction inventory pages cache (TIP cache) was reworked and is shared by all attachments.
To avoid contention on common dbb_pool its usage was replaced by att_pool when possible. To make this task slightly easy there was introduced jrd_rel::rel_pool which points currently to the attachment's pool. All relation's "sub-objects" (such as formats, fields, etc) is allocated using rel_pool (it was dbb_pool before). When we'll return metadata objects back to the Database it will be easy to redirect rel_pool to dbb_pool at one place in code instead of makeing tens of small changes again.
About stability testing of different parts of the engine:
- page cache - tested and worked
- nbackup - tested and worked
- monitoring - tested (less) and seems worked
- server stop\database shutdown - somewhat tested, no crash observed, client reaction is not perfect (mostly network write errors)
- shadow - not tested
- metadata changes in concurrent environment - not tested
- garbage collection thread - not tested, needs review and rework of some
implementation details
- cache writer thread - not tested, needs review and rework
- external connections - not tested
- external engines - not tested
In configuration file there are two new settings :
a) SharedCache - boolean value which rules the page cache mode
b) SharedDatabase - boolean value which which rules the database file open mode.
Currently they are common (as whole configuration) but soon it will be per-database settings. Below is few examples of how it could be used :
- SharedCache = true, SharedDatabase = false (default mode)
this is traditional SuperServer mode when all attachments share page cache and database file is opens in exclusive mode (only one server process could work with database)
- SharedCache = false, SharedDatabase = true
this is Classic mode when each attachment have its own page cache and many server processes could work with the same database.
To run SuperClassic you should use switch -m in command line of firebird.exe (on Windows) or run fb_smp_server (on Posix, here i'm not sure and Alex will correct me)
Else ClassicServer will run.
- SharedCache = true, SharedDatabase = true
this is completely new mode in which database file could be opened by many server processes and each process could handle many attachments which will share page cache (i.e. per-process page cache).
It could be used to run few SS processes, for example, or to run "main" SS process and have ability to attach using embedded to make some special tasks.
Must note that our lock manager is not ready to work in this mode, so we can't use it right now.
Also there is unknown how performance will be affected when few SS with big cache will work with the same database.
- SharedCache = false, SharedDatabase = false
Looks like single process with single attachment will be allowed to work with database with such settings. Probably you can find an applications for it ;)
One more change in configuration is that CpuAffinityMask setting changed its default value and it is 0 now. It allows new SS to use all available CPU's\cores by default.
After more than a year of development, testing, switching on another tasks and returning back I'm ready to commit shared page cache implementation into trunk.
I consider it stable enough to be committed into SVN. It runs some stability tests more than 10 hours for longest run and many shorter runs at different configurations.
Here I want to do overview of changes, what was tested, what was not, and what is work in progress.
Page cache was re-implemented looking at both Vulcan code (thanks to Jim) and our code base. It is not a blind copy of Vulcan code, every decision was thinking on and adjusted when needed with my understanding and expirience with existing code.
The cache is syncronized at different levels. There is a few BufferControl (BCB) level sync's used to syncronize access to the common structures such as LRU queue, Precedence graph, Dirty page list and so one. These common sync's should be locked at as short as possible period of time as this is kind of global locks. Also there is a two syncs for every BufferDesc (BDB) - one for guard contents of page buffer tself and second one is to prevent partial writes of page wich edition is not completed and to prevent simultaneous write of the same page by different threads.
I don't explain cache algoritms here, if someone have a question I'll be happy to answer it. And I not going to said that page cache implementation is frozen - if necessary we will change it of course (there is still space for impovements and bugs fixing).
You know I already committed SyncObject class (and related classes) ported from Vulcan. I removed some unused code and restored fairness of locking : initially it was fair in Vulcan too (i.e. all lock requests was put in waiting queue and granted in first-in-first-out, or fair, order), but later Jim found some (unknown to me) ssue with this and made preference for SHARED locks. I found no performance issues with fair locking so I decided to revert that and restore original behavior. Of course it could be changed when necessary.
Shared page cache is not an isolated change. It affects many parts of engine and our syncronization model changed significantly again. There was agreement that we will not implement shared metadata cache in v3 as this is risky and we could not deliver V3 in reasonable time frame.
In shared cache mode we have single Database instance and many attachment instances linked to it (this is not new).
All metadata objects moved into Attachment. Metadata syncronization is guarded by attachment's mutex now. Database::SyncGuard and company are replaced by corresponding Attachment::XXX classes.
To make AST's work we need to release attachment mutex sometimes. This is very important change after v2.5 : in v2.5 attachment mutex is locked during whole duration of API call and no other API call (except of asyncronous fb_cancel_operation) could work with "busy" attacment. In V3 this rule is not worked anymore. So, now we could run more that one API call on the same attachment (of course not really simultaneously). I'm not sure it is safe but not disabled it so far.
To make asyncronous detach safe I introduced att_use_count counter which is incremented each time when API call is entered and decremented on exit. detach now marks attachment as shutdown and waits for att_use_count == 0 before processing.
Parallel access to the attachment could be easy disabled making every API call wait for att_use_count == 0 on enter or even introducing one more mutex to avoid spin wait.
Also it seems this counter make obsolete att_in_use member as detach call should wait for att_use_count == 0 and drop call should return "object is in use" if att_use_count != 0.
All AST's related to attachment-level objects should take attachment mutex before access attachment internals. This is implemented but not tested!
Transaction inventory pages cache (TIP cache) was reworked and is shared by all attachments.
To avoid contention on common dbb_pool its usage was replaced by att_pool when possible. To make this task slightly easy there was introduced jrd_rel::rel_pool which points currently to the attachment's pool. All relation's "sub-objects" (such as formats, fields, etc) is allocated using rel_pool (it was dbb_pool before). When we'll return metadata objects back to the Database it will be easy to redirect rel_pool to dbb_pool at one place in code instead of makeing tens of small changes again.
About stability testing of different parts of the engine:
- page cache - tested and worked
- nbackup - tested and worked
- monitoring - tested (less) and seems worked
- server stop\database shutdown - somewhat tested, no crash observed, client reaction is not perfect (mostly network write errors)
- shadow - not tested
- metadata changes in concurrent environment - not tested
- garbage collection thread - not tested, needs review and rework of some
implementation details
- cache writer thread - not tested, needs review and rework
- external connections - not tested
- external engines - not tested
In configuration file there are two new settings :
a) SharedCache - boolean value which rules the page cache mode
b) SharedDatabase - boolean value which which rules the database file open mode.
Currently they are common (as whole configuration) but soon it will be per-database settings. Below is few examples of how it could be used :
- SharedCache = true, SharedDatabase = false (default mode)
this is traditional SuperServer mode when all attachments share page cache and database file is opens in exclusive mode (only one server process could work with database)
- SharedCache = false, SharedDatabase = true
this is Classic mode when each attachment have its own page cache and many server processes could work with the same database.
To run SuperClassic you should use switch -m in command line of firebird.exe (on Windows) or run fb_smp_server (on Posix, here i'm not sure and Alex will correct me)
Else ClassicServer will run.
- SharedCache = true, SharedDatabase = true
this is completely new mode in which database file could be opened by many server processes and each process could handle many attachments which will share page cache (i.e. per-process page cache).
It could be used to run few SS processes, for example, or to run "main" SS process and have ability to attach using embedded to make some special tasks.
Must note that our lock manager is not ready to work in this mode, so we can't use it right now.
Also there is unknown how performance will be affected when few SS with big cache will work with the same database.
- SharedCache = false, SharedDatabase = false
Looks like single process with single attachment will be allowed to work with database with such settings. Probably you can find an applications for it ;)
One more change in configuration is that CpuAffinityMask setting changed its default value and it is 0 now. It allows new SS to use all available CPU's\cores by default.
Thursday, March 31, 2011
OAT and MON$
Just as a "for your information", should anybody be interested.
The following query on Firebird V2.x gets the attachment that holds the oldest active transaction via the monitoring tables.
SELECT *
FROM MON$ATTACHMENTS
where MON$ATTACHMENT_ID =
(select MON$ATTACHMENT_ID FROM MON$TRANSACTIONS
where MON$TRANSACTION_ID =
(SELECT min(MON$TRANSACTION_ID) FROM MON$TRANSACTIONS))
The following query on Firebird V2.x gets the attachment that holds the oldest active transaction via the monitoring tables.
SELECT *
FROM MON$ATTACHMENTS
where MON$ATTACHMENT_ID =
(select MON$ATTACHMENT_ID FROM MON$TRANSACTIONS
where MON$TRANSACTION_ID =
(SELECT min(MON$TRANSACTION_ID) FROM MON$TRANSACTIONS))
HPUX Oracle and Itanium
The announcement by Oracle last week that it will stop developing software for the Itanium processor will affect users of Oracle on HPUX Integrity servers. No surprise then that we see the following blog post - Oracle is abandoning HP-UX customers…Fight back! from Ed Boyajian the CEO of EnterpriseDB suggesting that users should consider migrating to their version of PostgreSQL, even though as he admits PostgreSQL has not supported the HP-UX platform very well, but that is now going to change.
For those who are curious, Firebird V2.5 builds just fine on Itanium HPUX (IA-64). See prefix.hpux_aCC and prefix.hpux_ia64 in the builds/posix directory. The only issue we have in providing such builds, is access to the relevant hardware/operating system to do a build, and bringing upto date the HPUX installer.
For those who are curious, Firebird V2.5 builds just fine on Itanium HPUX (IA-64). See prefix.hpux_aCC and prefix.hpux_ia64 in the builds/posix directory. The only issue we have in providing such builds, is access to the relevant hardware/operating system to do a build, and bringing upto date the HPUX installer.
Wednesday, February 9, 2011
Forced Writes?
Forced Writes are turned ON by default on Windows since the release of Firebird V1.0.
See the following item on the tracker: CORE-591
But Forced Writes were OFF on Linux even if they were turned ON, until an issue with Linux (Posix) was resolved in Firebird V2.1 Beta 2 and this fix has now also ben backported to Firebird V2.0.4.
See the following item on the tracker: CORE-1476
There was some discussion amongst the Firebird developers about the change (either to OFF on Win or to ON on Linux so we would have consistent defaults), but so far there has not been a final resolution since Windows and Linux handle I/O differently, and its perfectly valid to have different default settings. Before the fix in Firebird V2.1 Linux never did forced writes even when asked to, and pretty much everybody got used to the performance with Forced Writes off, when we turned them back on correctly the performance decrement was definitely going to be a problem.
Basically, Linux writes pages out of the page cache as often as possible, so the file on disk stays pretty much current, whilst Windows writes to disk only when it must, so the difference between the disk and cache pre Firebird V1.5 could be enormous - hours or days of changes if the database file was not closed correctly. So, although it may be preferable that all systems should run with forced writes on, for Firebird on Windows it was pretty critical.
Within Firebird V1.5 we attempted to make sure that the issue of Windows cache, the Firebird cache and what was on disk could not be vastly different even if Forced Writes were turn off by introducing the following two parameters in the Firebird configuration file. However the new parameters do not guarentee complete consistency between the cache and disk, for that you do need Forced Writes turned on.
The parameters:
# ----------------------------
#
# How often the pages are flushed on disk
# (for databases with ForcedWrites=Off only)
#
# Number of unflushed writes which will accumulate before they are
# flushed, at the next transaction commit. For non-Win32 ports,
# the default value is -1 (Disabled)
#
# Type: integer
#
#MaxUnflushedWrites = 100
#
# Number of seconds during which unflushed writes will accumulate
# before they are flushed, at the next transaction commit. For non-Win32
# ports, the default value is -1 (Disabled)
#
# Type: integer
#
#MaxUnflushedWriteTime = 5
See the following item on the tracker: CORE-591
But Forced Writes were OFF on Linux even if they were turned ON, until an issue with Linux (Posix) was resolved in Firebird V2.1 Beta 2 and this fix has now also ben backported to Firebird V2.0.4.
See the following item on the tracker: CORE-1476
There was some discussion amongst the Firebird developers about the change (either to OFF on Win or to ON on Linux so we would have consistent defaults), but so far there has not been a final resolution since Windows and Linux handle I/O differently, and its perfectly valid to have different default settings. Before the fix in Firebird V2.1 Linux never did forced writes even when asked to, and pretty much everybody got used to the performance with Forced Writes off, when we turned them back on correctly the performance decrement was definitely going to be a problem.
Basically, Linux writes pages out of the page cache as often as possible, so the file on disk stays pretty much current, whilst Windows writes to disk only when it must, so the difference between the disk and cache pre Firebird V1.5 could be enormous - hours or days of changes if the database file was not closed correctly. So, although it may be preferable that all systems should run with forced writes on, for Firebird on Windows it was pretty critical.
Within Firebird V1.5 we attempted to make sure that the issue of Windows cache, the Firebird cache and what was on disk could not be vastly different even if Forced Writes were turn off by introducing the following two parameters in the Firebird configuration file. However the new parameters do not guarentee complete consistency between the cache and disk, for that you do need Forced Writes turned on.
The parameters:
# ----------------------------
#
# How often the pages are flushed on disk
# (for databases with ForcedWrites=Off only)
#
# Number of unflushed writes which will accumulate before they are
# flushed, at the next transaction commit. For non-Win32 ports,
# the default value is -1 (Disabled)
#
# Type: integer
#
#MaxUnflushedWrites = 100
#
# Number of seconds during which unflushed writes will accumulate
# before they are flushed, at the next transaction commit. For non-Win32
# ports, the default value is -1 (Disabled)
#
# Type: integer
#
#MaxUnflushedWriteTime = 5
Colissimo - What a joke.
If you want to send a package to anywhere in France, La Poste insist that you send it by Colissimo. Colissimo is more "expensive" than normal post, but it has insurance just in case its delivered late, goes missing or whatever you are sending gets broken.
Recently I sent a couple of Firebird mugs via Colissimo to a customer/Firebird user in France. After receipt and opening of the carefully packed box, the customer found that one of the mugs was cracked. After a brief email exchange he agreed to send me the damaged mug and I would send him a new one.
Once I had the broken mug, I asked the local post office for the basic repayment for the damaged mug from Colissimo. Their response was to hand me a form to fill in, and suggest I post the completed form direct to Colissimo. OK - job done - now I sit back and wait for them to send me the money for something they had managed to break.
After a couple of weeks a letter arrives telling me that they are looking into it, two weeks later another letter arrives, this one apologises, and says we cannot pay you the insurance because the reciever of the damaged goods should have filled in the complaint form and not the sender.
WTF? When you receive damaged goods from someone, what is the first thing you do?
Well, I would have thought, you contact the sender, ask if they want it back and ask for a replacement.... Its then the senders problem to deal with the people he used to send it.
Not in France it seems. By now we have paid for three lots of Colissimo (send original, send it back, send another) and wasted a whole lot of time. The result - huge profits for Colissimo, zero service for the customer. I now give up and throw the whole lot in the bin. Life is just too short to start aguing with a "fonctionnaire".
Recently I sent a couple of Firebird mugs via Colissimo to a customer/Firebird user in France. After receipt and opening of the carefully packed box, the customer found that one of the mugs was cracked. After a brief email exchange he agreed to send me the damaged mug and I would send him a new one.
Once I had the broken mug, I asked the local post office for the basic repayment for the damaged mug from Colissimo. Their response was to hand me a form to fill in, and suggest I post the completed form direct to Colissimo. OK - job done - now I sit back and wait for them to send me the money for something they had managed to break.
After a couple of weeks a letter arrives telling me that they are looking into it, two weeks later another letter arrives, this one apologises, and says we cannot pay you the insurance because the reciever of the damaged goods should have filled in the complaint form and not the sender.
WTF? When you receive damaged goods from someone, what is the first thing you do?
Well, I would have thought, you contact the sender, ask if they want it back and ask for a replacement.... Its then the senders problem to deal with the people he used to send it.
Not in France it seems. By now we have paid for three lots of Colissimo (send original, send it back, send another) and wasted a whole lot of time. The result - huge profits for Colissimo, zero service for the customer. I now give up and throw the whole lot in the bin. Life is just too short to start aguing with a "fonctionnaire".
Subscribe to:
Posts (Atom)