OpenAFS VLDB Address Troubleshooting
Author: |
Nate Coraor <nate@psu.???> |
Version: |
0.1 |
Copyright: |
Creative Commons Attribution Non-Commercial Share Alike |
Thanks: | Andrew Deason, Derrick Brashear, Jeffrey Altman, Freenode #openafs, and the OpenAFS Jabber conference room. |
Background
I have vlservers and fileservers in EC2. In EC2, your public IP address is not
assigned to any of the ethernet interfaces configured in the instance. A private
(10.0.0.0/8) address is assigned.
The problem
Twice, I've ended up with private addresses in the VLDB where public addresses
should be. To avoid this, you need to create the following (on Debian/Ubuntu,
${afslocaldir} is /var/lib/openafs/local):
- ${afslocaldir}/NetInfo with contents:
f <public-ip>
<private ip>
- ${afslocaldir}/NetRestrict with contents:
<private ip>
If you start a dafs instance without these files, the private IP will be
registered in the VLDB and you will have to get rid of it.
The solution
You want to remove the incorrect address and make sure that the fileserver is registered with the correct address. To do this:
- Fix your NetInfo and NetRestrict files.
- Use bos restart <fileserver-address> dafs to restart the fileserver with
the corrected address info. This will register the correct address(es) in
the VLDB.
- Use vos listvldb -server <private-ip> to make sure that the private IP
lists 0 entries.
- Execute vos changeaddr <private-ip> -remove
What not to do
- Do not use vos setaddrs.
- Do not use vos changeaddr <private-ip> <public-ip>
- Do not remove sysid unless you are doing a full disaster recovery.
Disaster recovery
You can recover from a screwed up VLDB with the following steps (on
Debian/Ubuntu, ${afsdbdir} is /var/lib/openafs/db):
- bos stop vlserver-address vlserver for all of your vlservers
- rm ${afsdbdir}/vldb* ${afsdbdir}/sysid on all fileserver and vlservers.
This removes the VLDB and the fileserver's sysid file, which will be
regenerated upon a dafs restart (with the correct IPs in it)
- bos start <vlserver-address> vlserver for all of your vlservers
- Wait until udebug <vlserver-address> 7003 shows Recovery state 1f on
whichever vlserver is elected as the sync site
- bos restart <fileserver-address> dafs for all of your fileservers
- vos syncvldb <fileserver-address> for all of your fileservers
Alternatively, Michael Meffie has a tool in development that can repair damaged VLDBs.